Utilities
We list here the services available for deployment within a Data2Services project.
The
d2s startcommand is provided when available. Adocker runcommand is provided for every module.
Feel free to propose new services using pull requests or creating a new issue.
Each service is run using Docker. They have been configured to be deployed on a common network, sharing volumes in workspace/. Services configuration can be changed in the docker-compose.yml file or using deployments.
See also the list of tools to work with knowledge graphs published by STI Innsbruck at stiinnsbruck.github.io/kgs
Integrated services#
Jupyter Notebooks#
Deploy JupyterLab to use Notebooks to build or consume your RDF Knowledge Graph. Query your knowledge graph through its SPARQL endpoint, or the HTTP OpenAPI using Python, or R.
The proposed deployment comes with example queries to perform data processing using tools such as Dipper, BioThings, or various RDF and Data Science libraries. Example are also provided to start querying data from the produced RDF Knowledge Graph. See the GitHub repository.
Access on http://localhost:8888
Change the Notebook password in the docker-compose.yml file. Different passwords can be defined for different deployments.
Spark Notebooks#
Deploy JupyterLab to use Notebooks to process data using Apache Spark. See the GitHub repository for more details about the build.
Access on http://localhost:8889
Default password is
password
Generate a hash for your password in a Notebook by running:
Docket multiomics data provider#
DOCKET is a Dataset Overview, Comparison and Knowledge Extraction Tool built as Multiomics provider for the NCATS Translator project.
Access Notebooks at http://localhost:8002
RMLStreamer#
Use the RDF Mapping Language (RML) to map your structured data (CSV, TSV, SQL, XML, JSON, YAML) to RDF. The RMLStreamer is a scalable implementation of RML in development.
The RML mappings needs to be defined as in a file with the extension .rml.ttl, in the mapping folder of the dataset to transform, e.g. datasets/dataset_id/mapping/associations-mapping.rml.ttl
Start the required services:
Access at http://localhost:8078 to see running jobs.
Run the RMLStreamer:
Output goes to
workspace/import/associations-mapping_rml_ttl-cohd.nt
See the original RMLStreamer documentation to deploy using Docker.
Nanobench#
Nanobench is a web UI to publish Nanopublications.
Access on http://localhost:37373
Follow the web UI instructions to get started and publish nanopublications.
You can easily create and publish new templates following instructions at the nanobench-templates repository.
FAIR Data Point#
FAIR Data Point (FDP) is a REST API for creating, storing, and serving FAIR metadata. This FDP implementation also presents a Web-based graphical user interface (GUI). The metadata contents are generated semi-automatically according to the FAIR Data Point software specification document.
More information about FDP and how to deploy can be found at FDP Deployment Documentation.
Apache Drill#
Exposes tabular text files (CSV, TSV, PSV) as SQL, and enables queries on large datasets. Used by AutoR2RML and R2RML to convert tabular files to a generic RDF representation.
Access at http://localhost:8047/.
See on DockerHub.
Postgres#
Popular SQL database.
Password is
pwd
See the Postgres guide for more details.
LIMES interlinking#
LIMES is a tool developed by DICE group to perform interlinking between RDF entities using various metrics: Cosine, ExactMatch, Levenshtein...
Start the LIMES server:
Access at http://localhost:8090
See the official documentation to use the deployed REST API to submit LIMES jobs.
Postman can be used to perform HTTP POST queries on the API.
A newly released public Web UI can also be tried in the browser.
Executables and modules#
d2s-sparql-operations#
Execute SPARQL queries from string, URL or multiple files using RDF4J. Available on DockerHub.
See documentation.
Comunica#
Framework to perform federated queries over a lot of different stores (triplestores, TPF, HDT).
RdfUpload#
Upload RDF files to a triplestore.
See on DockerHub.
AutoR2RML#
Automatically generate R2RML files from Relational databases (SQL, Postgresql).
Can be combined with Apache Drill to process tabular files
See on DockerHub.
R2RML#
Convert Relational Databases to RDF using the R2RML mapping language.
Shared on
/data/d2s
Can be combined with Apache Drill to process tabular files.
See on DockerHub.
xml2rdf#
Streams XML to a generic RDF representing the structure of the file.
See on DockerHub.
json2xml#
Convert JSON to XML using json2xml. This XML can be then converted to generic RDF.
Shared on your machine at
/data/d2s-workspace
PyShEx#
Validate RDF from a SPARQL endpoint against a ShEx file.
rdf2hdt#
Convert RDF to HDT files. Header, Dictionary, Triples is a binary serialization format for RDF that keeps big datasets compressed while maintaining search and browse operations without prior decompression.
Jena riot validate RDF#
Validate RDF or convert RDF to RDF using Apache Jena riot tool. See Jena on DockerHub.
- Convert RDF to RDF
Jena does not allow to provide a output file, it uses standard output.
Use as GitHub Action:
Raptor rdf2rdf#
Raptor is a small and efficient Bash tool to convert from a RDF format to another (nq, nt, ttl, rdf/xml). It can help fix triple normalization and encoding issues.
JSON-LD not available, available format:
ntriplesturtlenquadsrdfxml
See GitHub repository for Docker build.
rdf2neo#
Convert RDF data to a neo4j property graph by mapping the RDF to Cypher queries using Rothamsted/rdf2neo.
To be developed.
d2s-bash-exec#
Simple container to execute Bash scripts from URL (e.g. hosted on GitHub). Mainly used to download datasets. See download script example.
See on DockerHub.
Additional services#
BridgeDb#
BridgeDb links URI identifiers from various datasets (Uniprot, PubMed).
RDF tools to try#
Interesting tools to work with RDF that have not yet been tried.
- Astrea: generate SHACL Shape from ontology
- AtomGraph json2rdf: convert JSON to generic RDF based on its structure.
- List of RDF tools published by the Semantic Technologie Institute (STI) Innsbruck: https://stiinnsbruck.github.io/kgs/