Utilities
We list here the services available for deployment within a Data2Services project.
The
d2s start
command is provided when available. Adocker run
command is provided for every module.
Feel free to propose new services using pull requests or creating a new issue.
Each service is run using Docker. They have been configured to be deployed on a common network, sharing volumes in workspace/
. Services configuration can be changed in the docker-compose.yml file or using deployments.
See also the list of tools to work with knowledge graphs published by STI Innsbruck at stiinnsbruck.github.io/kgs
#
Integrated services#
Jupyter NotebooksDeploy JupyterLab to use Notebooks to build or consume your RDF Knowledge Graph. Query your knowledge graph through its SPARQL endpoint, or the HTTP OpenAPI using Python, or R.
The proposed deployment comes with example queries to perform data processing using tools such as Dipper, BioThings, or various RDF and Data Science libraries. Example are also provided to start querying data from the produced RDF Knowledge Graph. See the GitHub repository.
Access on http://localhost:8888
Change the Notebook password in the docker-compose.yml file. Different passwords can be defined for different deployments.
#
Spark NotebooksDeploy JupyterLab to use Notebooks to process data using Apache Spark. See the GitHub repository for more details about the build.
Access on http://localhost:8889
Default password is
password
Generate a hash for your password in a Notebook by running:
#
Docket multiomics data providerDOCKET is a Dataset Overview, Comparison and Knowledge Extraction Tool built as Multiomics provider for the NCATS Translator project.
Access Notebooks at http://localhost:8002
#
RMLStreamerUse the RDF Mapping Language (RML) to map your structured data (CSV, TSV, SQL, XML, JSON, YAML) to RDF. The RMLStreamer is a scalable implementation of RML in development.
The RML mappings needs to be defined as in a file with the extension .rml.ttl
, in the mapping folder of the dataset to transform, e.g. datasets/dataset_id/mapping/associations-mapping.rml.ttl
Start the required services:
Access at http://localhost:8078 to see running jobs.
Run the RMLStreamer:
Output goes to
workspace/import/associations-mapping_rml_ttl-cohd.nt
See the original RMLStreamer documentation to deploy using Docker.
#
NanobenchNanobench is a web UI to publish Nanopublications.
Access on http://localhost:37373
Follow the web UI instructions to get started and publish nanopublications.
You can easily create and publish new templates following instructions at the nanobench-templates repository.
#
FAIR Data PointFAIR Data Point (FDP) is a REST API for creating, storing, and serving FAIR metadata. This FDP implementation also presents a Web-based graphical user interface (GUI). The metadata contents are generated semi-automatically
according to the FAIR Data Point software specification document.
More information about FDP and how to deploy can be found at FDP Deployment Documentation.
#
Apache DrillExposes tabular text files (CSV, TSV, PSV) as SQL, and enables queries on large datasets. Used by AutoR2RML and R2RML to convert tabular files to a generic RDF representation.
Access at http://localhost:8047/.
See on DockerHub.
#
PostgresPopular SQL database.
Password is
pwd
See the Postgres guide for more details.
#
LIMES interlinkingLIMES is a tool developed by DICE group to perform interlinking between RDF entities using various metrics: Cosine, ExactMatch, Levenshtein...
Start the LIMES server:
Access at http://localhost:8090
See the official documentation to use the deployed REST API to submit LIMES jobs.
Postman can be used to perform HTTP POST queries on the API.
A newly released public Web UI can also be tried in the browser.
#
Executables and modules#
d2s-sparql-operationsExecute SPARQL queries from string, URL or multiple files using RDF4J. Available on DockerHub.
See documentation.
#
ComunicaFramework to perform federated queries over a lot of different stores (triplestores, TPF, HDT).
#
RdfUploadUpload RDF files to a triplestore.
See on DockerHub.
#
AutoR2RMLAutomatically generate R2RML files from Relational databases (SQL, Postgresql).
Can be combined with Apache Drill to process tabular files
See on DockerHub.
#
R2RMLConvert Relational Databases to RDF using the R2RML mapping language.
Shared on
/data/d2s
Can be combined with Apache Drill to process tabular files.
See on DockerHub.
#
xml2rdfStreams XML to a generic RDF representing the structure of the file.
See on DockerHub.
#
json2xmlConvert JSON to XML using json2xml. This XML can be then converted to generic RDF.
Shared on your machine at
/data/d2s-workspace
#
PyShExValidate RDF from a SPARQL endpoint against a ShEx file.
#
rdf2hdtConvert RDF to HDT files. Header, Dictionary, Triples is a binary serialization format for RDF that keeps big datasets compressed while maintaining search and browse operations without prior decompression.
#
Jena riot validate RDFValidate RDF or convert RDF to RDF using Apache Jena riot tool. See Jena on DockerHub.
- Convert RDF to RDF
Jena does not allow to provide a output file, it uses standard output.
Use as GitHub Action:
#
Raptor rdf2rdfRaptor is a small and efficient Bash tool to convert from a RDF format to another (nq, nt, ttl, rdf/xml). It can help fix triple normalization and encoding issues.
JSON-LD not available, available format:
ntriples
turtle
nquads
rdfxml
See GitHub repository for Docker build.
#
rdf2neoConvert RDF data to a neo4j property graph by mapping the RDF to Cypher queries using Rothamsted/rdf2neo.
To be developed.
#
d2s-bash-execSimple container to execute Bash scripts from URL (e.g. hosted on GitHub). Mainly used to download datasets. See download script example.
See on DockerHub.
#
Additional services#
BridgeDbBridgeDb links URI identifiers from various datasets (Uniprot, PubMed).
#
RDF tools to tryInteresting tools to work with RDF that have not yet been tried.
- Astrea: generate SHACL Shape from ontology
- AtomGraph json2rdf: convert JSON to generic RDF based on its structure.
- List of RDF tools published by the Semantic Technologie Institute (STI) Innsbruck: https://stiinnsbruck.github.io/kgs/