Utilities

We list here the services available for deployment within a Data2Services project.

The d2s start command is provided when available. A docker run command is provided for every module.

Feel free to propose new services using pull requests or creating a new issue.

Each service is run using Docker. They have been configured to be deployed on a common network, sharing volumes in workspace/. Services configuration can be changed in the docker-compose.yml file or using deployments.

See also the list of tools to work with knowledge graphs published by STI Innsbruck at stiinnsbruck.github.io/kgs

Integrated services#

Jupyter Notebooks#

Deploy JupyterLab to use Notebooks to build or consume your RDF Knowledge Graph. Query your knowledge graph through its SPARQL endpoint, or the HTTP OpenAPI using Python, or R.

The proposed deployment comes with example queries to perform data processing using tools such as Dipper, BioThings, or various RDF and Data Science libraries. Example are also provided to start querying data from the produced RDF Knowledge Graph. See the GitHub repository.

d2s start notebook

docker run --rm -it -p 8888:8888 \
  -v $(pwd)/workspace:/notebooks/workspace \
  -v $(pwd)/datasets:/notebooks/datasets \
  -e PASSWORD="<your_secret>" \
  -e GIT_URL="https://github.com/vemonet/translator-sparql-notebook" \
  umids/jupyterlab:latest

Access on http://localhost:8888

Change the Notebook password in the docker-compose.yml file. Different passwords can be defined for different deployments.

Spark Notebooks#

Deploy JupyterLab to use Notebooks to process data using Apache Spark. See the GitHub repository for more details about the build.

d2s start spark-notebook

docker run --rm -it -p 8889:8888 \
  -v $(pwd)/workspace:/home/jovyan/work/workspace
  -v $(pwd)/datasets:/home/jovyan/work/datasets
  -e JUPYTER_ENABLE_LAB=yes \
  umids/jupyterlab-spark \
  start-notebook.sh  --NotebookApp.password='sha1:9316432938f9:93985dffbb854d31308dfe0602a51db947fb7d80'

Access on http://localhost:8889

Default password is password

Generate a hash for your password in a Notebook by running:
from notebook.auth import passwd
passwd()

Docket multiomics data provider#

DOCKET is a Dataset Overview, Comparison and Knowledge Extraction Tool built as Multiomics provider for the NCATS Translator project.

d2s start docket

docker run -d --rm --name docket \
  -p 8002:8888 -e PYTHONPATH=/app \
  -v $(pwd)/workspace/docket:/data \
  umids/docket:latest

Access Notebooks at http://localhost:8002

RMLStreamer#

Use the RDF Mapping Language (RML) to map your structured data (CSV, TSV, SQL, XML, JSON, YAML) to RDF. The RMLStreamer is a scalable implementation of RML in development.

The RML mappings needs to be defined as in a file with the extension .rml.ttl, in the mapping folder of the dataset to transform, e.g. datasets/dataset_id/mapping/associations-mapping.rml.ttl

Start the required services:

d2s start rmlstreamer rmltask

Access at http://localhost:8078 to see running jobs.

Run the RMLStreamer:

d2s rml cohd

Output goes to workspace/import/associations-mapping_rml_ttl-cohd.nt

See the original RMLStreamer documentation to deploy using Docker.

Nanobench#

Nanobench is a web UI to publish Nanopublications.

d2s start nanobench

docker run -d --rm --name nanobench -p 37373:37373 \
  -v $(pwd)/workspace/.nanopub:/root/.nanopub \
  -e NANOBENCH_API_INSTANCES=http://grlc.np.dumontierlab.com/api/local/local/ http://grlc.nanopubs.lod.labs.vu.nl/api/local/local/ http://130.60.24.146:7881/api/local/local/ \
  nanopub/nanobench

Access on http://localhost:37373

Follow the web UI instructions to get started and publish nanopublications.

You can easily create and publish new templates following instructions at the nanobench-templates repository.

FAIR Data Point#

FAIR Data Point (FDP) is a REST API for creating, storing, and serving FAIR metadata. This FDP implementation also presents a Web-based graphical user interface (GUI). The metadata contents are generated semi-automatically according to the FAIR Data Point software specification document.

More information about FDP and how to deploy can be found at FDP Deployment Documentation.

d2s start fairdatapoint

Apache Drill#

Exposes tabular text files (CSV, TSV, PSV) as SQL, and enables queries on large datasets. Used by AutoR2RML and R2RML to convert tabular files to a generic RDF representation.

d2s start drill

docker run -dit --rm -p 8047:8047 -p 31011:31010 \
    --name drill -v $(pwd)/workspace/input:/data:ro umids/apache-drill:latest

Access at http://localhost:8047/.

See on DockerHub.

Postgres#

Popular SQL database.

d2s start postgres

docker run --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=pwd -d -v $(pwd)/workspace/postgres:/data postgres

Password is pwd

See the Postgres guide for more details.

LIMES interlinking#

LIMES is a tool developed by DICE group to perform interlinking between RDF entities using various metrics: Cosine, ExactMatch, Levenshtein...

Start the LIMES server:

d2s start limes-server

Access at http://localhost:8090

See the official documentation to use the deployed REST API to submit LIMES jobs.

Postman can be used to perform HTTP POST queries on the API.

A newly released public Web UI can also be tried in the browser.

Executables and modules#

d2s-sparql-operations#

Execute SPARQL queries from string, URL or multiple files using RDF4J. Available on DockerHub.

docker run -it --rm umids/d2s-sparql-operations:latest -o select \
  -q "select distinct ?Concept where {[] a ?Concept} LIMIT 10" \
  -e "http://dbpedia.org/sparql"
  
# Provide the URL to a GitHub folder to execute all .rq files in it
docker run -it --rm umids/d2s-sparql-operations \
  -e "https://graphdb.dumontierlab.com/repositories/public/statements" \
  -o update -u my_username -p my_password \
  -i "https://github.com/MaastrichtU-IDS/d2s-sparql-operations/tree/master/src/main/resources/insert-examples"

See documentation.

Comunica#

Framework to perform federated queries over a lot of different stores (triplestores, TPF, HDT).

docker run -it comunica/actor-init-sparql \
    http://fragments.dbpedia.org/2015-10/en \
    "CONSTRUCT WHERE { ?s ?p ?o } LIMIT 100"

RdfUpload#

Upload RDF files to a triplestore.

docker run -it --rm --link graphdb:graphdb -v $(pwd)/workspace/import:/data \
    umids/rdf-upload:latest -m "HTTP" -if "/data" \
    -url "http://graphdb:7200" -rep "test" \
    -un "username" -pw "password"

See on DockerHub.

AutoR2RML#

Automatically generate R2RML files from Relational databases (SQL, Postgresql).

docker run -it --rm --link drill:drill --link postgres:postgres -v $(pwd)/workspace/input:/data \
    umids/autor2rml:latest -j "jdbc:drill:drillbit=drill:31010" -r \
    -o "/data/d2s-workspace/mapping.trig" \
    -d "/data/d2s-workspace" \
    -u "postgres" -p "pwd" \
    -b "https://w3id.org/d2s/" \
    -g "https://w3id.org/d2s/graph"

Can be combined with Apache Drill to process tabular files

See on DockerHub.

R2RML#

Convert Relational Databases to RDF using the R2RML mapping language.

docker run -it --rm --net d2s-core_network \
  -v $(pwd)/workspace/input:/data \
  umids/r2rml:latest \ 
  --connectionURL jdbc:drill:drillbit=drill:31010 \
  --mappingFile /data/mapping.trig \
  --outputFile /data/rdf_output.nq \
  --format NQUADS

Shared on /data/d2s

Can be combined with Apache Drill to process tabular files.

See on DockerHub.

xml2rdf#

Streams XML to a generic RDF representing the structure of the file.

docker run --rm -it -v $(pwd)/workspace/input:/data umids/xml2rdf:latest  \
    -i "/data/d2s-workspace/file.xml.gz" \
    -o "/data/d2s-workspace/file.nq.gz" \
    -g "https://w3id.org/d2s/graph"

See on DockerHub.

json2xml#

Convert JSON to XML using json2xml. This XML can be then converted to generic RDF.

docker run -it -v $(pwd)/workspace/input:data vemonet/json2xml:latest -i /data/test.json 

Shared on your machine at /data/d2s-workspace

PyShEx#

Validate RDF from a SPARQL endpoint against a ShEx file.

git clone https://github.com/hsolbrig/PyShEx.git
docker build -t pyshex ./PyShEx/docker
docker run --rm -it pyshex -gn '' -ss -ut -pr \
    -sq 'select ?item where{?item a <http://w3id.org/biolink/vocab/Gene>} LIMIT 1' \
    https://graphdb.dumontierlab.com/repositories/ncats-red-kg \
    https://github.com/biolink/biolink-model/raw/master/shex/biolink-modelnc.shex

rdf2hdt#

Convert RDF to HDT files. Header, Dictionary, Triples is a binary serialization format for RDF that keeps big datasets compressed while maintaining search and browse operations without prior decompression.

docker run -it --rm -v $(pwd)/workspace:/data \
  rdfhdt/hdt-cpp rdf2hdt /data/input.nt /data/output.hdt

Jena riot validate RDF#

Validate RDF or convert RDF to RDF using Apache Jena riot tool. See Jena on DockerHub.

docker run --volume $(pwd)/workspace:/rdf stain/jena:3.14.0 riot --validate input.ttl

Convert RDF to RDF

docker run --volume $(pwd)/workspace:/rdf stain/jena:3.14.0 riot --output=NQUADS input.ttl > output.nq

Jena does not allow to provide a output file, it uses standard output.

Use as GitHub Action:

- uses: vemonet/jena-riot-action@v3.14
  with:
    input: my_file.ttl

Raptor rdf2rdf#

Raptor is a small and efficient Bash tool to convert from a RDF format to another (nq, nt, ttl, rdf/xml). It can help fix triple normalization and encoding issues.

JSON-LD not available, available format:

ntriples
turtle
nquads
rdfxml

docker run -it --rm -v $(pwd)/workspace:/data \
  umids/raptor-rdf2rdf -i ntriples -o rdfxml /data/kg.nt > /data/kg.xml

See GitHub repository for Docker build.

rdf2neo#

Convert RDF data to a neo4j property graph by mapping the RDF to Cypher queries using Rothamsted/rdf2neo.

To be developed.

d2s-bash-exec#

Simple container to execute Bash scripts from URL (e.g. hosted on GitHub). Mainly used to download datasets. See download script example.

docker run -it --rm -v $(pwd)/workspace/input:/data umids/d2s-bash-exec:latest https://raw.githubusercontent.com/MaastrichtU-IDS/d2s-project-template/master/datasets/stitch/download/download-stitch.sh

See on DockerHub.

Additional services#

BridgeDb#

BridgeDb links URI identifiers from various datasets (Uniprot, PubMed).

docker run -p 8183:8183 bigcatum/bridgedb

RDF tools to try#

Interesting tools to work with RDF that have not yet been tried.

Astrea: generate SHACL Shape from ontology
AtomGraph json2rdf: convert JSON to generic RDF based on its structure.
List of RDF tools published by the Semantic Technologie Institute (STI) Innsbruck: https://stiinnsbruck.github.io/kgs/