Start services

Run services such as triplestores, to store your RDF knowledge graph, interfaces or web UI to access the triplestore data. A specific deployment config can be passed using the -d flag.

d2s start <service_name> -d <optional_deployment_config>

Volumes of all containers started by d2s are shared in the workspace/ folder.

d2s uses docker-compose to run the different services 🐳

In this documentation we will use a set of services to build the knowledge graph and access it using various interfaces.

List of services#

The services deployments are defined in the d2s-core/docker-compose.yml file.

Start the services described below using:

d2s start <service_name>

🔗 Graph databases#

See the detailed lists of available graph databases.

  • graphdb: commercial triplestore with a web UI and multiple repositories
  • virtuoso: Open Source triplestore with a faceted browser
  • blazegraph: Open Source lightweight triplestore
  • fuseki: Open Source SPARQL server built on top of Apache Jena and TDB.
  • allegroGraph: commercial triplestore
  • anzoGraph: commercial triplestore
  • ldf-server: Open Source Linked Data Fragments server, store and query compressed HDT files
  • neo4j: commercial property graph database

🖥️ Interfaces#

See the detailed lists of available interfaces.

  • biothings-studio: web UI to build and deploy BioThings APIs
  • into-the-graph: SPARQL web browser leveraging HCLS metadata, with YASGUI editor
  • api: HTTP Open API with Swagger UI to query a RDF triplestore, accept ReasonerStd queries
  • comunica: widget to query heterogeneous interfaces (SPARQL, HDT) using Comunica SPARQL and GraphQL

🔧 Utilities#

See the detailed lists of RDF utilities.

  • notebook: JupyterLab with template Notebooks to build and query the triplestore.
  • spark-notebook: all Spark JupyterLab to process data.
  • docket: multiomics tool for dataset overview, comparison and knowledge extraction using Jupyter notebooks.
  • rmlstreamer: Apache Flink to process RML mappings
    • rmltask: dependency of the rmlstreamer, the 2 services are required to run
  • drill: exposes tabular text files (CSV, TSV, PSV) as SQL using Apache Drill
  • postgres: popular Open Source SQL database
  • limes: server to perform interlinking between RDF entities using various metrics
  • nanobench: web UI to publish Nanopublications
  • mapeathor: converts Excel files into RML or YARRRML mappings

Start demo#

Different solutions can used as final triplestore, here we will use Ontotext GraphDB as final triplestores for the Knowledge Graph. From our experience GraphDB is more stable and faster performing federated queries, additionally it offers a user-friendly administration.

GraphDB needs to be downloaded for licensing reason, provide your address and you will receive an email with the URL to download the GraphDB standalone zip file (graphdb-free-9.1.1-dist.zip).

To easily install GraphDB, we recommend you to place it in your home folder before running d2s init, it is the default when the path to the GraphDB zip file is asked.

Start services required to run data transformation demonstration workflows: GraphDB triplestore, Apache Drill and Virtuoso as temporary triplestore.

d2s start demo

⚠️ GraphDB might fail to start if not enough resources are available. We recommend raising the resources limit for Docker, and stopping resource-intensive apps, such as Slack, VSCode, Skype.

  • Access the into-the-graph browser for GraphDB at http://localhost:8079
  • Access the HTTP Swagger API at http://localhost:8080
  • Access GraphDB at http://localhost:7200
  • Access the temporary Virtuoso at http://localhost:8890

If you use Blazegraph or Virtuoso as final triplestore, you will need to activate CORS request to allow communication between the into-the-graph browser and the triplestore on your browser.

An add-on to enable CORS can be easily installed for Firefox or Chrome.

Use a deployment config#

Services can be started with a specific deployment config. This enables to define variable and docker parameters for a specific deployment in a complementary YAML file in the d2s-core/deployments folder.

See the deployments/trek.yml config as example, the following parameters are usually defined in deployment config:

  • the service public URL (nginx Virtual Host)
  • different Docker image tag for a service (to use different version)
  • password
  • resources limitations

Start services with a deployment config:

d2s start graphdb virtuoso drill api rmlstreamer rmltask -d trek

Feel free to define a new deployment config if your services requires different parameters than the one defined in the main docker-compose.yml

Manage services#

Show running services#

d2s services

Stop all services#

d2s stop --all

Stop specific services#

d2s stop virtuoso api

Show running workflows#

You can get process information about running workflows, such as its process ID.

d2s process-running

Stop running workflow#

Autocomplete will show only the PID of running workflows.

d2s process-stop <workflow_pid>

If autocomplete doesn't work, retrieve the PID using d2s process-running

Last updated on by Vincent Emonet