Graph databases

List of graph databases: RDF triplestores and property graphs.

RDF triplestores#

Ontotext GraphDB#

Ontotext GraphDB™ triplestore includes a web UI, various data visualizations, OntoRefine, SHACL validation, RDFS/OWL reasoning to infer new triples and the possibility to deploy multiple repositories. It uses mainly the rdf4j framework.

Download the zip file of GraphDB standalone free version 9.3.0, and place it in d2s-core/support/graphdb before building the image using d2s update(this step is also prompted during d2s init).

d2s start graphdb

docker build -t graphdb --build-arg version=9.3.0 d2s-core/support/graphdb
docker run -d --rm --name graphdb -p 7200:7200 \
    -v $(pwd)/workspace/graphdb:/opt/graphdb/home \
    -v $(pwd)/workspace/import:/root/graphdb-import \
    graphdb

Access at http://localhost:7200/

See the official Ontotext GraphDB™ documentation and the source code for Docker images for more details.

Obtain a license for more features such as performance improvement, easy deployment using the official DockerHub image or distributed deployment on multiple nodes with Kubernetes.

GraphDB allow to perform bulk load on large files using a second container:

Change the repository to be created and loaded in workspace/graphdb/preload-config.ttl (default: demo)
Put the files to be loaded in workspace/import/preload 📩
Start graphdb-preload docker container

d2s start graphdb-preload

When the preload has completed, the graphdb-preload container will stop, you can then copy the loaded repository from workspace/graphdb/preload-data/repositories to the running GraphDB folder:

cp -r workspace/graphdb/preload-data/repositories/* workspace/graphdb/data/repositories/

And access the newly loaded repository in the running GraphDB instance without downtime.

See additional d2s documentation about setting up GraphDB

Virtuoso#

OpenLink Virtuoso triplestore. Available on DockerHub.

d2s start virtuoso

docker run --name virtuoso \
    -p 8890:8890 -p 1111:1111 \
    -e DBA_PASSWORD=dba \
    -e SPARQL_UPDATE=true \
    -e DEFAULT_GRAPH=https://w3id.org/d2s/graph \
    -v $(pwd)/workspace/virtuoso:/data \
    -d tenforce/virtuoso

Access at http://localhost:8890/ and SPARQL endpoint at http://localhost:8890/sparql.

Admin username: dba

CORS can be enabled following those instructions. See our complete Virtuoso documentation for more details.

Clear the Virtuoso triplestore using this command:

docker exec -it d2s-virtuoso isql-v -U dba -P dba exec="RDF_GLOBAL_RESET ();"

Blazegraph#

A high-performance RDF graph database. See its documentation for Docker.

Not developed for 4 years but still efficient and used by Wikidata. It uses mainly the rdf4j framework.

d2s start blazegraph

# Start triplestore with specific UID and GID for the bulk load (UI)
# Tested on Ubuntu with $UID=1000 and nothing in $GROUPS (by default)
docker run --name blazegraph \
  -e BLAZEGRAPH_UID=$UID \
  -e BLAZEGRAPH_GID=$GROUPS \
  -p 8082:8080 \
  -v $(pwd)/workspace/import:/data \
  lyrasis/blazegraph:2.1.5

# To bulk load: create the dataloader.txt file
namespace=kb
propertyFile=/RWStore.properties
fileOrDirs=/data
format=n-triples
defaultGraph=http://defaultGraph
quiet=false
verbose=0
closure=false
durableQueues=true

# And submit it using a HTTP POST query to load all nt files in /data/d2s-workspace
curl -X POST \
  --data-binary @dataloader.txt \
  --header 'Content-Type:text/plain' \
http://localhost:8889/bigdata/dataloader

UID and Group ID needs to be set in order to have the right permission to bulk load a file (example given for Ubuntu). And RWStore.properties can be rewritten, see example.

Access UI at http://localhost:8082/bigdata

SPARQL endpoint at http://localhost:8080/bigdata/sparql (original port)

To clear the graph go to the update tab and enter clear all

Follow those instructions to enable CORS on Blazegraph SPARQL endpoint.

Jena Fuseki#

Fuseki is a SPARQL server on top of Apache TDB RDF store, for single machines. It uses mainly the Jena framework.

d2s start fuseki

docker run -d --name fuseki -p 3030:3030 -v $(pwd)/workspace/fuseki:/fuseki -v $(pwd)/workspace/import:/staging stain/jena-fuseki

Access at http://localhost:3030

Bulk load files in demo dataset from workspace/import (container needs to be stopped):

docker-compose -f d2s-core/docker-compose.yml \
  run -v $(pwd)/workspace/import:/staging \
  stain/jena-fuseki ./load.sh demo test1.ttl test2.nt

If you don't specify any filenames to load.sh, all filenames directly under /staging that match these GLOB patterns will be loaded:
*.rdf *.rdf.gz *.ttl *.ttl.gz *.owl *.owl.gz *.nt *.nt.gz *.nquads *.nquads.gz

Stardog#

Requires to download the free license first, then place it in the folder shared with Stardog.

See the official Stardog documentation for Docker. A JavaScript wrapper is available to communicate with Stardog API and SPARQL endpoint.

docker run -v $(pwd)/workspace/stardog:/var/opt/stardog -p 5820:5820 -e STARDOG_SERVER_JAVA_ARGS="-Xmx8g -Xms8g -XX:MaxDirectMemorySize=12g" stardog/stardog:latest

Access at http://localhost:5820, volume shared at workspace/stardog

AllegroGraph#

AllegroGraph® is a modern, high-performance, persistent graph database. It supports SPARQL, RDFS++, and Prolog reasoning from numerous client applications.

d2s start allegrograph

docker run -d -m 1g -v $(pwd)/workspace/allegrograph:/data -p 10000-10035:10000-10035 --shm-size 1g --name allegrograph franzinc/agraph:v6.6.0

Access at http://localhost:10035

Default login: test / xyzzy

See official documentation for bulk load.

TODO: fix shared volumes

AnzoGraph#

AnzoGraph® DB by Cambridge Semantics. See its official documentation to deploy with Docker.

Unregistered Free edition limited to 8G RAM, single user and single node deployment.
Register to access the 16G single node deployment for free.
Deploy AnzoGraph on multi-server cluster for horizontal scaling with the Enterprise Edition 💶

d2s start anzograph

docker run -d -p 8086:8080 -p 8443:8443 --name anzograph -v $(pwd)/workspace/anzograph:/opt/anzograph cambridgesemantics/anzograph:2.0.2

Access at http://localhost:8086

Default login: admin / Passw0rd1.

Kubernetes deployment available using Helm.

Linked Data Fragments Server#

Technically not a triplestore, server supporting the Memento protocol to timestamped SPARQL querying over multiple linked data sources, e.g. HDT or SPARQL.

d2s start ldf-server

docker run -p 8085:3000 -t -i --rm \
    -v $(pwd)/workspace/hdt-archives:/data \
    -v $(pwd)/workspace/ldfserver-config.json:/tmp/config.json \
    umids/ldf-server:latest /tmp/config.json

# Query example
curl -IL -H "Accept-Datetime: Wed, 15 Apr 2013 00:00:00 GMT" http://localhost:3000/timegate/dbpedia?subject=http%3A%2F%2Fdata2services%2Fmodel%2Fgo-category%2Fprocess

HDT archives goes in workspace/hdt-archives and the config file is in workspace/ldfserver-config.json

Access at http://localhost:8085

Property graphs#

Neo4j#

Not supporting RDF, Neo4j is a property graph database. It uses Cypher as query language.

d2s start neo4j

docker run -p 7474:7474 -p 7687:7687 -v $(pwd)/workspace/neo4j:/data neo4j

Access at http://localhost:7474, volume shared at workspace/neo4j

Login with neoj4 / neo4j and change the password.virtu

Additional triplestores#

MarkLogic#

Licensed RDF triplestore 📜

Follow the GitHub Docker instructions to deploy it.

You will need to download the MarkLogic Server 📥

RDFox#

Licensed RDF triplestore 📜

RDFox is a in-memory triplestore only supporting triples. RDFox is a main-memory, scalable, centralized data store that allows users to efficiently manage graph-structured data represented according to the RDF data model, run reasoning engines, and query that data using the SPARQL 1.1 query language.

See the documentation to deploy it using docker.