Store RDF data

Store RDF data in a triplestore accessible by querying a SPARQL endpoint.

Publish to our public GraphDB triplestore#

Create a new repository on our GraphDB triplestore at https://graphdb.dumontierlab.com/

Ask for permissions

Ask us to get the permissions to create new repositories after creating an account.

Create the GraphDB repository#

👩‍💻 Go to Setup > Repositories > Create Repository

Or click here: https://graphdb.dumontierlab.com/repository/create

👨‍💻 Choose the settings of your repository (leave the default if not mentioned here):

Ruleset: use RDFS-Plus (Optimized) by default, or a OWL ruleset if you are performing reasoning using OWL ontologies
Supports SHACL validation: enable if you plan on using SHACL shapes to validate the RDF loaded in the repository.
- Visit https://maastrichtu-ids.github.io/shapes-of-you to find SHACL Shapes
- Add new shapes to IDS Shapes repository: https://github.com/MaastrichtU-IDS/shacl-shapes
Use context index: enable this to index the contexts (aka. graphs)
For large dataset:
- Entity index size: increase this to 999999999
- Entity ID bit-size: increase this to 40

To access your repository:

SPARQL endpoint at https://graphdb.dumontierlab.com/repositories/my-repository
SPARQL endpoint to run update queries (e.g. INSERT): https://graphdb.dumontierlab.com/repositories/my-repository/statements
GraphDB admin web UI: https://graphdb.dumontierlab.com and change the repository using the button at the top right of the screen.

Edit your repository access#

By default your repository will not be available publicly.

👩‍💻 Go to Users and Access

Change the Free Access Settings (top right of the page) to enable public access to read the SPARQL endpoint of your repository
- Find your repository and enable Read access (checkbox on the left)
You can also give Write access to other users
- We usually give Write access to the import_user to be used in automated workflow (to automatically upload new data to the repository)

Optional: enable GraphDB search index#

You can easily enable GraphDB Lucene search index to quickly search string in your triplestore

Here is an example to create a search index for the rdfs:label and dct:description properties.

👨‍💻 Running this in your GraphDB repository SPARQL editor will insert the triples and the search index will be created (this might take some time). Feel free to edit the predicates indexed.

PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
INSERT DATA { 
    # luc:moleculeSize luc:setParam "1" .
    luc:includePredicates luc:setParam "http://www.w3.org/2000/01/rdf-schema#label http://www.w3.org/2000/01/rdf-schema#comment http://purl.org/dc/terms/description" .
    luc:useRDFRank luc:setParam "yes" .
    luc:searchIndex luc:createIndex "true" .
}

Query the GraphDB search index:

PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
SELECT ?foundUri ?foundLabel ?score {
    ?foundLabel luc:searchIndex 'TEXT_TO_SEARCH*' ;
      luc:score ?score .
    ?foundUri ?p ?foundLabel .
} ORDER BY ?score LIMIT 200

Wildcard

We are using a * wildcard at the end to match all strings starting with the string TEXT_TO_SEARCH

List of RDF triplestores#

Ontotext GraphDB#

Ontotext GraphDB™ triplestore includes a web UI, various data visualizations, OntoRefine, SHACL validation, RDFS/OWL reasoning to infer new triples and the possibility to deploy multiple repositories. It uses mainly the rdf4j framework.

Download the zip file of the latest GraphDB standalone free version, and place it in the same folder as the Dockerfile before building the image.

docker build -t graphdb --build-arg version=9.3.0 .
docker run -d --rm --name graphdb -p 7200:7200 \
    -v $(pwd)/workspace/graphdb:/opt/graphdb/home \
    -v $(pwd)/workspace/import:/root/graphdb-import \
    graphdb

Access at http://localhost:7200/

See the official Ontotext GraphDB™ documentation and the source code for Docker images for more details.

Obtain a license for more features such as performance improvement, easy deployment using the official DockerHub image or distributed deployment on multiple nodes with Kubernetes.

GraphDB allow to perform bulk load on large files using a second container:

Change the repository to be created and loaded in workspace/graphdb/preload-config.ttl (default: demo)
Put the files to be loaded in workspace/import/preload 📩
Start graphdb-preload docker container

When the preload has completed, the graphdb-preload container will stop, you can then copy the loaded repository from workspace/graphdb/preload-data/repositories to the running GraphDB folder:

cp -r workspace/graphdb/preload-data/repositories/* workspace/graphdb/data/repositories/

And access the newly loaded repository in the running GraphDB instance without downtime.

See additional d2s documentation about setting up GraphDB

Virtuoso#

OpenLink Virtuoso triplestore. Available on DockerHub.

docker run --name virtuoso \
    -p 8890:8890 -p 1111:1111 \
    -e DBA_PASSWORD=dba \
    -e SPARQL_UPDATE=true \
    -e DEFAULT_GRAPH=https://w3id.org/d2s/graph \
    -v $(pwd)/workspace/virtuoso:/data \
    -d tenforce/virtuoso

Access at http://localhost:8890/ and SPARQL endpoint at http://localhost:8890/sparql.

Admin username: dba

CORS can be enabled following those instructions. See our complete Virtuoso documentation for more details.

Clear the Virtuoso triplestore using this command:

docker exec -it d2s-virtuoso isql-v -U dba -P dba exec="RDF_GLOBAL_RESET ();"

Blazegraph#

A high-performance RDF graph database. See its documentation for Docker.

It uses mainly the rdf4j framework.

# Start triplestore with specific UID and GID for the bulk load (UI)
# Tested on Ubuntu with $UID=1000 and nothing in $GROUPS (by default)
docker run --name blazegraph \
  -e BLAZEGRAPH_UID=$UID \
  -e BLAZEGRAPH_GID=$GROUPS \
  -p 8082:8080 \
  -v $(pwd)/workspace/import:/data \
  lyrasis/blazegraph:2.1.5

# To bulk load: create the dataloader.txt file
namespace=kb
propertyFile=/RWStore.properties
fileOrDirs=/data
format=n-triples
defaultGraph=http://defaultGraph
quiet=false
verbose=0
closure=false
durableQueues=true

# And submit it using a HTTP POST query to load all nt files in /data/d2s-workspace
curl -X POST \
  --data-binary @dataloader.txt \
  --header 'Content-Type:text/plain' \
http://localhost:8889/bigdata/dataloader

UID and Group ID needs to be set in order to have the right permission to bulk load a file (example given for Ubuntu). And RWStore.properties can be rewritten, see example.

Access UI at http://localhost:8082/bigdata

SPARQL endpoint at http://localhost:8080/bigdata/sparql (original port)

To clear the graph go to the update tab and enter clear all

Follow those instructions to enable CORS on Blazegraph SPARQL endpoint.

Jena Fuseki#

Fuseki is a SPARQL server on top of Apache TDB RDF store, for single machines. It uses mainly the Jena framework.

docker run -d --name fuseki -p 3030:3030 -v $(pwd)/workspace/fuseki:/fuseki -v $(pwd)/workspace/import:/staging stain/jena-fuseki

Access at http://localhost:3030

Bulk load files in demo dataset from workspace/import (container needs to be stopped):

docker-compose -f d2s-core/docker-compose.yml \
  run -v $(pwd)/workspace/import:/staging \
  stain/jena-fuseki ./load.sh demo test1.ttl test2.nt

If you don't specify any filenames to load.sh, all filenames directly under /staging that match these GLOB patterns will be loaded:
*.rdf *.rdf.gz *.ttl *.ttl.gz *.owl *.owl.gz *.nt *.nt.gz *.nquads *.nquads.gz

Stardog#

Requires to download the free license first, then place it in the folder shared with Stardog.

See the official Stardog documentation for Docker. A JavaScript wrapper is available to communicate with Stardog API and SPARQL endpoint.

docker run -v $(pwd)/workspace/stardog:/var/opt/stardog -p 5820:5820 -e STARDOG_SERVER_JAVA_ARGS="-Xmx8g -Xms8g -XX:MaxDirectMemorySize=12g" stardog/stardog:latest

Access at http://localhost:5820, volume shared at workspace/stardog

AllegroGraph#

AllegroGraph® is a modern, high-performance, persistent graph database. It supports SPARQL, RDFS++, and Prolog reasoning from numerous client applications.

docker run -d -m 1g -v $(pwd)/workspace/allegrograph:/data -p 10000-10035:10000-10035 --shm-size 1g --name allegrograph franzinc/agraph:v6.6.0

Access at http://localhost:10035

Default login: test / xyzzy

See official documentation for bulk load.

TODO: fix shared volumes

AnzoGraph#

AnzoGraph® DB by Cambridge Semantics. See its official documentation to deploy with Docker.

Unregistered Free edition limited to 8G RAM, single user and single node deployment.
Register to access the 16G single node deployment for free.
Deploy AnzoGraph on multi-server cluster for horizontal scaling with the Enterprise Edition 💶

docker run -d -p 8086:8080 -p 8443:8443 --name anzograph -v $(pwd)/workspace/anzograph:/opt/anzograph cambridgesemantics/anzograph:2.0.2

Access at http://localhost:8086

Default login: admin / Passw0rd1.

Kubernetes deployment available using Helm.

Linked Data Fragments Server#

Technically not a triplestore, server supporting the Memento protocol to timestamped SPARQL querying over multiple linked data sources, e.g. HDT or SPARQL.

docker run -p 8085:3000 -t -i --rm \
    -v $(pwd)/workspace/hdt-archives:/data \
    -v $(pwd)/workspace/ldfserver-config.json:/tmp/config.json \
    umids/ldf-server:latest /tmp/config.json

# Query example
curl -IL -H "Accept-Datetime: Wed, 15 Apr 2013 00:00:00 GMT" http://localhost:3000/timegate/dbpedia?subject=http%3A%2F%2Fdata2services%2Fmodel%2Fgo-category%2Fprocess

HDT archives goes in workspace/hdt-archives and the config file is in workspace/ldfserver-config.json

Access at http://localhost:8085

Property graphs#

Neo4j#

Not supporting RDF, Neo4j is a property graph database. It uses Cypher as query language.

docker run -p 7474:7474 -p 7687:7687 -v $(pwd)/workspace/neo4j:/data neo4j

Access at http://localhost:7474, volume shared at workspace/neo4j

Login with neoj4 / neo4j and change the password.virtu

Additional triplestores#

MarkLogic#

Licensed RDF triplestore 📜

Follow the GitHub Docker instructions to deploy it.

You will need to download the MarkLogic Server 📥

RDFox#

Licensed RDF triplestore 📜

RDFox is a in-memory triplestore only supporting triples. RDFox is a main-memory, scalable, centralized data store that allows users to efficiently manage graph-structured data represented according to the RDF data model, run reasoning engines, and query that data using the SPARQL 1.1 query language.

See the documentation to deploy it using docker.