Setting up GraphDB
Ontotext GraphDB triplestore is available on DockerHub for standard
and enterprise edition
.
If you wish to use GraphDB free edition, you will need to download it from Ontotext and build the Docker image.
- Provide informations to get an email with the link to download GraphDB
- Download GraphDB as stand-alone server free version (zip)
- Put the downloaded
.zip
file in the GraphDB repository (cloned from GitHub). - Run
docker build -t graphdb --build-arg version=CHANGE_ME .
in the GraphDB repository.
#
Run itGo to http://localhost:7200/
#
Configure GraphDBThe memory allocated to the Java Virtual Machine can be increase, especially if you are facing Heap Space error when working with big graphs. See GraphDB documentation about recommended Java memory allocation.
#
Preload a fileWhen datasets bigger than 5G statements use the preload tool, which load faster in a stopped graphdb. To avoid stopping our main GraphDB instance we will preload using a temporary GraphDB, then copy the loaded repository in the running GraphDB
- Change the repository to be created and loaded in
workspace/graphdb/preload-config.ttl
- Put the files to be loaded in
workspace/import/preload
Run the docker-compose.yml
for preload
When the preload has completed, the graphdb-preload
will stop, you can then copy the loaded repository to the running GraphDB folder:
#
Create repository#
Using cURLCreate the test
repository
Edit the repository in graphdb-test-repo-config.ttl.
#
Using the GUIGo to Setup > Repositories > Create new repository
- Repository ID:
test
(or whatever you want it to be, but you will need to change the examples default config) - Check the box Use context index
- Click Create
#
Manage user🔓Default admin
user password is root
Go to Setup, then Users and access
To change your password:
- Click Edit admin user
- Enter a new password
- Click Save
By default security is not enabled, click on Security is off to turn it on.
To create a new user click Create new user
#
Create Search IndexExecute this insert SPARQL query in the repository:
#
Query inferred statementsSee the official Ontotext GraphDB documentation.
#
Use the HTTP APISee the Swagger UI.
#
Import fileImport the rdf_output.nq
file (in server import)
Check if import is done:
#
Import URLDoes not seems to work.
e.g. https://archive.monarchinitiative.org/latest/rdf/ctd.ttl
#
Export graphsExport all graphs of a repository to nquads:
Export one graph to nquads:
#
Importing large filesRecommendations when dealing with large RDF files to import.
- Speaking in general terms, JVM cannot handle big heaps well (>30GB) due to highly expensive full GC cycles.
- If you load datasets larger than 4B RDF triples use 40-bit identifiers to enable more than 2B unique RDF resources
- When datasets bigger than 500M statements without inference use the preload tool, which guarantees a sustained speed of 500M triples per hour
- Lower the heap to 30GB, the OS will cache some of the files so the big RAM will be still used to cache the files
- Expect a substantial offheap index (check the off heap estimate in the GraphDB documentation)
When creating the repo:
owlim:entity-index-size "2000000000" ;
owlim:entity-id-size "40" ;