If you wish to use GraphDB free edition, you will need to download it from Ontotext and build the Docker image.
- Provide informations to get an email with the link to download GraphDB
- Download GraphDB as stand-alone server free version (zip)
- Put the downloaded
.zipfile in the GraphDB repository (cloned from GitHub).
docker build -t graphdb --build-arg version=CHANGE_ME .in the GraphDB repository.
Go to http://localhost:7200/
The memory allocated to the Java Virtual Machine can be increase, especially if you are facing Heap Space error when working with big graphs. See GraphDB documentation about recommended Java memory allocation.
When datasets bigger than 5G statements use the preload tool, which load faster in a stopped graphdb. To avoid stopping our main GraphDB instance we will preload using a temporary GraphDB, then copy the loaded repository in the running GraphDB
- Change the repository to be created and loaded in
- Put the files to be loaded in
docker-compose.yml for preload
When the preload has completed, the
graphdb-preload will stop, you can then copy the loaded repository to the running GraphDB folder:
Edit the repository in graphdb-test-repo-config.ttl.
Go to Setup > Repositories > Create new repository
- Repository ID:
test(or whatever you want it to be, but you will need to change the examples default config)
- Check the box Use context index
- Click Create
admin user password is
Go to Setup, then Users and access
To change your password:
- Click Edit admin user
- Enter a new password
- Click Save
By default security is not enabled, click on Security is off to turn it on.
To create a new user click Create new user
Execute this insert SPARQL query in the repository:
See the Swagger UI.
rdf_output.nq file (in server import)
Check if import is done:
Does not seems to work.
Export all graphs of a repository to nquads:
Export one graph to nquads:
Recommendations when dealing with large RDF files to import.
- Speaking in general terms, JVM cannot handle big heaps well (>30GB) due to highly expensive full GC cycles.
- If you load datasets larger than 4B RDF triples use 40-bit identifiers to enable more than 2B unique RDF resources
- When datasets bigger than 500M statements without inference use the preload tool, which guarantees a sustained speed of 500M triples per hour
- Lower the heap to 30GB, the OS will cache some of the files so the big RAM will be still used to cache the files
- Expect a substantial offheap index (check the off heap estimate in the GraphDB documentation)
When creating the repo:
owlim:entity-index-size "2000000000" ;
owlim:entity-id-size "40" ;