Use the RDF Mapping Language (RML) to map your structured data (CSV, TSV, XLSX, SPSS, SQL, XML, JSON, YAML) to RDF using a declarative mapping language.
We recommend to use YARRRML, a mapping language to replace the RDF by YAML, to make the definition of RML mappings easier.
The Matey web UI 🦜 is available to easily write and test RML mappings in YAML files using the YARRRML simplified mapping language. The mappings can be conveniently tested in the browser on a sample of the file to transform.
Recommended workflow to easily create and test RML mappings:
- Use the Matey web UI 🦜 to write YARRRML mappings, and test them against a sample of your data
- Copy the YARRRML mappings to a file with the extension
- Copy the RML mappings to a file with same name, and the extension
- Optionally you can automate the execution in a GitHub Actions workflow.
YARRRML can also be parsed locally or automatically using the yarrrml-parser
Example of a YARRRML mapping file using the split function on the
grel:p_string_sepseparators needs to be escaped with
- See the full list of available default functions.
- Additional function can be added by integrating them in a
.jarfile, see the documentation.
You can also generate nquads by adding the graph infos in the
rr:subjectMap in RML mappings (or just
g: in YARRRML):
The rmlmapper-java execute RML mappings to generate RDF Knowledge Graphs.
Not for large files
The RML Mapper loads all data in memory, so be aware when working with big datasets.
- Download the rmlmapper
.jarfile at https://github.com/RMLio/rmlmapper-java/releases
- Run the RML mapper:
Run automatically in workflow
The RMLMapper can be easily run in GitHub Actions workflows, checkout the Run workflows page for more details.
Work in progress
The RMLStreamer is still in development, some features such as functions are yet to be implemented.
To run the RMLStreamer you have 2 options:
- Start a single node Apache Flink cluster using docker on your machine.
- Use the DSRI Apache Flink cluster (especially for really large files).
Checkout the documentation to convert COHD using the RMLStreamer on the DSRI.
RMLStreamer.jar file, your mapping files and data files to the Flink
jobmanager pod before running it.
Example of command to run the RMLStreamer from the Flink cluster master:
Check the progress
The progress of the job can be checked in the Apache Flink web UI.
The ntriples files produced by RMLStreamer in parallel:
SSH connect to node2, http_proxy var need to be changed temporary to access DSRI
Reactivate the proxy (
Check the generated COHD file on node2 at:
Replace wrong triples:
The COHD repository will be created in
/data/graphdb-preload/data, copy it to the main GraphDB: