Python is a good solution to perform preprocessing on the data for tasks not supported by RML.
For example you can use
pandas to quickly add a
id column based on another column (e.g. here the
name column without spaces and lowercase), to be used to build the entity URI:
requirements.txt file at the root of your repository with all libraries required to run your Python scripts.
Command to install the dependencies:
You can perform the conversion to RDF using the RDFLib library.
You can easily map any structured data (CSV, TSV, XLSX, SPSS, SQL, XML, JSON, YAML...) to RDF using Python and
For example, to map a CSV with 2 columns
Entity ID and
Dipper includes subpackages and modules to create graphical models of this data, including:
Models package for generating common sets of triples, including common OWL axioms, complex genotypes, associations, evidence and provenance models.
Graph package for building graphs with RDFLib or streaming n-triples
Source package containing fetchers and parsers that interface with remote databases and web services
kglab is an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc. Check the kglab documentation.
- Load a RDF graph with
- Validate the RDF with
- RDFS, OWLRL and SKOS inference
- Generate nodes/edges statistics
- Generate graph embeddings from RDF subgraphs