Convert with Python
Python for preprocessing#
Python is a good solution to perform preprocessing on the data for tasks not supported by RML.
For example you can use pandas to quickly add a id column based on another column (e.g. here the name column without spaces and lowercase), to be used to build the entity URI:
Add a requirements.txt file at the root of your repository with all libraries required to run your Python scripts.
Command to install the dependencies:
Python for RDF conversion#
RDFLib#
You can perform the conversion to RDF using the RDFLib library.
You can easily map any structured data (CSV, TSV, XLSX, SPSS, SQL, XML, JSON, YAML...) to RDF using Python and rdflib.
For example, to map a CSV with 2 columns Entity ID and Entity name:
Dipper#
Dipper is a Python package to generate RDF triples from common scientific resources. It has been used to build and expose RDF from multiple sources for the Monarch Initiative.
Dipper includes subpackages and modules to create graphical models of this data, including:
Models package for generating common sets of triples, including common OWL axioms, complex genotypes, associations, evidence and provenance models.
Graph package for building graphs with RDFLib or streaming n-triples
Source package containing fetchers and parsers that interface with remote databases and web services
Data Science on knowledge graphs#
kglab#
kglab is an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries โ atop Pandas, RDFlib, pySHACL, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc. Check the kglab documentation.
kglab features:
- Load a RDF graph with
rdflib - Validate the RDF with
pySHACL - RDFS, OWLRL and SKOS inference
- Generate nodes/edges statistics
- Generate graph embeddings from RDF subgraphs
pyRDF2Vec#
pyRDF2vec is a Python implementation and extension of RDF2Vec to create a 2D feature matrix from a knowledge graph for downstream ML tasks. Check the pyRDF2Vec documentation.