Convert with Python
#
Python for preprocessingPython is a good solution to perform preprocessing on the data for tasks not supported by RML.
For example you can use pandas
to quickly add a id
column based on another column (e.g. here the name
column without spaces and lowercase), to be used to build the entity URI:
Add a requirements.txt
file at the root of your repository with all libraries required to run your Python scripts.
Command to install the dependencies:
#
Python for RDF conversion#
RDFLibYou can perform the conversion to RDF using the RDFLib library.
You can easily map any structured data (CSV, TSV, XLSX, SPSS, SQL, XML, JSON, YAML...) to RDF using Python and rdflib
.
For example, to map a CSV with 2 columns Entity ID
and Entity name
:
#
DipperDipper is a Python package to generate RDF triples from common scientific resources. It has been used to build and expose RDF from multiple sources for the Monarch Initiative.
Dipper includes subpackages and modules to create graphical models of this data, including:
Models package for generating common sets of triples, including common OWL axioms, complex genotypes, associations, evidence and provenance models.
Graph package for building graphs with RDFLib or streaming n-triples
Source package containing fetchers and parsers that interface with remote databases and web services
#
Data Science on knowledge graphs#
kglabkglab is an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries โ atop Pandas, RDFlib, pySHACL, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc. Check the kglab documentation.
kglab
features:
- Load a RDF graph with
rdflib
- Validate the RDF with
pySHACL
- RDFS, OWLRL and SKOS inference
- Generate nodes/edges statistics
- Generate graph embeddings from RDF subgraphs
#
pyRDF2VecpyRDF2vec is a Python implementation and extension of RDF2Vec to create a 2D feature matrix from a knowledge graph for downstream ML tasks. Check the pyRDF2Vec documentation.