Define the data model
#
Define a conceptual modelDefine a conceptual model of the data you want to convert.
For example with CSV files or relational databases tables, most of the time:
- each file/table represents a class (aka. type),
- each row is an entity,
- each column is a property of this entity
Source: https://kgbook.org
If you create a diagram for your conceptual model, we encourage you to add an image of it in the model
folder of your repository.
#
Tools to build your modelDepending on what you are trying to achieve you might not need to use the same tools. If you want to build a complete OWL ontology, then a specialized tool like Protege would be more suited.
If you just want to define a schema with only a few classes and properties, then your favorite drawing tool will be probably enough. A popular tool used for defining data model is Diagram.io (previously yed and draw.io). There is also the Graffoo tool to generate an ontology from your Diagram.io diagram.
Here is a non-exhaustive list of tools specialized to define data models:
- Protege is the most popular and mature tool to build OWL ontologies. It is available as Desktop version and Web version (the desktop version has more functionalities)
- Gra.fo is a commercial website that will allow you to define your model using a nice graphical interface with nodes and edges. It can be useful for small simple models, but will require you to pay to unlock advanced features.
#
Search for ontology conceptsYou will need to define the class and relations for the properties in your data. The easiest way is to find classes and properties in existing model (aka. ontologies). Some properties are standard like rdf:type
and rdfs:label
, but for more specific concepts the best is to find an existing data model matching your model.
π Write an example RDF entity in the turtle format for each class you expect to create. Put the file(s) in the model
folder
You can search for relevant concepts in existing models in ontology repositories:
- Linked Open Vocabulary (LOV) for generic ontologies
- BioPortal for biomedical concepts by the NCBI πΊπΈ
- OntologyLookupService for biomedical concepts by the EBI πͺπΊ
- AgroPortal for agronomy by INRIA πΎ
- EcoPortal for ecology by Life Watch Italy
- Bartoc.org for social science and digital humanities
Here is a list of popular ontologies for generic or biomedical concepts:
- Semanticscience Integrated Ontology (SIO), a simple, integrated ontology of types and relations for rich description of objects, processes and their attributes.
- BioLink Model, A high level datamodel of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc) and their associations.
- Schema.org, a collaborative project to define schemes for structured data on the Internet, on web pages, in email messages, and beyond.
- Various classes described such as schema:Person, schema:MedicalGuideline, schema:Review, schema:ScholarlyArticle, schema:MedicalScholarlyArticle, schema:Dataset, etc.
- Extensions available, such as BioSchemas for biological data
- Alternatively you can look into Google Data Types, which are mainly built from schema.org and allow to describe and index your website using RDF (JSON-LD)
- DublinCore (dc, dct, dctypes), one of the most generic vocabulary (includes properties such as
dc:identifier
,dct:description
,dct:creator
,dct:license
,dct:rights
...) - PAV: Provenance, Authoring and Versioning ontology
- PROV: The Provenance Ontology, another ontology to describe provenance more in detail
- DCAT: Data Catalog Vocabulary, to describe datasets
- NCIT: National Cancer Institute Thesaurus, a vocabulary for clinical care, translational and basic research, and public information and administrative activities.
If you need to know which URI is behind a mysterious prefix, http://prefix.cc is a handy service to resolve prefixes., e.g. http://prefix.cc/bl
#
Define a validation shapeπ Write a SHACL or ShEx shape file describing exactly the model (classes and properties) you expect to use in the model
folder. This will be used later for validating the created KG.