Crucial in the construction of a knowledge graph is to turn input data into linked data by using ontologies as connectors. When you start researching knowledge graphs (like I did a few months ago), it quickly becomes clear that this is very central. But how do you do that?
This post is meant to be a minimal example that shows how to write a RDF file from a relational database table.
Basic setup: RDF Mapping Language
In order to generate a RDF-file, I have to write some kind of mapping file that defines the rules on how my input data and ontologies play together. One way to do that is using the RML, which stands for RDF Mapping Language (other methods and tools are listed here). It is developed by a group of researchers at the University of Ghent and they provide a whole toolset and some good documentation to work with RML. (The specification can be found here).
RML is an extension of R2RML. In contrast to RML, R2RML is a W3C standard. Both RML and R2RML are used to declare rules on how to generate RDF files.
Important for this minimal example:
RMLis built on top ofR2RML.R2RMLtakes input from relational databasesRMLalso takes flat files, likejsonandcsvfiles.
The downside: Made for machines
If you look at an RML mapping file, for instance here, it looks like I don’t want to work with that. It’s clearly made for machine consumption and not for a human brain.
The solution: YA(RRR)ML
The research group from Ghent knows that as well, that’s why they developed YARRRML:
“YARRRML is a human-friendly text-based representation of RML rules.
So I can write some YAML, define rules which are then transformed to RML or R2RML. For instance, this is the YAML mapping for example I linked above.
There is also a web-based version called Matey. Really nice and very helpful, especially for a smooth start!
I won’t go into details here on how to build that YAML file from flat files. This is covered very well on the website and accompanying tutorials.
But what I missed was a small example to work with a relational database.
A minimal example: DBMS to YARRRML to R2ML
Assume, I have a database with one table I want to access. It stores information on books.
You can access this data with sources where you declare the database table(s) as well as a queryFormulation (see specs). You can also include SQL queries in the rules file. Moreover this YAML file defines prefixes and mapping which are covered extensively on rml.io.
I save the following part as rules.yml:
prefixes:
ex: "http://example.com/"
schema: "https://schema.org/"
sources:
books:
table: books_table
queryFormulation: mysql
mappings:
BookMapping:
sources: books
s: http://example.com/$(book_id)
po:
- [a, schema:Book]
- [schema:name, $(name)]
- [schema:isbn, $(isbn)]
- [schema:author, $(author)]
Of course, the rml.io group also offers a CLI parser that transforms the YAML file to RML or R2RML. Since I work on a MariaDB I need R2RML as output.
Get a r2rml.ttl files based on the rules.yml:
yarrrml-parser -i rules.yml -o mapping.r2rml.ttl -f R2RML
Last step: Create RDF from R2RML
Now, there is only one step missing: Take the mapping file and apply it to the input data.
For sure, there are many ways to do that, I use kglab. It’s an abstraction layer and combines many popular Python libraries for working with linked data.
One of the libraries that are included in kglab is morph-kgc, which can take DBMS and R2RML files in order to generate a RDF file. Exactly what I want to do!
(Even though the kglab tutorials are very good, they also assume that miraculously an RML file already exists.)
As a template I used kglab’s tutorial on “Using morph-kgc to input from relational databases, CSV, etc”.
In the config.ini I point to the R2RML mapping as well as to my MYSQL/MariaDB database:
[DataSource1]
mappings=mappings/mapping.r2rml.ttl
db_url=mysql+pymysql://root:passwort@(local)host:port/database_name
Then I follow the the tutorial provided in kglabs documentation:
import kglab
namespaces = {
"ex": "http://example.com/",
"schema": "https://schema.org/"
}
kg = kglab.KnowledgeGraph(
name = "A KG example",
namespaces = namespaces
)
kg.materialize('config.ini')
# save RDF as ttl
kg.save_rdf("output/rdf-triples.ttl")
# save RDF as jsonld
kg.save_jsonld("output/rdf-triples.jsonld")
That’s it!
I can open rdf-triples.ttl or rdf-triples.jsonld and think about how to improve my data model.
How to improve (and what’s not working)
I haven’t figured out yet how to build a named graph by using graph: ex:mygraph in the YARRRML file. I guess, it’s the error Found an invalid graph termtype appears when using morph-kgc.
Also, I am undecided if I am going put logic into YAML or do i build SQL views and just get the data from that?
Ressources and Links
- RML.io: Software, tools and links to examples on how to use
YARRRML,RML… - Tutorial: generating Linked Data with YARRRML from rml.io
- kglab: Python abstraction layer for working with knowledge graphs
- Paper on differences between RML and R2RML
- Example RML mappings for inspiration
- Overview on different methods on how to convert tabular data to RDF in FAIR Cookbook: An inventory of tools for converting your data to RDF