Crucial in the construction of a knowledge graph is to turn input data into linked data by using ontologies as connectors. When you start researching knowledge graphs (like I did a few months ago), it quickly becomes clear that this is very central. But how do you do that?
This post is meant to be a minimal example that shows how to write a RDF
file from a relational database table.
Basic setup: RDF Mapping Language
In order to generate a RDF
-file, I have to write some kind of mapping file that defines the rules on how my input data and ontologies play together. One way to do that is using the RML
, which stands for RDF Mapping Language
(other methods and tools are listed here). It is developed by a group of researchers at the University of Ghent and they provide a whole toolset and some good documentation to work with RML
. (The specification can be found here).
RML
is an extension of R2RML
. In contrast to RML
, R2RML
is a W3C standard. Both RML
and R2RML
are used to declare rules on how to generate RDF
files.
Important for this minimal example:
RML
is built on top ofR2RML
.R2RML
takes input from relational databasesRML
also takes flat files, likejson
andcsv
files.
The downside: Made for machines
If you look at an RML
mapping file, for instance here, it looks like I don’t want to work with that. It’s clearly made for machine consumption and not for a human brain.
The solution: YA(RRR)ML
The research group from Ghent knows that as well, that’s why they developed YARRRML:
“YARRRML is a human-friendly text-based representation of RML rules.
So I can write some YAML
, define rules which are then transformed to RML
or R2RML
. For instance, this is the YAML mapping for example I linked above.
There is also a web-based version called Matey. Really nice and very helpful, especially for a smooth start!
I won’t go into details here on how to build that YAML
file from flat files. This is covered very well on the website and accompanying tutorials.
But what I missed was a small example to work with a relational database.
A minimal example: DBMS to YARRRML to R2ML
Assume, I have a database with one table I want to access. It stores information on books.
You can access this data with sources
where you declare the database table(s) as well as a queryFormulation
(see specs). You can also include SQL
queries in the rules file. Moreover this YAML
file defines prefixes
and mapping
which are covered extensively on rml.io.
I save the following part as rules.yml
:
prefixes:
ex: "http://example.com/"
schema: "https://schema.org/"
sources:
books:
table: books_table
queryFormulation: mysql
mappings:
BookMapping:
sources: books
s: http://example.com/$(book_id)
po:
- [a, schema:Book]
- [schema:name, $(name)]
- [schema:isbn, $(isbn)]
- [schema:author, $(author)]
Of course, the rml.io group also offers a CLI parser that transforms the YAML
file to RML
or R2RML
. Since I work on a MariaDB I need R2RML
as output.
Get a r2rml.ttl
files based on the rules.yml
:
yarrrml-parser -i rules.yml -o mapping.r2rml.ttl -f R2RML
Last step: Create RDF
from R2RML
Now, there is only one step missing: Take the mapping file and apply it to the input data.
For sure, there are many ways to do that, I use kglab
. It’s an abstraction layer and combines many popular Python libraries for working with linked data.
One of the libraries that are included in kglab
is morph-kgc
, which can take DBMS and R2RML
files in order to generate a RDF
file. Exactly what I want to do!
(Even though the kglab
tutorials are very good, they also assume that miraculously an RML
file already exists.)
As a template I used kglab
’s tutorial on “Using morph-kgc
to input from relational databases, CSV, etc”.
In the config.ini
I point to the R2RML
mapping as well as to my MYSQL/MariaDB database:
[DataSource1]
mappings=mappings/mapping.r2rml.ttl
db_url=mysql+pymysql://root:passwort@(local)host:port/database_name
Then I follow the the tutorial provided in kglab
s documentation:
import kglab
namespaces = {
"ex": "http://example.com/",
"schema": "https://schema.org/"
}
kg = kglab.KnowledgeGraph(
name = "A KG example",
namespaces = namespaces
)
kg.materialize('config.ini')
# save RDF as ttl
kg.save_rdf("output/rdf-triples.ttl")
# save RDF as jsonld
kg.save_jsonld("output/rdf-triples.jsonld")
That’s it!
I can open rdf-triples.ttl
or rdf-triples.jsonld
and think about how to improve my data model.
How to improve (and what’s not working)
I haven’t figured out yet how to build a named graph by using graph: ex:mygraph
in the YARRRML file. I guess, it’s the error Found an invalid graph termtype
appears when using morph-kgc
.
Also, I am undecided if I am going put logic into YAML
or do i build SQL views and just get the data from that?
Ressources and Links
- RML.io: Software, tools and links to examples on how to use
YARRRML
,RML
… - Tutorial: generating Linked Data with YARRRML from rml.io
- kglab: Python abstraction layer for working with knowledge graphs
- Paper on differences between RML and R2RML
- Example RML mappings for inspiration
- Overview on different methods on how to convert tabular data to RDF in FAIR Cookbook: An inventory of tools for converting your data to RDF