RDF, RML, YARRRML: A basic tutorial to create Linked Data from a relational database table

Crucial in the construction of a knowledge graph is to turn input data into linked data by using ontologies as connectors. When you start researching knowledge graphs (like I did a few months ago), it quickly becomes clear that this is very central. But how do you do that?

This post is meant to be a minimal example that shows how to write a RDF file from a relational database table.

Basic setup: RDF Mapping Language

In order to generate a RDF-file, I have to write some kind of mapping file that defines the rules on how my input data and ontologies play together. One way to do that is using the RML, which stands for RDF Mapping Language (other methods and tools are listed here). It is developed by a group of researchers at the University of Ghent and they provide a whole toolset and some good documentation to work with RML. (The specification can be found here).

RML is an extension of R2RML. In contrast to RML, R2RML is a W3C standard. Both RML and R2RML are used to declare rules on how to generate RDF files.

Important for this minimal example:

  • RML is built on top of R2RML.
  • R2RML takes input from relational databases
  • RML also takes flat files, like json and csv files.

The downside: Made for machines

If you look at an RML mapping file, for instance here, it looks like I don’t want to work with that. It’s clearly made for machine consumption and not for a human brain.

The solution: YA(RRR)ML

The research group from Ghent knows that as well, that’s why they developed YARRRML:

“YARRRML is a human-friendly text-based representation of RML rules.

See specs here.

So I can write some YAML, define rules which are then transformed to RML or R2RML. For instance, this is the YAML mapping for example I linked above.

There is also a web-based version called Matey. Really nice and very helpful, especially for a smooth start!

I won’t go into details here on how to build that YAML file from flat files. This is covered very well on the website and accompanying tutorials.

But what I missed was a small example to work with a relational database.

A minimal example: DBMS to YARRRML to R2ML

Assume, I have a database with one table I want to access. It stores information on books.

You can access this data with sources where you declare the database table(s) as well as a queryFormulation (see specs). You can also include SQL queries in the rules file. Moreover this YAML file defines prefixes and mapping which are covered extensively on rml.io.

I save the following part as rules.yml:

prefixes:
  ex: "http://example.com/"
  schema: "https://schema.org/"

sources:
  books:
    table: books_table
    queryFormulation: mysql

mappings:
  BookMapping:
    sources: books
    s: http://example.com/$(book_id)
    po:
      - [a, schema:Book]
      - [schema:name, $(name)]
      - [schema:isbn, $(isbn)]
      - [schema:author, $(author)]

Of course, the rml.io group also offers a CLI parser that transforms the YAML file to RML or R2RML. Since I work on a MariaDB I need R2RML as output.

Get a r2rml.ttl files based on the rules.yml:

yarrrml-parser -i rules.yml -o mapping.r2rml.ttl -f R2RML

Last step: Create RDF from R2RML

Now, there is only one step missing: Take the mapping file and apply it to the input data.

For sure, there are many ways to do that, I use kglab. It’s an abstraction layer and combines many popular Python libraries for working with linked data.

One of the libraries that are included in kglab is morph-kgc, which can take DBMS and R2RML files in order to generate a RDF file. Exactly what I want to do!

(Even though the kglab tutorials are very good, they also assume that miraculously an RML file already exists.)

As a template I used kglab’s tutorial on “Using morph-kgc to input from relational databases, CSV, etc”.

In the config.ini I point to the R2RML mapping as well as to my MYSQL/MariaDB database:

[DataSource1]
mappings=mappings/mapping.r2rml.ttl
db_url=mysql+pymysql://root:passwort@(local)host:port/database_name

Then I follow the the tutorial provided in kglabs documentation:

import kglab

namespaces = {
    "ex":  "http://example.com/",
    "schema": "https://schema.org/"
    }

kg = kglab.KnowledgeGraph(
    name = "A KG example",
    namespaces = namespaces
    )

kg.materialize('config.ini')

# save RDF as ttl
kg.save_rdf("output/rdf-triples.ttl")

# save RDF as jsonld
kg.save_jsonld("output/rdf-triples.jsonld")

That’s it!

I can open rdf-triples.ttl or rdf-triples.jsonld and think about how to improve my data model.

How to improve (and what’s not working)

I haven’t figured out yet how to build a named graph by using graph: ex:mygraph in the YARRRML file. I guess, it’s the error Found an invalid graph termtype appears when using morph-kgc.

Also, I am undecided if I am going put logic into YAML or do i build SQL views and just get the data from that?

Ressources and Links