I have experience in several areas:
Data Journalism and Data Science
Under Surveillance: How Location Data Jeopardizes German Security (Bayerischer Rundfunk, 2024)
Together with netzpolitik.org we uncovered people with access to intelligence services in Germany. We were able to do that because we had hundreds of GBs of GPS positions of smartphones. This is not only a massive privacy problem, but also one for national security.

- tagesschau.de: Das gefährliche Geschäft mit den Standortdaten (German)
- tagesschau-Podcast 11km: Investigativ-Recherche: Wie Handy-Daten zum nationalen Sicherheitsrisiko werden (German)
- BR-Podcast „Der Funkstreifzug“: Handel mit Standortdaten: Gefahr für die Innere Sicherheit (German)
- report München: Sendung am 16.7.2024 um 21:45 Uhr im Ersten oder in der ARD-Mediathek (German)
- BR24: Standortdaten: Spionagerisiko für Militär und Geheimdienste (German)
- netzpolitik.org: Die große Datenhändler-Recherche im Überblick (German)
- netzpolitik.org: Firma verschenkt 3,6 Milliarden Standorte von Menschen in Deutschland (German)
- netzpolitik.org: Jetzt testen: Wurde mein Handy-Standort verkauft? (German)
- netzpolitik.org: How data brokers turn our privacy into money and jeopardise national security (English)
Internet youth protection filter blocks educational content (Bayerischer Rundfunk, 2024)
Germany’s official youth protection filter often prevents access to websites with educational content for teenagers. This issue affects not only private providers but also public institutions, as revealed by investigations conducted by BR Data and netzpolitik.org. We set up a test
Tagesschau.de: Aufklärungsseiten für Jugendliche nicht erreichbar
BR24: Offizieller Jugendschutz-Filter blockiert Aufklärungsseiten
Netzpolitik.org: Deutschlands wichtigster Jugendschutz-Filter blockiert Hilfsangebote
Kandidatencheck for the state elections in Bavaria (Bayerischer Rundfunk, 2023)
A total of 1,035 direct candidates for 15 parties stood for election in Bavaria in the state elections in 2023. Who are they and what do they stand for? We asked them all the same questions and more than 800 people took part. They all answered questions on the most relevant topics of the election campaign.

BR24: Landtagswahl: Checken Sie hier Ihre Kandidaten
Training data for AI: We Are All Raw Material for AI (Bayerischer Rundfunk, 2023)
Training data for artificial intelligence include enormous amounts of images and text gathered from millions of websites. Our analysis of the LAION 5B dataset shows that it frequently contains sensitive and private data – usually without the knowledge of those concerned.

Elisa Harlan and me were awarded with Datenschutz Medienpreis of Berufsverband der Datenschutzbeauftragten Deutschlands (BvD)
How China Is Instrumentalizing the FAO (ARD/BR/MDR/SWR, 2023)
A story about dangerous pesticides and geostrategic interests in the context of the „Silk Road“. An investigation into Chinese influence on the United Nations. My contribution was the analysis of leaked data and documents of the United Nations Food and Agricultural Organisation (FAO).

Film in German (with English subtitles)
Audit of raciness APIs (Bayerischer Rundfunk, 2023)
Several companies offer to check images for raciness and output a probability value. Services like this are used in content moderation. At BR Data, we examined four of these services (Google, Amazon Web Services, Microsoft and Sightengine) and found (1) a gender bias and (2) significantly different ratings for the same image, depending on the manufacturer of the software.
BR24.de: Zu sexy: Wie KI-Algorithmen Frauen benachteiligen können
I talked about it on 11km, a Tageschau-Podcast (in German)
This investigation was a cooperation with Guardian US. Their story: ‘There is no standard’: investigation finds AI algorithms objectify women’s bodies
Interview about the methodology (Online-Recherche Newsletter)
Remove NA: A knowledge graph about queer history, (Prototype Fund 2022)
The Remove NA project links data science and domain knowledge with the goal of weaving queer data into the web of open, linked data. Results, an essay, and methodology are available on
- http://queerdata.forummuenchen.org/en in English and
- http://queerdata.forummuenchen.org in German

The project was funded by Prototype Fund and the German Ministry of Science and Education.

Operation „Honigbiene“, (Süddeutsche Zeitung, 2019)

Anyone entering China by land must know what to expect: The border police raids the smartphone, then an app extracts a lot of private information. We received the code of the surveillance app and analysed the code.
- Main story: Operation „Honigbiene“ (in English)
- Methodology I: Wie die SZ die Ausspäh-App geknackt hat
- Methodology II (Video): Wie die SZ die chinesische Polizei-App analysiert hat
The story was awarded with Journalistenpreis für Informatik Universität Saarland and got a „honorable mention“ from the researchnetwork surveillance-studies.org.
„Blaue Bücher, rosa Bücher“ (Süddeutsche Zeitung, 2019)

Pirates for boys and fairies for girls? We investigated 50k German language children’s books and identified enduring stereotypes around gender. I was primarily working on a network analysis. Nominated for the Digital Humanities Award 2018.
- Main story „Blaue Bücher, rosa Bücher“
- Methodology: So sind wir an die Daten gekommen
- A Twitter Thread about the investigation
“Wie hat Ihr Stimmkreis gewählt?” (Süddeutsche Zeitung, 2018)



Writing by numbers: After an election there are two tasks for journalism. First, report the results instantly (which is easy) and second interpret the results (which is not that easy). We wanted to do these two things in an automated way and as fast as possible for all election districts in Bavaria and Hesse. The approach were auto-generated texts and visualisations based on the results of every single district in Bavaria and Hesse: Did a district vote extraordinarily? Similar to the national level? Just slightly different? The method for finding differences was the Jenks algorithm.
- Article: Wie hat Ihr Stimmkreis gewählt?
- Examples: München-Mitte vs. Regen, Freyung-Grafenau
- Blogpost: Automatisierter Journalismus: Schreiben nach Zahlen
#analysis #datavis #textgeneration #datapipeline #rstats
Das gespaltene Parlament (Süddeutsche Zeitung, 2018)

Using text mining in political reporting. How does a right-wing, populist party change the atmosphere and the debates in the German parliament? Answers can be found in the official protocols of the Bundestag. The story was awarded with the Nannenpreis 2019.
- Main story in German or in English
- Data and Code can be found on Github.
- Methodology: Das steckt in den Bundestagsprotokollen
#textmining #datamining #dataanalysis #datavis #rstats
“Wie wir über Umfragen berichten” (Süddeutsche Zeitung, 2017)
Show more uncertainty to be more precise. Traditionally, media outlets are reporting about a new poll in the following style:
If an election would be held today, party x would get y percent
of the votes. This is a decline of z percent compared to the previous
week.
Covering polls like this is oversimplifying and even dangerous. Polls have real impact on decisions of politicians and voters, e.g. due to feedback loops. Pollsters want to mirror the views of a whole electorate by asking 1000 to 2000 people. Of course, there is uncertainty. The approach: Making the visualization more complex, but more precise by showing the uncertainty.

- Code on Github
- Methodology: Wie wir über Umfragen berichten – in English: How we report on polls
- Umfragen sind nur ein Schnappschuss der Gegenwart – in English: Surveys are just a snapshot of the present
- Süddeutsche Zeitung is improving the way media reports on political polls: (https://twitter.com/GENinnovate) about the project
#datavis #rstats #statistics #pollingdata
Der Facebook-Faktor – Wie das soziale Netzwerk die Wahl beeinflusst (Süddeutsche Zeitung, 2017)

Getting an idea of the blackbox Facebook: Investigating the political sphere on Facebook by crawling the sites of political parties and active users. We evaluated more than one million public Facebook likes from a little less than 5000 politically interested Facebook users.
Awarded with the Acatech prize for Tech Journalism 2017.
- Main story: Der Facebook Faktor
- Methodology: So haben wir die Daten recherchiert
- Von AfD bis Linkspartei – so politisch ist Facebook
- Was links und rechts verbindet – und trennt
- Wie es in Facebooks Echokammern aussieht – von links bis rechts
- All stories
#socialmedia #datavis #analysis #rstats
Research
Digitization Strategies of German Federal States (Katharina Brunner, Andreas Jager, Thomas Hess, Ursula Münch), Bavarian Research Institute for Digital Transformation (bidt):
How Can Politics Shape the Digital Transformation? The study traces the development of strategies at the state level and examines how defined measures can be steered and implemented.

Software
I published the following open-source software packages:
Generative Art
The R
package generativeart
let’s you create images based on many thousand points. The position of every single point is calculated by a formula, which has random parameters. Because of the random numbers, every image looks different.

- Get the code on Github
- Blog post: Generative Art: How thousands of points can form beautiful images
- Blog post: When Two Points on a Circle Form a Line
Destatis Cleaner
Update May 2020: This package is no longer needed. The Federal Statistical Office of Germany, Destatis, listened to it’s users: You can now download data as a flat file csv or use an API.
The csv
files of Destatis, the Federal Statistical Office of Germany, don’t comply with common standards if a tidy, ready-to-use machine-readable dataset. This tools helps to jump start the data analysis by doing the time-consuming cleaning tasks:

You can find Destatis Cleaner on apps.katharinabrunner.de/destatiscleaner/
If you are an R user, you can work with the destatiscleanr
package. You can find the code and instructions on Github
germanpolls
A few files of code to get German polling data from wahlrecht.de on to your computer: Github
Technical Writing/Tutorials
- How to Network: An Introduction to Network Analysis and Visualization with R
- Introduction to relational data models (dm R-package)
- Filtering in relational data models (dm R-package)
- Joining in relational data models (dm R-package)