How to Export Your Mozilla Firefox History as a Dataframe in R

The goal of this post is to export a Mozilla Firefox Browser history and import in R as a dataframe.

Browser history data

Firefox saves your browsing history in a file called places.sqlite. This file contains several tables, like bookmarks, favicons or the history.

To get a dataframe with visited websites, you need two tables from the sqlite file:

  1. moz_historyvisits: it contains all websites you visited with time and date. All websites have an id instead of a readable URL.
  2. moz_places: it contains the translation of the websites id and its actual URL.

More on the database schema:

Import the data into R

sqlite files can be imported with the package RSQLite.

First, find the places.sqlite on your computer. You can get the path, by visiting about:support in Firefox and looking for the Profiles directory.

library(RSQLite)
library(purrr)
library(here)

# connect to database
con <- dbConnect(drv = RSQLite::SQLite(), 
                 dbname = "path/to/places.sqlite",
                 bigint="character")

# get all tables
tables <- dbListTables(con)

# remove internal tables
tables <- tables[tables != "sqlite_sequence"]
# create a list of dataframes
list_of_df <- purrr::map(tables, ~{
  dbGetQuery(conn = con, statement=paste0("SELECT * FROM '", .x, "'"))
})
# get the list of dataframes some names
names(list_of_df) <- tables

Extract browser history

Next, we extract the two tables with the information we need, join them and keep only the visited url, the time and the URL id.

There are two caveats:

  1. The timestamps are saved in the PRTime format, which is basically an unix timestamp and you have to convert it in a human-readable format
  2. Extract the domain of a URL using the urltools package, e.g. getting twitter.com instead of twitter.com/cutterkom
library(urltools)
# get the two dataframes 
history <- list_of_df[["moz_historyvisits"]]
urls <- list_of_df[["moz_places"]]

df <- left_join(history, urls, by = c("place_id" = "id")) %>% 
  select(place_id, url, visit_date) %>% 
  # convert the unix timestamp
  mutate(date = as.POSIXct(as.numeric(visit_date)/1000000, origin = '1970-01-01', tz = 'GMT'),
  # extract the domains from the URL, e.g. `twitter.com` instead of `twitter.com/cutterkom`
         domain = str_remove(urltools::domain(url), "www\\."))

When Two Points on a Circle Form a Line

There are many ways to produce computer created abstract images. I show you one them written in R, that leads to images like these:

First of all, let’s set the stage with a config part:

#### load packages
#### instead of tidyverse you can also use just ggplot2, purrr and magrittr
library(here)
library(tidyverse)

####
#### Utils functions neccessary: 
#### You can find them in the generativeart package on Github: github.com/cutterkom/generativeart.
#### Here they are stored in `src/generate_img.R`.
####
source(here("src/generate_img.R"))
NR_OF_IMG <- 1
LOGFILE_PATH <- "logfile/logfile.csv"

The base concept is:

  • form a starting distribution of the points
  • transform the data

In this case, our starting point is a circle. I create the data with a formula called get_circle_data(). The function was proposed on Stackoverflow by Joran Elias.

get_circle_data <- function(center = c(0,0), radius = 1, npoints = 100){
  tt <- seq(0, 2*pi, length.out = npoints)
  xx <- center[1] + radius * cos(tt)
  yy <- center[2] + radius * sin(tt)
  return(data.frame(x = xx, y = yy))
}

The circle dataframe goes straight into a generate_data(), where every point on the circle is connected to excatly one other point. The connections between a pair of coordinates are based on randomness, see sample(nrow(df2)):

generate_data <- function() {
  print("generate data")
  df <- get_circle_data(c(0,0), 1, npoints = 100)
  df2 <- df %>% 
    mutate(xend = x,
           yend = y) %>% 
    select(-x, -y)
  df2 <- df2[sample(nrow(df2)),]
  df <- bind_cols(df, df2)
  return(df)
} 

The dataframe is input to a ggplot::geom_segment() plotting function:

generate_plot <- function(df, file_name, coord) {
  print("generate plot")
  plot <- df %>% 
    ggplot() +
    geom_segment(aes(x = x, y = y, xend = xend, yend = yend), color = "black", size = 0.25, alpha = 0.6) +
    theme_void() +
    coord_equal()
  
  print("image saved...")
  plot
}

Now we have all parts gathered to run the wrapper function generate_img from the generativeart package that indeed creates an image:

generate_img()

From here, you can play with the input parameters to generate different looking images. You can change these variables in get_circle_data():

  • center = c(0,0): changes nothing when you draw only one circle, the center can be anywhere
  • radius = 1: numbers greater than 1 for rings within the circle
  • npoints = 100: Higher numbers for denser circle lines

You can find the code in an .Rmd script on Github.

Automatisierter Journalismus: Schreiben nach Zahlen

Radar ist eine Presseagentur aus Großbritannien, deren Quelle offene Daten sind. Mit Hilfe von Software schreiben Journalisten dann nicht einen Text, sondern viele Texte gleichzeitig:

Our journalists select the most promising data, mine the data to find the story, develop the different angles and then compose a template that instructs the technology on what sentence to write as it computes the numbers in the spread sheet. We are writing stories as mini-algorithms for each new set of data.


Mehr dazu in diesem Text: How RADAR became front page news: Lessons from the first year of an automated news agency

Bei der Süddeutschen Zeitung haben wir das für die Landtagswahlen im Herbst auch gemacht: Ein statistisches Modell hat jedes Stimmkreisergebnis mit allen anderen Stimmkreisen verglichen und vom Resultat abhängige Texte formuliert. Zum Beispiel für München-Mitte:

Unsere Tweets damals nach der Wahl:

The R package to Create Generative Art


Do you want to create #generativeart with #rstats? I made a package for this purpose. It is called generativeart and you can find it on Github.

You can find more images on my Instagram account @cutterkom.

Description

One overly simple but useful definition is that generative art is art programmed using a computer that intentionally introduces randomness as part of its creation process.
Why Love Generative Art? – Artnome

The R package generativeart let’s you create images based on many thousand points. The position of every single point is calculated by a formula, which has random parameters. Because of the random numbers, every image looks different.

In order to make an image reproducible, generative art implements a log file that saves the file_name, the seed and the formula.

The R package to Create Generative Art weiterlesen

Generative Art: How thousands of points can form beautiful images

These images are based on simple points. This post explains how it works.

1. Step: Point, points, points …

The starting point is a rectangle a grid that is populated with many thousand points, in this case 3,969.

The retangle is placed in a coordinate system, so every point has two coordinates (x, y).

2. Step:

Now, the position of every single points is transformed. This new position is calculated by a formula, which has random parameters. Because of the random numbers, every image looks different.

For example, using a combination of sine, cosine and the random factor:

Circle resembling shapes are created by using a polar coordinate system:

Do it yourself

I wrote a package called generativeart which helps to create those kind of images with R.

You can get the package on Github.