Get direct URL of Wiki Commons image by filename from Media Wiki API in R/rstats

Assume you have the file name of an image in Wikimedia, like MJK_16033_Rosa_von_Praunheim_(Berlinale_2018)_crop.jpg and you want to get the direct URL of that image. In this case, this URL is called https://upload.wikimedia.org/wikipedia/commons/8/85/MJK_16033_Rosa_von_Praunheim_%28Berlinale_2018%29_crop.jpg.
The part after commons os from the MD5 hash (see comment) That part after commons cannot be guessed, but can also retrieved from the Media Wiki API.

A function to that in R with httr2 package:

get_url_of_wikimedia_img_file <- function(wiki_media_file_name) {
  media_wiki_api <- URLencode(paste0("https://en.wikipedia.org/w/api.php?action=query&titles=File:", wiki_media_file_name, "&prop=imageinfo&iiprop=url&format=json"))
  res <- httr2::request(media_wiki_api) |> httr2::req_perform() |> httr2::resp_body_json()
  url <- res$query$pages$`-1`$imageinfo[[1]]$url
  url
}

P.S. Why do I need to do that? I want to automatically import those images in Omeka S – and if a URL is not working, the whole Omeka object won’t be created.

4 Kommentare

Albin Larsson 16. Oktober 2024

@blog the part between "commons/" and the filename is from the MD5 hash of the filename so you should just calculate it.The Wikimeda Commons FAQ got more information and examples: https://commons.wikimedia.org/wiki/Commons:FAQ#What_are_the_strangely_named_components_in_file_paths?

Katharina 17. Oktober 2024 Antworten

Thanks! I updated the post

Katharina Brunner 17. Oktober 2024

Thanks! I updated the post

Frank Reichert 17. Februar 2025

@blog Cool! Hab ich ähnlich gemacht, um Wikimedia-Commons-Kategorien auszulesen (um sie mit einer METS/MODS-XML-Datei im DFG-Viewer anzuzeigen), siehe https://www.vermessungs-bibliothek.de/commons-dfg-viewer. Das mit dem MD5-Hashwert hab ich auch nicht gewusst und gerade erst hier gefunden!

Schreibe einen Kommentar