Browsertrix: Run with `urlFile`

22. Februar 2024 In Data + Code

How to archive websites in bulk with Browsertrix in a local Docker container:

docker run \
# pass txt file as volume
-v $PWD/urls.txt:/urls.txt \
# run crawler
-v $PWD/crawls:/crawls/ -it webrecorder/browsertrix-crawler crawl \
# use a browser profile that was previously 
--profile /crawls/profiles/profile.tar.gz \
# set urlfile
--urlFile /urls.txt \
# get only the URLs from the file, no crawling beyond this
--scopeType page \
--generateWACZ \
# save everthing in a specific collection/directory
--collection collection-name

#Digital Archive #Internet Archive

1 aus 24: Torero, ich hab Angst – Pedro Lemebel

Archiving websites: The Internet Archives Save Page Now (SPN) API

Ein Kommentar

Pingback: browsertrix: fetch all on forummuenchen.org – Katharina Brunner

Ein Kommentar

Schreibe einen Kommentar Antwort abbrechen

Ähnliche Beiträge

Leaflet.js: Regensburg und seine Migranten

Das CMS ist tot, es lebe das CMS!

Storyteller: Multimediales Storytelling möglichst einfach

Mehrdeutiges in Schubladen packen