Browsertrix: Run with `urlFile`

How to archive websites in bulk with Browsertrix in a local Docker container:

docker run \
# pass txt file as volume
-v $PWD/urls.txt:/urls.txt \
# run crawler
-v $PWD/crawls:/crawls/ -it webrecorder/browsertrix-crawler crawl \
# use a browser profile that was previously 
--profile /crawls/profiles/profile.tar.gz \
# set urlfile
--urlFile /urls.txt \
# get only the URLs from the file, no crawling beyond this
--scopeType page \
--generateWACZ \
# save everthing in a specific collection/directory
--collection collection-name

Schreibe einen Kommentar