aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorJordan <me@jordan.im>2022-02-14 21:02:12 -0700
committerJordan <me@jordan.im>2022-02-14 21:02:12 -0700
commita6a6fef1c7cc7d6878e8aa36541565fb3e0c9747 (patch)
tree7928f9229c26a12917a2303408dd6ce4fb691432 /README.md
parent13996013034f19d0d5ddf00a2926d2a117610170 (diff)
downloadcrawl-a6a6fef1c7cc7d6878e8aa36541565fb3e0c9747.tar.gz
crawl-a6a6fef1c7cc7d6878e8aa36541565fb3e0c9747.zip
crawl, readme: record assembled seed URLs to seed_urls file
Diffstat (limited to 'README.md')
-rw-r--r--README.md1
1 files changed, 1 insertions, 0 deletions
diff --git a/README.md b/README.md
index 5c740fc..128088c 100644
--- a/README.md
+++ b/README.md
@@ -15,6 +15,7 @@ Notable changes include:
* update ignore regex set per updates to
[ArchiveBot](https://github.com/ArchiveTeam/ArchiveBot)
* max default WARC size 100 MB -> 5 GB
+* record assembled seed URLs to seed_urls file
This tool can crawl a bunch of URLs for HTML content, and save the
results in a nice WARC file. It has little control over its traffic,