aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorale <ale@incal.net>2018-08-30 17:57:42 +0100
committerale <ale@incal.net>2018-08-30 17:57:42 +0100
commit86a0bd2d15a07662fdae4e24589b08706c2e80b9 (patch)
tree344cd315b8cd7ad8becb8578b7cd6f07d9d4826c
parent198633167822d6f96f43da273515a78df4dab2df (diff)
downloadcrawl-86a0bd2d15a07662fdae4e24589b08706c2e80b9.tar.gz
crawl-86a0bd2d15a07662fdae4e24589b08706c2e80b9.zip
Mention trickle as a possible bandwidth limiter
Since such bandwidth limiting is not provided by crawl directly, tell users there is another solution. Once/if crawl implements that on its own, that notice could be removed.
-rw-r--r--README.md7
1 files changed, 5 insertions, 2 deletions
diff --git a/README.md b/README.md
index 58403ab..0de9d15 100644
--- a/README.md
+++ b/README.md
@@ -3,8 +3,11 @@ A very simple crawler
This tool can crawl a bunch of URLs for HTML content, and save the
results in a nice WARC file. It has little control over its traffic,
-save for a limit on concurrent outbound requests. Its main purpose is
-to quickly and efficiently save websites for archival purposes.
+save for a limit on concurrent outbound requests. An external tool
+like `trickle` can be used to limit bandwidth.
+
+Its main purpose is to quickly and efficiently save websites for
+archival purposes.
The *crawl* tool saves its state in a database, so it can be safely
interrupted and restarted without issues.