diff options
author | ale <ale@incal.net> | 2018-08-30 17:57:42 +0100 |
---|---|---|
committer | ale <ale@incal.net> | 2018-08-30 17:57:42 +0100 |
commit | 86a0bd2d15a07662fdae4e24589b08706c2e80b9 (patch) | |
tree | 344cd315b8cd7ad8becb8578b7cd6f07d9d4826c | |
parent | 198633167822d6f96f43da273515a78df4dab2df (diff) | |
download | crawl-86a0bd2d15a07662fdae4e24589b08706c2e80b9.tar.gz crawl-86a0bd2d15a07662fdae4e24589b08706c2e80b9.zip |
Mention trickle as a possible bandwidth limiter
Since such bandwidth limiting is not provided by crawl directly, tell
users there is another solution. Once/if crawl implements that on its
own, that notice could be removed.
-rw-r--r-- | README.md | 7 |
1 files changed, 5 insertions, 2 deletions
@@ -3,8 +3,11 @@ A very simple crawler This tool can crawl a bunch of URLs for HTML content, and save the results in a nice WARC file. It has little control over its traffic, -save for a limit on concurrent outbound requests. Its main purpose is -to quickly and efficiently save websites for archival purposes. +save for a limit on concurrent outbound requests. An external tool +like `trickle` can be used to limit bandwidth. + +Its main purpose is to quickly and efficiently save websites for +archival purposes. The *crawl* tool saves its state in a database, so it can be safely interrupted and restarted without issues. |