From 86a0bd2d15a07662fdae4e24589b08706c2e80b9 Mon Sep 17 00:00:00 2001 From: ale Date: Thu, 30 Aug 2018 17:57:42 +0100 Subject: Mention trickle as a possible bandwidth limiter Since such bandwidth limiting is not provided by crawl directly, tell users there is another solution. Once/if crawl implements that on its own, that notice could be removed. --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index 58403ab..0de9d15 100644 --- a/README.md +++ b/README.md @@ -3,8 +3,11 @@ A very simple crawler This tool can crawl a bunch of URLs for HTML content, and save the results in a nice WARC file. It has little control over its traffic, -save for a limit on concurrent outbound requests. Its main purpose is -to quickly and efficiently save websites for archival purposes. +save for a limit on concurrent outbound requests. An external tool +like `trickle` can be used to limit bandwidth. + +Its main purpose is to quickly and efficiently save websites for +archival purposes. The *crawl* tool saves its state in a database, so it can be safely interrupted and restarted without issues. -- cgit v1.2.3-54-g00ecf