aboutsummaryrefslogtreecommitdiff
path: root/README.md
AgeCommit message (Collapse)Author
2022-03-24links, crawl: dramatically reduce memory usageJordan
to prevent excessive memory usage and OOM crashes, rather than store and pass around response bodies in memory buffers, let's store them temporarily on the filesystem wget-style and delete them when processed
2022-02-14crawl, readme: record assembled seed URLs to seed_urls fileJordan
2022-02-11readme: go get -> go install (deprecated), misc updatesJordan
2022-02-10crawl, readme: max default WARC size 100 MB -> 5 GBJordan
2022-02-10readme: typo correction, spacingJordan
2022-02-10readme: document changes from upstreamJordan
2019-12-04Fix installation instructionsale
2019-11-13Add contact email addressale
2019-01-02Add multi-file outputale
The output stage can now write to size-limited, rotating WARC files using a user-specified pattern, so that output files are always unique.
2018-09-02Fix typoale
2018-09-02Explicitly mention the crawler limitationsale
2018-08-30Mention trickle as a possible bandwidth limiterale
Since such bandwidth limiting is not provided by crawl directly, tell users there is another solution. Once/if crawl implements that on its own, that notice could be removed.
2018-08-30Improve install instructions a bit moreale
2018-08-30Update installation instructionsale
2017-12-19Provide better defaults for command-line optionsale
Defaults that are more suitable to real-world site archiving.
2017-12-19Add a READMEale