crawl - A simple recursive web crawler which stores content in the WARC/1.0 format

Age	Commit message (Expand)	Author
2019-01-20	Refactor Handlers in terms of a Publisher interface	ale
2019-01-02	Add multi-file output	ale
2018-12-06	Apply --excludes to related resources too	ale
2018-09-02	Add --exclude and --exclude-file options	ale
2018-08-31	Add a simple test for the full WARC crawler	ale
2018-08-31	Explicitly delegate retry logic to handlers	ale
2018-08-31	Improve error handling, part two	ale
2018-08-31	Improve error checking	ale
2017-12-19	Provide better defaults for command-line options	ale
2017-12-19	Exit gracefully on signals	ale
2017-12-19	Use a global http.Client with sane settings	ale
2017-12-19	Update cmd/links to new scope syntax	ale
2017-12-19	Add tags (primary/related) to links	ale
2015-07-03	minor golint fixes	ale
2015-06-29	clean up the state directory when done	ale
2015-06-28	add ignore list from ArchiveBot	ale
2014-12-20	move URLInfo logic into the Crawler itself	ale
2014-12-20	make Scope checking more modular	ale
2014-12-20	move link extraction to a common location	ale
2014-12-20	move the WARC code into its own package	ale
2014-12-19	initial commit	ale