aboutsummaryrefslogtreecommitdiff
path: root/cmd/links
AgeCommit message (Collapse)Author
2022-03-24misc: update handler signatures, tests, housekeepingJordan
2022-02-10misc: update crawl paths to reflect fork locationJordan
2020-02-17Fix the Handler in cmd/linksale
2019-01-20Refactor Handlers in terms of a Publisher interfaceale
Introduce an interface to decouple the Enqueue functionality from the Crawler implementation.
2018-08-31Explicitly delegate retry logic to handlersale
Makes it possible to retry requests for temporary HTTP errors (429, 500, etc).
2018-08-31Improve error handling, part twoale
Handler errors are fatal, so that an error writing the WARC output will cause the crawl to abort.
2018-08-31Improve error checkingale
Detect write errors (both on the database and to the WARC output) and abort with an error message. Also fix a bunch of harmless lint warnings.
2017-12-19Update cmd/links to new scope syntaxale
2014-12-20move URLInfo logic into the Crawler itselfale
2014-12-20make Scope checking more modularale
2014-12-20move link extraction to a common locationale
2014-12-20move the WARC code into its own packageale
Now generates well-formed, indexable WARC files.
2014-12-19initial commitale