aboutsummaryrefslogtreecommitdiff
path: root/crawler_test.go
AgeCommit message (Collapse)Author
2022-03-24misc: update handler signatures, tests, housekeepingJordan
2020-02-17Propagate the link tag through redirectsale
In order to do this we have to plumb it through the queue and the Handler interface, but it should allow fetches of the resources associated with a page via the IncludeRelatedScope even if it's behind a redirect.
2019-01-20Refactor Handlers in terms of a Publisher interfaceale
Introduce an interface to decouple the Enqueue functionality from the Crawler implementation.
2018-08-31Add a simple test for the full WARC crawlerale
2018-08-31Explicitly delegate retry logic to handlersale
Makes it possible to retry requests for temporary HTTP errors (429, 500, etc).
2017-12-19Add tags (primary/related) to linksale
This change allows more complex scope boundaries, including loosening edges a bit to include related resources of HTML pages (which makes for more complete archives if desired).