aboutsummaryrefslogtreecommitdiff
path: root/crawler.go
AgeCommit message (Expand)Author
2022-03-24crawler: close temporary descriptor in advance of defer (performance)Jordan
2022-03-24crawler: continue crawl when context deadline exceeded (timeout)Jordan
2022-03-24crawler: rm temporary body store once processed in advance of deferJordan
2022-03-24misc: update handler signatures, tests, housekeepingJordan
2021-06-19Ignore URL decode errorsale
2020-08-20Panic instead of just dying with fatal errorale
2020-07-30Panic on fatal errorsale
2020-02-17Propagate the link tag through redirectsale
2019-01-20Refactor Handlers in terms of a Publisher interfaceale
2019-01-19Replace URLInfo with a simple URL presence checkale
2018-12-27Normalize URLs before checking if they are in scopeale
2018-08-31Do not drop /index.html at the end of URLsale
2018-08-31Explicitly delegate retry logic to handlersale
2018-08-31Improve error handling, part twoale
2018-08-31Improve error checkingale
2017-12-19Exit gracefully on signalsale
2017-12-19Simplify redirectHandler.Handleale
2017-12-19Add tags (primary/related) to linksale
2017-12-18Switch to github.com/syndtr/goleveldbale
2015-07-03minor golint fixesale
2015-06-29clean up the state directory when doneale
2015-06-29improve queue code; golint fixesale
2014-12-20move URLInfo logic into the Crawler itselfale
2014-12-20add a prefix iterator to gobDbale
2014-12-20make Scope checking more modularale
2014-12-20move the WARC code into its own packageale
2014-12-19initial commitale