Age | Commit message (Collapse) | Author | |
---|---|---|---|
2022-02-10 | misc: update crawl paths to reflect fork location | Jordan | |
2020-02-17 | Fix the Handler in cmd/links | ale | |
2019-01-20 | Refactor Handlers in terms of a Publisher interface | ale | |
Introduce an interface to decouple the Enqueue functionality from the Crawler implementation. | |||
2018-08-31 | Explicitly delegate retry logic to handlers | ale | |
Makes it possible to retry requests for temporary HTTP errors (429, 500, etc). | |||
2018-08-31 | Improve error handling, part two | ale | |
Handler errors are fatal, so that an error writing the WARC output will cause the crawl to abort. | |||
2018-08-31 | Improve error checking | ale | |
Detect write errors (both on the database and to the WARC output) and abort with an error message. Also fix a bunch of harmless lint warnings. | |||
2017-12-19 | Update cmd/links to new scope syntax | ale | |
2014-12-20 | move URLInfo logic into the Crawler itself | ale | |
2014-12-20 | make Scope checking more modular | ale | |
2014-12-20 | move link extraction to a common location | ale | |
2014-12-20 | move the WARC code into its own package | ale | |
Now generates well-formed, indexable WARC files. | |||
2014-12-19 | initial commit | ale | |