Age | Commit message (Expand) | Author |
2022-03-24 | misc: update handler signatures, tests, housekeeping | Jordan |
2022-03-24 | links, crawl: dramatically reduce memory usage | Jordan |
2022-02-14 | client, crawl: fix/simplify net.Dialer overrides | Jordan |
2022-02-14 | crawl, readme: record assembled seed URLs to seed_urls file | Jordan |
2022-02-10 | crawl, readme: max default WARC size 100 MB -> 5 GB | Jordan |
2022-02-10 | misc: update crawl paths to reflect fork location | Jordan |
2022-02-10 | client, crawl: --bind, support making outbound requests from a particular add... | Jordan |
2022-02-10 | crawl: set User-Agent header to appear like Firefox on Windows | Jordan |
2022-02-10 | crawl: include crawl start date in directory name | Jordan |
2022-02-10 | crawl: create new directory to store crawl contents, resume param | Jordan |
2022-02-10 | crawl, scope: recurse infinitely by default | Jordan |
2020-08-26 | Minor logging fixes | ale |
2020-08-23 | Fix the crawl.go tests | ale |
2020-08-23 | Allow setting DNS overrides using the --resolve option | ale |
2020-07-30 | Retry requests on transport-level errors | ale |
2020-02-17 | Fix the Handler in cmd/links | ale |
2020-02-17 | Propagate the link tag through redirects | ale |
2019-01-20 | Refactor Handlers in terms of a Publisher interface | ale |
2019-01-02 | Add multi-file output | ale |
2018-12-06 | Apply --excludes to related resources too | ale |
2018-09-02 | Add --exclude and --exclude-file options | ale |
2018-08-31 | Add a simple test for the full WARC crawler | ale |
2018-08-31 | Explicitly delegate retry logic to handlers | ale |
2018-08-31 | Improve error handling, part two | ale |
2018-08-31 | Improve error checking | ale |
2017-12-19 | Provide better defaults for command-line options | ale |
2017-12-19 | Exit gracefully on signals | ale |
2017-12-19 | Use a global http.Client with sane settings | ale |
2017-12-19 | Update cmd/links to new scope syntax | ale |
2017-12-19 | Add tags (primary/related) to links | ale |
2015-07-03 | minor golint fixes | ale |
2015-06-29 | clean up the state directory when done | ale |
2015-06-28 | add ignore list from ArchiveBot | ale |
2014-12-20 | move URLInfo logic into the Crawler itself | ale |
2014-12-20 | make Scope checking more modular | ale |
2014-12-20 | move link extraction to a common location | ale |
2014-12-20 | move the WARC code into its own package | ale |
2014-12-19 | initial commit | ale |