aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-02-14client, crawl: fix/simplify net.Dialer overridesJordan
2022-02-14crawl, readme: record assembled seed URLs to seed_urls fileJordan
2022-02-11readme: go get -> go install (deprecated), misc updatesJordan
2022-02-10misc: add MakefileJordan
2022-02-10crawl, readme: max default WARC size 100 MB -> 5 GBJordan
2022-02-10license: add Jordan (me)Jordan
2022-02-10readme: typo correction, spacingJordan
2022-02-10misc: update crawl paths to reflect fork locationJordan
2022-02-10readme: document changes from upstreamJordan
2022-02-10gen-ignores, ignore_patterns: update to exclude unsupported Perl syntax, ↵Jordan
backreferences
2022-02-10ignore_patterns: update to reflect current ArchiveBot ignore setJordan
2022-02-10client, crawl: --bind, support making outbound requests from a particular ↵Jordan
address
2022-02-10crawl: set User-Agent header to appear like Firefox on WindowsJordan
2022-02-10crawl: include crawl start date in directory nameJordan
2022-02-10crawl: create new directory to store crawl contents, resume paramJordan
2022-02-10crawl, scope: recurse infinitely by defaultJordan
2021-07-12Merge branch 'renovate/github.com-puerkitobio-goquery-1.x' into 'master'ale
Update module github.com/PuerkitoBio/goquery to v1.7.1 See merge request ale/crawl!3
2021-07-11Update module github.com/PuerkitoBio/goquery to v1.7.1renovate
2021-06-19Merge branch 'renovate/github.com-google-go-cmp-0.x' into 'master'ale
Update module github.com/google/go-cmp to v0.5.6 See merge request ale/crawl!5
2021-06-19Merge branch 'renovate/github.com-pborman-uuid-1.x' into 'master'ale
Update module github.com/pborman/uuid to v1.2.1 See merge request ale/crawl!2
2021-06-19Merge branch 'renovate/github.com-puerkitobio-purell-0.x' into 'master'ale
Update module github.com/PuerkitoBio/purell to v0.1.0 See merge request ale/crawl!4
2021-06-19Update module github.com/google/go-cmp to v0.5.6renovate
2021-06-19Update module github.com/PuerkitoBio/purell to v0.1.0renovate
2021-06-19Update module github.com/pborman/uuid to v1.2.1renovate
2021-06-19Merge branch 'renovate/configure' into 'master'ale
Configure Renovate See merge request ale/crawl!1
2021-06-19Add renovate.jsonrenovate
2021-06-19Ignore URL decode errorsale
This is an internal inconsistency that should be investigated.
2021-06-19go mod vendorale
2020-08-26Minor logging fixesale
2020-08-26Rename the package to avoid conflictsale
2020-08-24Fix typo in CI configale
2020-08-24Build Debian packages via CIale
2020-08-23Minor fixes to Debian packagingale
2020-08-23Fix the crawl.go testsale
2020-08-23Add minimal Debian packagingale
2020-08-23Allow setting DNS overrides using the --resolve optionale
2020-08-20Panic instead of just dying with fatal errorale
2020-07-30Retry requests on transport-level errorsale
2020-07-30Panic on fatal errorsale
This allows users of crawl-as-a-library to recover from unexpected errors as a last resort.
2020-02-17Fix the Handler in cmd/linksale
2020-02-17Propagate the link tag through redirectsale
In order to do this we have to plumb it through the queue and the Handler interface, but it should allow fetches of the resources associated with a page via the IncludeRelatedScope even if it's behind a redirect.
2019-12-04Fix installation instructionsale
2019-11-13Add contact email addressale
2019-11-13Update dependencies (legacy and go.mod)ale
2019-10-07Add a vendor dependency used for testsale
2019-10-07Parse links in inline style blocksale
2019-09-26Switch to latest Go image for CI testale
2019-09-26Add Go module supportale
2019-09-26Update vendored dependenciesale
2019-01-20Refactor Handlers in terms of a Publisher interfaceale
Introduce an interface to decouple the Enqueue functionality from the Crawler implementation.