index
:
crawl
master
A simple recursive web crawler which stores content in the WARC/1.0 format
Jordan
about
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2022-02-10
misc: add Makefile
Jordan
2022-02-10
crawl, readme: max default WARC size 100 MB -> 5 GB
Jordan
2022-02-10
license: add Jordan (me)
Jordan
2022-02-10
readme: typo correction, spacing
Jordan
2022-02-10
misc: update crawl paths to reflect fork location
Jordan
2022-02-10
readme: document changes from upstream
Jordan
2022-02-10
gen-ignores, ignore_patterns: update to exclude unsupported Perl syntax, back...
Jordan
2022-02-10
ignore_patterns: update to reflect current ArchiveBot ignore set
Jordan
2022-02-10
client, crawl: --bind, support making outbound requests from a particular add...
Jordan
2022-02-10
crawl: set User-Agent header to appear like Firefox on Windows
Jordan
2022-02-10
crawl: include crawl start date in directory name
Jordan
2022-02-10
crawl: create new directory to store crawl contents, resume param
Jordan
2022-02-10
crawl, scope: recurse infinitely by default
Jordan
2021-07-12
Merge branch 'renovate/github.com-puerkitobio-goquery-1.x' into 'master'
ale
2021-07-11
Update module github.com/PuerkitoBio/goquery to v1.7.1
renovate
2021-06-19
Merge branch 'renovate/github.com-google-go-cmp-0.x' into 'master'
ale
2021-06-19
Merge branch 'renovate/github.com-pborman-uuid-1.x' into 'master'
ale
2021-06-19
Merge branch 'renovate/github.com-puerkitobio-purell-0.x' into 'master'
ale
2021-06-19
Update module github.com/google/go-cmp to v0.5.6
renovate
2021-06-19
Update module github.com/PuerkitoBio/purell to v0.1.0
renovate
2021-06-19
Update module github.com/pborman/uuid to v1.2.1
renovate
2021-06-19
Merge branch 'renovate/configure' into 'master'
ale
2021-06-19
Add renovate.json
renovate
2021-06-19
Ignore URL decode errors
ale
2021-06-19
go mod vendor
ale
2020-08-26
Minor logging fixes
ale
2020-08-26
Rename the package to avoid conflicts
ale
2020-08-24
Fix typo in CI config
ale
2020-08-24
Build Debian packages via CI
ale
2020-08-23
Minor fixes to Debian packaging
ale
2020-08-23
Fix the crawl.go tests
ale
2020-08-23
Add minimal Debian packaging
ale
2020-08-23
Allow setting DNS overrides using the --resolve option
ale
2020-08-20
Panic instead of just dying with fatal error
ale
2020-07-30
Retry requests on transport-level errors
ale
2020-07-30
Panic on fatal errors
ale
2020-02-17
Fix the Handler in cmd/links
ale
2020-02-17
Propagate the link tag through redirects
ale
2019-12-04
Fix installation instructions
ale
2019-11-13
Add contact email address
ale
2019-11-13
Update dependencies (legacy and go.mod)
ale
2019-10-07
Add a vendor dependency used for tests
ale
2019-10-07
Parse links in inline style blocks
ale
2019-09-26
Switch to latest Go image for CI test
ale
2019-09-26
Add Go module support
ale
2019-09-26
Update vendored dependencies
ale
2019-01-20
Refactor Handlers in terms of a Publisher interface
ale
2019-01-19
Replace URLInfo with a simple URL presence check
ale
2019-01-02
Add multi-file output
ale
2018-12-28
Updated dependencies
ale
[next]