index
:
crawl
master
A simple recursive web crawler which stores content in the WARC/1.0 format
Jordan
about
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2020-08-24
Build Debian packages via CI
ale
2020-08-23
Minor fixes to Debian packaging
ale
2020-08-23
Fix the crawl.go tests
ale
2020-08-23
Add minimal Debian packaging
ale
2020-08-23
Allow setting DNS overrides using the --resolve option
ale
2020-08-20
Panic instead of just dying with fatal error
ale
2020-07-30
Retry requests on transport-level errors
ale
2020-07-30
Panic on fatal errors
ale
2020-02-17
Fix the Handler in cmd/links
ale
2020-02-17
Propagate the link tag through redirects
ale
2019-12-04
Fix installation instructions
ale
2019-11-13
Add contact email address
ale
2019-11-13
Update dependencies (legacy and go.mod)
ale
2019-10-07
Add a vendor dependency used for tests
ale
2019-10-07
Parse links in inline style blocks
ale
2019-09-26
Switch to latest Go image for CI test
ale
2019-09-26
Add Go module support
ale
2019-09-26
Update vendored dependencies
ale
2019-01-20
Refactor Handlers in terms of a Publisher interface
ale
2019-01-19
Replace URLInfo with a simple URL presence check
ale
2019-01-02
Add multi-file output
ale
2018-12-28
Updated dependencies
ale
2018-12-27
Normalize URLs before checking if they are in scope
ale
2018-12-27
Merge branch 'master' of git.autistici.org:ale/crawl
ale
2018-12-06
Apply --excludes to related resources too
ale
2018-09-02
Fix typo
ale
2018-09-02
Explicitly mention the crawler limitations
ale
2018-09-02
Add --exclude and --exclude-file options
ale
2018-09-02
Minimal support for <video> and <object> tags
ale
2018-08-31
Do not drop /index.html at the end of URLs
ale
2018-08-31
Add a simple test for the full WARC crawler
ale
2018-08-31
Explicitly delegate retry logic to handlers
ale
2018-08-31
Improve error handling, part two
ale
2018-08-31
Use a buffered Writer for WARC output
ale
2018-08-31
Improve error checking
ale
2018-08-31
Update dependencies
ale
2018-08-30
Mention trickle as a possible bandwidth limiter
ale
2018-08-30
Improve install instructions a bit more
ale
2018-08-30
Update installation instructions
ale
2017-12-19
Provide better defaults for command-line options
ale
2017-12-19
Merge branch 'master' of git.autistici.org:ale/crawl
ale
2017-12-19
Exit gracefully on signals
ale
2017-12-19
Add a README
ale
2017-12-19
Use a global http.Client with sane settings
ale
2017-12-19
Crawl IFRAMEs as related resources
ale
2017-12-19
Simplify redirectHandler.Handle
ale
2017-12-19
Add license
ale
2017-12-19
Update cmd/links to new scope syntax
ale
2017-12-19
Skip data: URLs
ale
2017-12-19
Add tags (primary/related) to links
ale
[next]