index
:
crawl
master
A simple recursive web crawler which stores content in the WARC/1.0 format
Jordan
about
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
cmd
Age
Commit message (
Expand
)
Author
2019-01-20
Refactor Handlers in terms of a Publisher interface
ale
2019-01-02
Add multi-file output
ale
2018-12-06
Apply --excludes to related resources too
ale
2018-09-02
Add --exclude and --exclude-file options
ale
2018-08-31
Add a simple test for the full WARC crawler
ale
2018-08-31
Explicitly delegate retry logic to handlers
ale
2018-08-31
Improve error handling, part two
ale
2018-08-31
Improve error checking
ale
2017-12-19
Provide better defaults for command-line options
ale
2017-12-19
Exit gracefully on signals
ale
2017-12-19
Use a global http.Client with sane settings
ale
2017-12-19
Update cmd/links to new scope syntax
ale
2017-12-19
Add tags (primary/related) to links
ale
2015-07-03
minor golint fixes
ale
2015-06-29
clean up the state directory when done
ale
2015-06-28
add ignore list from ArchiveBot
ale
2014-12-20
move URLInfo logic into the Crawler itself
ale
2014-12-20
make Scope checking more modular
ale
2014-12-20
move link extraction to a common location
ale
2014-12-20
move the WARC code into its own package
ale
2014-12-19
initial commit
ale