index
:
crawl
master
A simple recursive web crawler which stores content in the WARC/1.0 format
Jordan
about
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2019-01-02
Add multi-file output
ale
2018-12-28
Updated dependencies
ale
2018-12-27
Normalize URLs before checking if they are in scope
ale
2018-12-27
Merge branch 'master' of git.autistici.org:ale/crawl
ale
2018-12-06
Apply --excludes to related resources too
ale
2018-09-02
Fix typo
ale
2018-09-02
Explicitly mention the crawler limitations
ale
2018-09-02
Add --exclude and --exclude-file options
ale
2018-09-02
Minimal support for <video> and <object> tags
ale
2018-08-31
Do not drop /index.html at the end of URLs
ale
2018-08-31
Add a simple test for the full WARC crawler
ale
2018-08-31
Explicitly delegate retry logic to handlers
ale
2018-08-31
Improve error handling, part two
ale
2018-08-31
Use a buffered Writer for WARC output
ale
2018-08-31
Improve error checking
ale
2018-08-31
Update dependencies
ale
2018-08-30
Mention trickle as a possible bandwidth limiter
ale
2018-08-30
Improve install instructions a bit more
ale
2018-08-30
Update installation instructions
ale
2017-12-19
Provide better defaults for command-line options
ale
2017-12-19
Merge branch 'master' of git.autistici.org:ale/crawl
ale
2017-12-19
Exit gracefully on signals
ale
2017-12-19
Add a README
ale
2017-12-19
Use a global http.Client with sane settings
ale
2017-12-19
Crawl IFRAMEs as related resources
ale
2017-12-19
Simplify redirectHandler.Handle
ale
2017-12-19
Add license
ale
2017-12-19
Update cmd/links to new scope syntax
ale
2017-12-19
Skip data: URLs
ale
2017-12-19
Add tags (primary/related) to links
ale
2017-12-18
Add CI configuration (test only)
ale
2017-12-18
Add support for @import syntax in css
ale
2017-12-18
Update location of the uuid package
ale
2017-12-18
Add vendor deps
ale
2017-12-18
Switch to github.com/syndtr/goleveldb
ale
2015-07-03
minor golint fixes
ale
2015-06-29
clean up the state directory when done
ale
2015-06-29
improve queue code; golint fixes
ale
2015-06-28
add ignore list from ArchiveBot
ale
2015-06-28
fix timestamp format
ale
2014-12-20
move URLInfo logic into the Crawler itself
ale
2014-12-20
add a prefix iterator to gobDb
ale
2014-12-20
add tests to scope.go
ale
2014-12-20
make Scope checking more modular
ale
2014-12-20
relax the CSS url() regexp
ale
2014-12-20
move link extraction to a common location
ale
2014-12-20
move the WARC code into its own package
ale
2014-12-19
initial commit
ale
[prev]