index
:
crawl
master
A simple recursive web crawler which stores content in the WARC/1.0 format
Jordan
about
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
crawler.go
Age
Commit message (
Expand
)
Author
2021-06-19
Ignore URL decode errors
ale
2020-08-20
Panic instead of just dying with fatal error
ale
2020-07-30
Panic on fatal errors
ale
2020-02-17
Propagate the link tag through redirects
ale
2019-01-20
Refactor Handlers in terms of a Publisher interface
ale
2019-01-19
Replace URLInfo with a simple URL presence check
ale
2018-12-27
Normalize URLs before checking if they are in scope
ale
2018-08-31
Do not drop /index.html at the end of URLs
ale
2018-08-31
Explicitly delegate retry logic to handlers
ale
2018-08-31
Improve error handling, part two
ale
2018-08-31
Improve error checking
ale
2017-12-19
Exit gracefully on signals
ale
2017-12-19
Simplify redirectHandler.Handle
ale
2017-12-19
Add tags (primary/related) to links
ale
2017-12-18
Switch to github.com/syndtr/goleveldb
ale
2015-07-03
minor golint fixes
ale
2015-06-29
clean up the state directory when done
ale
2015-06-29
improve queue code; golint fixes
ale
2014-12-20
move URLInfo logic into the Crawler itself
ale
2014-12-20
add a prefix iterator to gobDb
ale
2014-12-20
make Scope checking more modular
ale
2014-12-20
move the WARC code into its own package
ale
2014-12-19
initial commit
ale