Age | Commit message (Collapse) | Author | |
---|---|---|---|
2022-02-10 | misc: update crawl paths to reflect fork location | Jordan | |
2019-10-07 | Parse links in inline style blocks | ale | |
2018-09-02 | Minimal support for <video> and <object> tags | ale | |
2018-08-31 | Improve error checking | ale | |
Detect write errors (both on the database and to the WARC output) and abort with an error message. Also fix a bunch of harmless lint warnings. | |||
2017-12-19 | Crawl IFRAMEs as related resources | ale | |
2017-12-19 | Skip data: URLs | ale | |
2017-12-19 | Add tags (primary/related) to links | ale | |
This change allows more complex scope boundaries, including loosening edges a bit to include related resources of HTML pages (which makes for more complete archives if desired). | |||
2017-12-18 | Add support for @import syntax in css | ale | |
2015-06-29 | improve queue code; golint fixes | ale | |
The queuing code now performs proper lease accounting, and it will not return a URL twice if the page load is slow. | |||
2014-12-20 | relax the CSS url() regexp | ale | |
2014-12-20 | move link extraction to a common location | ale | |