aboutsummaryrefslogtreecommitdiff
path: root/analysis
AgeCommit message (Collapse)Author
2022-02-10misc: update crawl paths to reflect fork locationJordan
2019-10-07Parse links in inline style blocksale
2018-09-02Minimal support for <video> and <object> tagsale
2018-08-31Improve error checkingale
Detect write errors (both on the database and to the WARC output) and abort with an error message. Also fix a bunch of harmless lint warnings.
2017-12-19Crawl IFRAMEs as related resourcesale
2017-12-19Skip data: URLsale
2017-12-19Add tags (primary/related) to linksale
This change allows more complex scope boundaries, including loosening edges a bit to include related resources of HTML pages (which makes for more complete archives if desired).
2017-12-18Add support for @import syntax in cssale
2015-06-29improve queue code; golint fixesale
The queuing code now performs proper lease accounting, and it will not return a URL twice if the page load is slow.
2014-12-20relax the CSS url() regexpale
2014-12-20move link extraction to a common locationale