59 Commits (a6e1644ddf565b66281407fb1dbd374e6b92b677)

Author SHA1 Message Date
Tanner Collin c9fb9bd5df Add Lobsters to feed 3 years ago
Jason Schwarzenberger 4e5dc65461 don't rescrape if simple. 3 years ago
Jason Schwarzenberger 33a25fa34e allow re-scraping if simple scraper was used. 3 years ago
Jason Schwarzenberger da7f6330bf improve meta data scraping. 3 years ago
Jason Schwarzenberger 2a2bf4d671 add excerpt and scraper details. 3 years ago
Tanner Collin d8a0b77765 Blacklist sec.gov website 4 years ago
Jason Schwarzenberger 3b885e4327 renaming things. 4 years ago
Jason Schwarzenberger 5668fa5dbc fix mistake. 4 years ago
Jason Schwarzenberger b771b52501 add regex to get a unique ref from each sitemap/category based article url. 4 years ago
Jason Schwarzenberger 6a91b9402f split categories, sitemap and other crap out of news.py 4 years ago
Jason 00954c6cac local browser scraper 4 years ago
Jason Schwarzenberger c1b7877f4b remove limit. 4 years ago
Jason Schwarzenberger 7b8cbfc9b9 try to make feed only determined by the max age. 4 years ago
Jason 006db2960c change to 3 days 4 years ago
Jason Schwarzenberger 0f39446a61 tz aware for use in settings. 4 years ago
Jason Schwarzenberger 4488e2c292 add an `excludes` list of substrings for urls in the settings for sitemap/category. 4 years ago
Jason Schwarzenberger 9bfc6fc6fa scraper settings, ordering and loop. 4 years ago
Jason Schwarzenberger 9edc8b7cca move scraping for article content to files. 4 years ago
Jason Schwarzenberger db6aad84ec fix mistake. 4 years ago
Jason Schwarzenberger 29f8a8b8cc add news site categories feed. 4 years ago
Tanner Collin 9a279d44b1 Add header to get content type 4 years ago
Jason b759f46582 use extruct for opengraph/json-ld/microdata of articles 4 years ago
Jason Schwarzenberger 736cdc8576 fix mistake. 4 years ago
Jason Schwarzenberger 244d416f6e settings config of sitemap/substack publications. 4 years ago
Jason Schwarzenberger 76f1d57702 sitemap based feed. 4 years ago
Jason Schwarzenberger 4e64cf682a add the bulletin. 4 years ago
Jason Schwarzenberger c5fe5d25a0 add substack.py top sites, replacing webworm.py 4 years ago
Jason 283a2b1545 fix webworm comments 4 years ago
Jason Schwarzenberger 0d6a86ace2 fix webworm dates. 4 years ago
Jason Schwarzenberger f23bf628e0 add webworm/substack as a feed. 4 years ago
Tanner Collin ca78a6d7a9 Move feed and Praw config to settings.py 4 years ago
Tanner Collin 4579dfce00 Improve logging 4 years ago
Tanner Collin feba8b7aa0 Make qotnews work with WaPo 4 years ago
Tanner Collin 6cf2f01b08 Adjust feeds 4 years ago
Tanner Collin 6576eb1bac Adjust content-type request timeout 4 years ago
Tanner Collin 9a449bf3ca Remove extra logging 4 years ago
Tanner Collin d7f0643bd7 Add more logging 4 years ago
Tanner Collin f1c846acd0 Remove get first image 4 years ago
Tanner Collin 850b30e353 Add requests timeouts and temporary logging 4 years ago
Tanner Collin 6430fe5e9f Check content-type 4 years ago
Tanner Collin 2822974b6e Stop using archive.is on articles (hits CAPTCHAs) 4 years ago
Tanner Collin 2d80b19414 Grab comments on manually submitted links 4 years ago
Tanner Collin db5097ac57 Drop articles more than two days old 5 years ago
Tanner Collin 2edb3ceba7 Allow manual submission of articles 5 years ago
Tanner Collin edc4c439d7 Prefetch first images 5 years ago
Tanner Collin f8998b687e Fix crash from domain and ext check bug 5 years ago
Tanner Collin e4f81472fc Fix copy/paste error, switch to info logging 5 years ago
Tanner Collin 810e8c5ead Archive WSJ articles first, catch KeyboardInterrupt 5 years ago
Tanner Collin 19e9a80be1 Archive Bloomberg articles first 5 years ago
Tanner Collin 0053147226 Ignore certain files and domains, remove refs 5 years ago