Commit Graph

52 Commits

Author SHA1 Message Date
Jason Schwarzenberger
3b885e4327 renaming things. 2020-11-17 15:54:14 +13:00
Jason Schwarzenberger
5668fa5dbc fix mistake. 2020-11-17 12:54:54 +13:00
Jason Schwarzenberger
b771b52501 add regex to get a unique ref from each sitemap/category based article url. 2020-11-17 12:38:28 +13:00
Jason Schwarzenberger
6a91b9402f split categories, sitemap and other crap out of news.py 2020-11-16 15:30:33 +13:00
Jason
00954c6cac local browser scraper 2020-11-11 09:26:54 +00:00
Jason Schwarzenberger
c1b7877f4b remove limit. 2020-11-09 17:54:50 +13:00
Jason Schwarzenberger
7b8cbfc9b9 try to make feed only determined by the max age. 2020-11-09 17:50:58 +13:00
Jason Schwarzenberger
bfa4108a8e Merge remote-tracking branch 'tanner/master' 2020-11-09 16:08:28 +13:00
Jason
006db2960c change to 3 days 2020-11-09 01:36:51 +00:00
Jason Schwarzenberger
0f39446a61 tz aware for use in settings. 2020-11-05 16:30:55 +13:00
Jason Schwarzenberger
4488e2c292 add an excludes list of substrings for urls in the settings for sitemap/category. 2020-11-05 15:51:59 +13:00
Jason Schwarzenberger
9bfc6fc6fa scraper settings, ordering and loop. 2020-11-04 15:47:12 +13:00
Jason Schwarzenberger
9edc8b7cca move scraping for article content to files. 2020-11-04 15:00:58 +13:00
Jason Schwarzenberger
db6aad84ec fix mistake. 2020-11-04 11:12:01 +13:00
Jason Schwarzenberger
29f8a8b8cc add news site categories feed. 2020-11-04 11:08:50 +13:00
9a279d44b1 Add header to get content type 2020-11-03 20:27:43 +00:00
Jason
b759f46582 use extruct for opengraph/json-ld/microdata of articles 2020-11-03 10:31:36 +00:00
Jason Schwarzenberger
736cdc8576 fix mistake. 2020-11-03 17:04:46 +13:00
Jason Schwarzenberger
244d416f6e settings config of sitemap/substack publications. 2020-11-03 17:01:29 +13:00
Jason Schwarzenberger
5f98a2e76a Merge remote-tracking branch 'tanner/master' into master
And adding relevant setings.py.example/etc.
2020-11-03 16:44:02 +13:00
Jason Schwarzenberger
76f1d57702 sitemap based feed. 2020-11-03 16:00:03 +13:00
Jason Schwarzenberger
4e64cf682a add the bulletin. 2020-11-03 12:41:16 +13:00
Jason Schwarzenberger
c5fe5d25a0 add substack.py top sites, replacing webworm.py 2020-11-03 12:28:39 +13:00
Jason
283a2b1545 fix webworm comments 2020-11-02 22:06:43 +00:00
Jason Schwarzenberger
0d6a86ace2 fix webworm dates. 2020-11-03 10:31:14 +13:00
Jason Schwarzenberger
f23bf628e0 add webworm/substack as a feed. 2020-11-02 17:09:59 +13:00
ca78a6d7a9 Move feed and Praw config to settings.py 2020-11-02 02:26:54 +00:00
4579dfce00 Improve logging 2020-11-02 00:13:43 +00:00
feba8b7aa0 Make qotnews work with WaPo 2020-10-29 04:55:34 +00:00
6cf2f01b08 Adjust feeds 2020-10-03 23:41:57 +00:00
6576eb1bac Adjust content-type request timeout 2020-08-14 03:57:43 +00:00
9a449bf3ca Remove extra logging 2020-07-08 02:36:40 +00:00
d7f0643bd7 Add more logging 2020-07-08 02:36:40 +00:00
f1c846acd0 Remove get first image 2020-07-08 02:36:40 +00:00
850b30e353 Add requests timeouts and temporary logging 2020-07-08 02:36:40 +00:00
6430fe5e9f Check content-type 2020-07-08 02:36:40 +00:00
2822974b6e Stop using archive.is on articles (hits CAPTCHAs) 2019-12-15 22:47:33 +00:00
2d80b19414 Grab comments on manually submitted links 2019-12-02 23:15:51 +00:00
db5097ac57 Drop articles more than two days old 2019-11-08 21:50:33 +00:00
2edb3ceba7 Allow manual submission of articles 2019-11-08 05:55:30 +00:00
edc4c439d7 Prefetch first images 2019-10-19 07:33:06 +00:00
f8998b687e Fix crash from domain and ext check bug 2019-10-16 08:56:31 +00:00
e4f81472fc Fix copy/paste error, switch to info logging 2019-10-16 05:26:47 +00:00
810e8c5ead Archive WSJ articles first, catch KeyboardInterrupt 2019-10-15 21:03:47 +00:00
19e9a80be1 Archive Bloomberg articles first 2019-10-08 08:00:50 +00:00
0053147226 Ignore certain files and domains, remove refs 2019-09-24 08:22:06 +00:00
23cdbc9292 Render reddit markdown, poll tildes better, add utils 2019-08-28 04:13:02 +00:00
fc8ce79e33 Try outline.com for reader mode first 2019-08-25 23:49:08 +00:00
cf9e197e6c Fix tildes comments parsing bug 2019-08-25 07:46:22 +00:00
1b6c8fc6cb Add tildes to feeds 2019-08-25 00:36:26 +00:00