Jason Schwarzenberger
|
f5ccd844da
|
fix import error.
|
2020-11-16 15:41:09 +13:00 |
|
Jason Schwarzenberger
|
6a91b9402f
|
split categories, sitemap and other crap out of news.py
|
2020-11-16 15:30:33 +13:00 |
|
Jason Schwarzenberger
|
b23e470317
|
move reddit thresholds as settings variables.
|
2020-11-16 10:11:39 +13:00 |
|
Jason Schwarzenberger
|
7420b5ece9
|
fix microdata multiple authors
|
2020-11-12 17:33:46 +13:00 |
|
Jason Schwarzenberger
|
64ced635cc
|
fix mistake.
|
2020-11-12 17:15:29 +13:00 |
|
Jason Schwarzenberger
|
9318627f1b
|
ability to pass in multiple site maps/category urls.
|
2020-11-12 17:11:51 +13:00 |
|
Jason Schwarzenberger
|
3d0a3f1577
|
support list based json-ld authors.
|
2020-11-12 15:08:23 +13:00 |
|
Jason Schwarzenberger
|
587b10c438
|
recursive sitemaps (sitemap indexes)
|
2020-11-12 14:56:46 +13:00 |
|
Jason
|
00954c6cac
|
local browser scraper
|
2020-11-11 09:26:54 +00:00 |
|
Jason Schwarzenberger
|
3169af3002
|
hostname from settings.
|
2020-11-11 09:46:27 +13:00 |
|
Jason Schwarzenberger
|
d588a60930
|
add source to searchable attributes.
|
2020-11-11 09:37:54 +13:00 |
|
Jason Schwarzenberger
|
408e2870b2
|
tzinfo and microdata schema urls.
|
2020-11-10 16:51:27 +13:00 |
|
Jason Schwarzenberger
|
44b8b36547
|
add data cast in query.
|
2020-11-10 15:50:18 +13:00 |
|
Jason Schwarzenberger
|
1d78b1c592
|
fix favicon url.
|
2020-11-10 15:34:21 +13:00 |
|
Jason Schwarzenberger
|
0374794536
|
Sitemap and Category to get favicon into icon property of story.
|
2020-11-10 15:22:27 +13:00 |
|
Jason Schwarzenberger
|
5efc6ef2d3
|
add related stories (in api only)
|
2020-11-10 14:09:56 +13:00 |
|
Jason Schwarzenberger
|
4ec50e20cb
|
feed thread loop.
|
2020-11-10 10:10:38 +13:00 |
|
Jason Schwarzenberger
|
c1b7877f4b
|
remove limit.
|
2020-11-09 17:54:50 +13:00 |
|
Jason Schwarzenberger
|
7b8cbfc9b9
|
try to make feed only determined by the max age.
|
2020-11-09 17:50:58 +13:00 |
|
Jason Schwarzenberger
|
bfa4108a8e
|
Merge remote-tracking branch 'tanner/master'
|
2020-11-09 16:08:28 +13:00 |
|
Jason Schwarzenberger
|
0bd0d40a31
|
use json type in sqlite.
|
2020-11-09 15:45:10 +13:00 |
|
Jason Schwarzenberger
|
4e04595415
|
fix search.
|
2020-11-09 15:44:44 +13:00 |
|
Jason
|
006db2960c
|
change to 3 days
|
2020-11-09 01:36:51 +00:00 |
|
Jason Schwarzenberger
|
1f063f0dac
|
undo log level change
|
2020-11-06 11:20:34 +13:00 |
|
Jason Schwarzenberger
|
1658346aa9
|
fix news.py feed.
|
2020-11-06 10:37:43 +13:00 |
|
Jason Schwarzenberger
|
2dbc702b40
|
switch to python-dateutil for parser, reverse sort xml feeds.
|
2020-11-06 10:02:39 +13:00 |
|
Jason Schwarzenberger
|
1c4764e67d
|
sort sitemap feed by lastmod time.
|
2020-11-06 09:30:15 +13:00 |
|
Jason
|
ee49d2021e
|
newsroom
|
2020-11-05 20:28:55 +00:00 |
|
Jason
|
c391c50ab1
|
use localize
|
2020-11-05 04:15:31 +00:00 |
|
Jason Schwarzenberger
|
095f0d549a
|
use replace.
|
2020-11-05 16:57:08 +13:00 |
|
Jason Schwarzenberger
|
c21c71667e
|
fix date issue.
|
2020-11-05 16:41:15 +13:00 |
|
Jason Schwarzenberger
|
c3a2c91a11
|
update requirements.txt
|
2020-11-05 16:33:50 +13:00 |
|
Jason Schwarzenberger
|
0f39446a61
|
tz aware for use in settings.
|
2020-11-05 16:30:55 +13:00 |
|
Jason Schwarzenberger
|
351059aab1
|
fix excludes.
|
2020-11-05 15:59:13 +13:00 |
|
Jason Schwarzenberger
|
4488e2c292
|
add an excludes list of substrings for urls in the settings for sitemap/category.
|
2020-11-05 15:51:59 +13:00 |
|
Jason Schwarzenberger
|
afda5b635c
|
disqus test.
|
2020-11-05 14:23:51 +13:00 |
|
Jason Schwarzenberger
|
0fc1a44d2b
|
fix issue in substack.
|
2020-11-04 17:40:29 +13:00 |
|
Jason Schwarzenberger
|
9fff1b9e46
|
avoid duplicate articles listed on the category page
|
2020-11-04 17:14:42 +13:00 |
|
Jason Schwarzenberger
|
16b59f6c67
|
try stop bad pages.
|
2020-11-04 16:34:31 +13:00 |
|
Jason Schwarzenberger
|
939f4775a7
|
better settings example.
|
2020-11-04 15:52:34 +13:00 |
|
Jason Schwarzenberger
|
9bfc6fc6fa
|
scraper settings, ordering and loop.
|
2020-11-04 15:47:12 +13:00 |
|
Jason Schwarzenberger
|
6ea9844d00
|
remove useless try blocks.
|
2020-11-04 15:37:19 +13:00 |
|
Jason Schwarzenberger
|
1318259d3d
|
imply referrer is substack.
|
2020-11-04 15:21:07 +13:00 |
|
Jason Schwarzenberger
|
98a0c2257c
|
increase declutter timeout.
|
2020-11-04 15:15:00 +13:00 |
|
Jason Schwarzenberger
|
e6976db25d
|
fix tabs
|
2020-11-04 15:04:20 +13:00 |
|
Jason Schwarzenberger
|
9edc8b7cca
|
move scraping for article content to files.
|
2020-11-04 15:00:58 +13:00 |
|
Jason Schwarzenberger
|
d718d05a04
|
fix dates for newsroom.
|
2020-11-04 11:53:16 +13:00 |
|
Jason Schwarzenberger
|
9f4ff4acf0
|
remove unnecessary sitemap.xml request.
|
2020-11-04 11:22:15 +13:00 |
|
Jason Schwarzenberger
|
db6aad84ec
|
fix mistake.
|
2020-11-04 11:12:01 +13:00 |
|
Jason Schwarzenberger
|
29f8a8b8cc
|
add news site categories feed.
|
2020-11-04 11:08:50 +13:00 |
|