Commit Graph

  • 00954c6cac local browser scraper Jason 2020-11-11 09:26:54 +0000
  • 637bc38476 fix mistake. Jason Schwarzenberger 2020-11-11 17:21:31 +1300
  • 164b7e72c4 basically add declutter like capabilities. Jason Schwarzenberger 2020-11-11 17:16:04 +1300
  • 3169af3002 hostname from settings. Jason Schwarzenberger 2020-11-11 09:46:27 +1300
  • d588a60930 add source to searchable attributes. Jason Schwarzenberger 2020-11-11 09:37:54 +1300
  • 408e2870b2 tzinfo and microdata schema urls. Jason Schwarzenberger 2020-11-10 16:51:27 +1300
  • 44b8b36547 add data cast in query. Jason Schwarzenberger 2020-11-10 15:50:18 +1300
  • 4f49684194 remove logos from utils.js Jason Schwarzenberger 2020-11-10 15:38:48 +1300
  • 1d78b1c592 fix favicon url. Jason Schwarzenberger 2020-11-10 15:34:21 +1300
  • 0374794536 Sitemap and Category to get favicon into icon property of story. Jason Schwarzenberger 2020-11-10 15:22:27 +1300
  • 943a1cfa4f reader server Jason Schwarzenberger 2020-11-10 14:56:21 +1300
  • 9cee370a25 tvnz icon Jason Schwarzenberger 2020-11-10 14:10:02 +1300
  • 5efc6ef2d3 add related stories (in api only) Jason Schwarzenberger 2020-11-10 14:09:56 +1300
  • 4ec50e20cb feed thread loop. Jason Schwarzenberger 2020-11-10 10:09:48 +1300
  • c1b7877f4b remove limit. Jason Schwarzenberger 2020-11-09 17:54:50 +1300
  • 7b8cbfc9b9 try to make feed only determined by the max age. Jason Schwarzenberger 2020-11-09 17:50:58 +1300
  • bfa4108a8e Merge remote-tracking branch 'tanner/master' Jason Schwarzenberger 2020-11-09 16:08:28 +1300
  • 0bd0d40a31 use json type in sqlite. Jason Schwarzenberger 2020-11-09 15:45:10 +1300
  • 4e04595415 fix search. Jason Schwarzenberger 2020-11-09 15:44:44 +1300
  • 006db2960c change to 3 days Jason 2020-11-09 01:36:51 +0000
  • 1f063f0dac undo log level change Jason Schwarzenberger 2020-11-06 11:20:34 +1300
  • 1658346aa9 fix news.py feed. Jason Schwarzenberger 2020-11-06 10:37:43 +1300
  • 2dbc702b40 switch to python-dateutil for parser, reverse sort xml feeds. Jason Schwarzenberger 2020-11-06 10:02:39 +1300
  • 1c4764e67d sort sitemap feed by lastmod time. Jason Schwarzenberger 2020-11-06 09:30:15 +1300
  • ee49d2021e newsroom Jason 2020-11-05 20:28:55 +0000
  • c391c50ab1 use localize Jason 2020-11-05 04:15:31 +0000
  • 095f0d549a use replace. Jason Schwarzenberger 2020-11-05 16:57:08 +1300
  • c21c71667e fix date issue. Jason Schwarzenberger 2020-11-05 16:41:15 +1300
  • c3a2c91a11 update requirements.txt Jason Schwarzenberger 2020-11-05 16:33:50 +1300
  • 0f39446a61 tz aware for use in settings. Jason Schwarzenberger 2020-11-05 16:30:55 +1300
  • 351059aab1 fix excludes. Jason Schwarzenberger 2020-11-05 15:59:13 +1300
  • 4488e2c292 add an excludes list of substrings for urls in the settings for sitemap/category. Jason Schwarzenberger 2020-11-05 15:51:59 +1300
  • afda5b635c disqus test. Jason Schwarzenberger 2020-11-05 14:23:51 +1300
  • 0fc1a44d2b fix issue in substack. Jason Schwarzenberger 2020-11-04 17:40:29 +1300
  • 9fff1b9e46 avoid duplicate articles listed on the category page Jason Schwarzenberger 2020-11-04 17:14:42 +1300
  • 16b59f6c67 try stop bad pages. Jason Schwarzenberger 2020-11-04 16:34:31 +1300
  • 939f4775a7 better settings example. Jason Schwarzenberger 2020-11-04 15:52:34 +1300
  • 9bfc6fc6fa scraper settings, ordering and loop. Jason Schwarzenberger 2020-11-04 15:47:12 +1300
  • 6ea9844d00 remove useless try blocks. Jason Schwarzenberger 2020-11-04 15:37:19 +1300
  • 1318259d3d imply referrer is substack. Jason Schwarzenberger 2020-11-04 15:21:07 +1300
  • 98a0c2257c increase declutter timeout. Jason Schwarzenberger 2020-11-04 15:14:51 +1300
  • e6976db25d fix tabs Jason Schwarzenberger 2020-11-04 15:04:20 +1300
  • 9edc8b7cca move scraping for article content to files. Jason Schwarzenberger 2020-11-04 15:00:58 +1300
  • 33e21e7f30 fix mistake. Jason Schwarzenberger 2020-11-04 12:45:01 +1300
  • 892a99eca6 add + expander in place of collapser. Jason Schwarzenberger 2020-11-04 12:43:15 +1300
  • d718d05a04 fix dates for newsroom. Jason Schwarzenberger 2020-11-04 11:53:16 +1300
  • d1795eb1b8 add radionz and newsroom logos. Jason Schwarzenberger 2020-11-04 11:30:56 +1300
  • 9f4ff4acf0 remove unnecessary sitemap.xml request. Jason Schwarzenberger 2020-11-04 11:22:15 +1300
  • db6aad84ec fix mistake. Jason Schwarzenberger 2020-11-04 11:12:01 +1300
  • 29f8a8b8cc add news site categories feed. Jason Schwarzenberger 2020-11-04 11:08:50 +1300
  • 9a279d44b1 Add header to get content type Tanner Collin 2020-11-03 20:27:43 +0000
  • abf8589e02 fix sitemap Jason 2020-11-03 10:53:40 +0000
  • b759f46582 use extruct for opengraph/json-ld/microdata of articles Jason 2020-11-03 10:31:36 +0000
  • 736cdc8576 fix mistake. Jason Schwarzenberger 2020-11-03 17:04:46 +1300
  • 244d416f6e settings config of sitemap/substack publications. Jason Schwarzenberger 2020-11-03 17:01:29 +1300
  • e506804666 Clean code up Tanner Collin 2020-11-03 03:45:56 +0000
  • 5f98a2e76a Merge remote-tracking branch 'tanner/master' into master Jason Schwarzenberger 2020-11-03 16:44:02 +1300
  • 0567cdfd9b move sort to render. Jason Schwarzenberger 2020-11-03 16:30:22 +1300
  • 4f90671cec order feed by reverse chronological Jason Schwarzenberger 2020-11-03 16:21:23 +1300
  • e63a1456a5 add logos. Jason Schwarzenberger 2020-11-03 16:07:07 +1300
  • 76f1d57702 sitemap based feed. Jason Schwarzenberger 2020-11-03 16:00:03 +1300
  • de80389ed0 add logos. Jason Schwarzenberger 2020-11-03 12:48:19 +1300
  • 4e64cf682a add the bulletin. Jason Schwarzenberger 2020-11-03 12:41:16 +1300
  • c5fe5d25a0 add substack.py top sites, replacing webworm.py Jason Schwarzenberger 2020-11-03 12:28:39 +1300
  • 283a2b1545 fix webworm comments Jason 2020-11-02 22:06:43 +0000
  • 0d6a86ace2 fix webworm dates. Jason Schwarzenberger 2020-11-03 10:31:14 +1300
  • f23bf628e0 add webworm/substack as a feed. Jason Schwarzenberger 2020-11-02 16:07:05 +1300
  • ca78a6d7a9 Move feed and Praw config to settings.py Tanner Collin 2020-11-02 02:26:54 +0000
  • 7acce407e9 Fix index.html indentation Tanner Collin 2020-11-02 00:38:34 +0000
  • 5281672000 Fix noscript font color Tanner Collin 2020-11-02 00:36:11 +0000
  • e59acefda9 Remove Whoosh Tanner Collin 2020-11-02 00:22:40 +0000
  • cbc802b7e9 Try Hackernews API twice Tanner Collin 2020-11-02 00:17:22 +0000
  • 4579dfce00 Improve logging Tanner Collin 2020-11-02 00:13:43 +0000
  • 0d16bec6f6 Fix table width CSS Tanner Collin 2020-11-01 00:47:18 +0000
  • feba8b7aa0 Make qotnews work with WaPo Tanner Collin 2020-10-29 04:55:34 +0000
  • ee5105743d Upgrade readability Tanner Collin 2020-10-29 01:24:13 +0000
  • 72802a6fcf Show exerpt of hidden comments Tanner Collin 2020-10-27 00:41:36 +0000
  • 99d3a234f4 Fix bug with rendering text nodes Tanner Collin 2020-10-10 21:07:54 +0000
  • f95df227f1 Add instructions to download search server Tanner Collin 2020-10-04 21:21:19 +0000
  • b82095ca7a Add buttons to collapse / expand comments Tanner Collin 2020-10-26 21:57:10 +0000
  • 992c1c1233 Monkeypatch earlier Tanner Collin 2020-10-24 22:30:00 +0000
  • 88d2216627 Add a script to delete a story Tanner Collin 2020-10-03 23:42:21 +0000
  • 6cf2f01b08 Adjust feeds Tanner Collin 2020-10-03 23:41:57 +0000
  • 607573dd44 Add buttons to convert <pre> to <p> Tanner Collin 2020-10-03 23:23:25 +0000
  • c554ecd890 Add a line on UI to make search results obvious Tanner Collin 2020-08-14 03:58:11 +0000
  • 6576eb1bac Adjust content-type request timeout Tanner Collin 2020-08-14 03:57:43 +0000
  • 472af76d1a Adjust port Tanner Collin 2020-08-14 03:57:18 +0000
  • 4727d34eb6 Delete displayed-attributes when init search Tanner Collin 2020-08-14 03:56:47 +0000
  • 0e086b60b8 Remove business subreddit from feed Tanner Collin 2020-08-14 03:55:28 +0000
  • b46ce36c63 Update requirements Tanner Collin 2020-07-08 05:24:32 +0000
  • 9a449bf3ca Remove extra logging Tanner Collin 2020-07-07 20:55:13 +0000
  • 0bd9f05250 Fix crash when HN feed fails Tanner Collin 2020-07-07 20:53:46 +0000
  • 9c116bde4a Remove document img and ignore r/technology Tanner Collin 2020-07-06 21:44:16 +0000
  • ebedaef00b Tune search rankings and attributes Tanner Collin 2020-07-06 21:43:57 +0000
  • d7f0643bd7 Add more logging Tanner Collin 2020-07-04 22:39:38 +0000
  • eb1137299d Remove article numbers Tanner Collin 2020-07-04 22:38:36 +0000
  • 72d4a68929 Remove pre-fetching image Tanner Collin 2020-07-04 00:29:04 +0000
  • f1c846acd0 Remove get first image Tanner Collin 2020-07-04 00:27:15 +0000
  • 850b30e353 Add requests timeouts and temporary logging Tanner Collin 2020-07-04 00:25:41 +0000
  • d614ad0743 Integrate with external MeiliSearch server Tanner Collin 2020-06-27 22:53:39 +0000