Today guardian.co.uk rolled out a major upgrade to the RSS feeds. Our feeds now contain the full content of each article so that you can take guardian.co.uk with you wherever you prefer to get your news.
Fast-forward back to today and not only do they not provide the full content of articles in their feeds (those clickthroughs and ad impressions being all-important), but not even their developer blog has been spared. This is pretty disappointing.
When time is short or my brain is full, I have two ways of marking content as worth reading at some point in the future:
if it’s in google reader I star it
if it’s on the wild wild web then I add it to delicious and tag it ‘readme’
The fact that I have over 600 ‘readme’ items in delicious, going back to 2004 tells us one of two things:
I am not reading those items, or
I am not untagging them once read.
Sadly for me, the answer is (1) and I’ve not previously worked out a way of making serious damage to the number of unread articles without declaring bankruptcy and potentially starting again – except of course that I would still have no strategy for actually reading them!
Enter http://www.tabbloid.com/ – a two-year old (yet new to me) service from HP that lets you add any number of feeds you like and it will, on a daily or weekly schedule, grab those feeds, merge the results, sort by time, select the most recent items and generate a PDF which it will then email to you.
I’m going for a weekly delivery of both my starred items and readme items – my first one arrived in my inbox the other day, I printed it out and am very happy indeed. Of course it means that each week I’m giving myself a job to go through my Tabbloid printout and de-star or remove the tag in delicious, but at least I’m making progress!
That isn’t to say there aren’t any pain points with this whole process – I haven’t yet sussed how to queue up video items tagged with “watchme” for example, or watch videos I’ve starred in Google Reader – presumably there’s something about parsing the feeds, grabbing the video where possible, encoding to a phone-friendly format and then subscribing on a mobile feedreader, but that sounds like a lot of work right now for a relatively small issue and I’m more than happy to be able to have a piece of my online reading experience come offline with me, and be ready whenever I am.
As I recently wrote, I have a new-found interest in ebooks (I also bought four new textbooks from O’Reilly using a BOGOF offer to pick up 97 Things Every Programmer Should Know, 97 Things Every Project Manager Should Know, Beautiful Code and The Art of Agile Development).
Aldiko has a built in browser for the feedbooks.com catalog, but also gives you the ability to add your own catalogs. A friend told me that Calibre, a popular ebook management programme, has a web interface which one of the other popular Android ebook readers (WordPlayer) could be pointed at in order to add custom catalogs. After a quick trial and a few Google searches, I realised that WordPlayer actually subscribes to an XML file hosted on http://localhost/calibre/stanza
Opening this file shows it to be Atom, where each entry is a small metadata container and the link element is used to reference the actual book and images that represent it, like this:
Another few searches showed this to be a draft specification called openpub. Aldiko supports this, so adding the /stanza URL to a custom catalog works there too! Voila, custom catalogs in Aldiko. Marvellous!
It should only require a tiny bit of work to write code that serves a catalog straight from the filesystem without the overhead of Calibre (which I found to be quite heavyweight). This is what I have started here.
load the podcasting application, mark all items and hit “send -> bluetooth”. Contrary to what you might expect, this will send an OPML file listing your subscriptions to your PC
Edit the list ready for import
Open your new Podcasting.opml file in a text editor
Find/replace all instances of url= with xmlUrl=
Immediately after the opening <body> tag put <outline title="podcasts" text="podcasts">
Just before the closing </body> tag put </outline>
(I also duplicated all the text=”blah” attributes with title=”blah” but I don’t know if this is actually necessary)
Import the list of podcasts
load Google Reader
Click “Settings” in the top right
Go to the Import/Export tab
Find your Podcasting.opml file and upload!
You should now find that you have a new folder called “podcasts” in your google reader containing all the podcasts from your Nokia device.
Even nicer – if you make the folder public (Settings -> Folders and Tags) you can import the OPML from Google Reader directly into other applications by giving the URL http://www.google.com/reader/public/subscriptions/user/USERID/label/podcasts where USERID is the long number in the URL of the “view public page” link next to your public podcasts folder in Settings -> Folders and Tags.
This means that venus-ng will, at a particular point in time, give you an accurate representation of your currently unread Newsgator feed entries. Here is the output from the newsgator.com web aggregator and venus-ng:
venus-ng does not mark feeds as read on the Newsgator server when in retrieves them, although that will likely get added when I have a test Newsgator account set up.
It is currently a fork because I’ve had to modify feedparser.py in a few ways which probably stop it working with other data sources:
I’ve changed the way it deals with passed-in urllib2 handlers
I’ve commented out the HTTP 401 response behaviour (since I’m passing it an HTTPBasicAuthHandler already)
It always passes through an additional X-NGAPIToken HTTP header containing a Newsgator API key
As far as I can tell, the handler refactoring should be fine, but the 401-handling and extra HTTP header seem like a deal-breakers.
I have no idea how to stop the 401 handler in _FeedURLHandler() conflicting with that in urllib2.HTTPBasicAuthHandler.
I suspect there is a good solution in subclassing urllib2.HTTPBasicAuthHandler to provide the additional Newsgator HTTP header but I’ve not worked out some of the details yet.
You can get the latest source via bzr get http://philwilson.org/code/venus-ng – there is a sample newsgator.ini file in the /examples directory, but it relies on you already having a Newsgator account and some feeds set up.
Once I’d traced through the Venus code to semi-understand it, this was quite straightforward to do (deal-breaking fork-causers aside) so were Google Reader to introduce an official API it would not take long to integrate.
I recently broke the graphics drivers on my Windows Vista installation, so re-partitioned and now run Ubuntu full-time at home.
On Windows I use FeedDemon as my full-time aggregator. It has a degree of speed and polish unmatched by any other web or desktop aggregator.
This means that all my feeds are automatically synced with newsgator.com – a web-based aggregator which is not fast and not particularly polished. Although it might be polished, I don’t know, it’s so slow that I tend to just give up (sync with Google Reader is coming).
FeedDemon has significantly raised the bar for any aggregator I use. Web-based tools no longer cut it, in particular when I have hundreds of feeds and, at times, thousands of unread items.
On Ubuntu the options for a native aggregator are Straw or Liferea. Both are currently undergoing rewrites. Liferea seems like the better option for me, and it has a plugin system which is appealing, but there’s no sync with any online tools.
NewsGator have an HTTP-basd API (PDF reference and sample code which requires a minor tweak to run) which is quite straightforward. It gives back data which can be consumed by the Universal Feed Parser. Venus uses the Universal Feed Parser in planet/spider.py after fetching data to create the cache which powers it.
This time last year I wrote a very very basic wxWidgets tool for browsing the Venus cache. A modification to planet/spider.py to use the NewsGator API would seem like an easy way forward, whilst gaining all the power of the Venus filters, plugins and existing XSLTs.