Online ebook catalogs in Atom

As I recently wrote, I have a new-found interest in ebooks (I also bought four new textbooks from O’Reilly using a BOGOF offer to pick up 97 Things Every Programmer Should Know, 97 Things Every Project Manager Should Know, Beautiful Code and The Art of Agile Development).

I mainly read ebooks on my Android device, specifically, using Aldiko.

Aldiko has a built in browser for the feedbooks.com catalog, but also gives you the ability to add your own catalogs. A friend told me that Calibre, a popular ebook management programme, has a web interface which one of the other popular Android ebook readers (WordPlayer) could be pointed at in order to add custom catalogs. After a quick trial and a few Google searches, I realised that WordPlayer actually subscribes to an XML file hosted on http://localhost/calibre/stanza

Opening this file shows it to be Atom, where each entry is a small metadata container and the link element is used to reference the actual book and images that represent it, like this:


    <link type="application/epub+zip" href="/get/epub/3"/>
    <link rel="x-stanza-cover-image" type="image/jpeg" href="/get/cover/3"/>
    <link rel="x-stanza-cover-image-thumbnail" type="image/jpeg" href="/get/thumb/3"/>

Another few searches showed this to be a draft specification called openpub. Aldiko supports this, so adding the /stanza URL to a custom catalog works there too! Voila, custom catalogs in Aldiko. Marvellous!

It should only require a tiny bit of work to write code that serves a catalog straight from the filesystem without the overhead of Calibre (which I found to be quite heavyweight). This is what I have started here.

Importing blog posts and comments from Blogger to WordPress

bloggerpressI tried this a year ago only to experience epic fail.

I tried this yesterday and it was a marvellous success.

Around this time last year I was locked out of my Google account and decided to move what I could over to my own server (a process I’ve still not completed!). As part of that move I used BloggerBackup to export all of my blog posts and comments and tried to do an import into WordPress, which didn’t work. I was resigned to writing some script to import it but ran into a WordPress date parsing bug which I had trouble tracking down – however since my old blog was still available as static HTML on my server, I wasn’t really that worried about it.

blogger import Last night I tried the built-in WordPress import from Blogger. It uses OAuth to authenticate and then allows import of your posts and comments from the comfort of a couple of clicks in the WordPress admin interface. All very smooth, all very easy (apart from the slightly worrying disparity between the number of imported elements and the totals). I’ll have to move my images, but that’s no real bother.

My archives now go all the way back to May 2002 when it was a co-blog with my housemate of the time who is now an arty-philoso-programmer in Australia. Before that I maintained my blog by hand and I’m not sure I have copies.

A quick “thanks” to my colleague Tom Natt who helped me fix my .htaccess changes so that old links and Google searches still work (also thanks to Mark Pilgrim’s Cruft-free URLs in Movable Type which I could rather tragically remember as a useful post from five years ago).

Parsing Atom with libxml2

Whilst trying to parse some Atom (my Blogger backup) with libxml2 I appear to have run into the same problem that Aristotle hit two years ago in XPath vs the default namespace: easy things should be easy, to wit: The story is that you can’t match on the default namespace in XPath.


>> import libxml2
>> doc = libxml2.parseFile("/home/pip/allposts.xml")
>> results = doc.xpathEval("//feed")
>> len(results)
0

Unbelievable.

Immediate potential solutions:

  1. XSLT my Atom document to add “atom:” to all my default-namespaced elements
  2. use an entirely different method of parsing
  3. remove the atom namespace declaration from the top of the file
  4. something else

Option 3 looks like the only sane route to take in this one-off job, but I’m quite surprised that I have to do it at all.

Actually, this turned out to be my fault – I was parsing two documents at the same time, one with a namespace declaration set correctly (for parsing my Atom file), and one with no namespaces set. I used the latter for my xpath query, which clearly didn’t work – many thanks to everyone who left a comment!

HOWTO download your Google Reader starred items

How to create a backup of your starred items in Google Reader, should the need ever arise:

A screenshot of the Google Reader settings page

  • Log in to Google Reader
  • Click ‘Settings‘ in the top-right of the window
  • Click the ‘Tags‘ tab
  • Check the “Your starred items” box
  • Click the “Change sharing…” dropdown box and select “public
  • Now click on ‘View public page‘ which has appeared to the right of “Your starred items” (this will open in a new window by default)
  • In the right-hand column there is a link to a feed. Right-click it and save it to disk.

Congratulations, you now have an Atom feed of your starred items to do with as you wish. With any luck it will even be valid.