Requesting a URL with Python

After a year at the bottom of my drawer, I have busted out my Pimoroni Flotilla. It has a Python API and so I figured this was as good a time as any to use Python3 for the first time in many years.

Part of the kit is an LED matrix. I want to use this to display how many people there are in space right now.

There is a great website which will answer this question (howmanypeopleareinspacerightnow.com) which also has a JSON endpoint. It turns out there are lots of ways of getting this information from Python. Here are three I tried.

requests

Requests calls itself “HTTP for Humans”, and is mostly short and concise, but I need to send some extra headers with the request because of some user-agent filtering on the server.


import requests

url = 'http://www.howmanypeopleareinspacerightnow.com/peopleinspace.json'
headers = {'user-agent': 'space-requestor/0.1'}

response = requests.get(url, headers=headers)

print(response.status_code) # 200
print(response.text) # response body as text

inspacenow = response.json() # response body as JSON object

print(inspacenow["people"][0]["name"]) # 'Peggy Whitson'

urllib3

urllib3 describes itself as a ‘powerful, sanity-friendly HTTP client’ and it’s more verbose than I’d like for my simple case, but feels like it might scale into a larger application quite well.


import urllib3
import json

url = 'http://www.howmanypeopleareinspacerightnow.com/peopleinspace.json'

http = urllib3.PoolManager()
response = http.request('GET', url)

print(response.status) # 200
print(response.data) # response body as byte string

inspacenow = json.loads(response.data.decode('utf-8')) # response body as JSON object

print(inspacenow["people"][0]["name"]) # Peggy Whitson

unirest

Unirest is a collection of eight HTTP client libraries for multiple programming languages, but supporting near-identical request idioms, which makes it easy to use if you are using multiple languages yourself.

Sadly, the Python library, which I’ve used before very happily, does not work with Python3.

If it worked, the main call would have looked something like response = unirest.get(url), which is a brevity I deeply appreciate!

Conclusion

Requests seems very popular, and seems like a good solid choice for HTTP requests in Python3 applications. It’s what I’ll be using for my humans in space monitor!

CouchDB in desktop applications

Following my last post I was considering writing a Venus filter which adds all feed items into a CouchDB database. This could then be queried by a modified wxVenus or a  webapp (using the CouchDB jQuery library) or whatever.

Thinking specifically about wxVenus, which is a desktop appliaction, CouchDB is like MySQL in that you must have the server up and running before your application tries to use it, and (afaik) there is no way to embed the server itself into your application, which places quite a bit of burden on the user.

My initial plans were to use SQLite which I can embed and use happily without another daemon running beforehand, but would mean I have to set up a schema and do all that tedious INSERTing, SELECTing and so on (I appreciate I could go all ORM on its ass, but again the development effort is much much higher than that with CouchDB).

So, what to do? I suspect that for the moment I’ll go about getting CouchDB all nice and integrated, but it doesn’t look like it’d leave me with an application people can download, install the dependencies, and just run, does it?

Storing feedparser objects in couchdb

sudo apt-get install python-feedparser
easy_install jsonpickle
sudo apt-get install couchdb
easy_install couchdb
sudo couchdb

Open a new terminal

python
import feedparser, jsonpickle
from couchdb import Server
s = Server('http://127.0.0.1:5984/')
len(s)
db = s.create('feeds')
len(s)
doc = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")
doc['feed']['title']
len(doc.feed.links)
pfeed = jsonpickle.encode(doc)
db.create({'feed1' : pfeed})

outputs DOC_ID

cfeed = db['DOC_ID_HERE']
dfeed = jsonpickle.decode(cfeed['feed1'])
dfeed['feed']['title']
len(dfeed.feed.links)

venus-ng

venus-ng is a fork of Venus which uses Newsgator to provide both the reading list and the feeds.

This means that venus-ng will, at a particular point in time, give you an accurate representation of your currently unread Newsgator feed entries. Here is the output from the newsgator.com web aggregator and venus-ng:

Screenshot of newsgator.com unread feeds Screenshot of venus-ng unread feeds

venus-ng does not mark feeds as read on the Newsgator server when in retrieves them, although that will likely get added when I have a test Newsgator account set up.

It is currently a fork because I’ve had to modify feedparser.py in a few ways which probably stop it working with other data sources:

  1. I’ve changed the way it deals with passed-in urllib2 handlers
  2. I’ve commented out the HTTP 401 response behaviour (since I’m passing it an HTTPBasicAuthHandler already)
  3. It always passes through an additional X-NGAPIToken HTTP header containing a Newsgator API key

As far as I can tell, the handler refactoring should be fine, but the 401-handling and extra HTTP header seem like a deal-breakers.

I have no idea how to stop the 401 handler in _FeedURLHandler() conflicting with that in urllib2.HTTPBasicAuthHandler.

I suspect there is a good solution in subclassing urllib2.HTTPBasicAuthHandler to provide the additional Newsgator HTTP header but I’ve not worked out some of the details yet.

You can get the latest source via bzr get http://philwilson.org/code/venus-ng – there is a sample newsgator.ini file in the /examples directory, but it relies on you already having a Newsgator account and some feeds set up.

Once I’d traced through the Venus code to semi-understand it, this was quite straightforward to do (deal-breaking fork-causers aside) so were Google Reader to introduce an official API it would not take long to integrate.

NewsGator + Venus?

I recently broke the graphics drivers on my Windows Vista installation, so re-partitioned and now run Ubuntu full-time at home.

On Windows I use FeedDemon as my full-time aggregator. It has a degree of speed and polish unmatched by any other web or desktop aggregator.

This means that all my feeds are automatically synced with newsgator.com – a web-based aggregator which is not fast and not particularly polished. Although it might be polished, I don’t know, it’s so slow that I tend to just give up (sync with Google Reader is coming).

FeedDemon has significantly raised the bar for any aggregator I use. Web-based tools no longer cut it, in particular when I have hundreds of feeds and, at times, thousands of unread items.

On Ubuntu the options for a native aggregator are Straw or Liferea. Both are currently undergoing rewrites. Liferea seems like the better option for me, and it has a plugin system which is appealing, but there’s no sync with any online tools.

NewsGator have an HTTP-basd API (PDF reference and sample code which requires a minor tweak to run) which is quite straightforward. It gives back data which can be consumed by the Universal Feed Parser. Venus uses the Universal Feed Parser in planet/spider.py after fetching data to create the cache which powers it.

This time last year I wrote a very very basic wxWidgets tool for browsing the Venus cache. A modification to planet/spider.py to use the NewsGator API would seem like an easy way forward, whilst gaining all the power of the Venus filters, plugins and existing XSLTs.

I might just have to try that.

Setting up Trac on Debian Etch with Apache 1.3 (a brief guide)

This is a summary of what I got from the Trac installation instructions here, here, here and here. My life would have been easier if I was running Apache2, but for the site in question, I’m not.

The version numbers I am working with:

  • apache – 1.3.34-4.1
  • python – 2.4.4-2
  • libapache-mod-python 2:2.7.11-2
  • Trac 0.11b2

Install easy_install, followed by the Trac requirements:

$ easy_install Pygments
$ easy_install Genshi
$ easy_install Trac
$ easy_install sqlite
$ apt-get install libapache-mod-python
$ apt-get install python-pysqlite2
$ cd ~
$ mkdir trac/myprojectname
$ trac-admin trac/myprojectname initenv

(enter the details you need or just keep hitting to accept the defaults – it’s all configurable later)

Type the tracd line given to you at the end of the install and make sure it runs (probably need your IP at this point because it won’t bind to a hostname).

Add this inside your VirtualHost:

<Location /wherever/you/like>
  SetHandler python-program
  PythonHandler trac.web.modpython_frontend
  PythonOption TracEnv /absolute/path/trac/myprojectname
  PythonOption TracUriRoot /wherever/you/like
  PythonDebug On
</Location>

Patch /usr/lib/python2.4/site-packages/Trac-0.11b2-py2.4.egg/trac/web/modpython_frontend.py with code from http://trac.edgewall.org/wiki/TracModPython2.7 (yes, it’s all needed) – the “Known Issues” at the end of the code apply, most notably “There may be a character set issue” – for me this manifested itself in the <title> element of the page with a “–” separating my project name from the word “Trac” rather than a long hypen.

wxVenus

bzr get http://philwilson.org/code/wxvenus

wxVenus is, at the moment, a desktop tool for browsing the cache that a local Venus installation creates when it runs. It is written in wxPython and is dependent on lxml.

wxVenus

It is also the first Python program of greater than ten lines that I’ve ever written, and given that we’ve already established I am very bad at it, the code quality is very low.

The long-term intention is to provide a cross-platform desktop tool which uses either a local or remote Venus installation as its aggregator and data source. At the moment I am using Lighthouse to track progress, but the free account doesn’t let me expose my tickets publically (although I will use the API to do this) I’ve moved to Google code because Lighthouse was closed and my local Trac install was slower than you could possibly imagine.

Really this is a lesson in Bazaar, Python, wxWidgets and XML parsing. Hopefully I will end up with a tool I can use. So far I’m learning a lot 🙂

Parsing Atom with libxml2

Whilst trying to parse some Atom (my Blogger backup) with libxml2 I appear to have run into the same problem that Aristotle hit two years ago in XPath vs the default namespace: easy things should be easy, to wit: The story is that you can’t match on the default namespace in XPath.


>> import libxml2
>> doc = libxml2.parseFile("/home/pip/allposts.xml")
>> results = doc.xpathEval("//feed")
>> len(results)
0

Unbelievable.

Immediate potential solutions:

  1. XSLT my Atom document to add “atom:” to all my default-namespaced elements
  2. use an entirely different method of parsing
  3. remove the atom namespace declaration from the top of the file
  4. something else

Option 3 looks like the only sane route to take in this one-off job, but I’m quite surprised that I have to do it at all.

Actually, this turned out to be my fault – I was parsing two documents at the same time, one with a namespace declaration set correctly (for parsing my Atom file), and one with no namespaces set. I used the latter for my xpath query, which clearly didn’t work – many thanks to everyone who left a comment!