April 4th, 2009 by
Phil
Following my last post I was considering writing a Venus filter which adds all feed items into a CouchDB database. This could then be queried by a modified wxVenus or a webapp (using the CouchDB jQuery library) or whatever.
Thinking specifically about wxVenus, which is a desktop appliaction, CouchDB is like MySQL in that you must have the server up and running before your application tries to use it, and (afaik) there is no way to embed the server itself into your application, which places quite a bit of burden on the user.
My initial plans were to use SQLite which I can embed and use happily without another daemon running beforehand, but would mean I have to set up a schema and do all that tedious INSERTing, SELECTing and so on (I appreciate I could go all ORM on its ass, but again the development effort is much much higher than that with CouchDB).
So, what to do? I suspect that for the moment I’ll go about getting CouchDB all nice and integrated, but it doesn’t look like it’d leave me with an application people can download, install the dependencies, and just run, does it?
Tagged: couchdb, python, venus, wxvenus |
4 Comments »
March 31st, 2009 by
Phil
sudo apt-get install python-feedparser
easy_install jsonpickle
sudo apt-get install couchdb
easy_install couchdb
sudo couchdb
Open a new terminal
python
import feedparser, jsonpickle
from couchdb import Server
s = Server('http://127.0.0.1:5984/')
len(s)
db = s.create('feeds')
len(s)
doc = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")
doc['feed']['title']
len(doc.feed.links)
pfeed = jsonpickle.encode(doc)
db.create({'feed1' : pfeed})
outputs DOC_ID
cfeed = db['DOC_ID_HERE']
dfeed = jsonpickle.decode(cfeed['feed1'])
dfeed['feed']['title']
len(dfeed.feed.links)
Tagged: couchdb, python, syndication |
1 Comment »
March 28th, 2009 by
Phil
venus-ng is a fork of Venus which uses Newsgator to provide both the reading list and the feeds.
This means that venus-ng will, at a particular point in time, give you an accurate representation of your currently unread Newsgator feed entries. Here is the output from the newsgator.com web aggregator and venus-ng:

venus-ng does not mark feeds as read on the Newsgator server when in retrieves them, although that will likely get added when I have a test Newsgator account set up.
It is currently a fork because I’ve had to modify feedparser.py in a few ways which probably stop it working with other data sources:
- I’ve changed the way it deals with passed-in urllib2 handlers
- I’ve commented out the HTTP 401 response behaviour (since I’m passing it an HTTPBasicAuthHandler already)
- It always passes through an additional X-NGAPIToken HTTP header containing a Newsgator API key
As far as I can tell, the handler refactoring should be fine, but the 401-handling and extra HTTP header seem like a deal-breakers.
I have no idea how to stop the 401 handler in _FeedURLHandler() conflicting with that in urllib2.HTTPBasicAuthHandler.
I suspect there is a good solution in subclassing urllib2.HTTPBasicAuthHandler to provide the additional Newsgator HTTP header but I’ve not worked out some of the details yet.
You can get the latest source via bzr get http://philwilson.org/code/venus-ng – there is a sample newsgator.ini file in the /examples directory, but it relies on you already having a Newsgator account and some feeds set up.
Once I’d traced through the Venus code to semi-understand it, this was quite straightforward to do (deal-breaking fork-causers aside) so were Google Reader to introduce an official API it would not take long to integrate.
Tagged: python, syndication, venus |
9 Comments »
March 9th, 2009 by
Phil
I recently broke the graphics drivers on my Windows Vista installation, so re-partitioned and now run Ubuntu full-time at home.
On Windows I use FeedDemon as my full-time aggregator. It has a degree of speed and polish unmatched by any other web or desktop aggregator.
This means that all my feeds are automatically synced with newsgator.com – a web-based aggregator which is not fast and not particularly polished. Although it might be polished, I don’t know, it’s so slow that I tend to just give up (sync with Google Reader is coming).
FeedDemon has significantly raised the bar for any aggregator I use. Web-based tools no longer cut it, in particular when I have hundreds of feeds and, at times, thousands of unread items.
On Ubuntu the options for a native aggregator are Straw or Liferea. Both are currently undergoing rewrites. Liferea seems like the better option for me, and it has a plugin system which is appealing, but there’s no sync with any online tools.
NewsGator have an HTTP-basd API (PDF reference and sample code which requires a minor tweak to run) which is quite straightforward. It gives back data which can be consumed by the Universal Feed Parser. Venus uses the Universal Feed Parser in planet/spider.py after fetching data to create the cache which powers it.
This time last year I wrote a very very basic wxWidgets tool for browsing the Venus cache. A modification to planet/spider.py to use the NewsGator API would seem like an easy way forward, whilst gaining all the power of the Venus filters, plugins and existing XSLTs.
I might just have to try that.
Tagged: feeddemon, newsgator, planet, python, syndication, ubuntu, wxvenus, wxwidgets |
No Comments »
March 21st, 2008 by
Phil
This is a summary of what I got from the Trac installation instructions here, here, here and here. My life would have been easier if I was running Apache2, but for the site in question, I’m not.
The version numbers I am working with:
- apache – 1.3.34-4.1
- python – 2.4.4-2
- libapache-mod-python 2:2.7.11-2
- Trac 0.11b2
Install easy_install, followed by the Trac requirements:
$ easy_install Pygments
$ easy_install Genshi
$ easy_install Trac
$ easy_install sqlite
$ apt-get install libapache-mod-python
$ apt-get install python-pysqlite2
$ cd ~
$ mkdir trac/myprojectname
$ trac-admin trac/myprojectname initenv
(enter the details you need or just keep hitting to accept the defaults – it’s all configurable later)
Type the tracd line given to you at the end of the install and make sure it runs (probably need your IP at this point because it won’t bind to a hostname).
Add this inside your VirtualHost:
<Location /wherever/you/like>
SetHandler python-program
PythonHandler trac.web.modpython_frontend
PythonOption TracEnv /absolute/path/trac/myprojectname
PythonOption TracUriRoot /wherever/you/like
PythonDebug On
</Location>
Patch /usr/lib/python2.4/site-packages/Trac-0.11b2-py2.4.egg/trac/web/modpython_frontend.py with code from http://trac.edgewall.org/wiki/TracModPython2.7 (yes, it’s all needed) – the “Known Issues” at the end of the code apply, most notably “There may be a character set issue” – for me this manifested itself in the <title> element of the page with a “–” separating my project name from the word “Trac” rather than a long hypen.
Tagged: linux, python, tools, trac, web |
2 Comments »
March 17th, 2008 by
Phil
bzr get http://philwilson.org/code/wxvenus
wxVenus is, at the moment, a desktop tool for browsing the cache that a local Venus installation creates when it runs. It is written in wxPython and is dependent on lxml.

It is also the first Python program of greater than ten lines that I’ve ever written, and given that we’ve already established I am very bad at it, the code quality is very low.
The long-term intention is to provide a cross-platform desktop tool which uses either a local or remote Venus installation as its aggregator and data source. At the moment I am using Lighthouse to track progress, but the free account doesn’t let me expose my tickets publically (although I will use the API to do this) I’ve moved to Google code because Lighthouse was closed and my local Trac install was slower than you could possibly imagine.
Really this is a lesson in Bazaar, Python, wxWidgets and XML parsing. Hopefully I will end up with a tool I can use. So far I’m learning a lot
Tagged: bzr, python, syndication, wxvenus |
3 Comments »
November 26th, 2007 by
Phil
Whilst trying to parse some Atom (my Blogger backup) with libxml2 I appear to have run into the same problem that Aristotle hit two years ago in XPath vs the default namespace: easy things should be easy, to wit: The story is that you can’t match on the default namespace in XPath.
>> import libxml2
>> doc = libxml2.parseFile("/home/pip/allposts.xml")
>> results = doc.xpathEval("//feed")
>> len(results)
0
Unbelievable.
Immediate potential solutions:
- XSLT my Atom document to add “atom:” to all my default-namespaced elements
- use an entirely different method of parsing
- remove the atom namespace declaration from the top of the file
- something else
Option 3 looks like the only sane route to take in this one-off job, but I’m quite surprised that I have to do it at all.
Actually, this turned out to be my fault – I was parsing two documents at the same time, one with a namespace declaration set correctly (for parsing my Atom file), and one with no namespaces set. I used the latter for my xpath query, which clearly didn’t work – many thanks to everyone who left a comment!
Tagged: atom, blogger, libxml2, python, real |
11 Comments »