Personal web proxy: MindRetrieve

24 April, 2005

Since Jim stuck it in his bookmarks a few weeks ago I’ve been using MindRetrieve both at home and at work to proxy my web browsing, and I really can’t fault it.

As you browse it caches and indexes local copies of webpages so that you can search through them later, and even view the cached version if you’re offline. It strips the pages of all HTML except for simple paragrpahs, presumably for easier search indexing, which means no CSS o images, so the cached version is as plain as can be, but this is a fair exchange for disk space and proxy speed. It’s written in Python and uses (I think) PyLucene for the indexing, and is, as a result, very quick. Additionally, since it just uses a query parameter on the search page, it’s very easy and quick to set up a Firefox keyword so you can search from the address bar: I just type “find dvd” and up come all the pages I’ve viewed with “DVD” in them, ranked by relevance.

I’ve used a fair few personal proxies in the last few years, ranging from the very simple to very complex, but MindRetrieve really hits the sweet spot for me. It’s easy to install, set up and use. The proxying is so fast that you forget your’re using a proxy at all, and the searches are fast and the results are relevant. It has the added benefit that the admin UI and search results are all editable because they’re plain HTML and CSS, so you can make them look however you want, even to the extend of adding JavaScript to add in linkbacks from and so on if you like.

Simple and does the job, I don’t think praise gets much higher than that 🙂

See other posts tagged with general and all posts made in April 2005.


24 April, 2005 at 18:58

I just got Beagle running today and it comes with a Firefox extension that tells the beagle daemon to automatically index every page you visit. Seeing as it combines this index with IM logs, feeds and files, I think I prefer this to a proxy. It also doesn’t give the (however minimal) speed decrease of routing everything through a proxy. No cached pages though.

24 April, 2005 at 19:21

Yes, if I could get Beagle running on Windows, and anything running in the .NET runtime didn’t slow my machine to a complete crawl (even with 1GB RAM), that’s what I’d prefer to use. 😉

Certainly all of the main desktop search tools I’ve used have been complete memory hogs, and intrusive on my day-to-day computer usage. MindRetrieve has been memory-light and fast. I have indexers for the rest so I can certainly live with it until something better comes along.

15 June, 2005 at 18:01

This is a great tool. Hadn’t ever used one until you posted your blog entry. As I do my usual work stuff, IT slave labor :), and a lot of personal research I’ve gotten into the habit of saving the more important articles because some sites tend to redesign and can’t find the article at all or not easily sometimes. Now I get the best of both world: can visit the page and if it’s gone go back to my cache.

You are the man,

A fellow PHPBF4 guy