Connotea: for scientists

Doing a little digging on the topic of my last post, I was poking around, and found connotea, which is described as a derivative of It is apparently similar to an independent effort called CiteULike.

At first, it seems like an awful lot of duplication — the core is basically a clone of The biggest difference seems to be that it seems to think of URLs as handles to actual bibliographic entries, which are extracted at bookmarking time from the pages being bookmarked, and the bibliographic handle is the “primary key” (I wonder what happens if two URLs point to the same biblio entry). The analysis works on a few major sites so far, including pubmed and Amazon. Having the bibliographic data then lets them do integration with citation management software (like EndNote). If enough of one’s sources are found online, then I can certainly see that as being a useful tool — I spent way too much time entering LaTeX bibliographies over the years.

But is the new feature “worth” having a segregated social bookmarking service (and data pool) just for scientists?

First, will it work? Assuming that the system is bootstrapped, my guess is: probably. The social aspect of, i.e. the tag-sharing, link-exploring and folksonomy-building will probably work just fine in a “vertical” community such as scientists or lawyers (assuming a high enough degree of participation). The profession-specific shared bookmarking service could very well make folksonomy development go a tad faster, within well-defined communities with a shared jargon (although I feel that jargon semantics don’t carry across subfields, with one field’s definition of a term quite at odds with another’s). Paul Kedrosky will be happy to see another vertical search concept (if he doesn’t know about it already!).

Apart from the duplication of effort, which is only theoretically bad, one obvious downside of the verticalization of the tool is that people doing interdisciplinary work (e.g. scientific lawyers, aka patent lawyers) will probably suffer from the compartmentalization of the meta-data — but they’re used to it by now.

Most interesting to me is the notion that the folks at Nature may have figured out a possible new feature/concept for systems like Maybe it’s worth considering the possibilities that follow from doing more in-depth analysis of the “stuff” being bookmarked, and extracting the key parts of the content of interest, as opposed to focusing (as technologists would naturally do) on the “simple bit”, i.e. the URL. After all, the URL isn’t what’s interesting — it’s the stuff in the page that is.

As an example, this morning I bookmarked the page on gawker that was my introduction to the Starbucks corporate anthem (warning, it’s depressing as hell). I bookmarked the page because “it was there” — but it would be nice for the system to know that what’s key about that page is the link to the MP3 file — not just the words that Gawker uses to introduce it. If others have bookmarked another page that happens to include the same link, wouldn’t let me know about it. A version of something like Connotea that knew about link structures might.

As my kids say, very instering.

Coalesce Bloglines feeds?

Due in part to my initial waffling about domain names (and subsequent struggles with Apache and WordPress configuration), bloglines now has multiple different URLs which map to the same actual feed (,,,, etc.). This isn’t a problem (I hope) for readers, because of DNS redirects, and Apache rewrite rules. However, it’s a very minor annoyance for me when it comes to understanding my readership trends (as represented by the bloglines contingent) — I have to look at the statistics for 9 different bloglines id’s.

Are there any was to tell Bloglines that some feeds should be merged? (I have tried to do “permanent redirects”, but I’m not convinced that it’s had any effect on the bloglines database structure).

More broadly: it’s interesting that bloglines didn’t put in the infrastructure to let people “claim” feeds, the way Technorati does. It would allow authors to help bloglines serve bloglines readers, to automate URL changes and the like, thereby making them a higher-value aggregator from the author’s point of view (note that I don’t yet understand the value in claiming a technorati feed).

Bloglines & Twisted

Interesting bit found in my webserver access logs: - - [08/Nov/2004:00:07:32 -0500] "GET /blog/index.cgi/?flav=rss HTTP/1.0" 404 24542 "-" "Twisted PageGetter"

Googling “Twisted PageGetter” confirms that it’s a spider that comes with Twisted Python. Once more, Python’s in the web spidering business, this times w/ bloglines.

Bloglines & Twisted

Interesting bit found in my webserver access logs: - - [08/Nov/2004:00:07:32 -0500] "GET /blog/index.cgi/?flav=rss HTTP/1.0" 404 24542 "-" "Twisted PageGetter"

Googling “Twisted PageGetter” confirms that it’s a spider that comes with Twisted Python. Once more, Python’s in the web spidering business, this times w/ bloglines.

Apache, redirect old RSS feed?

In switching to the new blogging software, I have broken the URLs for the old RSS feeds. If anyone has feedback on how I can use Apache .htaccess and the like to redirect a URL that goes through a file that doesn’t exist (…/index.cgi/index.rss) over to another URL (…/blog/wp-rss2.php), please let me know. To those using aggregators to read this site, 1) you probably can’t see this =), and 2) sorry.

Update: Fixed with some Redirects.

New blogging engine

I was getting lousy performance out of pyblosxom, for reasons that I’m sure have nothing to do with pyblosxom, but with my abuse of it, and my lack of a deep enough understanding of how pingbacks, trackbacks, etc. worked. Also, I didn’t really have the energy to build my own comment-spam filter. Finally, it had served its purpose — it gave me a good feel for what blogging software does. Now I was happy to move on to something with more “GOOBE” (Good Out Of Box Experience), polish, templates, features, etc. I tried blogger again, found it too slow tonight. Looked at Drupal, but its generality scared me. Settled back on WordPress, which I’d played with before and had impressed me. It has an excellent user interface, trivial installation, and seems to have a strong “aftermarket” of plugins and themes. I’m using the Kubrick theme, tweaked with minor CSS changes and a picture that my brother Ivan took on his trip through eastern Europe a few years ago.

So, Will & friends, don’t take it badly — I was a happy customer and then my needs as a customer changed. Thanks for pyblosxom, and keep on trucking.

New features for all you readers: comments are back, the RSS feeds are nicer (e.g. they have dates, which the old one didn’t, which is why all of the old posts that were imported neatly don’t have any history of when they were posted — oh well). I’m sure there’ll be more as I figure this tool out.

Taking comments out

I got hit by comment spam (I guess that makes this a real blog, sigh), and don’t have the time to implement a countermeasure (or the inclination, really), so I’m taking comments off for now. Email me ( if you want to contribute—it’s not like I was getting a lot of non-spam comments anyway.