THOUGHTS ON EFFICIENT PUBLISH AND SUBSCRIBE METHODOLOGY AND PUBSUBHUBBUB

I’ve been thinking a lot about Pubsubhubbub and the applicability of efficient but unreliable messaging services in the course of application development that affects me. Pubsubhubbuub, if you don’t know, is a Google-authored protocol that piggybacks onto ATOM RSS feeds to provide efficient updates of new content in the publish-subscribe model. It’s efficient because it specifies a mechanism to push updates out to subscribed entities, avoiding lame (Google’s words, not mine) polling for updates that is the de facto standard for RSS consumption these days. It’s unreliable because there is no mechanism in the spec that guarantees delivery of the message. Of course, this is not a failing of the spec as much as it is a narrow definition of how the simple design is supposed to work to solve simple problems. For twitter-like real time updates of content on an aggregate website, Google Reader for example, network bandwidth is saved as Google Reader wouldn’t have to poll all its subscriptions repeatedly and a user would potentially see updates to a subscribed resource in a more timely fashion. There are two issues with this system, as I can see:

  1. All of the intelligence is on the “hub” server that publishers send update notifications to and that pushes out the updates to the subscribers. This is -again- by design, and keeps the other pieces of the puzzle, subscribers and publishers as little changed as possible. However, an issue with the hub means subscribers and publishers are cut off from each other. Is there a specification that tells the subscriber to resume polling in the absence of new data within an expected time period? In essence, there may be a need to build fault-tolerance into the system.
  2. Ideally, I’d like to be able to extend the metaphor down to a desktop client. It would be grand if I could write a client that could subscribe to a resource with a central hub server and then keep a port open and listening for publication notifications. The problem is that such a client would not be a persistent subscriber in the way that a website would be and can be expected to be on and offline at random intervals. Part of the problem can be solved by using the hub.lease_seconds parameter in the spec to specify how long a hub should service a subscriber before receiving a new lease request (like an ip lease.) However this approach is inefficient if a client subscribes to many feeds on the hub and needs to resubscribe to all of them at the end of every lease period. I wish the spec could batch up a subscriber’s feeds into one group for leasing. Of course, one could augment the spec to provide this ability, but I wish it were baked in for standardization purposes. (Also to potentially take advantage of spec-adhering code libraries.)

Dave Winer, visionary technologist, understands the plumbing of the internet like few others and envisions pubsubhubbub playing a part (message brokering) in a decentralized version of twitter. Certainly, there’s hope, but unless a means to accommodate transient participants (issue #2) is addressed, I don’t see Pubsubhubbub gaining traction outside of a few large producers and consumers such as the Bloggers, Yahoos and WordPresses of the world. The smaller pieces of the food chain will have to go it alone or augment the spec for their own use. My biggest take-away from this whole exercise is having the idea of leaving a port open in a client app that just listens for HTTP requests and responds to them in an event-driven manner bubbling up to the forefront of my consciousness.

Now I’m off to finish my tutorial series on XPages and Dojo Drag and Drop…

UPDATE: I am (clearly) not alone.