Skip navigation

The trouble with versioning (v1)

xmarkjenkinsx

Stop complaining. No one else gets it right, either.

Versioning posts is hard. Google doesn’t do it. Bloglines beta used to partially do it (but they turned it off). But I want Assetbar to fully support versioning. It can make you want to hit your head against the wall.

So what is this “versioning” business?

Joe Blogger writes a post, then presses “publish”. Depending on what blogging software he’s using, the post generally gets a Globally Unique ID (GUID), title, pubdate and some other metadata.

Then a feed reader like Google reader, Bloglines or Assetbar fetches the ATOM/RSS feed. It reads the GUID, title, etc. and puts the asset into a data store.

Since the GUID is unique and it’s a well-behaved blogging platform, that post is uniquely identified as new. When Suzy fires up her Google Reader app, she can now see there’s a new post to read.

Answer the question: what /should/ happen?

So that’s all fine and good, but what happens when Joe Blogger goes back and makes edits to his post? What does the blogging software do? What do the feed readers do? What should they do?

If the blogging software is tight(*), it updates the pub date, but keeps the GUID the same. The next time the feed fetcher comes along, it uniquely IDs the post from the GUID, notes the new pubdate and then saves the updated content. But what should be displayed to Suzy Reader?

If she hasn’t seen the original post, giving her the new version is the right thing to do. But what if Suzy Reader has already seen the first version?

In Google Reader, you only ever see the current version(**) of the post. Suzy doesn’t get to see multiple versions and Suzy is never sees any indication that the post has been updated. Bloglines beta used to treat the update like a new post, and you had to wade though each version.

Good enough is also boring

Google Reader’s way is simple and it works in more cases: don’t show the old data and no one not many will complain. But it’s incomplete. You simply don’t get versions and you can’t see any changes. If you really care about feeds, this is insufficient. (I also don’t agree with their policy of marking old posts as “read” in your account regardless of whether or not you actually read them. But that’s their decision.)

Bloglines’ method is arguably better, because you at least get access to the data. But it has the drawback that Suzy will get notified if there are minor or major updates. She may look at her Scoble feed and see 10 items with the same title. Depending on how many feeds Suzy has, how often the updates are made, and how much Suzy cares about the changes this can either very useful, or terribly annoying.

Let it be noted that I have personally seen 64 versions of single post from Guy Kawasaki. You don’t want to experience that.

So here’s our dilemma: we wanted to take an alternate approach that is a best-of-both-worlds. We want to keep all the feed versions for the fan(atic) for whom detail is important, while not overwhelming the average or power user. Make us resilient against broken feeds, and very useful to professional readers.  I think we achieved our goal,s but in the process, we had to rewrite some core system components. And we’re still fixing them. Ahem.

versions, baby

Temporary Fail, Future Win

What was our solution that pooched the system?

Every time an edit/new version of a post comes in, we save the new version “with” the original asset. So if you’re seeing an post for the first time, it might already be up to version 4. But the UI notes that it’s the 4th version, and provides you a simple way to see the all other version of the asset. When the 5th and 6th and 10th version of a post come in, Suzy isn’t forced to read it again, but she always has access to all the data.

I think every feed reader should do this:

Display versions well

Now when authors edit posts for clarity, or tone down some mean-ness, Suzy Reader can see all the versions on Assetbar. And in playing around with this for the last two weeks, I’ve seen some pretty interesting edits that I didn’t get with Google Reader. And if people want it, we could easily add a feature so that Suzy can get a list of the assets from any or all feeds that have changed [n-times] since she last viewed them.

Where’s my Beta?

So our new versioning thing is really sweet, and it opens the door for us to handle versioned comments (edit your comment, but the other versions are still accessible), versioned photos, and versioned everything. This new core capability is going to be really useful for more than versions of feed posts–it’s almost wiki-izing everything.

But in creating our versioning scheme, we broke the assumptions of our mirroring scheme and had to redo that. Since we rely on a distributed database filesystem, keeping all the right parties mirrored and in-synch is vital to the otherall operation of the system.

Our system basically works like this:
Assetbar technical overview

Bottom line is that we hope to have this locking-mirroring mess sorted out soon and start opening the system to you, the patient public. It will be fun, and you’ll have versions, too.

Cheers!

This post brought to you by :LavAzza Super Crema Espresso

(*) Another thing that makes versioning such a bother is that the pub software isn’t tight. In fact, it’s mostly terrible.

GUIDs aren’t created (I’m looking at you, MacRumors), shared feed items mishandle create dates, Feedburner adds and subtracts “flare” tracking gifs, wordpress adds random numbers that change with every HTTP request, Pub dates are different on two hosts serving the identical asset. And that’s just the problem when they’re sending semi-valid feeds. So many feeds are just busted, the Internet basically shouldn’t work.

(**) Google reader only shows you the current version, but GFS certainly has the ability to store dated versions of files. It would really surprise me if they weren’t keeping all of the versioned data on GFS. But then, I’m surprised they use the squid cache proxy. (what to use instead) Hopefully, they’ll update Reader to let Suzy see all previous copies of all posts. And maybe they’ll stop marking posts as read when you haven’t seen them yet, too. Give more power to the people!

Advertisements

Post a Comment

You must be logged in to post a comment.
%d bloggers like this: