Skip navigation

Twitter-proxy: Any Interest?

—————
Update:Over 100,000 paying subscriber views on our premium content service.
—————-

The stories of twitter going down frequently don’t need repeating here. Instead, I want to ask the community if there is any interest in addressing the problem.

As many are aware, Twitter’s problem with scaling is not RoR, it’s not Joyent NTT, or … Twitter’s scaling problem is exactly the same thing that makes it valuable: their database of users. And getting a traditional SQL /Relational DB to scale horizontally is pretty tough. Sharding works for some apps but not others.

It so happens that our new distributed database technology is rather well suited for twitter-style high-volume reliable messaging. If there is sufficient community interest we could help solve downtime by putting together a “twitter-proxy” that keeps twitter users on twitter, but provides an additional layer of data accessibility in the ecosystem. Not compete, just help keep users happy.

Consider the messaging problem:

Nothing is as easy as it looks. When Robert Scoble writes a simple “I’m hanging out with…” message, Twitter has about two choices of how they can dispatch that message:

  1. PUSH the message to the queue’s of each of his 6,864 followers, or
  2. Wait for the 6,864 followers to log in, then PULL the message.

The trouble with #2 is that people like Robert also follow 6,800 people. And it’s unacceptable for him to login and then have to wait for the system to open records on 6,800 people (across multiple db shards), then sort the records by date and finally render the data. Users would be hating on the HUGE latency.

So, the twitter model is almost certainly #1. Robert’s message is copied (or pre-fetched) to 6,864 users, so when those users open their page/client, Scoble’s message is right there, waiting for them. The users are loving the speed, but Twitter is hating on the writes. All of the writes.

How many writes?

A 6000X multiplication factor:

Do you see a scaling problem with this scenario?

Scoble writes something–boom–6,800 writes are kicked off. 1 for each follower.

Michael Arrington replies–boom–another 6,600 writes.

Jason Calacanis jumps in –boom–another 6,500 writes.

Beyond the 19,900 writes, there’s a lot of additional overhead too. You have to hit a DB to figure out who the 19,900 followers are. Read, read, read. Then possibly hit another DB to find out which shard they live on. Read, read, read. Then you make a connection and write to that DB host, and on success, go back and mark the update as successful. Depending on the details of their messaging system, all the overhead of lookup and accounting could be an even bigger task than the 19,900 reads + 19,900 writes. Do you even want to think about the replication issues (multiply by 2 or 3)? Watch out for locking, too.

And here’s the kicker: that giant processing & delivery effort–possibly a combined 100K disk IOs— was caused by 3 users, each just sending one, tiny, 140 char message. How innocent it all seemed.

Now, are there any questions why twitter goes down when there’s any kind of event?

I once ran at the Bruce Jenner Classic

This is where we (potentially) come into the picture: we’ve spent the last 2 years developing a web architecture built on our horizontally scalable distributed database, and this kind of [lookup | message passing | writing] is what it eats for breakfast. We haven’t had any twitter-sized days, but we are seeing the architecture scale as designed.

You know how Yahoo News or Google News or NYTime or CNN shows everybody the same stories, and after you read them, the front page is boring? It’s a big database problem–you have to keep track of what every user has read, and SQL falls short. Our system is designed to scale horizontally so it can keep track of what hundreds of millions of individuals have read, and then show users the [highest rated | most viewed | etc.] stories that are new to them. But since we don’t have a deal with any news guys yet, we’re building out the most database intensive feed reader on the planet. It has plenty of nifty features not found in Google Reader, Bloglines, etc. But that’s an aside.

The Idea: twitter-proxy for the people

Addressing Twitter’s downtime could be pretty straightforward. It could work much like a (psudo-reverse) proxy:

  1. You enter your twitter credentials on the proxy site
  2. You can post your tweets to the proxy. If twitter is up, we’ll post there, too.
  3. We’ll get your friend list and GET and store their tweets in our db.

When twitter is up and fully functional, twitter proxy contains a mirror of all the tweets from each to the twitter-proxy registered members, and the people they follow.

twitter-proxy idea

When twitter is down, you can still post a tweet to twitter-proxy. That message will immediately be available to anyone who is in our system. (How are they in the proxy system? Either they registered directly, or they are being followed BY someone who registered, so we automatically grabbed their status updates.)

Ground rules:

  1. You should be able to access this system with nothing more than your existing twitter credentials. No separate login.
  2. We would expose a twitter-compatible API so outside clients would “just work”. (e.g. change the /etc/hosts file to resolve twitter.com to another IP)

Twitter is the new mail

Because twitter has done such a great job with their API, the net effect of a twitter-proxy is that you could could still send and receive your twitter messages, directly from twitter, or via twitter-proxy. If your friends are sending SMS messages to twitter, they would still end up at twitter-proxy.

Communications Infrastructure

The win is that when twitter goes down, there is another component of the ecosystem that can be alive and healthy. Messages sent via the twitter-proxy system would get to every user on the proxy system. (again, either registered directly, or was followed by someone who did register.) And twitter users stay twitter users. No one is split off to different, competing platforms.

We don’t have any experience with a SMS->HTTP gateway, so if twitter is down, the only way to get messages to and from your friends via the proxy is HTTP. That means a web page or web client. But hey, use your iphone if you’re out and about.

Moreover, we should be able to support fast Search, and the RSS/Atom feeds of people’s tweets would be available in real time, too. Built into the system could be other nice-itys such as “how many people viewed this tweet” and top-read tweets (+ that are new to you.) It’s up to your imagination.

Caveats

First of all, we won’t embark on any twitter-proxy system if the twitter folks aren’t cool with it. We would need their OK, first.

Second, enough of you–twitter diehards–need to tell us you want such a system. From where we’re at, it shouldn’t take long to build it, if there’s enough demand.

If you want it–let us know, loudly.

Thanks for reading.

-Israel

17 Comments

  1. Posted February 8, 2008 at 8:02 pm | Permalink

    this sounds like a neat idea, i just wonder if the guys at twitter are already thinking on something similar for their scaling problems..

  2. Posted February 9, 2008 at 5:24 am | Permalink

    I wonder how sometimes the solution comes from thinking out of the box. Its a good proposition. will be good to know what Twitter team is thinking about this problem

  3. joearnold
    Posted February 9, 2008 at 8:06 am | Permalink

    So how is twitter ever going to be a “mission-critical part of their communication suite” if it _isn’t_ an open system. The fact that you have to ask twitter if this is okay smells.

  4. mofooker
    Posted February 9, 2008 at 8:46 am | Permalink

    First off, thanks for setting the record straight on Twitter’s scalability problem not being RoR related, that misconception has been floating around way too long. I am sick of hearing that crap from non-techies and techies alike who aren’t worth a shit mouthing off about Twitter RoR scalability problem.

    Back to my main point, I think it is a great idea, in fact I thought of the same idea myself several weeks ago but it looks like you have the technology ready to go. I would up your suggestion by saying go build a Twitter clone instead of just being a proxy. Many Twitter users have openly said they would switch to a reliable alternative, so just do it. And there is no need to ask for permission, screw that, you are in no way obligated to Twitter not to build a clone.

  5. Posted February 9, 2008 at 9:52 am | Permalink

    HELL YES!

    anything, anything that could make twitter more reliable.

  6. scorpion032
    Posted February 9, 2008 at 9:58 am | Permalink

    Oh Yea, Of all I know, Evans will surely have it done ;). But why a proxy? U can implement the same design on twitter itself.

  7. bngu
    Posted February 9, 2008 at 10:23 am | Permalink

    I developed my site in Ruby on Rails and would be interested in exploring your solution for peer-distributed file system database. Is there a way I can sign up for the private beta?

  8. Posted February 9, 2008 at 2:02 pm | Permalink

    bngu, part of our medium-term plan is to let developers code against our system, but our platform isn’t ready for arbitrary outside use yet.

    However, we’re more than happy to take a look at your app and see if there’s a quicker fit. Let me know. Cheers. israel–assetbar.com

    scorpiono32,
    Yes, twitter already uses proxy servers. Our idea is that our alternate &data store would/should scale and provide better access for lots of users. Happy to talk to twitter directly about this. We just don’t happen to know any of those guys and wanted to see if there’s community interest, first.

  9. bngu
    Posted February 9, 2008 at 6:22 pm | Permalink

    Israel,
    How would you like to go about looking at my app to see if there is a quicker fit? You can check out my site at http://www.jiggyme.com. You can reach me at bob_ngu at yahoo dot com.

  10. Posted February 9, 2008 at 8:56 pm | Permalink

    Why would you do this? Why not just sell your technology to Twitter and have them do it themselves?

    I can’t see the Twitter guys going for this, but you never know. I wish Twitter was more reliable, and I think your idea is definitely interesting.

  11. tomnixon
    Posted February 10, 2008 at 1:25 am | Permalink

    This is a really interesting idea.
    It strikes me that Google could make a move into this space by open-sourcing the Jaiku code so that anyone could set up and host their own Twitter proxy. This could pave the way towards a distributed Twitter, much like the global email system.

  12. Posted February 10, 2008 at 3:35 am | Permalink

    Guys, I think you’re thoroughly missing the point. Messages need to be filtered BEFORE they hit the database. http://www.texttechnologies.com/2008/02/09/scalable-twitter/ spells out why and how. It’s supported by http://www.dbms2.com/2008/01/16/twitter-could-easily-be-made-reliable/ .

    The key piece here is CEP (think Coral8 or StreamBase). They filter 250,000 messages/second without blinking, before any kind of aggressive clustering. That’s the kind of scalable that will be needed if Twitter is to grow beyond the early adopter niche.

    CAM

  13. Posted February 10, 2008 at 6:42 am | Permalink

    Curt,
    I love your post and 100% agree. And I’m a big fan of CEP–I looked at the Stanford streams project back in the day.

    I hope I didn’t miss the point — I don’t think I precluded complex pre-processing and in-memory queues.. 🙂 We just said we could worry about the messaging and storage if there’s demand for a reliable, scalable twitter-proxy system.

    By the way, do you have a public source for the “million registered users” figure? The best private data I have indicates twitter is dramatically smaller than assumed. I’ve even heard active users under 20K (!), but I don’t know how current that data is.

    Does anyone have data on # of registered users and tweets/day?

  14. invisiblebirds
    Posted February 12, 2008 at 11:11 am | Permalink

    I would like to see this. Of course, I’d also like an invite code for Assetbar – I was one of the first users for Onstad’s Achewood ‘bar and was immediately impressed. Pretty please?

  15. Posted February 12, 2008 at 3:57 pm | Permalink

    Invisiblebirds, with such kind words I HAVE to give you an assetbar invite. Bring FF/Safari and your OPML…

  16. Posted February 16, 2008 at 3:03 pm | Permalink

    The 1 million figure is what Twitter claims, no? Or anyway something high 6 figure and growing. I think the number of ACTIVE users, however, is as you suggest vastly lower. Look at the speed of your timeline vs. the speed of the public timeline, and you’ll probably conclude that the people you follow put out a non-trivial fraction of all the posts.

    CAM

  17. Posted April 28, 2008 at 5:23 am | Permalink

    This is the best article about the scalability problems of Twitter I’ve red so far.

    Microblogs are changing the world, in many, many aspects.

    Within 5 minutes a messages can be posted on the thousands upon thousands of people.

    All major languages seem to have their microblog service(s) running at the moment.

    It’s fascinating to witness this day and age !

    Pieter Jansegers
    http://microblogs.ning.com


23 Trackbacks/Pingbacks

  1. […] Assetbar, a new Google Reader competitor in early beta mode, have made an interesting proposal to host a proxy for Twitter’s database on their database system: It so happens that our new distributed database technology is rather well suited for twitter-style […]

  2. […] This post by the people at Assetbar on an always-up twitter-proxy is strikingly similar to the learn-to-code project I’ve been working on, a twitter backup and repeater. Of course, they’re building theirs to be scalable, while I’m building mine for me (and maybe a few friends.) That’s the difference between programmers who know what they’re doing and a product manager just having some fun in his spare time. This entry was written by greg and posted on February 8, 2008 at 7:08 pm. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback: Trackback URL. « Art or science? […]

  3. […] Twitter-proxy: Any Interest? « Assetbar: drinks and recipes (tags: architecture proxy twitter) […]

  4. […] Because twitter has done such a great job with their API, the net effect of a twitter-proxy is that you could could still send and receive your twitter messages, directly from twitter, or via twitter-proxy. If your friends are sending SMS messages to twitter, they would still end up at twitter-proxy. Source: Twitter-proxy: Any Interest? « Assetbar: drinks and recipes […]

  5. […] To any ex-Yahoo’s out there, if the kind of problems described in this post sound interesting to you, we’re always hiring. Give me a holler. […]

  6. […] sounds like others are thinking similar things. Actually, it may not just be twitter’s growing pains as the main inspiration […]

  7. […] The Yahoo layoffs yesterday have me wondering: Is this really a good thing for Microsoft’s proposed acquisition? I don’t think so–especially if Google becomes the talent beneficiary. […]

  8. […] that postscript Dare links back to a post on the AssetBar blog where there was a post talking about creating a decentralized version of Twitter. It is this […]

  9. […] [TWITTER] Twitter-proxy: Any Interest?, assetbar.wordpress.com, via:l33t.reddit.com […]

  10. […] others, Dave is picking up some ideas that the guys from Assetbar discussed a while back: #4 It must be possible to use your clone when Twitter goes down and then […]

  11. […] Twitter-proxy: Any Interest? […]

  12. […] built into our architecture. We had previously asked if there was any interest in us making a twitter-proxy and the post garnered quite a bit of interest. But somehow a direct clone didn’t seem […]

  13. […] Twitter-proxy: Any Interest? « Assetbar: drinks and recipes (tags: scaling twitter performance database) […]

  14. […] hosting providers like Joyent. They have recognized a fundamental problem with their service, as pointed out by Assetbar in an earlier post who wanted to offer a proxy service for […]

  15. […] Dare looks to Israel's analysis of the impact that this follow relationship has. […]

  16. […] Dare looks to Israel’s analysis of the impact that this follow relationship has. […]

  17. […] both within Twitter and without have framed the conversation as an architectural challenge. Meanwhile the nattering classes have […]

  18. […] Twitter-proxy: Any Interest? (Assetbar) Never thought about the scaling problems of Twitter in such detail, but indeed, this is a challenging problem. Looking at the profile of the mentioned example ‘twitters’ they even have more followers…Of course, twitter now has $15M to worry about it… (tags: twitter architecture databases scalability performance) […]

  19. […] it’s what Dare Obasanjo posits, building on what Israel says on the Assetbar blog, and they’re both way smarter than […]

  20. […] how Twitter has the wrong architecture for the application it’s become. Not only does Isreal’s post [Twitter-proxy: Any Interest?] accurately describes the problem with the logical model for Twitter’s “followers” […]

  21. […] Twitter-proxy: Any Interest? « Assetbar: drinks and recipes (tags: twitter) […]

  22. […] Twitter-proxy: Any Interest? « Assetbar: drinks and recipes (tags: web2.0 twitter scaling scalability programming performance) […]

  23. […] that postscript Dare links back to a post on the AssetBar blog where there was a post talking about creating a decentralized version of Twitter. It is this […]

Post a Comment to buzzworkers

You must be logged in to post a comment.