Skip to main content

Hueniverse: Scaling a Microblogging Service - Part I

Popularity Report

Total Popularity Score: 0

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Rank

Bookmark History

Saved by 9 people (0 private), first by anonymouse user on 2008-05-14


Public Sticky notes

Scaling a Microblogging Service - Part I

Highlighted by rickyrobinson

Scaling a Microblogging Service - Part I

Highlighted by j_thompson

The retrieval system is where things are not as simple. Unlike webmail services where refreshing a user’s inbox only queries a very simple data set (is there anything new in MY inbox?), refreshing a user’s home page on Twitter queries a much more complex data set (are there any new updates in ALL my friends’ pages?) and the nature of the service means that the ratio of reads to writes is significantly different from most other web services.

It is these constant queries that bring Twitter down during popular events. The fact that a single request for a user’s home page or an API request for a user’s friends timeline must perform a complex iterative query is what causing some requests to take too long, at best timeout and at worst cause the entire system to lag behind. These custom views are expensive and mean that it is much more difficult to cache the final result of a request.

Going through a timeline request, the server first looks up the list of people the user is following. Then for each person, checks if their timeline is public or private, and if private, if the requesting user has the rights to view it. If the user has rights, the last few messages are retrieved, and the same is repeated for each person being followed. When done, all the messages are collected, sorted, and the latest messages are converted from their internal representation to the requested format (JASON, XML, RSS, or ATOM).

Highlighted by smoody

The retrieval system is where things are not as simple. Unlike webmail services where refreshing a user’s inbox only queries a very simple data set (is there anything new in MY inbox?), refreshing a user’s home page on Twitter queries a much more complex data set (are there any new updates in ALL my friends’ pages?) and the nature of the service means that the ratio of reads to writes is significantly different from most other web services.

Highlighted by bankwatch