Skip to main content

Dare Obasanjo aka Carnage4Life - Project Cassandra: Facebook'...

Popularity Report

Total Popularity Score: 0

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Rank

URL Tag Cloud

Related Lists

Bookmark History

Saved by 4 people (0 private), first by anonymouse user on 2008-07-14


Public Comment

on 2008-07-30 by alexband

Facebook Cassandra

on 2008-07-30 by alexband

Disk is the new tape

Public Sticky notes

Cassandra has several optimizations to make writes cheaper. When a write operation occurs, it doesn't immediately cause a write to the disk. Instead the record is updated in memory and the write operation is added to the commit log. Periodically the list of pending writes is processed and write operations are flushed to disk. As part of the flushing process the set of pending writes is analyzed and redundant writes eliminated. Additionally, the writes are sorted so that the disk is written to sequentially thus significantly improving seek time on the hard drive and reducing the impact of random writes to the system. How important is improving seek time when accessing data on a hard drive? It can make the difference between taking hours versus days to flush a hundred gigabytes of writes to a disk. Disk is the new tape.

Highlighted by alexband

The Cassandra data model is fairly straightforward. The entire system is a giant table with lots of rows. Each row is identified by a unique key. Each row has a column family, which can be thought of as the schema for the row. A column family can contain thousands of columns which are a tuple of {name, value, timestamp} and/or super columns which are a tuple of {name, column+} where column+ means one or more columns. This is very similar to the data model behind Google's BigTable.

Highlighted by alexband

About a week ago, the Facebook Data team quietly released the Cassandra Project on Google Code. The Cassandra project has been described as a cross between Google's BigTable and Amazon's Dynamo storage systems. An overview of the project is available in the SIGMOD presentation on Cassandra available at SlideShare.

Highlighted by jangondol

Facebook has followed their lead by developing Cassandra which they admit is inspired by BigTable. 

The Cassandra data model is fairly straightforward. The entire system is a giant table with lots of rows. Each row is identified by a unique key. Each row has a column family, which can be thought of as the schema for the row. A column family can contain thousands of columns which are a tuple of {name, value, timestamp} and/or super columns which are a tuple of {name, column+} where column+ means one or more columns. This is very similar to the data model behind Google's BigTable.

Highlighted by jangondol

At first glance, this is a very nice addition to the world of Open Source software by the Facebook team. Kudos.

Found via James Hamilton.

Highlighted by jangondol