Thursday, February 05, 2009

BigTable

This article is taken from IT toolbox Data Management section.

About BigTable

BigTable is a propietary scalable distributed database system from Google Inc., a well known US based Internet services company, for supporting and managing the data intensive distributed GFS (Google File System) over clustering computers. BigTable has been designed, mainly, for managing petabytes of structured data distributed over a huge number of remote servers and computer nodes, specifically for Google Products.

Bigtable is being used in over sixty Google products and projects such as Google Search engine, Google App Engine, Google Analytics, Google Finance, Orkut, Personalized Search, Writely, Google Earth, YouTube, Google Reader, Google Maps, etc.

Bigtable is capable of handling demanding workloads of various intensities, ranging from throughput-oriented batch-processing jobs to latency-sensitive serving of data to the end users, specific to a particular Google application.

Basic Architecture of a BigTable

Although some operations of Bigtable are just like any traditional database, it is not based on relational database model. Architecturally, Bigtable is designed as a sparse, distributed, persistent multi-dimensional sorted map. Each value of the map is an array of bytes which is indexed by a row key, column key, and a timestamp. Bigtable basically treats data as series of continuous strings. Tables in BigTable are multidimensional and split-able.

Data model of BigTable supports dynamic control over data layout and format. Bigtable has a number of choices for schema which can be used to control the locality of data dynamically based on particular client application. Location of data can be in-memory or directly from the disk. Also, BigTable is designed to handle serialized inputs of structured as well as semi-structured data into these strings from client applications by application of indexing mechanism used in BigTable.

BigTable supports MapReduce programming model for parallel computations over large datasets over clustering computers. BigTable uses Chubby Lock service in order to facilitate synchonizing of the distributed applications for sharing application resources.

For further information, references:-

http://labs.google.com/papers/bigtable.html

http://www.usenix.org/events/osdi06/tech/chang/chang_html/index.html

No comments:

Post a Comment