On what kind of databases is Google working today? Still Bigtable? Recently Google published a white paper describing their distributed database F1 (term comes from genetics “Filial 1 hybrid”) for the missing critical AdWords system. The AdWords database “is over 100 TB, serves up to hundreds of thousands of requests per second, and runs SQL queries that scan tens of trillions of data rows per day. Availability reaches five nines, even in the presence of unplanned outages, and observable latency on our web applications has not increased compared to the old MySQL system” [Google White Paper]. Google claims the database to support typical NoSQL features like high scalability and high availability. The database also supports typical relational features like ACID and SQL. IMO, the database is much more relational than NoSQL-style. Some of the main characteristics of the database are summarized below.
Basic Architecture
F1 is built on top of Spanner. Spanner is a low level data store and responsible for persistence, replication, caching, data sharding, transactions, etc. Spanner servers interacts with the Colossus File System (CFS).
As F1 servers do not contain data, F1 servers can easily be added. Data is synchronously replicated across multiple, widely distributed datacenters. Commit latencies are 50-150 ms are rather high though (see also “Latency and Throughput” below).
Data Model
F1 is not schemaless as many of those loud and trendy NoSQL databases today. F1 has a data model that is comparable to a relational model with some differences. Tables in a F1 schema can be organized as a hierarchy of parent and child tables. The child tables can be clustered with the parent table. Primary and foreign keys are an important part of efficient data manipulation and retrieval. Local and global indexes can be used to speed up queries.
Trackbacks/Pingbacks