Reference to blogs, tweets, discussions, etc that caught my attention during the last week.
Data Modeling
Data independence as the key idea of RDBMS is illustrated in “Relational algebra-How it makes Relational Databases go faster” by Kyle Hailey. Algebraic optimization is done by the database system and not by the programmer as in many NoSQL databases.
Fuss is regularly made about inefficient schema evolution in RDBMS. Just throwing data as textfiles into Hadoop is not really the solution. With Hadoop, you get many choices about file formats. Avro is a choice that allows schema evolution as described by Gwen Shapira in “The problem of managing schemas“.
Data Architecture
Popular NoSQL and Hadoop blog articles 2014:
- “DZone Best of the Year: NoSQL Zone Edition” by G. Ryan Spain (DZone blog)
- “Top Ten Popular Hadoop Blog Posts of 2014” by Jules S. Damji (Hortonworks blog)
- “Top 10 Hadoop Blogs of 2014” by Karen Whipple (MapR blog)
- “The Top 10 Posts of 2014 from the Cloudera Engineering Blog” by Justin Kestelyn (Cloudera Blog)
Data Storage
“Notes on machine-generated data, year-end 2014” by Curt Monash is a compact summary about kinds of machine-generated data, their database structures, continuous events and streaming + memory-centric processing.