Data Modeling
Data Modeling and NoSQL? Data Modeling gets more and more important in the “schema-less” world because a suitable data model ensures data quality, performance, and other characteristics. The slides “Data Modeling deep dive” by MongoDB Inc. illustrate four MongoDB use cases.
“Driving Keys and relationship history, one or more tables?” by Roelant Vos shows two alternatives how to implement a driving key relationship defined in a Data Vault Link table. A detailed example is used to show the approaches with one Satellite or two (more) satellites.
Data Architecture
“Cache is the new RAM” by Carlos Bueno deals with several tech cycles in the past that were hyped to solve almost any problem, e.g. sharding, NoSQL, MapReduce, etc. And what about 2014 and 2016? In-Memory/RAM again? Very worthwhile and entertaining to read with some tongue-in-cheek messages.
Data Storage
Apache Hadoop 2.6.0 has been released as 4th major release for 2014 with nearly 900 Jira issues solved. The announcement from A. Murthy contains some of the changes with major topics like
- heterogeneous storage tiers + archival storage
- security features (e.g. transparent data at rest encryption (beta))
- support for long-running services in YARN
- rolling upgrades
See also Hortonworks blog, Cloudera blog or MapR blog for more details.
There is and will be a lot of change around program execution in the Hadoop stack, e.g. Tez, Spark, etc. But what about HDFS data storage? Changes like Parquet, ORCfile were rather small. Curt Monash argues that a more fundamental change in the data storage would make sense: “Hadoop’s next refactoring?” to address issues like caching and especially in-memory inter-program data exchange.
Instance caging was introduced in Oracle 11gR2 as a means to limit the available CPU resources. OS Processor Group integration goes a step further and allows to link an instance to a named subset of available CPUs. See Nikolay Manchev’s blog “Processor group integration in Oracle Database 12c” for a detailled description.
Data Tools
Data Visualization
The infographic “The Internet is a zoo: The ideal length of everything online” by Mark Uzunian shows statistics about character counts, header length, etc to get most attraction focusing on quantitative criteria (e.g. slides: 6min – 61 slides, blogs: 1500 words, headers: 6 words).
Data Divers
Download a free report “When Hardware Meets Software: How the Internet of Things Transforms Design and Manufacturing” published by O’Reilly.
Oracle 12c Parallel execution white paper.