Data Modeling
The blog article “Data Vault 2.0 Staging Area learnings & suggestions” by Roelant Vos shows an approach to generate hash keys for Data Vault 2.0 in the staging layer for business keys, link relationships and hash difference determination.
“Introduction to Cassandra Data Modeling” video by dbtube with Cassandra storage model and data modeling.
“Pour Some Schema On Me: The Secret Behind Every Enterprise Information Lake” by Murthy Mathiprakasam on Informatica blog strengthens the need to care about schematas and metadata – just pouring log data, sensor data, etc into a data lake is not sufficient to get data quality in the long run.
Data Architecture
Link to ThoughtWorks’ “Rethink Dallas” videos on agile topics:
- Agile architecture (Molly Bartlett Dishman, Martin Fowler)
- The death of agile (Dave Thomas)
- Rethinking the agile enterprise (Brandon Byars)
Data Storage
“The Top 10 Posts of 2014 from the Cloudera Engineering Blog” by Justin Kestelyn contains many articles dealing with right-time capabilities for Hadoop ecosystem, e.g. Spark, Kafka, Impala.
MongoDB acquires storage engine WiredTiger. Press release: “MongoDB acquires WiredTiger Inc.“
Frits Hoogland started an in-depth blog series about Oracle PGA:
- Oracle database operating system memory allocation management for PGA
- Oracle database operating system memory allocation management for PGA – part 2: Oracle 11.2
- Oracle database operating system memory allocation management for PGA – part 3: Oracle 11.2.0.4 and AMM: Quiz
- Oracle database operating system memory allocation management for PGA – part 4: Oracle 11.2.0.4 and AMM
Data Flow
Hortonworks webinar recap on Kafka & Storm with recording, slides, and Q&A: “Discover HDP 2.2: Apache Kafka and Apache Storm for Stream Data Processing“. Recently, Apache Storm 0.9.3 has been released with improvements in HDFS, HBase and Kafka integration. The new release allows Storm to write into Kafka – so Storm can now use Kafka as source and as target: “Storm 0.9.3 Released“.
Data Visualization
“10 significant visualization development: July to December 2014” by Andy Kirk and “The Best Data Visualization Projects of 2014” by NathanYau show great visualisations in 2014.
Data Statistics
Google research blog “Automatically making sense of data” about automatically discovering insights from data and providing a human-readable explanation (see The Automatic Statistician project site).
The title says it all “Open-Sourced Advanced Analytics is increasing…” by Alexander Linden.
Data Quotes
“If you have data, you have a schema. Whether you want one or not.” tweeted by Karen Lopez.