DataBeat week 50/2014

Reference to blogs, tweets, discussions, etc that caught my attention during the last week.

Data Modeling

Blog post, link to web session and source code on how to use BIML to generate Data Vault. BIML (Business Intelligence Markup Language) is a XML dialect for defining BI assets liek tables, ETL flows, etc. “. See Auto Generate Data Vault using Biml – Part 1 – Webinar Content” by Peter Avenent.

A LinkedIn discussion about “Data Vault, Data Virtualisation and agile DW” addresses the current approach to make Data Vault also more agile for Data Marts. Views in the Data Mart layer are often sufficient with faster hardware and/or in-memory column-oriented DBs.

Data Storage

“Archiving everything with Hadoop” by Mark Cusack on Roberto Zicari’s blog mentions three key features that Hadoop has to provide in order to be suitable as long-term storage: schema preservation, security/governance, and SQL access.

Data Flow

Mark Rittmann started a threepart blog series on Hadoop ETL using MapReduce, YARN, Tez, and Spark with examples and an overview how the tools work:

Gwen Shapira lists several links for more information about Kafka: “Getting started with Kafka – Resources“.

ETL tools are widely used in the classical DWH because of their supposed productivity and maintenance advantage compared to manual coding. But what is the role of ETL tools if code for data loading is generated automatically? Roelant Vos’ view on his blog article “Do we still want to automate against ETL tools?“.

Data Tools

Reference to the “Impala Cookbook” compiled by Cloudera’s Impala team covering schema and physical design, memory usage, query tuning basics, etc.

Data Visualization

Mike Bostock’s d3.js 3.5 is now available on GitHub. d3.js is a powerful JavaScript visualization library for HTML and SVG.

“Star statistician Hans Rosling takes on Ebola” by ScienceMagazine / Kai Kupferschmidt. Rosling is well-known from his inspiring talks while showing great visualisations.

1 Comment

Smart Shyam on April 30, 2015 at 08:05

Nice article!
ETL are biggies gearing up to handle big data and offering services that Hadoop is not (like metadata management). This convert to ETL from distributed applications still thinks the future is challenging enough for ETL Tools to continue to excite the IT specialist and business analyst. The key to recognizing whether or not your ETL tool is relevant is focusing on the value the ETL tool brings to the data.
http://www.bestandroidtrainingchennai.in/

Data Modeling

Data Storage

Data Flow

Data Tools

Data Visualization

1 Comment

Leave a reply Cancel reply

Recent Posts

Archives

Categories

DataBeat week 50/2014

Data Modeling

Data Storage

Data Flow

Data Tools

Data Visualization

Related Posts

DataBeat week 51/2014

Data Vault and Star Schema with PlantUML: Entity Relationship Diagram as Code

Log-based Change Data Capture - lessons learnt

Data Vault resources

1 Comment

Leave a reply Cancel reply

Recent Posts

Archives

Categories