Book Review "Data Architecture: A Primer for the Data Scientist" by W.H. Inmon / D. Linstedt

The book “Data Architecture: A Primer for the Data Scientist” by W.H. Inmon / D. Linstedt contains the subtitle “Big Data, Data Warehouse and Data Vault” which summarizes pretty well the main focus of the book.

The first chapter introduces and defines structured and unstructured data. Unstructured data is further divided as repetitive and nonrepetitive. Repetitive unstructured data (e.g. metering data, clickstream data, etc) is mainly processed by Hadoop/NoSQL-centric tools while nonrepetitive unstructured data (e.g. emails, documents) must be processed by textual disambiguation. Repetitive unstructured data wrongly gets most attention nowadays because it is rather easy to manage compared to repetitive unstructured data. But the latter is the most important for data scientists to work on.

The next chapters introduce Big Data, Data Warehouse and Data Vault. All pieces are put together in the remaining chapters following “6.1 A brief history of Data Architecture”. Distillation and filtering are the types of processing repetitive unstructured data while various techniques for contextualization of nonrepetitive unstructured data is necessary, e.g. acronym resolution, tagging, stop word processing, word stemming, etc.

The last chapter subsumes the “composite data architecture” by means of timeliness of data:

Integration of data in the Data Warehouse / Data Vault
Analytics and Archival of Big Data
Metadata within and across environments

Data is detailed and granular as a system of record.

The book provides an architectural, high-level overview of Big Data, DWH and Data Vault. The book ends with a concise data architecture blueprint: the “composite data architecture” combines the authors’ work from the past (e.g. “DWH 2.0” by W.H. Inmon and “Data Vault” by D. Linstedt) in one architecture. It is one view how to combine different types of data together and provides a lot of ideas to follow.

“If you are building a one-story, one-room log cabin in the forest, you don’t need much of a blueprint. But if you are building a large, complex expensive multistory building in the middle of a city, you need blueprints. There is much to be considered when it comes to building a multistoried structure in the middle of a modern city. And there is the same complexity and expense when it comes to a modern information infrastructure for technology and data.” (quotation extracted from the book, page 329)

Some negative remarks:

I found it strange and somewhat inconvenient that there are sub charter titels (e.g. for 1.1., 1.2, 2.1, 2.2. etc) but there is never a chapter title (e.g. for 1., 2., etc).
Unstructured text is discussed in detail – which is good. Image data, video data and audio data are neglected.
There are many illustrations to brigthen up the text but some of the illustrations are IMO too mundane.
Some repetitions of content.

1 Comment

Aviv Liberman on October 27, 2015 at 12:08

this review basically criticizes the book and provides it's negative points and faults.
a review should provide more detailed information.

Book Review “Data Architecture: A Primer for the Data Scientist” by W.H. Inmon / D. Linstedt

1 Comment

Leave a reply Cancel reply

Recent Posts

Archives

Categories

Book Review “Data Architecture: A Primer for the Data Scientist” by W.H. Inmon / D. Linstedt

Related Posts

Data Vault and Star Schema with PlantUML: Entity Relationship Diagram as Code

DataBeat week 52/2014

The Zettabyte challenge

Data Vault 2.0, Hashing and DB2 LUW

1 Comment

Leave a reply Cancel reply

Recent Posts

Archives

Categories