Data Modeling
Data Architecture
- Security enhancements like folder-level HDFS encryption and sharing of data between Hive, Impala, and others using Apache Sentry
- Apache Spark 2.1
- Impala 2.1 and Hue 3.7
- Apache Flume includes an Apache Kafka channel
Data Storage
Gregory Steulet did a comprehensive performance comparison of different mySQL versions including MariaDB and Percona. See “MySQL versions performance comparison“.
See MapR blog article “What Kind of Hive Table is Best for Your Data?” by Jim Bates for workload-specific options to improve Hive performance. Different storage formats (text file, RCfile, ORCfile) and compression types were considered in the comparison. Scripts used for the comparison are available.
Data Flow
Data Tools
Data Visualization
The pdf article “An Economist’s Guide to Visualizing Data” by Jonathan A. Schwabish in Journal of Economic Perspectives [28(1): 209-234] contains a consolidated overview of visualization:
- various diagrams types (line chart, clutterplot, pie chart, etc)
- good and bad visualization design
- visualization tools / resources
“Das Jahr 2014 in der Neuen Züricher Zeitung” shows NZZ newspaper article headings as interactive bubbles by category and by time.