Vector Database – What, Why, and How
In today's data-driven world, vector databases are available to handle complex, high-dimensional data. This article describes vector databases including use cases as well as an example with the PostgreSQL extension pg_vector. What is a vector database? A vector...
How to Be Useful: Unpacking Arnold Schwarzenegger’s Secrets to Success
Did you know that the man who conquered bodybuilding, Hollywood, and the political arena believes that his multifaceted success boils down to just seven principles? Yes, Arnold Schwarzenegger, in his book "Be Useful: Seven Tools for Life," distills the essence of his...
Data visualization with Flourish
Flourish is a data visualization and storytelling platform that helps data enthusiasts understand and communicate complex data. With a wide range of customizable templates and interactive features, Flourish makes it easy to create beautiful and engaging visualizations...
Predictions about data for 2023 and beyond
Predictions about data for 2023 and beyond. End of the year: it’s the time for predictions. Let’s have a look at some predictions regarding data. There are many predictions for Machine Learning, Deep Learning, and AI - explainability, professionalisation, and...
Data Vault and Star Schema with PlantUML: Entity Relationship Diagram as Code
Entity Relationship Diagram as code means developers use the same tools for creating the diagrams - or documentation in general - as for coding. Documentation includes more than just source code and some comments. If the documentation is textual and not binary,...
Materialization examples of Data Engineering with dbt
dbt offers several materialization options to create ETL/ELT processes. The article shows and compares various approaches how to use dbt for ETL/ELT. A previous post contains an introduction into dbt: Data Engineering with dbt – first steps using PostgreSQL and...
Data Engineering with dbt – first steps using PostgreSQL and Oracle
dbt is a Data Engineering tool supporting version control with CI/CD for transformations and materialization. The approach with dbt differs from tools like SSIS, DataFactory, Informatica. The developer models the target tables/views and the transformations. dbt uses...
PostgreSQL application_name
PostgreSQL application_name can be set in the connection string. The view pg_stat_activity will show the application_name to help to identify the sessions. The article shows how to set application_name and how to benefit from it. It is highly recommended to set the...
PostgreSQL columnar extension cstore_fdw
PostgreSQL columnar extension cstore_fdw is a storage extension which is suited for OLAP-/DWH-style queries and data-intense applications. Columnar analytical databases have unique characteristics compared to row-oriented data access. Many commercial products exist:...
PostgreSQL partitioning guide
PostgreSQL partitioning is a powerful feature when dealing with huge tables. Partitioning allows breaking a table into smaller chunks, aka partitions. Logically, there seems to be one table only if accessing the data, but physically there are several partitions....
Anonymization techniques and data privacy
Anonymization techniques are essential for data analytics or in test/dev databases. Anonymization and pseudonymization are very different but often confused. GDPR does not apply to anonymized data anymore. GDPR is still applicable for pseudonymized data that can be...
Log-based Change Data Capture - lessons learnt
My article on medium summarizes experiences from various projects with log-based change data capture (CDC). There are many use cases for which CDC is beneficial. Some DBs even have CDC functionality integrated without requiring a separate tool. The article first...