The TDWI Munich 2025 once again proved why it is an excellent event for data, analytics and AI professionals. Below you’ll find my personal notes and reflections from three packed days in the “Data Universe”.
Generative AI: The Urgency and Opportunity
Stephen Brobst set the tone during his opening keynote with a provocative claim: “Employees not using Generative AI should get fired.” The hyperbole landed; the message was clear. Brobst argued that:
-
Generative AI is now the table-stakes productivity layer. Opt-out is no longer a viable career strategy.
-
The real differentiator is metadata. Robust metadata and a framework for Minimum Viable Governance (think key construction, quality metrics) is at the heart of trusted AI.
-
He forecast a wave of data-product business models that do not yet exist—a textbook case of FOMO driving investment decisions.
LLMs vs. classical ML
A later session compared XGBoost with proprietary and open-source LLMs for tabular prediction tasks. Results were eye-opening: the LLMs matched (and sometimes beat) XGBoost with less training data. Even more exciting, TabPFN turned out to be promising for small tabular datasets.
My hands-on RAG lab
From my perspective, the best approach is to experiment and learn—be open to new methods and technologies. Therefore I prepared a notebook for my “Hands-On RAG: Semantic Search with Python” lab. Participants built an index and a retrieval-augmented generation (RAG) pipeline using a prepared notebook (available on my GitHub repository or from my article From Raw Text to Ready Answers — A Technical Deep-Dive into Retrieval-Augmented Generation (RAG)). The image below shows a code snippet of the notebook.
Participants moved through each core step of building a RAG system with an example URL:
-
Crawling and cleaning a website to collect unstructured text data.
-
Chunking the cleansed text into a strucured blocks – by far the most critical part of a RAG system.
-
Embedding the chunks into vectors using an open-source embedding model.
-
Vector Storing these embeddings efficiently in a vector database, enabling fast similarity searches.
-
Semantic Searching relevant information based on meaning, not just keywords.
-
Prompting and Retrieval to generate context-aware answers based on an augmented query.
While it’s easy to get started with RAG, true professionalization following software-engineering methods is required to ensure trustworthy, production-grade results.

Complexity Reduction: Back to Basics Amidst AI Hype
As systems grow ever more complex, the basics remain crucial—even as GenAI dominates headlines. The conference reinforced the importance of following standards, many of which are well-known but still often ignored:
- Prioritize automation and data engine thinking over brittle, manual pipelines. Generate 80 % of ETL from metadata templates; hand-code only the 20 % edge. Fewer DAGs, fewer bugs.
-
Business Intelligence (BI) isn’t dead. Sticking to reporting and visualization standards lets you plug GenAI into BI workflows without chaos with automated Data-Quality Testing.
-
Data model standardization along core and data mart like data and timestamp handling: clean up dates and times early in your pipeline. Always save both original and transformed values, and use closed-open intervals where possible. There is so much more around data modeling.
-
Delta / Change-Data-Capture slashes I/O cost and runtime chaos, especially in the cloud.
-
Treat your orchestration layer like business-critical infrastructure, and treat security and free-and-open-source software (FOSS) as first-class citizens.
- Simple Markdown + diagrams-as-code (e.g., Mermaid) keeps tribal knowledge discoverable.
My analogy: Driving a car feels easier thanks to lane-keeping, parking and fatigue-detection assistants. Analytics, however, is hurtling down the Autobahn while new tools spawn by the hour. The unchecked proliferation of tools, self-service options often and … (so much more) leads to technical sprawl, making analytics harder, not easier. We need the equivalent of an emergency brake assist for analytics complexity. Come from use cases and business value instead of technical sexiness of tools.
Data Governance — Move Fast and Follow the Rules
The EU AI Act was a hot topic, introducing a risk-based framework that compels teams to document model purpose, data provenance, and human oversight from day one. This regulation is forcing organizations to adopt more rigorous governance practices, ensuring AI is developed and deployed responsibly.
Metadata and data catalogs remain central, but the discussion is still too technical and not enough focused on user benefits. Metadata should enable business value, not become an end in itself. Or just use a RAG system with feeding tables and metadata into it!?
FinOps—financial operations in the cloud—was another key theme. Cloud platforms can surprise teams with unexpectedly high costs, especially when tools are misused. While scaling resources in the cloud is easy (just slide the CPU/RAM bar to the right), this can quickly become unsustainable both financially and environmentally. FinOps discipline is now essential for any data-driven organization.
Final Thoughts
AI (and GenAI in particular) is no longer optional, and the pace of change is accelerating. Generative AI is driving both excitement and anxiety, pushing organizations to experiment, learn, and professionalize their approaches. Yet, amidst the innovation, foundational best practices in data management, complexity reduction, and governance remain as relevant as ever.
Ready to try out Retrieval-Augmented Generation?
👉 Clone the full RAG lab notebook from my GitHub and start building your own semantic search pipeline today.