In his article, “Databases in 2024: A Year in Review,” Andy Pavlo provides a comprehensive analysis of the state of databases in the past year, offering a reflection on the triumphs, struggles, and significant trends that have shaped the industry. The database market, always one of intense competition, faced a wave of major licensing changes, corporate acquisitions, and shifts in strategic direction. Andy’s annual roundup covers the most significant moments, from licensing wars to massive acquisitions, and highlights the fierce competition between some of the most prominent players in the database ecosystem. This article summarizes Andy’s observations.
Licensing and Open Source: A Year of Backlash and Reversals
The issue of licensing continued to dominate the database landscape, especially as cloud vendors like Amazon began hosting popular open-source DBMSs, profiting off systems that were initially developed and maintained by other companies. The rise of dual licenses aimed at limiting cloud vendors from reselling the software without contributing to its development was one of the year’s most contentious stories. Companies like MongoDB had led the charge with this model, and 2024 saw some significant moves, most notably with Redis and Elasticsearch.
Redis Ltd., aiming for an IPO and consolidation of control over the Redis ecosystem, shocked the open-source community in March by switching from the permissive BSD-3 license to a dual license that included the Redis Source Available License (RSAL) and MongoDB’s Server Side Public License (SSPL). The backlash was swift, with several forks of the original BSD-3 code arising, including Valkey, which was supported by major players like Amazon, Google, and Oracle. The response from the community was not just one of discontent but of action, and in December 2024, Redis’ creator hinted at a potential reunion with the Redis community, echoing a desire to mend the rift caused by these aggressive licensing moves.
Elasticsearch, the flagship text-search DBMS from Elastic N.V., followed a similar path in 2021 by switching to a dual-license model. However, in 2024, they reversed course, abandoning the Elastic License and MongoDB’s SSPL in favor of the more permissive AGPL license. This shift came after Amazon launched their OpenSearch fork of Elasticsearch in 2021, which received significant traction. Despite the reversal, Amazon’s OpenSearch remained the dominant open-source option, highlighting the power imbalance between independent software vendors (ISVs) and cloud giants.
These licensing battles underscored the challenges faced by open-source database systems in a world where cloud vendors can easily monetize the work of others. Despite the significant backlash, companies like MongoDB, Neo4j, and CockroachDB managed to weather similar storms without major disruptions, suggesting that the perception of unfairness played a crucial role in the backlash against Redis and Elasticsearch.
Databricks vs. Snowflake: A Billion-Dollar Battle
The rivalry between Databricks and Snowflake escalated to new heights in 2024. This classic database showdown, once focused on performance benchmarks, has evolved into a broader ecosystem war. The battle now encompasses everything from large-scale machine learning models to data management infrastructure.
In March, Databricks made a bold move by investing $10 million to build its own massive LLM (DBRX), claiming to be a leader in enterprise AI tasks. Snowflake fired back with its own Arctic LLM, outperforming DBRX for SQL generation tasks while boasting a lower cost of development. The war over AI models, however, was just one front. Behind the scenes, the two companies competed for dominance in the data cataloging space, leading to an epic acquisition by Databricks of Tabular, the company behind the Iceberg project. Snowflake’s bid to acquire Tabular was thwarted by Databricks, who acquired the company for $2 billion, a move that extended the conflict further. Databricks even went so far as to open-source its Unity catalog, challenging Snowflake’s proprietary offerings.
The rivalry highlighted how data infrastructure has become more than just about raw database performance. The ecosystem surrounding data management, from data ingestion to tooling, is now a critical factor in determining success. For consumers, this intense competition should ultimately lead to better products and services—though whether it will result in lower prices remains to be seen.
DuckDB: The Unsung Hero of 2024
In the world of analytical databases, DuckDB emerged as an unexpected favorite. Known for its portability and ease of use, DuckDB has become the go-to choice for running OLAP (Online Analytical Processing) queries on a smaller scale. Its integration into popular systems, particularly PostgreSQL, proved a game-changer. Four new DuckDB extensions were released in 2024 to make it easier to integrate with PostgreSQL, enhancing support for geospatial data and advanced analytics.
The rise of DuckDB is a testament to how database systems are evolving to meet modern requirements. By offering a streamlined, efficient way to perform high-performance analytics without needing a full-fledged data warehouse, DuckDB has filled an important gap in the analytics market. This shift could disrupt existing database systems, including the likes of ClickHouse and Redshift, which had previously been the go-to choices for analytical workloads.
My Take: Looking Ahead
Looking ahead, it’s clear that the database industry is at a crossroads. The rise of alternative databases like DuckDB, the increasing consolidation of key players, and the constant back-and-forth over licensing models all point to a future where innovation and competition will continue to shape the market. For now, database professionals and organizations must stay agile, as the landscape is bound to keep shifting in unexpected ways.
The integration of artificial intelligence and machine learning into database management systems is another trend to watch. Databricks and Snowflake’s ongoing investments in AI and large language models (LLMs) show just how much the database industry is overlapping with AI-driven technologies. AI’s potential to automate database optimization, enhance query performance, and even assist with data governance will only become more pronounced in the years to come. With these advancements, databases may become “self-managing” in ways we’ve only begun to imagine, lowering the burden on database administrators and developers while improving efficiency and performance.
The role of vector databases and their use in AI-driven applications—especially in natural language processing (NLP) and recommendation systems—will also grow. DuckDB, which has proven itself in the analytical space, could pivot toward becoming an even more crucial part of AI/ML workflows, particularly as its integrations with other systems like PostgreSQL continue to evolve. If we see more innovations in this space, expect an increased focus on hybrid models that bridge OLAP and AI/ML capabilities, further blurring the lines between traditional analytics and the AI tools that are becoming essential in every industry.
We may also see transactional/analytical hybrid databases become the norm. Companies are increasingly demanding systems that can handle both OLTP and OLAP workloads without requiring separate infrastructures, leading to new solutions built from the ground up to serve dual purposes. Projects like Iceberg and Hudi, which began as open-source projects for managing large-scale data lakes, could evolve into mainstream tools that bring transactional capabilities to analytic workflows. Expect more investments in these hybrid systems, as organizations aim to reduce complexity and increase performance in data-heavy applications.