Lakehouse 13

7/5/2023

To that end, IBM foresees Presto and Apache Spark being two of the first data engines to run in its watsonx.data lakehouse. By bringing support for ACID transactions to data, Iceberg enables customers to bring multiple compute engines to bear on data residing in a lake or lakehouse. Watsonx.data will also incorporate Apache Iceberg, the increasingly popular open table format that emerged from Netflix and Apple to address data consistency and correctness issues that arose with the reliance on Apache Hive in the early days of Hadoop-based data lakes. While IBM didn’t specify in its announcement, the offering is assumed to utilize IBM’s own flavor of object storage, which it obtained with its 2015 acquisition of Cleversafe for $1.5 billion. When it becomes generally available in July, IBM’s new Watsonx.data lakehouse will run on-prem and in the IBM Cloud and AWS.

Instead of ungovernable data swamps, the lakehouse is designed to bring order to data, but without the storage limitations posed by data warehouses. Lakehouses have proliferated in recent years as companies look to combine the massive scalability of cloud-based object storage while borrowing the proven data management and governance capabilities of traditional data warehouses running on analytics databases. Together, the three watsonx components represents IBM’s latest push into the enterprise AI market. Big Blue launched two other offerings in the new watsonx family yesterday at its annual THINK conference, including watsonx.AI and ernance. IBM yesterday unveiled watsonx.data, a new data lakehouse offering for cloud and on-prem that will use object storage and Apache Iceberg, an open data format.

0 Comments

Lakehouse 13

Leave a Reply.

Author

Archives

Categories