What is a Data Lakehouse?

More Data Lake Topics

Data lakehouse architecture is designed to combine the benefits of data lakes and data warehouses by adding table metadata to files in object storage. This added metadata provides additional features to data lakes including time travel, ACID transactions, better pruning, and schema enforcement, features that are typical in a data warehouse, but are generally lacking in a data lake. However, just like any architecture, an open data lakehouse comes with trade offs. Storing data in an open table format can be greatly beneficial for improved interoperability, but can result in greater overhead in terms of tool version compatibility and upgrades, more challenging FinOps with disparate billing, variable performance, limited concurrency support, and disparate governance controls and auditing across many tools.

Five Best Practices for Data Warehouse Development

DATA LAKE FEATURES

Separation of storage and compute
Virtually unlimited scale data repository
Mixed data types: structured, semi-structured and unstructured
Choice of languages for processing (but not always SQL)
Process data in-place
Direct access to rawsource data

DATA WAREHOUSE FEATURES

Strong data governance, access to data only through the platform
High performance & concurrency support
No need to inventory or ingest data
ACID transactions
Direct access to curated data
Version history, time travel

data engineering - data engineering training

Both data lakes and data warehouses are big data repositories. The difference between data lake vs. data warehouse lies in how they handle compute and storage. Snowflake's Data Cloud can be used to build and adapt to various architecture patterns that align with needs of various use cases. Snowflake offers customers the ability to ingest data to a managed repository, in what’s commonly referred to as a data warehouse architecture, but also gives customers the ability to read and write data in cloud object storage, functioning as a data lake query engine. Regardless of the pattern, Snowflake adheres to core tenets of strong security, governance, performance, and simplicity.

DATA LAKEHOUSE FEATURES

In addition to the features above, Snowflake also provides the following features for a data lakehouse pattern:

Fully managed table format
Apache Iceberg table format
Polyglot, multi-cluster compute engine
Cost-effective performance for high concurrency

SNOWFLAKE DATA CLOUD

A data platform is not restricted to a single architectural pattern. Instead, it should have many architecture patterns for many functions and workloads, including:

Collaboration
Analytics
Data exploration
Data engineering for ingestion and transformation of data
AI and ML
Data application development and operation

A flexible platform like Snowflake allows you to use traditional business intelligence tools and newer, more advanced technologies devoted to artificial intelligence, machine learning, data science, and applications. It’s a single platform that can be used to power multiple types of workloads.

See Snowflake’s capabilities for yourself. To give it a test drive, sign up for a free trial.