For many of today’s advanced analytics and data science use cases, it’s crucial to have a data serving architecture that can present data for querying moments after the data has been generated. Whether it’s the massive volumes of time-sensitive data related to consumer behavior on an ecommerce website, banking or credit card transactions, quality control metrics, or logistics tracking data, certain use cases rely on data with a very short “best by” date. Acting quickly allows organizations to fully benefit from the data they collect. Lambda architecture is a popular deployment model with a unique configuration designed for this type of rapid data serving. In this post, we’ll explain how it works and the benefits and drawbacks of using this data architecture.
What Is Lambda Architecture?
Lambda architecture is a data deployment model for processing that consists of a traditional batch data pipeline and a fast streaming data pipeline for handling real-time data. In addition to the batch layer and speed layers, Lambda architecture also includes a data serving layer for responding to user queries. This hybrid approach is designed to harness enormous volumes of rapidly created data, enabling businesses to make use of data more quickly.
How Does Lambda Architecture Work?
Lambda architecture is complex. Its dual layers operate in tandem to make data available more quickly than the traditional batch processing approach. Here’s how the magic happens.
Data sources
Lambda architecture is used to quickly access real-time data for querying. In this data serving model, data is fed into the system continuously from a variety of sources. New data is fed into the batch and speed layers simultaneously.
Batch layer
In the batch layer, all of the incoming data is saved as batch views to ready it for indexing. This layer serves two important purposes. First, it manages the master data set where the data is immutable and append-only, preserving a trusted historical record of the incoming data from all sources. Second, it precomputes the batch views.
Data serving layer
The data serving layer receives the batch views from the batch layer on a predefined schedule. This layer also receives the near real-time views streaming in from the speed layer. Here, the batch views are indexed to make them available for querying. As one indexing job is running, the serving layer queues newly arriving data for inclusion in the next indexing run.
Speed layer
By design, the batch layer has a high latency, typically delivering batch views to the serving layer at a rate of once or twice per day. The job of the speed layer is to narrow the gap between when the data is created and when it’s available for querying. The speed layer does this by indexing all of the data in the serving layer’s current indexing job as well as all the data that’s arrived since the most recent indexing job began. After the serving layer completes an indexing job, all of the data included in the job is no longer needed in the speed layer and is deleted.
Querying
Since queryable data is stored in both the serving and speed layers, queries must be submitted to both with the results merged before being presented to end users.
Benefits of Lambda Architecture’s Data Serving
Synthesizing the capabilities of a traditional batch pipeline and real-time stream pipeline, Lambda architecture’s data serving technique offers the best of both systems.
Serverless management
Lambda architecture is a serverless system. That means there’s no server software to install, update, or maintain. As a bonus, there’s very little danger of errors even in the event of a system crash. That’s because the batch layer manages all historical data using fault-tolerant distributed storage.
Access data in real time
Traditional batch processing isn’t designed to accommodate streaming data such as transactional data. But Lambda’s inclusion of a fast streaming data pipeline and data serving layer makes it possible to query data as it’s being created.
Scalability
The distributed nature of Lambda architecture allows it to automatically scale to meet current business needs. Flexible, cloud-based storage doesn’t rely on the predefined computing resources inherent in on-premises server setups.
Drawbacks of Lambda Architecture
Lambda architecture is a creative way to access real-time and near real-time data. But it comes with limitations.
Logic duplication
Supporting two separate code bases for the batch and streaming layers requires additional time and resources to upkeep. The use of two distinct layers can make ongoing maintenance complex and time-consuming and complicate debugging efforts.
Batch processing inefficiencies
For some use cases, the need to reprocess every batch cycle is highly inefficient. Running a parallel batch and speed layer requires dedicating additional time and computing resources.
Complexity
Lambda architecture relies on many moving pieces. The complexity of operating this data serving model presents numerous challenges. With multiple components each running different software, maintenance burdens are high. Intensive processing requirements involve complex coding while data modeled using this architecture requires extensive effort to reorganize or migrate.
Snowflake and Lamba
Lambda architecture has been adopted broadly to have separate layers of traditional batch and streaming data pipelines. However, more industry innovations in the last couple years have made it possible to bring streaming and batch pipelines together in unification. Apache Beam, for example, can be an abstraction layer that handles both data pipelines and processing. Snowflake’s Snowpipe and Snowpipe Streaming can also be used to eliminate some needs for Lambda, where there are no more boundaries between streaming and batch data. Data can be ingested in both fashions using unified data pipelines in one single system without setting up complex pipelines and architecture in place.
That said, there are also some scenarios where Lambda can still be considered, including when transformation and merging are serverless, and there is no need for ETL pre-processing. In these scenarios, Lamba can help simplify the architecture. There is no one-size-fits-all, so work with your team and business needs to find the best fit solutions.
To see Snowflake in action, sign up today for a free trial.