Long Live Data Products! Understand the 4 Stages of the Data Product Lifecycle
“The future is what you make it,” declared Sheila Jordan, Chief Digital Technology Officer at Honeywell, in the opening keynote of this year’s Snowflake Summit keynote. One way Honeywell "makes the future" is by developing innovative data products. For example, by combining its inflation index with its pricing application, Honeywell was able to consider inflation-adjusted pricing, which had a significant impact on the business.
Snowflake’s World of Data Collaboration, the theme of Snowflake Summit, was a world of data products—extending beyond data to include AI models and applications. That new focus highlights the use of the data—across whatever form the data product can take—and the value it delivers.
Yes, the product can be the data itself, but the focus lies in its use. This expanded scope reflects the shift to data product thinking and, more importantly, data product doing.
A data product is built to be used and reused. As a result, data product teams shift the focus of data and analytics from time-bound projects and one-off deliverables to a continuous product lifecycle. That’s not to say that there won’t be “projects.” Ideally, however, the data products coming out of these projects will live on to be used and reused multiple times, across multiple use cases and for different business units or functions, internally and externally. These data products deliver value and help make the future of the business.
The data product lifecycle includes the following stages:
- Discovery
- Design
- Development
- Deployment
Let’s take a look at what they entail and the roles that lead and support each of them:
Discovery starts the process. In an earlier blog post, I discussed how the notion of a product emerges from a business need. Sometimes the business stakeholders articulate that need; other times a data team might present an idea to demonstrate how the data could be used. In either case, the process of discovery continues to capture the product requirements, including data sources; the specific data requirements such as quality, formats or frequency of refresh; and also analytics and engineering requirements. The data product manager will guide the process with input from a business analyst or subject matter expert, and a data solution architect will begin sketching out the requirements with the data analyst.
The discovery process includes a close look around the organization to determine whether the required data products already exist, either as a whole or in part. A mature organization will have a data catalog or marketplace through which data teams can discover existing data products, or potential components of the products they will go on to build. The data analyst, with their knowledge of existing data products, takes the lead here.
As part of this upfront due diligence, the data product manager will determine a request’s alignment with business goals, and will estimate potential benefits to be weighed against estimated costs. A prioritization matrix can help formalize this process. In a recent webinar, Miguel Morgado, Head of Data Products at OneWeb, described using such a quadrant to identify which products move forward and which don’t. High-cost products that deliver high value are OK. High-cost but low-value products are terminated. Others require a little more review.
Another aspect of the discovery process includes rationalization of requirements, coordinating needs across business units to determine overlapping needs and prevent duplication of effort. This coordination increases the efficiency of the process by distilling larger data products into common requirements and eventually component parts that can be reused across multiple business units. A data product council or steering committee, with representation from the business and functional units, drives this coordination process.
Design adds shape to the concept. The next step is to translate requirements into product specifications starting with the form of data product (are you going for sand, glass or lamp?), and including data models, application design and UI/UX design if necessary. The design process double-clicks into the needs of the end users to ensure that the product conforms to those needs. The data product or solution architect would be joined by a data engineer to flesh out the pipeline needs, potentially with additional input from a UI/UX designer. The UI/UX designer would be responsible for the front end through which the end users of this product will engage with the data. Unless the data product is delivered to a skilled data scientist who “only wants the data,” there will likely be some aspect of user experience involved.
As part of the design process, the architect will collaborate with the operations team and platform architect to inform infrastructure decisions and ensure deployment and scale of data products. This step includes estimations regarding both storage and compute needs of various options. For example, should certain tables be materialized to reduce recurring compute costs? A complete review of costs should also be included in the design phase.
Development brings the product to life. In the development process, the team builds the data products to specifications, including all relevant governance policies. If the data has specific access, usage and security requirements, these must be built into the data product.
Several recent announcements at Snowflake Summit 2023 highlighted new features relevant to building more enhanced data products. Snowflake Native App Framework, currently in public preview on AWS, is now available for developers to build and test Snowflake Native Apps. And, a set of new capabilities improves developer experiences: Snowpark ML APIs for more efficient model development, currently in public preview; Snowpark Model Registry for scalable MLOps, currently in private preview; and several Streamlit in Snowflake advancements to turn models into interactive apps, soon in public preview.
Finally, not to be overlooked are the metadata and documentation required to ensure the product can easily be used. These elements contribute to adherence to the FAIR principles of data:
- Findable: The first step in (re)using data is to find it. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata is essential for automatic discovery of data sets and services.
- Accessible: Once the users find the required data, they must be able to access it, possibly including authentication and authorization.
- Interoperable: The data product should be easy to combine with other data, or be embedded into applications or workflows for analysis, storage and processing.
- Reusable: The ultimate goal is to optimize the reuse of data. To achieve this, all aspects of the data product should be documented and well described.
In this stage of the process, the product team expands to include application developers and data science/machine learning engineers.
Deployment delivers the product to the end user. A lot has been written on data ops and deployment. But let’s focus on the need to publish and distribute data products including code, data and metadata. Last year, Snowflake Marketplace dropped the “data” from its name because it now delivers even more. All types of data products can be deployed and delivered via Snowflake Marketplace, to be discovered either publicly or by specific Snowflake customers. By creating a “listing,” data and application publishers can define exactly what they offer—metadata, data tables, business logic, full applications or any combination of these—as well as specify to whom, at what price, for what time period, and for which purpose.
A new Snowflake feature that facilitates deployment and delivery is Cross-Cloud Auto-Fulfillment, which allows data product owners to ensure reach but control costs. The product might initially be deployed on a single cloud in a single region, such as AWS/US West or Azure/UK. However, should a potential customer request access in another cloud region or geography, the product would be replicated to that additional cloud and the request automatically fulfilled. When the data product was no longer required, the replication could be canceled.
But deployment doesn’t mean you’re done. Data products can live on. Usage metrics are captured and—along with other feedback—used to inform future versions. Or not. If usage falls off, data products might need to be completely revamped or fully retired. Some data leaders use the “scream test,” measuring how many people would scream if the product didn’t exist, to inform end-of-life decisions.
Through these stages data products are brought to life or given a new life. Explore more about how Snowflake enables data application development here. Or learn how Snowflake Marketplace enables you to discover, evaluate and purchase data, data services and applications from some of the world’s leading data and solution providers here.
Long live data products!