Product and Technology

Build and Manage ML Features for Production-Grade Pipelines with Snowflake Feature Store

Illustration of an ML model icon of a 3x3 grid of squares.

When scaling data science and ML workloads, organizations frequently encounter challenges in building large, robust production ML pipelines. Common issues include redundant efforts between development and production teams, as well as inconsistencies between the features used in training and those in the serving stack, which can lead to decreased performance. Many teams turn to feature stores to create a centralized repository that maintains a consistent and up-to-date set of ML features. However, this often introduces the complexity of managing additional infrastructure for feature authoring, building and maintaining update pipelines, and establishing workflows to access consistent and fresh features. As a result, teams often end up spending more time than expected on makeshift or customized solutions.

Today we are announcing the general availability of the Snowflake Feature Store. This native solution lives on the same platform as your end-to-end workflows in Snowflake ML, with seamless integration to your data, features and models, so that large-scale ML pipelines can be productionized easily and efficiently. The Feature Store helps you eliminate redundancy and duplication of pipelines, ensuring that you have updated, consistent and accurate features available with enterprise-grade security and governance. 

Key capabilities of Snowflake Feature Store are:

  • Easy authoring of common feature transformations in Python or SQL

  • Automated and efficient feature refresh on new data from both batch and streaming sources

  • Simple API for retrieving time-consistent features using ASOF JOIN and generating training datasets

  • Fine-grained role-based access control (RBAC) and governance

  • Support for user-maintained feature pipelines in tools such as dbt

  • Full integration with Model Registry and other Snowflake ML capabilities 

  • Centralized view of features and entities from the Snowsight UI for easy search and discoverability

  • Built-in end-to-end ML Lineage (preview feature) 

Snowflake Feature Store is fully integrated with Snowflake Model Registry and other Snowflake ML capabilities to enable a complete end-to-end ML development and operations solution in Snowflake. A high-level schematic of this workflow is shown below:

Diagram showing how Snowflake Feature Store is fully integrated with Snowflake Model Registry and other Snowflake ML capabilities.

Customers productionize MLOps with Snowflake Feature Store

Many customers are already using Feature Store in their ML workflows across various industries and use cases. 

Scene+ is a Canadian loyalty program that uses Snowflake Feature Store on large data sets with notable performance improvements from its previous solution.

Quote Icon

Scene+ leverages machine learning to deliver relevant member experiences across our properties. This requires working with a vast amount of data. Leveraging the straightforward Snowflake Feature Store drove a 66% reduction in processing time; we can join the model universe with the features with just four blocks of code. Previous methods required writing extensive Python scripts, input files and additional dependency scripts."

Aasma John
Data Science Manager at Scene+

In the retail industry, Feature Store is also being implemented by our partner Kubrick to productionize models that improve customer experience.

Quote Icon

A global luxury fashion and lifestyle company partnered with Kubrick to implement Snowflake Feature Store to deploy models that enhance customer experience and personalization for the holiday season. This MLOps solution led to a 25% reduction in lead time to production and speed improvements of 3 to 10 times.”

Bavandeep Malhi
Technical Delivery Lead at Kubrick

We’re also seeing Feature Store used in gen AI use cases. Stride, a leader in remote, online, and in-person learning, partnered with phData to implement Snowflake Feature Store in a RAG app that provides accurate, safe assistance to students and teachers.

Using Snowflake Feature Store

A simplified ML workflow powered by Feature Store is depicted below:

Diagram of a simplified ML workflow powered by Snowflake Feature Store

Let’s look at the main components of this. 

Creating Feature Stores

You can easily create a Feature Store, or connect to an existing one, by providing a Snowpark session, database name, schema name and default warehouse. Feature Store is simply a schema in Snowflake’s backend. 

from snowflake.ml.feature_store import FeatureStore, CreationMode

fs = FeatureStore(
    session=session,
    database="MY_DB",
    name="MY_FEATURE_STORE",
    default_warehouse="MY_WH",
    creation_mode=CreationMode.CREATE_IF_NOT_EXIST,
)

Creating Feature Views

Feature Views are the primary abstraction in a Feature Store. They consist of a collection of logically related features that are computed and maintained on the same schedule. In the Snowflake Feature Store, Feature Views can be created from any source data (e.g., tables, views, shares) by using Snowpark dataframes or SQL transformations. Columns in the source tables (or views) or data transformation dataframe are recognized as features. Additionally, Feature Views must contain an Entity, which contains the join keys used for feature lookup at training or inference time, and optionally, a timestamp column for capturing changes in feature value through time.  

Define an Entity:

from snowflake.ml.feature_store import Entity

entity = Entity(
    name="CUSTOMER",
    join_keys=["CUSTOMER_ID"],
    desc="customer entity"
)
fs.register_entity(entity)

Define a Feature View:

from snowflake.ml.feature_store import FeatureView

managed_fv = FeatureView(
    name="Customer_Order_History_Features",
    entities=[entity],
    feature_df=my_df,               # a Snowpark DataFrame
    timestamp_col="ts",             # optional timestamp column name in the dataframe
    refresh_freq="5 minutes",       # optional parameter specifying how often feature view refreshes
    desc="features about customers order history"  # optional description string.
)

registered_fv: FeatureView = fs.register_feature_view(
    feature_view=managed_fv,
    version="1"
)

feature_df is a Snowpark DataFrame object containing your feature definition. Snowpark provides helper functions that make it easy to define many common feature transformations. For example, this code snippet below specifies 3 month and 6 month aggregations of customer order sum and count over a 1 day sliding window.

def custom_column_naming(input_col, agg, window):
    return f"{agg}_{input_col}_{window.replace('-', 'past_')}"

my_df = customer_orders_df.analytics.time_series_agg(
    aggs={"ORDER TOTAL": ["SUM, "COUNT"]},
    windows=["-3MM", "-6MM"],
    sliding_interval="1D",
    group_by=["CUSTOMER_ID"],
    time_col="ORDER_DATE",
    col_formatter=custom_column_naming
)

timestamp_col is the name of a timestamp column that is used to join with a table containing the required entity keys for training to retrieve point-in-time correct feature values.  

A key benefit of Snowflake Feature Store is its use of Dynamic Tables to automate and abstract the complexity of data and feature engineering pipeline and backfill management. In many feature store solutions, the user is responsible for creating all the data and feature engineering logic to perform the initial population and subsequent 'update' of feature values. These steps then need to be scheduled and managed manually outside of the feature store. 

In a Snowflake managed Feature View, all of this is declaratively handled. You define the logic to compute features across all history, using Dataframe/SQL. Snowflake handles the incrementalization of that declarative logic. To use these managed Feature Views, simply specify the refresh_freq, which defines the frequency of feature refresh and how up to date you need your features to be from their source tables. Snowflake-managed Feature Views can be monitored from the Snowsight UI via the new Feature Store support.

While in most cases you will want to use such managed Feature Views, there may be scenarios where you want to use feature pipelines, maintained by you, that run using external tools. In this case, create a Feature View by omitting the refresh_freq. This creates user-maintained Feature Views that are computed at retrieval time. 

Generating training data 

A key purpose of feature stores is to simplify generation of consistent training data sets. Feature Store provides APIs to generate training data in two formats depending on your workflow. In either case, Snowflake Feature Store handles retrieval of point-in-time correct values using the timestamp and ASOF JOIN function to efficiently and scalably join features from multiple views, yielding time-consistent results. 

Snowflake Dataset is a new schema-level object specially designed for machine learning workflows. Snowflake Datasets hold collections of data organized into versions, where each holds a materialized snapshot of your data with guaranteed immutability, efficient data access and interoperability with popular deep learning frameworks, such as PyTorch and TensorFlow. Datasets can be conveniently created from Feature Store as shown below:

my_dataset = fs.generate_dataset(
    name="CUSTOMER_ORDER_DATASET",
    spine_df=MySourceEntityKeyDataFrame,
    features=[customer_order_features, customer_demographic_features],
    version="v1",                               # optional
    spine_timestamp_col="TS",                   # optional
    spine_label_cols=["LABEL1", "LABEL2"],      # optional
    include_feature_view_timestamp_col=False,   # optional
    desc="customer order dataset for training customer life time value model",                   # optional
)

Training data can also be created as a Snowpark DataFrame for training with classic ML Libraries, such as scikit-learn or Snowpark ML, or to load into external machine learning frameworks:

training_set = fs.generate_training_set(
    spine_df=MySourceDataFrame,
    features=[customer_order_features, customer_demographic_features],
    save_as="training_data_20240101",           # optional
    spine_timestamp_col="TS",                   # optional
    spine_label_cols=["LABEL1", "LABEL2"],      # optional
    include_feature_view_timestamp_col=False,   # optional
)

Similarly, Feature Store supports retrieving feature data directly for model inference using retrieve_feature_values, enabling production-ready incremental batch inference pipelines to be authored and scheduled easily.

Discovering and exploring features

Feature Store is available within the Snowsight UI and can be used to conveniently browse, search and manage feature views and their versions, underlying entities, individual feature columns and associated feature metadata.

Screenshot of Feature Store available within the Snowsight UI.

Governance 

Snowflake Feature Store uses standard database objects, like schemas, dynamic tables and views. Snowflake object tagging is used to denote these database objects as belonging to a Feature Store, and to maintain the relationships between them. Standard Snowflake RBAC is used to control access to the Feature Store and the objects within. In a typical Feature Store implementation, two roles are commonly defined: Producers and Consumers. Producers can create and modify Feature Views. Consumers can read Feature Views. Refer to this page for more details about privileges of each role. We also provide a simple utility API and a SQL script to easily configure these roles. Feature publishers can also share features within and across accounts using Snowflake Data Sharing.

Snowflake ML includes built-in ML Lineage capabilities (in preview) with integration to Feature Store, which allows you to visualize the lineage of all ML artifacts in your pipeline, such as source data tables, feature views, data sets and ML models, along with all the data lineage and governance that Snowflake Horizon Catalog provides. 

Screenshot of Snowflake ML lineage capabilities integrated with Feature Store.

Getting started

The Snowflake Feature Store is generally available for all enterprise edition (or higher) customers and you can get started today with an introductory quickstart. For additional details, more end-to-end examples, and API reference, visit our documentation.

Diagram of a connected sphere.

Get acquainted with Snowflake ML

Learn how you can streamline model development and MLOps through demos, webinars, quickstarts and more.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Start your 30-DayFree Trial

Try Snowflake free for 30 days and experience the AI Data Cloud that helps eliminate the complexity, cost and constraints inherent with other solutions.