Product and Technology

Predict Known Categorical Outcomes with Snowflake Cortex ML Classification, Now in Public Preview

Predict Known Categorical Outcomes with Snowflake Cortex ML Classification, Now in Public Preview

Today, enterprises are focused on enhancing decision-making with the power of AI and machine learning (ML). But the complexity of ML models and data science techniques often leaves behind organizations without data scientists or with limited data science resources. And for those organizations with strong data analyst resources, complex ML models and frameworks may seem overwhelming, potentially preventing them from driving faster, higher-quality insights.

That’s why Snowflake Cortex ML Functions were developed: to abstract away the complexity of ML frameworks and algorithms, automate much of the data science process, and democratize ML for everyone. 

These functions make activities such as data quality monitoring through anomaly detection, or retail sales forecasting through time series forecasting, faster, easier and more robust — especially for data analysts, data engineers, and citizen data scientists.

As a continuation of this suite of functions, Snowflake Cortex ML Classification is now in public preview. It enables data analysts to categorize data into predefined classes or labels, and both binary classification (two classes) and multi-class classification (more than two classes) are supported. All of this can be done with a simple SQL command, for use cases such as lead scoring or churn prediction. 

How ML Classification works

Imagine you are a data analyst on a marketing team and want to ensure your team takes quick action on the highest-priority sales leads, optimizing the value from investments in sales and marketing. 

With ML Classification, you can easily classify certain leads as having a higher likelihood to convert, and thus give them a higher priority for follow-up. And for those with a low likelihood to convert, your marketing team can choose to nurture those or contact them less frequently.

ML Classification can be accomplished in two simple steps: First, train a machine learning model using your CRM data for all leads you’ve pursued in the past and labeled as either “Converted” or “Not converted.” Then, use that model to classify your new set of leads as likely to convert or not. 

When you generate your Snowflake ML Classification predictions, you’ll get not only the predicted “class” (likely to convert vs. not likely), but also the probability of that prediction. That way, you can prioritize outreach and marketing to leads that have the highest probability of converting — even within all leads that are likely to convert. 

Here’s how to use Classification with just a few lines of SQL:

-- Train a model on all historical leads. 
CREATE OR REPLACE SNOWFLAKE.ML.CLASSIFICATION my_lead_model(
	INPUT_DATA => SYSTEM$REFERENCE('TABLE', 'historical_leads'),
	TARGET_COLNAME => 'CONVERT'
);

-- Generate predictions.
CREATE TABLE my_predictions AS SELECT
my_lead_model!PREDICT(object_construct(*)) as prediction 
FROM new_leads;

The above SQL generates an ML model you can use repeatedly to assess whether new leads are likely to convert. It also generates a table of predictions that includes not only the expected class (likely to convert vs. not likely) but also the probability of each class

If you’re interested in pulling out just the predicted class and probability of that class, you can use the following SQL to parse the results:

CREATE TABLE my_predictions AS SELECT
prediction:class as convert_or_not,
prediction['probability']['"1"'] as convert_probability
FROM
(SELECT my_lead_model!PREDICT(object_construct(*)) as prediction
FROM new_leads);

To support your assessment of the model (“Is this good enough for my team to use?”) and understanding of the model (“What parts of the data I’ve trained the model on are most useful to the model?”), this classification function produces evaluation metrics and feature importance data. 

-- Get evaluation metrics
CALL my_lead_model!SHOW_EVALUATION_METRICS();
CALL my_lead_model!SHOW_GLOBAL_EVALUATION_METRICS();
CALL my_lead_model!SHOW_CONFUSION_MATRIX();


-- Get feature importances
CALL my_lead_model!SHOW_FEATURE_IMPORTANCE();

ML Classification can be used for other use cases as well, such as churn prediction. For example, customers classified as having a high likelihood to churn can be targeted with special offers, personalized communication or other retention efforts.

The two problems we describe above — churn prediction and lead scoring — are binary classification problems, where the value we’re predicting takes on just two values. This classification function can also solve multi-class problems, where the value we’re predicting takes on three or more values. For example, say your marketing team segments customers into threethree groups (Bronze, Silver, and Gold) (Bronze, Silver, and Gold) based on their purchasing habits, demographic and psychographic characteristics. This classification function could help you bucket new customers and prospects into those three value-based segments with ease. 

-- Train a model on all existing customers. 
CREATE OR REPLACE SNOWFLAKE.ML.CLASSIFICATION my_marketing_model(
	INPUT_DATA => SYSTEM$REFERENCE('TABLE', 'customers'),
	TARGET_COLNAME => 'value_grouping'
);

-- Generate predictions for prospects.
CREATE TABLE my_value_predictions AS SELECT
my_marketing_model!PREDICT(object_construct(*)) as prediction 
FROM prospects;

-- Parse results.
CREATE TABLE my_predictions_parsed AS SELECT
prediction:class as value_grouping,
prediction['probability'][class] as probability
FROM my_value_predictions;

How Faraday uses Snowflake Cortex ML Classification

Faraday, a customer behavior prediction platform, has been using ML Classification during private preview. For Faraday, having classification models right next to their customers’ Snowflake data accelerates their use of next-generation AI/ML and drives value for their customers.

“Snowflake Cortex ML Functions allow our data engineering team to run complex ML models where our customers' data lives. This provides us out-of-the-box data science resources and means we don't have to move our customers' data to run this analysis,” said Seamus Abshere, Co-Founder and CTO at Faraday. “The public release of Cortex ML Classification is a big unlock; it disrupts a long tradition of separating data engineering and data science.”

What’s next?

To continue improving the ML Classification experience, we plan to release support for text and timestamps in training and prediction data. We are also continuously improving the amount of data that can be used in training and prediction and the speed of training and prediction – as well as model accuracy.

Not only do we want to put AI and ML in the hands of all data analysts and data engineers, but we want to empower business users, too. That’s why the Snowflake Cortex UI is now in private preview. 

This clickable user interface helps our Snowflake customers discover Snowflake Cortex functions from Snowsight and guides users through the process of selecting data, setting parameters and scheduling recurring training and prediction for AI and ML models — all through an easy-to-use interface. 

To learn more about Snowflake Cortex ML functions, visit Snowflake documentation or try out this Quickstart.

How to Use Snowflake Cortex ML Functions

Share Article

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Start your 30-DayFree Trial

Try Snowflake free for 30 days and experience the AI Data Cloud that helps eliminate the complexity, cost and constraints inherent with other solutions.