How PepsiCo Gains Actionable Insights Using the Data Cloud
Quickly identifying and responding to changes in consumer buying behaviors are vital for a consumer packaged goods (CPG) company, and PepsiCo is a master at using data to achieve just that. We recently sat down with Vaibhav Kulkarni, PepsiCo’s Head of Engineering and Data Science, for an episode of our podcast, Rise of the Data Cloud, where Kulkarni described how the company uses the Data Cloud to power faster, smarter decision-making across the organization.
PepsiCo is one of the world’s largest CPG companies, with a portfolio that includes Frito-Lay, Gatorade, Pepsi-Cola, Quaker, Tropicana, and SodaStream. PepsiCo’s product portfolio includes a wide range of enjoyable foods and beverages, including 23 brands that generate more than $1 billion in revenue. Kulkarni works in PepsiCo’s ecommerce organization, which operates like a startup within the CPG giant. His engineering and data science teams are responsible for building multiple machine learning powered data products that are used across global PepsiCo sectors. He also leads the infrastructure engineering team responsible for building scalable data infrastructure, tools, and capabilities in the cloud.
PepsiCo started its migration from an on-premises infrastructure to the cloud a few years ago, seeing benefits in security, a reduced risk of data loss, and an easier way to quickly scale its services and applications. Kulkarni’s ecommerce division has been using the Snowflake Data Cloud for over two years. “Snowflake offers a very unique solution with its architecture, with separate compute and storage layers,” he said. “Apart from improving performance and speed and lowering costs, with Snowflake, you’re ready for a multi-cloud environment.”
Teams at PepsiCo use multiple public clouds for different purposes. With the Snowflake Data Cloud, Kulkarni says his team can set up multi-cloud replication with “a couple of clicks.”
One of his major projects is the ROI Engine, a machine learning modeling-based tool that evaluates and measures the effectiveness of PepsiCo’s marketing and advertising campaigns. Already in use in North America, PepsiCo plans to roll it out globally to enable smarter marketing decision-making and boost its return on investment. The ROI Engine helps PepsiCo marketers answer questions about the performance of different media campaigns as well as making recommendations on where to place marketing spend.
As part of the ROI Engine, the team collects information from 60+ data sources in North America. They store this data within the Data Cloud. The data scientists then use this data set to generate actionable insights. Thanks to Snowflake Secure Data Sharing, teams can quickly share a database or a table with other teams in the organization. “Those teams don’t need to worry about building the data pipelines; they can always get access to the latest, up-to-date data at any given time,” Kulkarni said. “Having all these data sets together in one place is very beneficial.”
The ROI Engine has already allowed PepsiCo to increase its digital penetration in certain brands by double digits, according to Kulkarni.
Recently, PepsiCo also started using Snowflake Data Marketplace to acquire and share third-party data. The company has used a data set about COVID-19 infection rates, and it is exploring other data sets about foot traffic and weather.
Looking ahead, Kulkarni is optimistic about the emergence of advanced AI applications across a variety of industries and predicts an increased need for all kinds of data technologists. “As the data space continues to evolve, there are so many roles and opportunities in this field for data scientists, data analysts, data engineers, machine learning engineers, and full-stack machine learning engineers,” he said.
He anticipates that more companies will see the benefits of reducing their data transfer costs and development time using Snowflake’s data sharing capabilities. “Companies can share and ingest data without needing to build big ETL pipelines,” Kulkarni said. “You can directly consume the data, since it’s already been loaded and prepared.”
Rise of the Data Cloud is a podcast hosted by award-winning author and journalist Steve Hamm. For each episode, he speaks with a data leader to learn how they leverage the cloud to manage, share, and analyze data to drive business growth, fuel innovation and disrupt their industries. You can listen to more episodes here.