CUSTOMER STORIES
S&P Global Saves Time and Money While Scaling Machine Learning
With Snowflake, S&P Global has increased efficiency and scaled ML pipelines, ultimately equipping customers with richer credit insights to inform their financial decisions.
KEY RESULTS:
75%
Time savings by moving from PySpark on Databricks to Snowflake ML
200M+
Webpages mined for data to enhance risk assessment
Industry
Financial ServicesLocation
New York, New YorkNot so risky business
S&P Global powers the markets of the future. As the world's foremost provider of credit ratings, benchmarks and analytics in the global capital and commodity markets, S&P equips the world’s leading organizations with the essential intelligence they need to make confident decisions about the road ahead.
A division of S&P Global, S&P Global Market Intelligence integrates financial and industry data, analytics, research and news to help corporations identify risk and reward opportunities. RiskGauge™, an S&P Global Market Intelligence product, allows customers to assess and monitor credit risk of their counterparty exposures. These insights are especially vital for clients that have exposure to small and midsize enterprises, which — unlike large public companies — have less financial information readily available to fully understand financial and business risks.
To efficiently store, mine and curate massive amounts of data about these millions of private companies, S&P Global relies on Snowflake. Building a scalable ML pipeline with Snowflake ML helps S&P Global deliver timely, accurate risk reports and expand coverage to include even more small to midsize enterprises.
Story Highlights
Scalable ML pipeline for advanced data mining: S&P Global loads terabytes of web data into Snowflake ML and builds models to efficiently mine business attributes that help customers better understand risk.
Faster data processing on millions of URLs: With Snowpark-optimized warehouses, S&P Global runs advanced ML models on millions of website URLs in minutes — a 75% time savings compared to its previous solution.
- Streamlined operations for greater productivity and scale: Thanks to Snowflake’s fully managed service, S&P has eliminated manual configurations and complex platform management, helping the team scale to meet the data demands of today and tomorrow — all in a single, unified environment.
Scaling data processing, simplifying workflows and reducing costs
To build its risk reports and analysis, S&P Global uses advanced ML models to source terabytes of data from millions of enterprises’ websites. Initially, S&P Global stored raw web crawler data in object storage and used multiple data science technologies for data cleaning and model hosting. However, S&P Global quickly abandoned this approach due to concerns about data movement, runtime performance, infrastructure costs and complexity.
“Our technology workhorses have to be really performant,” says Ganesh Nagarathnam, S&P’s Chief Architect and Head of Machine Learning Engineering and Analytics. “We do not want to move data into one place for compute then have results in a separate space. We need one unified engine.”
For near-infinite scalability, S&P Global’s Market Intelligence team turned to Snowflake. “I wanted to bring the data and compute together to enable mining within hours — not days — due to the sheer size of the information being collected,” says Moody Hadi, Head of Risk Solutions New Product Development at S&P Global. “We selected Snowflake and Snowpark for the scale to massively process big data and produce insights synchronized with the way companies generally update their public information.”
By moving to Snowflake, S&P Global benefits from a fully managed service, which has allowed the team to scale resources efficiently without manual configurations or downtime while also enhancing both performance and availability for data processing. "Snowflake provides us with a unified platform that allows us to easily scale up and down without changing the code, providing the speed we need for AI-related projects while keeping our compute costs within budget." Hadi says.
S&P Global now loads both structured and unstructured web-crawled data into Snowflake and applies business attributes and firmographic mining models built with Snowpark. These AI custom models then curate the business data, ultimately feeding S&P Global’s credit models within their RiskGauge™ reports.
For S&P Global, governance and security are easier now, too. Prior to Snowflake, S&P faced significant challenges in managing runtime dependencies for its models — and downloading and validating these dependencies was a time-consuming process. But thanks to Snowflake’s easy access to popular Python libraries for flexible model development, S&P has eliminated the need for additional tuning and manual security checks, enhancing security while ensuring models and dependencies are high-quality and vetted.
“"With Snowflake in our quiver, we can now develop enterprise-wide solutions quickly with a small team while driving costs down for highly computational intensive AI tasks."
Moody Hadi
Mining structured and unstructured data in minutes
Handling terabytes of data weekly, S&P needed a platform that could process large data sets more efficiently than its previous technology. With Snowflake ML, S&P Global now collocates data and compute more easily for faster parallel processing across multiple algorithms and data mining models, including custom models, natural language toolkits and named entity recognition. Running advanced ML workloads in parallel helps accelerate extraction of organizational data, such as business name, snippets of business activities, company location details, industry classification, announcements and news sentiment.
Snowpark Container Services helps S&P Global optimize data costs by selectively using graphics processing units (GPUs) for compute-intensive use cases. “Snowpark and Snowflake effortlessly handles all our big data processing, which includes 15 heavy ML models all competing with each other,” Hadi says.
While using other data science and AI platforms often added significant overhead costs and processing delays, switching to Snowflake has simplified the team’s entire workflow while consolidating storage and compute in a single unified environment. Now, the team enjoys streamlined operations with automated platform management and seamless upgrades — and no longer has to deal with complex data transfers. These improvements have reduced downtime, manual configuration and cumbersome maintenance while allowing the team to enhance performance, decrease complexity and gain more control over their infrastructure.
S&P Global’s data mining engines have performed consistently well since switching to Snowflake. For example, a pattern-based algorithm that extracts business locations processes several million webpages in just minutes, while a natural language processing model extracts business names and attributes from 4.4 million URLs in a speedy 27 minutes. In runtime testing, S&P Global found that leveraging a Snowpark-optimized warehouse delivered a 75% time savings compared to using PySpark on Databricks.
75%
time savings by switching from PySpark on Databricks to Snowflake ML
With Snowflake, S&P Global keeps pace with evolving market data while maintaining proper historical context. According to Nagarathnam, “A huge benefit of Snowflake is having the entire history of audits — including data that’s been changed or deleted — so we can compare the probability of default scores to previous calculations.”
200 million webpages, 5 million enterprises — and unlimited potential
S&P Global uses Snowflake to store, mine and curate data from at least 200 million website URLs and 5 million small to midsize enterprises. Doubling coverage by year’s end will provide corporate clients with even richer credit insights. “Since Snowflake is so scalable, we can use the same pipeline we built to gather data from another 5 million companies,” Hadi says. “This will help us achieve our goal to cover 100% of U.S. small businesses for credit analysis. Combining our AI capabilities on entities that have different levels of data veracity allows us to triangulate information on them in an efficient manner.”
Mining additional business attributes — web traffic and corporate change data are next — and combining them with other data sets will help S&P Global Market Intelligence deliver even more value to its customers. “We augment primary information extracted from websites, so customers know they’re getting the best quality data from us. That’s the key differentiator,” Nagarathnam says.
An innovative future invested in gen AI
Scaling S&P Global’s use of Snowpark Container Services is an important priority as the company pursues a growing number of AI and ML use cases. According to Nagarathnam, “We are looking heavily at Snowpark Container Services because we want to have a GPU-based environment for our workloads.”
Using LLMs could make it easier for S&P Global to streamline data extraction and classification, as well as generate company descriptions and varying insights on what could potentially create credit transitions for client-facing RiskGauge™ reports. The team also recently enabled Snowflake Notebooks on Container Runtime for scalable AI/ML development, which opens up new opportunities for data exploration and experimentation moving forward.
Check out S&P Global's listings on Snowflake Marketplace here.
Start your 30-DayFree Trial
Try Snowflake free for 30 days and experience the AI Data Cloud that helps eliminate the complexity, cost and constraints inherent with other solutions.