Snowflake's Single Platform Improves Performance, Advances Mission Criticality, and Analytics While Supporting More Data Types
The world is undergoing a remarkable transformation fueled by data. Organizations have accumulated silos across their data infrastructure to support various workloads, languages, tools, and formats because of technology limitations. These silos can have major consequences in the form of greater operational burden, security vulnerabilities, increased total cost of ownership, incomplete insights, and reduced agility.
This is where Snowflake’s single, unified platform comes in: helping break down silos and simplify architectures. At Summit 2023, we announced a series of new advancements to the platform that help customers break down silos through: improved performance, more visibility and control over spend, enhanced governance, more advanced analytics, expanded business continuity capabilities, innovations around Apache Iceberg, the ability to get more value from unstructured data with large language models (LLMs), and the extension of ML-powered capabilities to more analysts. We’re going to summarize these new capabilities in this blog post.
Continuously improving customers’ price for performance
Snowflake’s most important value is to “put customers first.” We’re focused on delivering continuous innovations with nearly every product release to improve performance and efficiency, and many of these platform improvements are rolled out automatically to customers with no action or effort required on their part.
This is why we’re introducing the new Snowflake Performance Index (SPI), an aggregate index for measuring improvements in Snowflake performance experienced by customers over time. Between when we began tracking the SPI on August 25, 2022 to April 30, 2023, query duration time improved by 15 percent for customers’ stable workloads in Snowflake.* This is one of the many ways Snowflake is helping customers get more value from the platform.
Search Optimization (SO) Service speeds up query performance by quickly finding the needle in the haystack to return a small number of rows on large tables. We opened SO to accommodate more data types including VARIANT, ARRAY, OBJECT and GEOGRAPHY and we’re expanding the service to support more use cases in general availability like: speeding up substring searches in text columns, and working with other performance features like Query Acceleration Service.
Low latency TOP-K analytics enable customers to only retrieve the most relevant answers from a large result set by rank. Additional pruning features, now GA, help reduce the need to scan across entire data sets, thereby enabling faster searches.
To help customers more easily analyze the structure of expensive queries and identify operators that cause performance problems, we will soon be making Programmatic Access to Query Profile available in GA.
Learn more about the continuous performance improvements we make to the platform on an ongoing basis.
Gain more visibility and control over your Snowflake spend
We announced three new features to help users gain better visibility and control over their Snowflake spend, while maximizing their existing resources and driving more cost predictability.
First, our new warehouse utilization feature (in private preview) gives customers a single metric to help them better estimate capacity, size warehouses appropriately, and optimize their warehouse spend.
Snowflake’s new per-query cost attribution feature (private preview coming soon) gives users the ability to attribute warehouse spend to different queries. For example, if a centralized team is running Snowflake for several departments with different billing — let's say HR, Finance, and IT — that central team can now see how many Snowflake credits each department is using. This helps with chargeback scenarios, in which centralized departments need to chargeback different teams for the amount of credits they’ve actually used on Snowflake.
We also announced that Budgets will be in public preview soon to give users even more control. A Budget defines a spending limit for a specific time interval on the compute costs for a group of Snowflake objects. Budgets help customers monitor warehouse and serverless usage, including the usage of automatic clustering, materialized views, search optimization, and more. When the spending limit is projected to be exceeded, a daily reminder email will be sent.
Supporting mission criticality with enhanced native data governance, new Snowflake UIs, a growing compliance footprint, and updated cross-cloud business continuity
At Snowflake, we’re committed to providing best-in-class native data governance features for customers entrusting our platform with their data. These customers span many countries around the world and, as such, we’ve expanded classification capabilities to support UK-, Australia-, and Canada-based data (in private preview).
Customers can also now more easily manage sensitive and personally identifiable (PII) data by leveraging an enhanced user experience. The Classification UI (in private preview) provides customers with an intuitive workflow in Snowsight to classify and tag tables in the desired schema while the Data Governance UI (in GA soon) offers an at-a-glance summary of tagged and protected assets in Snowsight, with workflows to take action.
We are further expanding our data governance capabilities with native data quality monitoring (private preview coming soon) through out-of-the-box metrics for data freshness, volume, accuracy and common statistics along with the ability to define your own custom metrics. Snowflake delivers these building blocks for data quality monitoring that our partners can further leverage and extend.
Aside from native data governance innovations, we are also constantly working to expand our compliance footprint. Most notably, Snowflake recently launched the Government & Education Data Cloud industry offering earlier in June and has obtained authorization for StateRAMP High on AWS GovCloud. To help federal, state and local agencies meet security and compliance standards, Snowflake now supports regulated workloads such as Criminal Justice Information Services (CJIS).
Snowgrid is a uniquely differentiated cross-cloud technology layer that interconnects your business’ ecosystems across regions and clouds so you can operate at global scale. Snowgrid powers Snowflake’s cross-cloud business continuity capabilities and we are excited to announce that Account Replication is now generally available. This feature expands replication beyond databases to account metadata and integrations, making business continuity turnkey. Snowflake users can now recover their account and client connections in seconds, at virtually any scale, when paired with Client Redirect.
To simplify and streamline the user experience for cross-cloud business continuity, customers can set up, configure, and monitor account replications through an intuitive UI (public preview coming soon). This UI allows them to manage replication sources, destinations, objects to be replicated, and timings.
With the replication of Stages, Snowpipe, COPY (ingestion), and directory tables soon to be in public preview, customers will be able to replicate entire ETL pipelines (public preview coming soon) to protect against Snowflake becoming unavailable in a region. This means customers can failover pipelines and Snowflake guarantees idempotent loads.
Snowflake users can now also replicate Streams and Tasks in GA — these are often used together to build modern data pipelines. We have thousands of Snowflake customers developing powerful data transformation pipelines every single day. With the ability to replicate Streams and Tasks, your data pipelines will now also seamlessly work on your secondary Snowflake accounts.
Advanced analytics with new support of GEOMETRY, new financial services capabilities, and fast SQL functions
At Snowflake, we’re committed to customer convenience, flexibility, and efficiency, and we’re showing this through our advancements in analytics.
We’ve made significant investments as part of our effort to become the leading platform for geospatial data. Regardless of whether location data is stored in a spherical (Geography), flat surface (Geometry), or invalid shape format, customers can now process all these types of vector geospatial data in GA. We are also announcing the public preview of Transformations between Spatial Reference Systems for geometry objects, which enable reprojections from one mapping system to another.
Furthermore, we’re continuously improving our SQL capabilities to drive even more efficiency in coding, save time, and provide increased accuracy through new functions. We introduced several SQL improvements (in GA) including SELECT*, MIN_BY / MAX_BY, GROUP BY ALL, and Banker’s Rounding. In particular, the inclusion of Banker's Rounding aids in reducing errors during financial analysis, catering to the specific requirements of bankers and financial professionals.
Updated Apache Iceberg support with more simplicity, better performance
Apache Iceberg continues to grow in popularity as the industry standard for open table formats. Because of its leading ecosystem of diverse adopters, contributors, and commercial offerings, Iceberg helps prevent storage lock-in and eliminates the need to move or copy tables between different systems, which often translates to lower compute and storage costs for your overall data stack.
We announced at Summit 2023 that we’re unifying External Tables for Iceberg and Native Iceberg Tables into one table type — an Iceberg Table (private preview coming soon). Customers now get the simplicity of a single Iceberg table type, but now with options to specify catalog implementation and far less performance tradeoffs. Managed Iceberg Tables allow full read/write from Snowflake and uses Snowflake as the catalog from which external engines can easily read. Unmanaged Iceberg Tables connects Snowflake to read Iceberg Tables from an external catalog. We’re also adding an easy, low cost way to convert an unmanaged Iceberg Table into a managed one, making it easy for customers to onboard without having to rewrite entire tables.
While query performance is dependent on Parquet efficiency, our testing has shown performance for unmanaged Iceberg Tables is more than 2X better than External Tables. And performance for managed Iceberg Tables is very close to internal tables using Snowflake’s table format.
Integrating data stored on-premises
Amidst the ongoing trend of companies shifting their data to the cloud, numerous organizations find themselves in a situation where data remains stored on-premises or in private cloud environments for a variety of reasons. While some data may be unsuitable for migration to the public cloud or is currently undergoing the migration process, these organizations are looking to seamlessly manage all their data from one place, regardless of its storage location. Consolidating and accessing data from disparate sources is crucial for holistic data insights and governance.
Generally available soon, External Tables and Stages for on-premises storage help bridge this gap. Customers can use Snowflake to access data in s3-compatible storage devices while getting the ease of use, elasticity, unified governance, resilience, and connectivity from Snowflake’s platform. Use cases could include performing analytics on data lakes with External Tables, simplified ingestion of files on-premises to tables in the cloud, or even using Snowpark Python, Java, or Scala to process files stored externally. For more information including a list of supported storage providers and our public test suite, please read product documentation.
Introducing a built-in LLM with Document AI
Nearly every business has unstructured data in the form of documents, but the path to valuable analytical insights from those files has been either limited to machine learning (ML) experts or siloed away from all other data. Building on our native support for unstructured data, Snowflake’s built-in Document AI (in private preview) makes it easier for organizations to understand and extract value from documents using natural language.
Document AI leverages a purpose-built, multimodal LLM. By natively integrating this model within the Snowflake platform, organizations can easily extract content, such as invoice amounts or contractual terms from documents securely stored in Snowflake, and fine-tune results using a visual interface and natural language. Data engineers and developers can also perform inference by programmatically calling the built-in or fine-tuned models, like in pipelines with Streams and Tasks or in applications.
Making ML accessible via SQL
Analysts can gain more accurate insights from data as its volume continues to grow. Specifically, ML algorithms can accelerate that process, but programming knowledge gaps and complex compute infrastructure requirements often prevent analysts from adopting ML.
This is why we’re improving our single platform with ML-powered functions (in public preview). With ML-Powered Functions, analysts can now uncover insights and generate predictions with the assistance of ML functions available through familiar SQL. This empowers analysts with capabilities that previously were only accessible to those with ML skill sets. The functions now available in public preview include:
- Forecasting: Build more reliable time series forecasts with automated handling of seasonality, missing values and more;
- Anomaly detection: Identifies outliers and trigger alerts for further action; and
- Contribution Explorer: Quickly identify the dimensions and their values contributing to the change of a given metric across two different user-defined time intervals.
ML can now be adopted more broadly to improve the speed and quality of day-to-day business decisions. This capability removes the complexity of ML frameworks through familiar SQL functions available directly through Snowflake or through integrations with BI/analytics tools like Sigma Computing.
Learn more on-demand
To learn more about these innovations, visit the Summit 2023 page.
*Based on internal Snowflake data from August 25, 2022 to April 30, 2023. To calculate SPI, we identify a group of customer workloads that are stable and comparable in both amount of queries and data processed over the period presented. Reduction in query duration resulted from a combination of factors, including hardware and software improvements and customer optimizations.