Product and Technology

Snowflake Cortex LLM Functions Moves to General Availability with New LLMs, Improved Retrieval and Enhanced AI Safety

Snowflake Cortex LLM Functions Moves to General Availability with New LLMs, Improved Retrieval and Enhanced AI Safety

Snowflake Cortex is a fully-managed service that enables access to industry-leading large language models (LLMs) is now generally available. You can use these LLMs in select regions directly via LLM Functions on Cortex so you can bring generative AI securely to your governed data. Your team can focus on building AI applications, while we handle model optimization and GPU infrastructure to deliver cost-effective performance.

Here is the full set of updates released today as part of our mission to provide efficient, user-friendly and trusted generative AI: 

  • More high-performing LLMs: In addition to high-performing LLMs from Mistral AI and Google, Cortex now also supports Snowflake Arctic, Llama 3 (both 8B and 70B) and Reka-Core LLMs. All of these foundation models are available via the single, easy-to-use COMPLETE function - now generally available
  • Efficient RAG and semantic search using Arctic embed: Embed and vector functions are now in public preview and include support for Arctic embed, the world’s best practical text-embedding model for retrieval.
  • Enhanced AI Safety with Llama Guard: As part of a collaboration with Meta, Llama Guard, an LLM-based input-output safeguard model, comes natively integrated with Snowflake Arctic, and will soon be available for use with other models in Cortex LLM Functions.

The combination of these updates continues to unlock value across industries, with two use cases in particular:

  • Text analytics and generation: A major video-hosting platform struggled with converting free users to paid subscriptions due to limited data insights from their text-based user interactions. By running semantic search tasks with the embedding and vector distance functions, they were able to more clearly define target segments. Then, using foundation models in the complete function, they created personalized emails. Because these LLM operations can effortlessly run as batch operations in Snowflake, the team continuously runs this to increase free-to-paid conversions.
  • Document chatbots. A multinational, sustainable energy tech company struggled to effectively foster collaboration among geographically spread technical teams. Their knowledge was contained in more than 700,000 pages of private R&D documents. Using a RAG-based architecture that combines Cortex and Streamlit in Snowflake, the team has built a document chatbot. This new application streamlines knowledge sharing, reduces information search time and increases overall productivity.

Expanded flexibility for high-performing LLMs with support for Snowflake Arctic, Llama 3 and Reka models

As state-of-the-art models continue to advance, customers need flexibility to quickly and securely test and evaluate models to get the best results for their use case. This is why Snowflake Cortex is adding support for: 

  • Snowflake Arctic: Snowflake's efficient and truly open model for enterprise tasks. Arctic excels at enterprise tasks, such as SQL generation, coding and instruction following benchmarks. Available under Apache 2.0 license, it provides ungated access to weights and code. Snowflake customers with a payment method on file will be able to access Snowflake Arctic for free until June 3, 2024. Daily limits apply.
  • Meta Llama 3 8B and 70B: The Llama 3 models are powerful open source models that demonstrate impressive performance on a wide range of natural language processing tasks, outperforming several previous state-of-the-art models. These models provide improved capabilities, such as reasoning, code generation and instruction following, along with increased diversity in model responses.
  • Reka Core: This LLM is a state-of-the-art multimodal model demonstrating a comprehensive understanding of images, videos and audio, along with text. Currently Snowflake Cortex provides the text modality with support for multimodal expected in the near future. 

Using any of these models against your data is as simple as changing the model name in the COMPLETE function, available both in SQL and Python.

Efficient RAG and semantic search using Arctic-embed

To accurately answer business questions using LLMs, companies must augment pretrained models with their data. Retrieval Augmented Generation (RAG) is a popular solution to this problem, as it incorporates factual, real-time data into the LLM generation.

Snowflake customers can now effortlessly test and evaluate RAG-oriented use cases, such as document chat experiences, with our fully integrated solution. Arctic embed is now available as an option in the Cortex EMBED function. The EMBED and vector distance functions, alongside VECTOR, as a native data type in Snowflake are currently available in public preview, with general availability coming soon. With all of this natively built into the Snowflake platform, there is no need to set up, maintain and govern a separate vector store. This cohesive experience accelerates the path from idea to implementation and broadens the range of use cases organizations can support. 

Ready to build your own document chatbot in Snowflake? Try this step-by-step quickstart. 

We continue to develop more advanced and efficient retrieval so that enterprises can securely talk to their data, and we do so in a way that is open and collaborative to push the industry forward. With this approach in mind, we open-sourced Arctic embed, the world’s best practical text-embedding model for retrieval and recently announced a partnership with University of Waterloo to continue evolving retrieval benchmarks. 

Enhanced AI safety with Llama Guard in Snowflake Arctic

At Snowflake, we prioritize maintaining high safety standards for gen AI applications. As part of our ongoing focus and partnership with Meta, the Llama Guard model is natively integrated into Snowflake Arctic (with availability expanding soon to other models) to proactively filter out any potentially harmful content from LLM prompts and responses. Snowflake Arctic, combined with Llama Guard, minimizes objectionable content in your gen AI applications, ensuring a safer user experience for all. Llama Guard from Meta has been instruction-tuned, based on Meta's collected data set. It demonstrates strong performance on existing benchmarks, matching or exceeding the performance of currently available content moderation tools. The model can identify a specific set of safety risks in LLM prompts and classify the responses generated by LLMs to these prompts. We also plan to offer our customers the ability to use Llama Guard with other models in Cortex soon.

Elevating enterprise AI: LLMs rooted in security and trust

Data security is key to building production-grade generative AI applications. Snowflake is committed to industry-leading standards of data security and privacy to enable enterprise customers to protect their most valuable asset — the data — throughout its journey in the AI lifecycle, from ingestion to inference.  The high security bar can be applied to all of Cortex; whether using one of the task-specific functions, such as summarize, or a foundation model from Snowflake, Mistral AI, Meta or any other, the following is always true:

  • Snowflake does not use Customer Data to train any LLM to be used across customers.
  • LLMs run inside Snowflake. Data never leaves the Snowflake service boundary or gets shared with any third-party provider.
  • Role based access controls (RBAC) can be used to manage access to Cortex LLM Functions.

You can find more details in our AI Trust and Safety FAQ and our AI Security Framework white paper.

Pricing and Availability

Snowflake Cortex LLM functions incur compute cost, based on the number of tokens processed. Refer to the consumption table for each function’s cost in credits per million tokens. The capability is available in selected regions. Refer to the region feature matrix for more details.

Getting Started

Snowflake Cortex enables organizations to expedite delivery of generative AI applications with LLMs while keeping their data in the Snowflake security and governance perimeter. Try it out for yourself!

Want to network with peers and learn from other industry and Snowflake experts about how to use the latest generative AI features? Join us at Snowflake Data Cloud Summit in San Francisco from June 3–6.

Using Generative AI to Improve Operational Efficiency and Data-Driven Decision-Making

Share Article

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Start your 30-DayFree Trial

Try Snowflake free for 30 days and experience the AI Data Cloud that helps eliminate the complexity, cost and constraints inherent with other solutions.