Partner & Customer Value

Why a Solid Data Foundation Is the Key to Successful Gen AI

Why a Solid Data Foundation Is the Key to Successful Gen AI

Think back just a few years ago when most enterprises were either planning or just getting started on their cloud journeys. The pandemic hit and, virtually overnight, the need to radically change ways of working pushed those cloud journeys into overdrive. Cost-effective adaptability was essential. And the companies that could scale up or scale down quickly were the ones that navigated the pandemic successfully. Migrating to the cloud made that possible.

Today, game-changing benefits of generative AI are creating a renewed impetus to act just as fast and decisively. This time it’s all about ensuring that the data and the platform where it’s processed are ready for the new AI models. 

But there’s still a long way to go in an environment where the volume, velocity and complexity of data and data types is constantly increasing. By 2025 it’s estimated that there will be 7 petabytes of data generated every day compared with “just” 2.3 petabytes daily in 2021. And it’s not just any type of data. The majority of it (80%) is now estimated to be unstructured data such as images, videos, and documents — a resource from which enterprises are still not getting much value. 

A big gap between aspiration and reality

In this data-rich world, organizations understand that their ability to compete from now on will rest on the availability, veracity and accessibility of the data they need. At present, however, while 83% of Accenture’s clients say that real-time data is going to be crucial for competitive advantage over the next two years, just 31% say that they’re managing that data effectively. 

In other words, there’s a big gap between aspiration and reality. And as the need to securely share data — both within and beyond the enterprise — becomes mission critical, the ability to manage and create robust and trusted data pipelines is key. Yet today, 55% of enterprises say they can’t trace the lineage of their data from source to endpoint. And with structured and unstructured data held across multiple silos in many different cloud-based and on-premises locations, it’s a huge challenge. But it’s one enterprises have to solve to remain competitive.

Our research supports this. We’ve found that the highest-performing companies are 2.4x more likely to store their data in a specialized, modern data platform in the cloud. Key actions that set them apart? Breaking down data silos, removing duplication, creating trusted data products, reducing the cost of data rework, ensuring more timely insights and cross-functional use cases, and improving user adoption.

Realizing the value of proprietary data

The greatest value from large-scale machine learning (ML) and generative AI will be realized when companies can rely on their own data to deliver the unique insights and recommendations that will fundamentally move the performance needle. Then they’ll be able to go from interacting with a generic internet-trained chatbot to generating highly relevant content that leverages up-to-date and potentially confidential enterprise information. 

Companies that have real control over their data can put the technology to much more targeted and valuable use. Think, for example, about a life sciences business using a model narrowly trained on its proprietary trial and product data to predict the likelihood of a drug’s success much more accurately, efficiently and quickly than its competitors.

Many modern enterprises have far-flung operations, products and value chains that generate data globally and in a federated way. In order to build more targeted, discrete models like the one in the example above, they need to find a way for teams to share and access data stored on multiple clouds in secure and governed environments. 

The ideal solution is to enable usage of the primary, most up-to-date data, without having to copy it from one place to another, all while meeting relevant regulatory requirements, which will continue to evolve with AI. 

This approach can avoid significant and unnecessary data storage costs, of course, as well as prevent the creation of yet more data silos. But it’s also the vital means through which to enable strong governance and security by preserving, for example, fine-grained data-access controls. Finally, seamless access — via a trusted virtual “clean room” — to valuable data sets controlled by third parties opens up entirely new opportunities for value creation.

Prioritizing data security and governance

How can companies do all this — move fast and stay safe at the same time?  A comprehensive data foundation, with security and governance baked-in  at the digital core, is non-negotiable. This foundation must allow every team to trust all the data they use, whether it is proprietary to the enterprise or from other sources, including ecosystem partners. 

And this foundation has to control access to data in more complex configurations than ever before. One of the many exciting things about gen AI is its power to democratize access to insights that were only ever previously available to AI specialists and data scientists. But lowering the barriers also raises the risks. Security and governance gain even more prominence. 

So what comes next?

Many, but by no means all, have successfully tackled phase one of the data challenge: making structured data shareable across corporate lines and to third parties. The second phase, being able to trust the explosion of unstructured, streaming high-velocity information, is still a work in progress for the majority. The third phase, harnessing bespoke large language models (LLMs) and larger-scale ML models tuned or trained with this data, is now just emerging. 

Particularly crucial to the second phase is engendering trust in data. This requires a data platform that can bring all the necessary pieces of compute to the data and make them available within the same governance boundary. With our partners at Snowflake, that’s something we help clients to achieve. By providing controls at the data layer and across clouds, Snowflake’s platform enables processing to happen next to the data. This means people enterprise-wide know their AI models are using trusted data every time. Without that assurance, there’s always the risk that models will provide faulty insights.

And for phase three, democratizing and extending the benefits of industry-leading AI and LLMs, what’s needed is a way for everyone (not just AI specialists) to be able to access and use these cutting-edge technologies and apply all their trusted data to train and prompt both custom-built and open source LLMs.

Investing in a cloud data platform

Whatever stage your organization has reached or is aiming for, investing today in a modern data platform for your digital core is a “no regrets” investment. Identify areas of the business with the highest value potential and invest in optimizing how you manage and secure the data pipelines that feed them. 

We are increasingly seeing our clients invest in this as a top priority. Generative AI and ML capabilities are rapidly becoming the crucial differentiator for companies across industries. In this world, every business needs to democratize access to these capabilities and ensure the data they use is trusted. 

Provided they can do this, they’ll secure the competitive edge by standing out in three key ways:

  1. Ensuring all their business teams can use AI in everyday analytics within seconds.
  2. Accelerating delivery of innovation, with technical users able to build and deploy AI apps in just a few minutes.
  3. Keeping all their data and models secure and governed.

Building a Data-Centric Platform for Generative AI and LLMs at Snowflake

Share Article

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Start your 30-DayFree Trial

Try Snowflake free for 30 days and experience the AI Data Cloud that helps eliminate the complexity, cost and constraints inherent with other solutions.