Kafka is a streaming platform developed by the Apache Software Foundation and written in Scala and Java. Since its creation in 2011, Kafka has evolved from a messaging queue to a robust event streaming platform. It provides a high-throughput, low-latency platform for handling real-time data feeds and, with Kafka Connect, can connect to external systems for data import/export.
Kafka stores key-value messages that come from arbitrarily many processes called producers. The data can be partitioned into different "partitions" within different "topics".
Kafka supports both regular and compacted topic types.. Regular topics are configurable with a space bound or a retention time. When the space bound is surpassed for any partition or when records age beyond a set retention time, Kafka can delete this data to free up storage space. With compacted topics, Kafka updates older messages and never deletes the most recent message.
Snowflake and Kafka
The Snowflake Connector for Kafka reads data from one or more Apache Kafka topics and loads the data into a Snowflake table. Increasingly, organizations are finding that they need to process data as soon as it becomes available. In addition, there has been a growing demand of separating storage and compute. Kafka and Snowflake put streaming data into a cloud data platform Users can then take the unstructured data from Snowflake and use an ELT tool like Matillion to convert it to structured data and conduct advanced analytics with machine learning.