With the announcement of S3-native-streams (Freight clusters), here is a commentary on Confluent strategy regarding object storage, streaming and an open data architecture.
https://lnkd.in/dTg6DKVy
Jack Vanlightly Honestly I really enjoyed the writing.
But I do disagree on all the Tableflow stuff, which makes me wonder how much R&D has taken place other than the transactional properties.
From a practical perspective:
1. Tableflow moves the offloading from what we have been historically doing with Flink to tiered storage directly.. Yes it’s more convenient, but it’s again either batch OR stream, not batch AND stream.
2. The Stream/Table duality itself verifies it, it’s either batch or stream there no unification and u need to convert between them.
3. Iceberg itself can’t support streaming, so offloading from tiered storage (which by nature adds extra latencies), running compaction only to stream back (which by design iceberg has many limitations there) and more.
4. At the same time (you might have different view), I have seen this Kafka-table-kafka implementation in practise, but in reality there was almost 0 market demand for such use case (especially if there is latency introduced)
All this fancy things to me it just looks as a variation of the Lambda architecture (Confluent advocated the Kappa in the first place), the only difference (no matter what wrapper you add on top) is the batch layer used Icebeg.
Databricks named a Leader in the 2024 #ForresterWave for Data Lakehouses, with the highest scores in both strategy and current offering categories. Databricks is the pioneer of the lakehouse, and the standard for data architectures!
Learn why we were named a leader in the full report.
Databricks is named the #1 Leader in the 2024 #ForresterWave for Data Lakehouses, with the highest scores in both the Strategy and Current Offerings categories! As the pioneer of the Lakehouse architecture, it's truly thrilling to see Databricks set the standard for next gen architectures that support any data use case.
Learn more in the full report.
This paper set out to "discover" a set of design principles and rules for Cloud-based Big Data platforms for complex, heterogeneous environments. The design scope comprises Big Data's significance, challenges and architectural impacts. Using a methodology we call Reverse Engineered Design Science Research (REDSR), artifacts from leading vendors were used to elicit essential and common design principles and rules. we conclude that there is little to choose between major cloud vendor architectures.
#bigdata#digitalplatforms
Stephen Wingreen Purna Naga Sai Mannava
📈📊 Increasing evidence suggests that Kafka is evolving into a new form of data lake. ❓ Why is that❓
🔹 For starters, Kafka has all the data lake properties!
🔹 Kafka also has the potential to serve as the new data lake in production.
🤔 What do you think: Will Kafka replace the existing data lake managing frameworks?
📣👇Check out what Yingjun Wu, CEO at RisingWave Labs, has to say about this: https://lnkd.in/g7xGprWp#dataprocessing#kafka#datalake#streamprocessing#datalakehouse
Good video explaining what an open data architecture actually is and how it’s next generation capabilities solve many challenging problems for enterprises.
“Discover the power of WatsonX.data, a data store built on an open data lakehouse. See how the solution can help data management challenges such as reducing data warehouse costs and unifying data across any hybrid cloud and on-premises environments.”
#openlakehousearchitecture #watsonx.data#datahttps://lnkd.in/dxWwTUnc
Forrester named Databricks a Leader with the highest score in both the strategy and current offering categories among all vendors, with 5/5 scores across 19 criteria.
intelia are specialised implementation partners of #Databricks, get in touch today to find out how we can help your organisation take advantage of its capabilities to drive more value from your data.
#intelia#databricks#data#datastrategyhttps://lnkd.in/g5SNiUvQ
Databricks named a Leader in the 2024 #ForresterWave for Data Lakehouses!
As pioneers of the lakehouse data architecture, we’re pleased to be placed highest in both the strategy and current offering categories. Read the report to see our cited strengths in several areas, including:
• Security & Governance
• GenAI/LLM
• Data storage and formats
• Data ingestion/pipeline
• Data models
"The heart of the #datamesh beats in real-time with #apachekafka"
If there were a buzzword of the hour, it would undoubtedly be “data mesh”! This new architectural paradigm unlocks analytic and transactional data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios.
The data mesh addresses the most common weaknesses of the traditional centralized #datalake or data platform architecture. And the heart of a decentralized data mesh infrastructure must be real-time, reliable, and scalable.
Learn how the de facto standard for data streaming, Apache #kafka, plays a crucial role in building a data mesh and how it complements (not replaces!) data lakes, #datawarehouse, and other data platforms:
https://lnkd.in/emgZNmsn
Senior Data Engineer | Data Architect | Data Science | Data Mesh | Data Governance | 4x Databricks certified | 2x AWS certified | 1x CDMP certified | Medium Writer | Turning Data into Business Growth | Nuremberg, Germany
𝗗𝗮𝘁𝗮 𝗟𝗮𝗸𝗲𝗵𝗼𝘂𝘀𝗲 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀: 𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀 𝗳𝗼𝗿 𝗦𝘂𝗰𝗰𝗲𝘀𝘀
The transition to data lakehouses presents significant challenges and opportunities for organizations across various industries. Traditional data warehouses and data lakes often fall short in meeting the demands of modern businesses due to limitations in agility, scalability, integration, and governance. However, data lakehouses offer a solution by providing unified platforms with advanced AI capabilities to address these shortcomings and accelerate analytical use cases.
𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀
𝗚𝗲𝗻𝗔𝗜 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻: As genAI capabilities continue to evolve, organizations must carefully assess solutions that offer foundational genAI capabilities, such as natural language query and data intelligence, to simplify lakehouse development and deployment.
𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗲𝗱 𝗘𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲: End-to-end integration is crucial for accelerating analytical use cases. Organizations should seek lakehouse vendors that provide integrated solutions encompassing streaming, transformation, workload management, integration, governance, and security.
𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Large and complex data warehouses often require months to set up and configure. Look for vendors offering solutions with deep integration with table formats, built-in automated performance optimization, advanced workload management, and parallel data processing to ensure performance at the speed of business.
𝗘𝗺𝗯𝗿𝗮𝗰𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 𝗹𝗮𝗸𝗲𝗵𝗼𝘂𝘀𝗲𝘀 𝗰𝗮𝗻 𝗿𝗲𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻𝗶𝘇𝗲 𝗵𝗼𝘄 𝗼𝗿𝗴𝗮𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻𝘀 𝗺𝗮𝗻𝗮𝗴𝗲 𝗮𝗻𝗱 𝗱𝗲𝗿𝗶𝘃𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲𝗶𝗿 𝗱𝗮𝘁𝗮. By addressing challenges related to genAI integration, integrated experiences, and performance optimization, organizations can unlock the full potential of data lakehouses to drive innovation and competitiveness in today's data-driven landscape.
#DataLakehouses#DataManagement#AI#Analytics#BigData#CloudComputing#DataIntegration#DataScience#DigitalTransformation#MachineLearning#Technology#BusinessIntelligence#ForresterWave#DataEngineering#GenerativeAI
Databricks named a Leader in the 2024 #ForresterWave for Data Lakehouses!
As pioneers of the lakehouse data architecture, we’re pleased to be placed highest in both the strategy and current offering categories. Read the report to see our cited strengths in several areas, including:
• Security & Governance
• GenAI/LLM
• Data storage and formats
• Data ingestion/pipeline
• Data models
Staff Streaming Product Architect @ Ververica | Apache Flink 🐿️ Streaming Lakehouse 🌊 Everything is a Stream 🌊
2moJack Vanlightly Honestly I really enjoyed the writing. But I do disagree on all the Tableflow stuff, which makes me wonder how much R&D has taken place other than the transactional properties. From a practical perspective: 1. Tableflow moves the offloading from what we have been historically doing with Flink to tiered storage directly.. Yes it’s more convenient, but it’s again either batch OR stream, not batch AND stream. 2. The Stream/Table duality itself verifies it, it’s either batch or stream there no unification and u need to convert between them. 3. Iceberg itself can’t support streaming, so offloading from tiered storage (which by nature adds extra latencies), running compaction only to stream back (which by design iceberg has many limitations there) and more. 4. At the same time (you might have different view), I have seen this Kafka-table-kafka implementation in practise, but in reality there was almost 0 market demand for such use case (especially if there is latency introduced) All this fancy things to me it just looks as a variation of the Lambda architecture (Confluent advocated the Kappa in the first place), the only difference (no matter what wrapper you add on top) is the batch layer used Icebeg.