Skip to main content

LakeSoul Cloud-Native Lakehouse

Through the certification of National Innovation and Information Technology, LakeSoul can serve as a future digital foundation for enterprises to build a unified real-time data center and data asset base. At the same time, LakeSoul is also a knowledge base for the AI 2.0 big model era, supporting the management of massive multimodal data, integrated big model training, and RAG applications

Leading technical concept and architecture design

Traditional data architecture is faced with the untimely response, high cost, inability to unify real-time data, batch data, and difficulty scaling. LakeSoul provides a perfect lake warehouse storage to solve the above problems. It offers high concurrency, high throughput, read and write capabilities and complete warehouse management capabilities on the cloud and provides it to various computing engines in a general way.

Incremental writes and Upsert updates are supported

LakeSoul provides efficient Merge on Read and Upsert functions to improve data intake flexibility and performance.

Real-time Lakehouse

Support streaming and batch writing, snapshot reading; Flink CDC multi-source real-time lake entry, streaming incremental reading and calculation, achieving full chain real-time data warehouse.

Open

It supports interconnection with various computing engines such as Spark, Flink, and Presto and fully supports multiple data intelligent computing services such as ETL, OLAP, and AI model training.

Efficient and extensible Catalog metadata service

Use a PostgreSQL database to store Catalog information, improving metadata scalability and transaction concurrency.

Concurrent writes and ACID transactions

Implement concurrency control, with high write concurrency capability, automatically identify conflicts and handle them, ensuring data consistency.

Unified stream-batch table storage

Rich application scenarios, meeting various service requirements and helping to release service value

Real-time data is rapidly entering the lake

Flink CDC is provided for real-time implementation from the data source without T+1 import and Kafka deployment

Example of real-time online database entry report analysis

With only relevant configurations, such as online data sources, the whole database synchronization and real-time entry task can be started. It supports the automatic sensing of new tables and synchronizing table structure changes without human operation and maintenance. The online data is updated to the lake warehouse in real time. The BI reports and large-screen display are seamlessly connected and updated in real time so that key business indicators can be grasped at any time to support business decisions.

Real-time Report Analysis

Based on the streaming batch update feature, data extraction, transformation and development are completed through SQL, simplifying the ETL and data analysis process.

RAG Intelligent Expert System

LakeSoul provides a native Python interface that integrates support for mainstream AI frameworks such as PyTorch for direct data calling, training, and inference. It can also support training of large models and RAG applications

GIFGIF
GIF

AI 专家咨询

Join the community and share data intelligence