Multimodal & AI Computing
Built on LakeSoul's integrated Data+AI lakehouse framework, LakeInsight supports unified storage of multimodal data (video, audio, images, text, etc.) alongside structured data in the lakehouse, with deep AI compute engine integration, enabling a complete closed loop from data ingestion to AI model training and inference.
Multimodal Data Processing
LakeInsight supports storing and managing diverse unstructured data types within the lakehouse:
(1) Video Data Processing
- Parse video files frame-by-frame into lakehouse tables, with each frame containing image binary data, timestamps, frame indices, and other metadata
- Incremental query and filtering by frame index, time range, and other criteria for extracting specific video segments for downstream analysis
- Integration with AI models for object detection, action recognition, and other video analysis tasks
(2) Audio Data Processing
- Store audio files as structured tables in the lakehouse, including audio sample data, sample rate, channel information, and other metadata
- Query and statistical analysis by path, duration, and other dimensions
- Integration with speech recognition, voiceprint recognition, and other AI models for audio data analysis
(3) Image & Text Data
- Store images and text data as binary blobs or file path references
- Leverage LakeSoul's Schema Evolution to flexibly extend metadata fields for multimodal data management needs
AI Compute Capabilities
(1) AI Engine Integration
- Support for mainstream AI and machine learning compute engines including PyTorch, Pandas, and Spark MLLib
- LakeSoul's native Python Reader and PyTorch Dataset interface enables direct data reading from lakehouse tables for model training, eliminating data export and format conversion steps
(2) Intelligent Coding Assistant (OpenChamber)
- Built-in AI coding assistant supporting natural language interaction to generate SQL statements
- Automatically analyzes data source schema information and generates SQL code for data ingestion, analysis, and modeling based on user intent
- Generated code can be published as production tasks with one click, completing the approval and deployment workflow
(3) Python Development Environment
- CodeServer-based online IDE with Jupyter Notebook interactive development support
- Built-in Conda environment management for creating isolated Python virtual environments with dependency isolation
- Completed Python tasks can be published as approval tasks for scheduled or streaming execution
Data + AI Integration
LakeInsight follows LakeSoul's "Data+AI Integration" design philosophy:
- Unified Storage: Structured and unstructured data share a single lakehouse storage system, eliminating data silos
- Unified Compute: Spark/Flink batch-stream computing and AI training collaborate on the same platform, with no cross-system data migration needed
- Unified Governance: Multimodal data and AI models governed through unified metadata management, access control, and lineage tracking