One-Stop Data Intelligence Development Environment
LakeInsight provides a WEB-UI-based one-stop data intelligence development environment that unifies streaming computing, batch computing, AI development, and multimodal data processing on a single platform. Data engineers, AI developers, and business analysts can complete the entire development lifecycle — from data ingestion to model deployment and metric publishing — within a single workspace.
Unified Development Experience
(1) Multi-Language Development Support
- SQL-based data query and modeling development, compatible with Flink SQL and Spark SQL
- Python, Java, and Scala support for data processing, AI training, and custom business logic
- Jupyter Notebook for interactive exploratory analysis and AI model experimentation
- One-stop service covering the full development lifecycle: development, testing, and deployment
(2) Unified Streaming, Batch, AI & Multimodal Platform
- Manage Flink real-time streaming jobs, Spark batch jobs, and Python AI training tasks within a single workspace
- Streaming jobs (Flink) for real-time CDC synchronization, real-time metric computation, and streaming feature engineering
- Batch jobs (Spark) for large-scale offline modeling, historical data backfill, and periodic reports
- Python tasks support PyTorch, Pandas, and other AI frameworks, reading lakehouse data directly for model training
- Multimodal data (video, audio, images, text) queried and processed alongside structured tables in the same IDE
(3) Online IDE Development Environment
- CodeServer-based online editor with syntax highlighting, auto-completion, and syntax checking
- Built-in Conda virtual environment management for project-level dependency isolation
- Interactive SQL editor with real-time query result preview
- Collaborative development supporting multiple users working on data modeling tasks simultaneously
Security & Multi-Tenancy
- Enterprise single sign-on (SSO) integration
- Development and production environment isolation to prevent development errors from affecting production data
- Data domain partitioning with fine-grained read, write, and execute permission isolation
- Role-based access control (RBAC) ensuring business and data security across workspaces
- Custom roles with flexible module-level permission configuration
Task Publishing & Operations
- One-click publishing of completed development tasks to production with configurable settings
- Approval workflow: tasks must pass administrator review before online deployment
- Real-time task status monitoring with start/stop control, log viewing, and anomaly alerts
- Task version management with traceable history and rollback capability
- Flexible compute resource (CPU, memory) and cluster configuration adjustment
Platform Management
- Workspace management: isolated workspaces for different users and projects
- Separate development and production cluster configurations for production stability
- Resource monitoring: real-time monitoring and alerting for cluster resources and compute tasks
- Task scheduling: Cron-based periodic scheduling for batch jobs, 24/7 continuous operation for streaming jobs