Skip to main content

One-Stop Data Intelligence Development Environment

LakeInsight provides a WEB-UI-based one-stop data intelligence development environment that unifies streaming computing, batch computing, AI development, and multimodal data processing on a single platform. Data engineers, AI developers, and business analysts can complete the entire development lifecycle — from data ingestion to model deployment and metric publishing — within a single workspace.

Unified Development Experience

(1) Multi-Language Development Support

  • SQL-based data query and modeling development, compatible with Flink SQL and Spark SQL
  • Python, Java, and Scala support for data processing, AI training, and custom business logic
  • Jupyter Notebook for interactive exploratory analysis and AI model experimentation
  • One-stop service covering the full development lifecycle: development, testing, and deployment

(2) Unified Streaming, Batch, AI & Multimodal Platform

  • Manage Flink real-time streaming jobs, Spark batch jobs, and Python AI training tasks within a single workspace
  • Streaming jobs (Flink) for real-time CDC synchronization, real-time metric computation, and streaming feature engineering
  • Batch jobs (Spark) for large-scale offline modeling, historical data backfill, and periodic reports
  • Python tasks support PyTorch, Pandas, and other AI frameworks, reading lakehouse data directly for model training
  • Multimodal data (video, audio, images, text) queried and processed alongside structured tables in the same IDE

(3) Online IDE Development Environment

  • CodeServer-based online editor with syntax highlighting, auto-completion, and syntax checking
  • Built-in Conda virtual environment management for project-level dependency isolation
  • Interactive SQL editor with real-time query result preview
  • Collaborative development supporting multiple users working on data modeling tasks simultaneously

Security & Multi-Tenancy

  • Enterprise single sign-on (SSO) integration
  • Development and production environment isolation to prevent development errors from affecting production data
  • Data domain partitioning with fine-grained read, write, and execute permission isolation
  • Role-based access control (RBAC) ensuring business and data security across workspaces
  • Custom roles with flexible module-level permission configuration

Task Publishing & Operations

  • One-click publishing of completed development tasks to production with configurable settings
  • Approval workflow: tasks must pass administrator review before online deployment
  • Real-time task status monitoring with start/stop control, log viewing, and anomaly alerts
  • Task version management with traceable history and rollback capability
  • Flexible compute resource (CPU, memory) and cluster configuration adjustment

Platform Management

  • Workspace management: isolated workspaces for different users and projects
  • Separate development and production cluster configurations for production stability
  • Resource monitoring: real-time monitoring and alerting for cluster resources and compute tasks
  • Task scheduling: Cron-based periodic scheduling for batch jobs, 24/7 continuous operation for streaming jobs