Skip to main content

Agentic AI Native Capabilities

LakeInsight builds an Agentic AI core layer on top of the lakehouse foundation, featuring three major engines — Semantic Agent, AI Coding Agent, and Intelligent Query Agent — to deliver full-chain AI-native capabilities from automatic code semantic extraction to intelligent development and natural language querying, forming a self-reinforcing intelligent closed loop.

LakeInsight Semantic Agent

The Semantic Agent is LakeInsight's AI-native semantic infrastructure, automatically extracting structured semantic information from business code without manual configuration:

(1) Automated Semantic Extraction Pipeline

  • Code Parsing: Scans SQL and data processing code used in warehouse modeling, ETL, and metric computation, identifying table structures, field definitions, and computation logic
  • Lineage Extraction: Automatically resolves field-level dependency relationships and transformation chains, building full-chain data lineage graphs
  • Metric & Terminology Recognition: Extracts business metric calculation logic and naming conventions from code, forming a unified business glossary
  • Knowledge Graph Construction: Organizes extracted semantic information into a structured lakehouse knowledge graph encompassing tables, fields, metrics, terms, and their relationships

(2) Self-Reinforcing Closed Loop

  • New code generated by upper-layer Agents automatically flows back into the business code layer
  • Semantic Agent performs continuous incremental parsing, keeping the semantic layer automatically updated as development progresses
  • The entire semantic layer stays current without manual maintenance

AI Agent Core Engines

LakeInsight provides three major AI Agents covering data query, development coding, and diagnosis scenarios:

(1) Intelligent Query Agent

  • Natural language interaction that automatically converts user questions into precise SQL queries
  • Retrieves field meanings and metric definitions via the Semantic Retrieval API, ensuring query results align with business definitions
  • Automatically infers JOIN paths for cross-table queries and resolves time ranges and aggregation granularity for fuzzy queries
  • Designed for business analysts, enabling self-service analytics without understanding underlying table structures

(2) AI Coding Agent

  • Built-in out-of-the-box development skills for LakeSoul, Flink, and Spark
  • Generates data ingestion, modeling, and publishing code from natural language descriptions of development requirements
  • Generated code automatically aligns with existing metric definitions and business terminology, avoiding redundant definitions
  • Supports one-click publishing of validated code as production tasks

(3) Diagnostic & Repair Agent

  • Real-time monitoring of task execution status with automatic error log parsing
  • Intelligent root cause analysis combining lineage graphs and semantic information
  • Provides fix recommendations or automatic code repair, reducing operational costs

MCP Server Standardized Interfaces

  • Standards-based API interfaces covering the full pipeline via the Model Context Protocol (MCP)
  • Encapsulates lakehouse operations, code development, and task management capabilities as standardized tools callable by Agents
  • Supports seamless integration with external AI assistants and ecosystem tools

Unified Semantic Retrieval API

As the experience hub of the one-stop platform, the Unified Semantic Retrieval API provides upper-layer Agents and users with:

  • Field Meaning Lookup: Returns business meaning, data type, and associated metrics for given table and field names
  • Metric Definition Lookup: Retrieves metric computation logic, data sources, and downstream dependencies
  • Lineage Query: Traces complete upstream and downstream dependency chains for fields
  • Knowledge Graph Retrieval: Graph-based search of semantic relationships between tables, fields, metrics, and terms

All user roles — data engineers, business analysts, and AI model developers — access precise semantics through the unified retrieval interface, without needing to understand the underlying knowledge graph construction details.