在 LakeInsight 中处理视频数据
1. 读取视频文件, 使用 daft 进行信息抽取
演示 python 环境,请选择 conda 中的 video
请保证 LakeSoul 元数据库里,没有相同的表
import logging
logging.disable(logging.CRITICAL)
使用 daft.read_video_frames 抽取视频中的关键帧
import daft
from daft import col, DataType
from daft.functions import encode_image
video_path = "/home/coder/work-dir/video_demo/data/v_Basketball_g01_c01.avi"
# 设置 Daft 执行器
daft.set_runner_ray(noop_if_initialized=True)
df = daft.read_video_frames(
path=video_path,
image_height=480,
image_width=640,
is_key_frame=True,
sample_interval_seconds=1.0,
)
df = df.with_column( "video_path", daft.lit(video_path)).with_column("data", encode_image(col("data"), "JPEG"))
df.show()
Cell Details
Click on a cell to view its full content
2. 写入 LakeSoul 多模态湖仓
使用 create_table 创建 LakeSoul 表,并写入
from lakesoul.metadata import create_table
from lakesoul.ray import LakeSoulDatasink
schema = df.schema().to_pyarrow_schema()
create_table(
"video_frames_table",
table_schema=schema,
table_path="/home/coder/work-dir/video_demo/video_frames_table",
)
ds = df.to_ray_dataset()
sink = LakeSoulDatasink("video_frames_table")
ds.write_datasink(sink)
(pid=633763) PhysicalScan->Project: 0%| | 0.00/1.00 [00:00<?, ?it/s]
3. 从湖仓中读取写入数据并展示
使用 ray.data.read_lakesoul() 读取 LakeSoul 表
之后将 ray.data.Dataset 转换成 daft.dataframe
将关键帧 data 解码为 image
import ray
import lakesoul.ray
df = ray.data.read_lakesoul("video_frames_table").to_daft()
from daft import col
from daft.functions import decode_image
df = df.with_column(
"data",
decode_image(col("data"))
)
# data 放到第一列
cols = df.column_names
df = df.select(
"data",
*[c for c in cols if c != "data"]
)
df.show()
Cell Details
Click on a cell to view its full content
(pid=633763) InMemoryScan->DistributedLimit->Project: 0%| | 0.00/1.00 [00:00<?, ?it/s]