feat: 生成视频技能

This commit is contained in:
hp0912 2026-04-06 17:38:51 +08:00
parent a4ebe79c2e
commit 5395f54c9b
5 changed files with 624 additions and 0 deletions

View File

@ -30,6 +30,13 @@ MYSQL_PASSWORD=houhou
<wechat-robot-image-url>图片URL4</wechat-robot-image-url>
```
**需要发送视频的时候可以在控制台输出如下内容**
```
<wechat-robot-video-url>视频URL1</wechat-robot-video-url>
<wechat-robot-video-url>视频URL2</wechat-robot-video-url>
```
**发送图片的时候也可以调用 Agent 接口**
```
@ -43,3 +50,17 @@ MYSQL_PASSWORD=houhou
}
```
**发送视频的时候也可以调用 Agent 接口**
```
[POST] http://127.0.0.1:{ROBOT_WECHAT_CLIENT_PORT}/api/v1/robot/message/send/video/url
请求体 Body:
{
"to_wxid": "{{ROBOT_FROM_WX_ID}}",
"video_urls": ["{{videourl}}"]
}
```

View File

@ -0,0 +1,116 @@
---
name: video-generation
description: "AI 视频生成工具。当用户想生成视频、文生视频、图生视频、让图片动起来、指定首帧尾帧生成视频时使用。支持纯文本生成视频,或使用 1 张图片作为首帧、2 张图片作为首帧和尾帧。"
argument-hint: "需要 prompt可选 model、file_paths、ratio、resolution、duration。file_paths 最多 2 个。"
---
# Video Generation Skill
## 描述
这是一个 AI 视频生成技能,覆盖两类常见场景:
- 文生视频:用户只提供文本描述。
- 图生视频:用户提供 1 张首帧图,或 2 张首尾帧图,再结合提示词生成视频。
当前实现对接即梦视频接口,从数据库中的绘图配置读取 `base_url`、`sessionid` 等信息。脚本生成成功后会直接调用机器人客户端接口发送视频,不再输出固定的 XML 视频标签。
## 触发条件
- 用户想生成视频、做一段短视频、让画面动起来。
- 用户说「生成一个视频」「做个视频」「把这张图做成视频」「首帧是这张图」「尾帧用这张图」。
- 用户提到「文生视频」「图生视频」「首帧尾帧视频」「AI 视频生成」。
## 入参规范
```json
{
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": "根据用户输入的文本内容,提取出生成视频的提示词,但是不要对提示词进行修改。"
},
"model": {
"type": "string",
"description": "视频模型选择,可选,默认 none。",
"enum": [
"none",
"jimeng-video-seedance-2.0",
"jimeng-video-3.5-pro",
"jimeng-video-veo3",
"jimeng-video-veo3.1",
"jimeng-video-sora2",
"jimeng-video-3.0-pro",
"jimeng-video-3.0",
"jimeng-video-3.0-fast"
],
"default": "none"
},
"file_paths": {
"type": "array",
"items": {
"type": "string"
},
"description": "用于视频首尾帧的图片地址列表可选。0 个表示文生视频1 个表示首帧图生视频2 个表示首尾帧图生视频。最多 2 个。"
},
"ratio": {
"type": "string",
"description": "视频比例,可选,默认 4:3。",
"default": "4:3"
},
"resolution": {
"type": "string",
"description": "视频分辨率,可选,默认 720p。",
"default": "720p"
},
"duration": {
"type": "integer",
"description": "视频时长,单位秒,可选,默认 5。",
"default": 5
}
},
"required": ["prompt"],
"additionalProperties": false
}
```
对应的命令行参数为:
- `--prompt <提示词>` 必填
- `--model <模型名>` 可选
- `--file_paths <图片地址>` 可选,可重复传入 0 到 2 次
- `--ratio <比例>` 可选
- `--resolution <分辨率>` 可选
- `--duration <秒数>` 可选
## 依赖安装
- 脚本首次运行时会自动创建虚拟环境并安装依赖,无需手动执行。
- 如需手动重新安装,可执行:`python3 video-generation/scripts/bootstrap.py`
## 执行步骤
1. 当用户想生成视频时触发该技能。
2. 从用户输入中提取 `prompt`,不要改写提示词本身。
3. 根据上下文可选提取 `model`、`file_paths`、`ratio`、`resolution`、`duration`。
4. 如果用户没有明确指定模型,默认使用 `jimeng-video-3.0-fast`
5. 在仓库根目录执行脚本,例如:
```bash
python3 video-generation/scripts/video_generation.py --prompt '海边日落,镜头缓慢推进' --file_paths 'https://example.com/start.jpg'
```
6. 脚本生成视频后会自动调用客户端接口 `POST http://127.0.0.1:{ROBOT_WECHAT_CLIENT_PORT}/api/v1/robot/message/send/video/url` 将视频发送给用户,成功时输出「视频发送成功」。
## 校验规则
- `prompt` 不能为空。
- `file_paths` 最多只能有 2 个。
- 目前只支持即梦视频模型。
- 若数据库里关闭了 AI 绘图能力或即梦配置不可用,脚本会直接返回明确错误。
## 回复要求
- 成功时,脚本输出「视频发送成功」,表示视频已通过客户端接口直接发送,无需 AI 智能体再做额外处理。
- 失败时,返回脚本输出的具体错误信息。

View File

@ -0,0 +1,128 @@
#!/usr/bin/env python3
from __future__ import annotations
import hashlib
import subprocess
import sys
import traceback
from pathlib import Path
sys.stderr = sys.stdout
def _skill_root_from(script_dir: Path) -> Path:
return script_dir.parent
def _venv_dir(script_dir: Path) -> Path:
return _skill_root_from(script_dir) / ".venv"
def _venv_python(venv_dir: Path) -> Path:
if sys.platform == "win32":
return venv_dir / "Scripts" / "python.exe"
return venv_dir / "bin" / "python"
def _stamp_file(venv_dir: Path) -> Path:
return venv_dir / ".req_hash"
def _file_hash(path: Path) -> str:
return hashlib.sha256(path.read_bytes()).hexdigest()
def _deps_up_to_date(requirements_file: Path, venv_dir: Path) -> bool:
stamp = _stamp_file(venv_dir)
if not stamp.is_file():
return False
return stamp.read_text().strip() == _file_hash(requirements_file)
def _write_stamp(requirements_file: Path, venv_dir: Path) -> None:
_stamp_file(venv_dir).write_text(_file_hash(requirements_file))
def _ensure_venv(venv_dir: Path, venv_python: Path) -> int:
if venv_python.is_file():
return 0
sys.stdout.write(f"未检测到技能虚拟环境,正在创建: {venv_dir}\n")
command = [
sys.executable,
"-m",
"venv",
str(venv_dir),
]
try:
subprocess.run(command, check=True, stdout=sys.stdout, stderr=sys.stdout)
except subprocess.CalledProcessError as exc:
sys.stdout.write(f"创建虚拟环境失败,退出码: {exc.returncode}\n")
return exc.returncode or 1
return 0
def main() -> int:
script_dir = Path(__file__).resolve().parent
requirements_file = script_dir / "requirements.txt"
venv_dir = _venv_dir(script_dir)
venv_python = _venv_python(venv_dir)
if not requirements_file.is_file():
sys.stdout.write(f"未找到依赖文件: {requirements_file}\n")
return 1
ensure_result = _ensure_venv(venv_dir, venv_python)
if ensure_result != 0:
return ensure_result
if _deps_up_to_date(requirements_file, venv_dir):
sys.stdout.write("依赖已是最新,跳过安装\n")
return 0
command = [
str(venv_python),
"-m",
"pip",
"install",
"--upgrade",
"pip",
]
try:
subprocess.run(command, check=True, stdout=sys.stdout, stderr=sys.stdout)
except subprocess.CalledProcessError as exc:
sys.stdout.write(f"升级 pip 失败,退出码: {exc.returncode}\n")
return exc.returncode or 1
command = [
str(venv_python),
"-m",
"pip",
"install",
"-r",
str(requirements_file),
]
try:
subprocess.run(command, check=True, stdout=sys.stdout, stderr=sys.stdout)
except subprocess.CalledProcessError as exc:
sys.stdout.write(f"安装依赖失败,退出码: {exc.returncode}\n")
return exc.returncode or 1
_write_stamp(requirements_file, venv_dir)
sys.stdout.write(f"依赖安装完成,当前技能虚拟环境: {venv_dir}\n")
return 0
if __name__ == "__main__":
try:
raise SystemExit(main())
except SystemExit:
raise
except Exception:
traceback.print_exc(file=sys.stdout)
raise SystemExit(1)

View File

@ -0,0 +1 @@
pymysql

View File

@ -0,0 +1,358 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import subprocess
import sys
import traceback
import urllib.request
from pathlib import Path
sys.stderr = sys.stdout
SUPPORTED_MODELS = {
"jimeng-video-seedance-2.0",
"jimeng-video-3.5-pro",
"jimeng-video-veo3",
"jimeng-video-veo3.1",
"jimeng-video-sora2",
"jimeng-video-3.0-pro",
"jimeng-video-3.0",
"jimeng-video-3.0-fast",
}
DEFAULT_MODEL = "jimeng-video-3.0-fast"
DEFAULT_RATIO = "4:3"
DEFAULT_RESOLUTION = "720p"
DEFAULT_DURATION = 5
def _skill_root() -> Path:
script_dir = Path(__file__).resolve().parent
return script_dir.parent
def _skill_venv_python() -> Path:
venv_dir = _skill_root() / ".venv"
if sys.platform == "win32":
return venv_dir / "Scripts" / "python.exe"
return venv_dir / "bin" / "python"
def _run_bootstrap() -> None:
bootstrap = Path(__file__).resolve().parent / "bootstrap.py"
result = subprocess.run([sys.executable, str(bootstrap)])
if result.returncode != 0:
raise SystemExit(result.returncode)
def _ensure_skill_venv_python() -> None:
venv_python = _skill_venv_python()
if not venv_python.is_file():
_run_bootstrap()
venv_python = _skill_venv_python()
if not venv_python.is_file():
sys.stdout.write("bootstrap 后仍未找到虚拟环境\n")
raise SystemExit(1)
venv_dir = _skill_root() / ".venv"
if Path(sys.prefix) == venv_dir.resolve():
return
os.execv(str(venv_python), [str(venv_python), str(Path(__file__).resolve()), *sys.argv[1:]])
_ensure_skill_venv_python()
try:
import pymysql # type: ignore # noqa: E402
except ModuleNotFoundError:
_run_bootstrap()
os.execv(sys.executable, [sys.executable, str(Path(__file__).resolve()), *sys.argv[1:]])
def _mysql_connect():
host = os.environ.get("MYSQL_HOST", "127.0.0.1")
port = int(os.environ.get("MYSQL_PORT", "3306"))
user = os.environ.get("MYSQL_USER", "root")
password = os.environ.get("MYSQL_PASSWORD", "")
database = os.environ.get("ROBOT_CODE", "")
if not database:
raise RuntimeError("环境变量 ROBOT_CODE 未配置")
return pymysql.connect(
host=host,
port=port,
user=user,
password=password,
database=database,
charset="utf8mb4",
connect_timeout=10,
read_timeout=30,
)
def _query_one(conn, sql: str, params: tuple = ()) -> dict | None:
cur = conn.cursor()
cur.execute(sql, params)
columns = [desc[0] for desc in cur.description] if cur.description else []
row = cur.fetchone()
cur.close()
if row is None:
return None
return dict(zip(columns, row))
def load_drawing_settings(conn, from_wx_id: str) -> tuple[bool, dict]:
gs = _query_one(conn, "SELECT image_ai_enabled, image_ai_settings FROM global_settings LIMIT 1")
enabled = False
settings_json: dict = {}
if gs:
if gs.get("image_ai_enabled") is not None:
enabled = bool(gs["image_ai_enabled"])
raw = gs.get("image_ai_settings")
if raw:
if isinstance(raw, (bytes, bytearray)):
raw = raw.decode("utf-8")
if isinstance(raw, str) and raw.strip():
settings_json = json.loads(raw)
if from_wx_id.endswith("@chatroom"):
override = _query_one(
conn,
"SELECT image_ai_enabled, image_ai_settings FROM chat_room_settings WHERE chat_room_id = %s LIMIT 1",
(from_wx_id,),
)
else:
override = _query_one(
conn,
"SELECT image_ai_enabled, image_ai_settings FROM friend_settings WHERE wechat_id = %s LIMIT 1",
(from_wx_id,),
)
if override:
if override.get("image_ai_enabled") is not None:
enabled = bool(override["image_ai_enabled"])
raw = override.get("image_ai_settings")
if raw:
if isinstance(raw, (bytes, bytearray)):
raw = raw.decode("utf-8")
if isinstance(raw, str) and raw.strip():
settings_json = json.loads(raw)
return enabled, settings_json
def _resolve_jimeng_config(settings_json: dict) -> dict:
jimeng_config = settings_json.get("JiMeng")
if isinstance(jimeng_config, dict) and jimeng_config:
return jimeng_config
if isinstance(settings_json, dict):
return settings_json
return {}
def _normalize_session_ids(raw: object) -> list[str]:
if isinstance(raw, str):
return [raw] if raw.strip() else []
if isinstance(raw, list):
return [item.strip() for item in raw if isinstance(item, str) and item.strip()]
return []
def _http_post_json(url: str, body: dict, headers: dict, timeout: int = 300) -> dict:
data = json.dumps(body).encode("utf-8")
req = urllib.request.Request(url, data=data, headers=headers, method="POST")
with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.loads(resp.read().decode("utf-8"))
def send_videos(from_wx_id: str, video_urls: list[str]) -> None:
client_port = os.environ.get("ROBOT_WECHAT_CLIENT_PORT", "").strip()
if not client_port:
raise RuntimeError("环境变量 ROBOT_WECHAT_CLIENT_PORT 未配置")
send_url = f"http://127.0.0.1:{client_port}/api/v1/robot/message/send/video/url"
send_body = {
"to_wxid": from_wx_id,
"video_urls": [url for url in video_urls if url],
}
_http_post_json(send_url, send_body, {"Content-Type": "application/json"}, timeout=60)
def call_jimeng_video(
config: dict,
prompt: str,
model: str,
file_paths: list[str],
ratio: str,
resolution: str,
duration: int,
) -> list[str]:
base_url = str(config.get("base_url", "")).rstrip("/")
session_ids = _normalize_session_ids(config.get("sessionid", []))
if not base_url or not session_ids:
raise RuntimeError("即梦视频配置缺少 base_url 或 sessionid")
body = {
"model": model or DEFAULT_MODEL,
"prompt": prompt,
"ratio": ratio or DEFAULT_RATIO,
"resolution": resolution or DEFAULT_RESOLUTION,
"duration": duration or DEFAULT_DURATION,
"response_format": "url",
}
if file_paths:
body["file_paths"] = file_paths
resp = _http_post_json(
f"{base_url}/v1/videos/generations",
body,
{
"Content-Type": "application/json",
"Authorization": f"Bearer {','.join(session_ids)}",
},
timeout=300,
)
urls: list[str] = []
for item in resp.get("data", []):
if isinstance(item, dict):
url = item.get("url")
if isinstance(url, str) and url.strip():
urls.append(url)
return urls
def _parse_cli_params(argv: list[str]) -> dict:
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument("--prompt", default="")
parser.add_argument("--model", default="")
parser.add_argument("--file_paths", action="append", default=[])
parser.add_argument("--ratio", default="")
parser.add_argument("--resolution", default="")
parser.add_argument("--duration", type=int, default=0)
namespace, unknown = parser.parse_known_args(argv)
if unknown:
raise ValueError(f"存在不支持的参数: {' '.join(unknown)}")
return {
"prompt": namespace.prompt,
"model": namespace.model,
"file_paths": [path for path in namespace.file_paths if path.strip()],
"ratio": namespace.ratio,
"resolution": namespace.resolution,
"duration": namespace.duration,
}
def main() -> int:
if len(sys.argv) < 2:
sys.stdout.write("缺少输入参数\n")
return 1
try:
params = _parse_cli_params(sys.argv[1:])
except ValueError as exc:
sys.stdout.write(f"参数格式错误: {exc}\n")
return 1
prompt = params.get("prompt", "").strip()
if not prompt:
sys.stdout.write("缺少视频提示词\n")
return 1
model = params.get("model", "").strip()
if not model or model == "none":
model = DEFAULT_MODEL
if model not in SUPPORTED_MODELS:
sys.stdout.write("不支持的 AI 视频模型\n")
return 1
file_paths = params.get("file_paths", [])
if len(file_paths) > 2:
sys.stdout.write("file_paths 最多只能传 2 个\n")
return 1
ratio = params.get("ratio", "").strip() or DEFAULT_RATIO
resolution = params.get("resolution", "").strip() or DEFAULT_RESOLUTION
duration = params.get("duration", 0) or DEFAULT_DURATION
if duration <= 0:
sys.stdout.write("duration 必须大于 0\n")
return 1
from_wx_id = os.environ.get("ROBOT_FROM_WX_ID", "").strip()
if not from_wx_id:
sys.stdout.write("环境变量 ROBOT_FROM_WX_ID 未配置\n")
return 1
try:
conn = _mysql_connect()
except Exception as exc:
sys.stdout.write(f"数据库连接失败: {exc}\n")
return 1
try:
enabled, settings_json = load_drawing_settings(conn, from_wx_id)
except Exception as exc:
sys.stdout.write(f"加载绘图配置失败: {exc}\n")
return 1
finally:
try:
conn.close()
except Exception:
pass
if not enabled:
sys.stdout.write("AI 生成视频未开启\n")
return 0
jimeng_config = _resolve_jimeng_config(settings_json)
if not isinstance(jimeng_config, dict) or not jimeng_config:
sys.stdout.write("未找到即梦视频配置\n")
return 1
if jimeng_config.get("enabled") is False:
sys.stdout.write("即梦视频未开启\n")
return 0
try:
video_urls = call_jimeng_video(
jimeng_config,
prompt,
model,
file_paths,
ratio,
resolution,
duration,
)
except Exception as exc:
sys.stdout.write(f"调用即梦生成视频接口失败: {exc}\n")
return 1
if not video_urls:
sys.stdout.write("未生成任何视频\n")
return 1
try:
send_videos(from_wx_id, video_urls)
sys.stdout.write("视频发送成功\n")
except Exception as exc:
sys.stdout.write(f"发送视频失败: {exc}\n")
return 1
return 0
if __name__ == "__main__":
try:
raise SystemExit(main())
except SystemExit:
raise
except Exception:
traceback.print_exc(file=sys.stdout)
raise SystemExit(1)