# 可视化实践 **Repository Path**: feng-weixia/visualization-practice ## Basic Information - **Project Name**: 可视化实践 - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-09 - **Last Updated**: 2025-11-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # MM-StoryAgent This repo is the official implementation of "MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio". ## Introduction MM-StoryAgent is a multi-agent framework that employs LLMs and diverse expert tools across several modalities to produce expressive storytelling videos. It hightlights in the following aspects: * MM-StoryAgent designs a reliable and **customizable** workflow. Users can define their own expert tools to improve the generation quality of each component. * MM-StoryAgent writes **high-quality** stories based on the input story setting, in a multi-agent, multi-stage pipeline. * Agents of all modalities (image, speech, sound, music) generated corresponding assets are composed to an **immersive** storytelling video.
Framework
Besides, we provide a story topic list and story evaluation criteria for further story writing evaluation. ## News * Aug 16, 2024: The initial version of MM-StoryAgent was released. ## Demo Video The demo video is available:
MM-StoryAgent demo
## Installation Install the required dependencies and install this repo as a package: ```bash pip install -r requirements.txt pip install -e . ``` ## Quickstart MM-StoryAgent can be called by configuration files: ```bash python run.py -c configs/mm_story_agent.yaml ``` Each agent is called in the following format: ```yaml story_writer: # agent name tool: qa_outline_story_writer # name registered in the definition cfg: # parameters for initializing the agent instance max_conv_turns: 3 ... params: # parameters for calling the agent instance story_topic: "Time Management: A child learning how to manage their time effectively." ... ``` The customization of new agents can refer to [music_agent.py](mm_story_agent/modality_agents/music_agent.py#L42). The agent class should implement `__init__` and `call` to work properly, like the following: ```python from typing import Dict from mm_story_agent.base import register_tool @register_tool("my_speech_agent") class MySpeechAgent: def __init__(self, cfg: Dict): # For example, the agent need `attr1` and `attr2` for initilization self.attr1 = cfg.attr1 self.attr2 = cfg.attr2 ... def call(self, params: Dict): # For example, calling the agent needs `voice` and `speed` parameters voice = params["voice"] speed = params["speed"] ... ``` Then the agent can be called by simply modifying the configuration like: ```yaml speech_generation: tool: my_speech_agent cfg: attr1: val1 attr2: val2 params: voice: en_female speed: 1.0 ``` ## Evaluation Data The evaluation topics are provided in [story_topics.json](story_eval/story_topics.json). Evaluation rubrics and prompts are also provided accordingly. ### Story Content Evaluation We use GPT-4 to automatically evaluate the story quality according to several aspects. Our story writing agent is compared with directly prompting LLM to write stories. Evaluation scores show the advantage of our multi-agent, multi-stage story writing pipeline. | Rubric Grading | | Attractiveness | Warmth | Education | Average | |---------------------------|--------------|----------------|--------|-----------|---------| | **Topic 1: Self-growing** | Direct | 3.68 | 4.42 | 4.84 | 4.31 | | | Story Agent | 4.1 | 4.5 | 4.80 | **4.47**| | **Topic 2: Family & Friendship** | Direct | 3.94 | 5.0 | 4.72 | 4.55 | | | Story Agent | 4.36 | 4.8 | 4.92 | **4.69**| | **Topic 3: Environments** | Direct | 4.0 | 4.62 | 4.92 | 4.51 | | | Story Agent | 4.44 | 4.68 | 4.86 | **4.66**| | **Topic 4: Knowledge Learning** | Direct | 4.46 | 4.14 | 4.86 | 4.49 | | | Story Agent | 4.84 | 4.52 | 4.90 | **4.75**| | **All** | Direct | 4.02 | 4.55 | 4.84 | 4.47 | | | Story Agent | 4.44 | 4.63 | 4.87 | **4.65**| ## Citation