# ComfyUI-vidflows **Repository Path**: rick-ren001/ComfyUI-vidflows ## Basic Information - **Project Name**: ComfyUI-vidflows - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-05-13 - **Last Updated**: 2026-05-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ComfyUI Workflow: Sora2-Alike Full Loop Video Open-source video generation pipeline in ComfyUI that creates coherent multi-shot video narratives with synchronized dialogue — inspired by OpenAI's Sora 2. Generate complete video stories from minimal input: a character description, a reference photo, and the number of scenes. The workflow orchestrates multiple AI models through a 4-LLM pipeline to handle dialogue writing, cinematography, voice casting, and animation automatically. --- ## 🎬 Workflows Included This repository contains three powerful workflows: ### 1. **Google Veo 3 Pipeline** ⚡ (Long-Form Video Generation) **File**: `workflow-full-loop-Google.json` ![Workflow Preview](03.png) ![Demo](03.gif) **⚠️ Note**: This workflow uses **paid APIs** (Google Veo 3, Imagen 3, etc.). **Costs can be very high** for video generation. Not open-source models. Simplified pipeline for generating longer videos using Google's latest models in ComfyUI. Same core approach but leverages proprietary APIs for enhanced quality and length. ### 2. **Sora2-ComfyUI** (Open-Source Character Videos) **File**: `workflow-full-loop-Sora2-ComfyUI.json` ![Workflow Preview](01.png) ![Demo](01.gif) Character-driven multi-shot video generation with automated dialogue and perfect lip-sync. **100% open-source models**. ### 3. **Story Creator** (Narrative Sequences with Music) **File**: `workflow-full-loop-Storycreator.json` ![Workflow Preview](02.png) ![Demo](02.gif) Text-to-story generation with custom styles, music synchronization, and complete narrative control. **Open-source based**. --- ## ✨ Key Features - **4-LLM Pre-Production Pipeline**: Scriptwriter → Cinematographer → Animation Director → Voice Casting - **Scene Consistency**: Last-frame continuation ensures visual coherence between shots - **AI Cinematography**: Custom-trained `Next Scene` LoRA interprets "Next shot..." prompts to change camera angles intelligently - **Perfect Lip-Sync**: Dynamic audio-video synchronization using custom `Audio Duration` node - **Automated Face Swapping**: Maintains character identity across all generated scenes - **One-Click Generation**: Complete multi-shot videos from simple inputs --- ## 🛠️ Installation ### Recommended Configuration - **GPU**: Minimum RTX 6000 Pro (96GB VRAM) to run the full workflow - **Alternative**: Use [RunPod](https://www.runpod.io/) or similar cloud GPU services - Plenty of online guides available for RunPod ComfyUI setup ### Download Workflow & Custom Nodes 1. **Clone or download this repository**: ```bash git clone https://github.com/lovis93/ComfyUI-Workflow-Sora2Alike-Full-loop-video.git ``` 2. **Install ComfyUI-Lovis-Node** (required custom nodes): ```bash # Copy to your ComfyUI custom_nodes directory cp -r ComfyUI-Lovis-Node /path/to/ComfyUI/custom_nodes/ # Install dependencies cd /path/to/ComfyUI/custom_nodes/ComfyUI-Lovis-Node pip install -r requirements.txt ``` 3. **Install other required custom nodes** via ComfyUI Manager: - ComfyUI-WanVideoWrapper - ComfyUI-Custom-Scripts (pythongosssss) - ComfyUI LayerStyle - ComfyUI Easy Use - ComfyUI essentials (mb fork) - Audio Batch - ComfyUI Workflow Encrypt - ComfyUI JM MiniMax API - Infinite Talk nodes 4. **Load the workflow**: - Open ComfyUI - Load `workflow-full-loop-Sora2-ComfyUI.json` (main character-based workflow) ### Download Models **Base Models** (place in `ComfyUI/models/`): - `qwen_image_distill_full_fp8_e4m3fn.safetensors` — Qwen-Image (text-to-image) - `qwen_image_edit_2509_fp8_e4m3fn.safetensors` — Qwen-Image-Edit (scene transitions) - `wan2.2_i2v_high_noise_14B_fp16.safetensors` — Wan 2.2 I2V (video generation) - `wan2.2_i2v_low_noise_14B_fp16.safetensors` — Wan 2.2 I2V low noise - `Wan2_1-InfiniTetalk-Single_fp16.safetensors` — Infinite Talk (talking heads) - `wan_2.1_vae.safetensors` — Wan VAE - `qwen_image_vae.safetensors` — Qwen VAE - `qwen_2.5_vl_7b_fp8_scaled.safetensors` — Text encoder - `umt5_xxl_fp16.safetensors` — UMT5 - `clip_vision_h.safetensors` — CLIP Vision **LoRAs** (place in `ComfyUI/models/loras/`): - `next-scene_lora_v1-3000.safetensors` — **Next Scene LoRA** (AI Cinematographer) - Download: [lovis93/next-scene-qwen-image-lora-2509](https://huggingface.co/lovis93/next-scene-qwen-image-lora-2509) - `motionpushin-v5-wan-i2v-14b-720p-400.safetensors` — **Motion Push-In LoRA** (Camera motion) - Download: [lovis93/Motion-Lora-Camera-Push-In-Wan-14B-720p-I2V](https://huggingface.co/lovis93/Motion-Lora-Camera-Push-In-Wan-14B-720p-I2V) - `Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors` — Wan distill LoRA - `Qwen-Image-Lightning-4steps-V1.0.safetensors` — Qwen Lightning LoRA - `Qwen_influencer_style_v1.safetensors` — Style LoRA --- ## 🚀 Usage ### Workflow 1: Sora2-ComfyUI (Main Character Workflow) **Inputs**: 1. **Main Prompt**: Character description (e.g., "Audrey loves ComfyUI, she is a geek") 2. **Reference Photo**: Clear photo of your character 3. **Number of Lines**: How many dialogue lines/scenes to generate (e.g., 4) ### Configuration - **Image Size**: 768x512 (faster) or higher for quality - **FPS**: 24 or 30 ### Run Click **Queue Prompt** and wait. The workflow will automatically: 1. Generate dialogue and cinematography via 4 LLMs 2. Create initial scene with face swap 3. Loop through remaining scenes with consistent transitions 4. Synthesize audio with perfect duration matching 5. Animate each scene with lip-sync 6. Combine all segments into final video --- ## 🎬 How It Works ### Core Architecture **Models**: - `Qwen-Image` — Initial image generation for scene 1 - `Qwen-Image-Edit` — Scene transitions via image editing (scenes 2+) - `Wan 2.2 I2V` + `Infinite Talk` — Video animation with talking heads - Custom LoRAs: `Next Scene` (cinematography) + `Motion Pushin` (camera motion) **Custom Nodes** (included in ComfyUI-Lovis-Node): - Line Count Node — Counts dialogue lines - Text to Single Line Node — Text formatting - Audio Duration Node — Calculates exact audio duration for perfect sync ### 4-LLM Pre-Processing Pipeline Before video generation, 4 specialized LLMs process your input: 1. **LLM 1 - Scriptwriter**: Generates N dialogue lines from character prompt 2. **LLM 2 - Cinematographer**: Creates N image prompts (scenes 2+ prefixed with "Next shot...") 3. **LLM 3 - Animation Director**: Creates N video animation prompts 4. **LLM 4 - Voice Casting**: Generates detailed voice profile (via MiniMax) ### Execution Pipeline **Phase 1 — Scene 1 Initialization**: 1. Generate initial frame with `Qwen-Image` 2. Automated face swap (reference photo → generated image) 3. Synthesize audio for dialogue line 1 4. Measure exact audio duration (`Audio Duration` node) 5. Animate with `Wan 2.2 I2V` + `Infinite Talk` (duration = audio length) **Phase 2 — Iterative Loop (Scenes 2-N)**: 1. Take **last frame** of previous video clip 2. Edit frame with `Qwen-Image-Edit` + `Next Scene` LoRA + "Next shot..." prompt 3. Result: new camera angle while preserving character/environment 4. Automated face swap on new frame 5. Generate audio → measure duration → animate 6. Repeat until all lines complete **Phase 3 — Final Assembly**: - Concatenate all video segments + audio clips - Output single continuous video ### Key Innovation - **Last-frame continuation**: Each scene starts from the previous scene's final frame - **"Next shot..." trigger**: Activates `Next Scene` LoRA for intelligent camera angle changes - **Dynamic synchronization**: Video length automatically matches audio duration (no manual timing) --- ## 📋 Included Custom Nodes ### ComfyUI-Lovis-Node **Line Count Node** (`text/utility`): - Counts lines in text input - Determines number of scenes to generate **Text to Single Line Node** (`text/utility`): - Converts multi-line text to single line with customizable separators **Audio Duration Node** (`audio/utility`): - Extracts duration from audio data in seconds - Calculates frame count based on FPS - Critical for perfect lip-sync synchronization See `ComfyUI-Lovis-Node/README.md` for detailed documentation. --- ### Workflow 2: Story Creator Alternative workflow for creating narrative sequences with more creative control. **Features**: - **Text-to-Story**: Automatically generates visual sequences from any story text - **Custom Styles**: Apply different visual styles and looks to your narrative - **Music Synchronization**: Generates music automatically timed to the story - **Same Core Tools**: Built on the same model foundation as the main workflow **Use Case**: Ideal for creating complete stories, film sequences, or narrative content with custom aesthetics and synchronized soundtracks. --- ## 🙏 Credits **Models & Technologies**: - Qwen-Image / Qwen-Image-Edit — Alibaba Cloud - Wan 2.2 I2V — Wan AI - Infinite Talk — Infinite Talk Project - MiniMax Voice Synthesis — MiniMax AI **Custom LoRAs**: - Next Scene LoRA — Trained by [@lovis93](https://huggingface.co/lovis93) on 100+ cinematic transitions - Motion Push-In LoRA — Trained by [@lovis93](https://huggingface.co/lovis93) on 100+ drone camera clips **Inspiration**: - OpenAI Sora 2 — For demonstrating AI video generation potential --- ## 📜 License MIT License — Free to use, modify, and share. Custom LoRAs: Apache 2.0 License (see model cards on HuggingFace). --- ## 📞 Contact - **GitHub**: [lovis93](https://github.com/lovis93) - **X**: [lovis93](https://x.com/OdinLovis) - **HuggingFace**: [@lovis93](https://huggingface.co/lovis93) - **Issues**: [Report bugs or request features](https://github.com/lovis93/ComfyUI-Workflow-Sora2Alike-Full-loop-video/issues) --- **Built with ❤️ for the open-source AI community** *Making cinematic AI video generation accessible to everyone.*