1 Star 0 Fork 0

Gitee 极速下载/fastrtc

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
此仓库是为了提升国内下载速度的镜像仓库,每日同步一次。 原始仓库: https://github.com/freddyaboulton/fastrtc
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

Gradio WebRTC ⚡️

Static Badge Static Badge

Stream video and audio in real time with Gradio using WebRTC.

Installation

pip install gradio_webrtc

to use built-in pause detection (see conversational ai), install the vad extra:

pip install gradio_webrtc[vad]

Examples:

  1. Object Detection from Webcam with YOLOv10 📷
  2. Streaming Object Detection from Video with RT-DETR 🎥
  3. Text-to-Speech 🗣️
  4. Conversational AI 🤖🗣️

Usage

The WebRTC component supports the following three use cases:

  1. Streaming video from the user webcam to the server and back
  2. Streaming Video from the server to the client
  3. Streaming Audio from the server to the client
  4. Streaming Audio from the client to the server and back (conversational AI)

Streaming Video from the User Webcam to the Server and Back

import gradio as gr
from gradio_webrtc import WebRTC


def detection(image, conf_threshold=0.3):
    ... your detection code here ...


with gr.Blocks() as demo:
    image = WebRTC(label="Stream", mode="send-receive", modality="video")
    conf_threshold = gr.Slider(
        label="Confidence Threshold",
        minimum=0.0,
        maximum=1.0,
        step=0.05,
        value=0.30,
    )
    image.stream(
        fn=detection,
        inputs=[image, conf_threshold],
        outputs=[image], time_limit=10
    )

if __name__ == "__main__":
    demo.launch()

  • Set the mode parameter to send-receive and modality to "video".
  • The stream event's fn parameter is a function that receives the next frame from the webcam as a numpy array and returns the processed frame also as a numpy array.
  • Numpy arrays are in (height, width, 3) format where the color channels are in RGB format.
  • The inputs parameter should be a list where the first element is the WebRTC component. The only output allowed is the WebRTC component.
  • The time_limit parameter is the maximum time in seconds the video stream will run. If the time limit is reached, the video stream will stop.

Streaming Video from the server to the client

import gradio as gr
from gradio_webrtc import WebRTC
import cv2

def generation():
    url = "https://download.tsi.telecom-paristech.fr/gpac/dataset/dash/uhd/mux_sources/hevcds_720p30_2M.mp4"
    cap = cv2.VideoCapture(url)
    iterating = True
    while iterating:
        iterating, frame = cap.read()
        yield frame

with gr.Blocks() as demo:
    output_video = WebRTC(label="Video Stream", mode="receive", modality="video")
    button = gr.Button("Start", variant="primary")
    output_video.stream(
        fn=generation, inputs=None, outputs=[output_video],
        trigger=button.click
    )

if __name__ == "__main__":
    demo.launch()
  • Set the "mode" parameter to "receive" and "modality" to "video".
  • The stream event's fn parameter is a generator function that yields the next frame from the video as a numpy array.
  • The only output allowed is the WebRTC component.
  • The trigger parameter the gradio event that will trigger the webrtc connection. In this case, the button click event.

Streaming Audio from the Server to the Client

import gradio as gr
from pydub import AudioSegment

def generation(num_steps):
    for _ in range(num_steps):
        segment = AudioSegment.from_file("/Users/freddy/sources/gradio/demo/audio_debugger/cantina.wav")
        yield (segment.frame_rate, np.array(segment.get_array_of_samples()).reshape(1, -1))

with gr.Blocks() as demo:
    audio = WebRTC(label="Stream", mode="receive", modality="audio")
    num_steps = gr.Slider(
        label="Number of Steps",
        minimum=1,
        maximum=10,
        step=1,
        value=5,
    )
    button = gr.Button("Generate")

    audio.stream(
        fn=generation, inputs=[num_steps], outputs=[audio],
        trigger=button.click
    )
  • Set the "mode" parameter to "receive" and "modality" to "audio".
  • The stream event's fn parameter is a generator function that yields the next audio segment as a tuple of (frame_rate, audio_samples).
  • The numpy array should be of shape (1, num_samples).
  • The outputs parameter should be a list with the WebRTC component as the only element.

Conversational AI

import gradio as gr
import numpy as np
from gradio_webrtc import WebRTC, StreamHandler
from queue import Queue
import time


class EchoHandler(StreamHandler):
    def __init__(self) -> None:
        super().__init__()
        self.queue = Queue()

    def receive(self, frame: tuple[int, np.ndarray] | np.ndarray) -> None:
        self.queue.put(frame)

    def emit(self) -> None:
        return self.queue.get()
    
    def copy(self) -> StreamHandler:
        return EchoHandler()


with gr.Blocks() as demo:
    with gr.Column():
        with gr.Group():
            audio = WebRTC(
                label="Stream",
                rtc_configuration=None,
                mode="send-receive",
                modality="audio",
            )

        audio.stream(fn=EchoHandler(), inputs=[audio], outputs=[audio], time_limit=15)


if __name__ == "__main__":
    demo.launch()
  • Instead of passing a function to the stream event's fn parameter, pass a StreamHandler implementation. The StreamHandler above simply echoes the audio back to the client.
  • The StreamHandler class has two methods: receive and emit and copy. The receive method is called when a new frame is received from the client, and the emit method returns the next frame to send to the client. The copy method is called at the beginning of the stream to ensure each user has a unique stream handler.
  • An audio frame is represented as a tuple of (frame_rate, audio_samples) where audio_samples is a numpy array of shape (num_channels, num_samples).
  • You can also specify the audio layout ("mono" or "stereo") in the emit method by retuning it as the third element of the tuple. If not specified, the default is "mono".
  • The time_limit parameter is the maximum time in seconds the conversation will run. If the time limit is reached, the audio stream will stop.
  • The emit method SHOULD NOT block. If a frame is not ready to be sent, the method should return None.

An easy way to get started with Conversational AI is to use the ReplyOnPause stream handler. This will automatically run your function when the speaker has stopped speaking. In order to use ReplyOnPause, the [vad] extra dependencies must be installed.

import gradio as gr
from gradio_webrtc import WebRTC, ReplyOnPause

def response(audio: tuple[int, np.ndarray]):
    """This function must yield audio frames"""
    ...
    for numpy_array in generated_audio:
        yield (sampling_rate, numpy_array, "mono")


with gr.Blocks() as demo:
    gr.HTML(
    """
    <h1 style='text-align: center'>
    Chat (Powered by WebRTC ⚡️)
    </h1>
    """
    )
    with gr.Column():
        with gr.Group():
            audio = WebRTC(
                label="Stream",
                rtc_configuration=rtc_configuration,
                mode="send-receive",
                modality="audio",
            )
        audio.stream(fn=ReplyOnPause(response), inputs=[audio], outputs=[audio], time_limit=60)


demo.launch(ssr_mode=False)

Deployment

When deploying in a cloud environment (like Hugging Face Spaces, EC2, etc), you need to set up a TURN server to relay the WebRTC traffic. The easiest way to do this is to use a service like Twilio.

from twilio.rest import Client
import os

account_sid = os.environ.get("TWILIO_ACCOUNT_SID")
auth_token = os.environ.get("TWILIO_AUTH_TOKEN")

client = Client(account_sid, auth_token)

token = client.tokens.create()

rtc_configuration = {
    "iceServers": token.ice_servers,
    "iceTransportPolicy": "relay",
}

with gr.Blocks() as demo:
    ...
    rtc = WebRTC(rtc_configuration=rtc_configuration, ...)
    ...
MIT License Copyright (c) 2024 Freddy Boulton Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

将任何 Python 函数转换为通过 WebRTC 或 WebSockets 的实时音频和视频流 展开 收起
README
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/mirrors/fastrtc.git
git@gitee.com:mirrors/fastrtc.git
mirrors
fastrtc
fastrtc
add-llama-code-editor-to-cookbook

搜索帮助