# sam-audio **Repository Path**: silent_pencil/sam-audio ## Basic Information - **Project Name**: sam-audio - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-18 - **Last Updated**: 2025-12-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
# SAM-Audio ![CI](https://github.com/facebookresearch/sam-audio/actions/workflows/ci.yaml/badge.svg) ![model_image](assets/sam_audio_main_model.png)
Segment Anything Model for Audio SAM-Audio is a foundation model for isolating any sound in audio using text, visual, or temporal prompts. It can separate specific sounds from complex audio mixtures based on natural language descriptions, visual cues from video, or time spans. ## Setup **Requirements:** - Python >= 3.10 - CUDA-compatible GPU (recommended) Install dependencies: ```bash pip install . ``` ## Usage ⚠️ Before using SAM Audio, please request access to the checkpoints on the SAM Audio Hugging Face [repo](https://huggingface.co/facebook/sam-audio-large). Once accepted, you need to be authenticated to download the checkpoints. You can do this by running the following [steps](https://huggingface.co/docs/huggingface_hub/en/quick-start#authentication) (e.g. `hf auth login` after generating an access token.) ### Basic Text Prompting ```python from sam_audio import SAMAudio, SAMAudioProcessor import torchaudio model = SAMAudio.from_pretrained("facebook/sam-audio-large") processor = SAMAudioProcessor.from_pretrained("facebook/sam-audio-large") model = model.eval().cuda() file = "