# clip-vit-large-patch14-336 **Repository Path**: hf-models/clip-vit-large-patch14-336 ## Basic Information - **Project Name**: clip-vit-large-patch14-336 - **Description**: 基于 CLIP(Contrastive Language-Image Pre-training)模型的大型版本,使用了 Vision Transformer(ViT)作为其图像编码器。这个模型是为了处理图像和文本之间的关联任务而预训练的,能够将图像内容与描述图像的文本标签紧密联系起来。 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 3 - **Forks**: 1 - **Created**: 2023-10-23 - **Last Updated**: 2025-11-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README --- tags: - generated_from_keras_callback widget: - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/cat-dog-music.png candidate_labels: playing music, playing sports example_title: Cat & Dog model-index: - name: clip-vit-large-patch14-336 results: [] --- # clip-vit-large-patch14-336 This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set: ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - optimizer: None - training_precision: float32 ### Training results ### Framework versions - Transformers 4.21.3 - TensorFlow 2.8.2 - Tokenizers 0.12.1