1 Star 0 Fork 1

Hugging Face 模型镜像/clip-ViT-B-32

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
library_nametagspipeline_tag
sentence-transformers
sentence-transformersfeature-extractionsentence-similarity
sentence-similarity

clip-ViT-B-32

This is the Image & Text model CLIP, which maps text and images to a shared vector space. For applications of the models, have a look in our documentation SBERT.net - Image Search

Usage

After installing sentence-transformers (pip install sentence-transformers), the usage of this model is easy:

from sentence_transformers import SentenceTransformer, util
from PIL import Image

#Load CLIP model
model = SentenceTransformer('clip-ViT-B-32')

#Encode an image:
img_emb = model.encode(Image.open('two_dogs_in_snow.jpg'))

#Encode text descriptions
text_emb = model.encode(['Two dogs in the snow', 'A cat on a table', 'A picture of London at night'])

#Compute cosine similarities 
cos_scores = util.cos_sim(img_emb, text_emb)
print(cos_scores)

See our SBERT.net - Image Search documentation for more examples how the model can be used for image search, zero-shot image classification, image clustering and image deduplication.

Performance

In the following table we find the zero-shot ImageNet validation set accuracy:

Model Top 1 Performance
clip-ViT-B-32 63.3
clip-ViT-B-16 68.1
clip-ViT-L-14 75.4

For a multilingual version of the CLIP model for 50+ languages have a look at: clip-ViT-B-32-multilingual-v1

空文件

简介

clip-ViT-B-32是一个图像与文本交互的模型,具有较强的图像和文本理解能力,适用于各种跨模态任务。 展开 收起
取消

发行版

暂无发行版

贡献者 (4)

全部

近期动态

12个月前推送了新的提交到 main 分支,61c3f1c...11fb331
1年多前推送了新的 main 分支
1年多前创建了仓库
不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/hf-models/clip-ViT-B-32.git
git@gitee.com:hf-models/clip-ViT-B-32.git
hf-models
clip-ViT-B-32
clip-ViT-B-32
main

搜索帮助