1 Star 0 Fork 0

Hugging Face 数据集镜像/DataComp-12M

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
licenselicense_namelicense_linktask_categorieslanguage
otherapple-asclhttps://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data
text-to-imageimage-to-text
en

Dataset Card for DataComp-12M

This dataset contains UIDs of DataComp-12M that is a 12M subset of DataComp-1B-BestPool. Image-text models trained on DataComp-12M are significantly better than on CC-12M/YFCC-15M as well as DataComp-Small/Medium. For details on this dataset and the improved DataCompDR-12M, please visit our MobileCLIP paper. The dataset with the original captions is now available at mlfoundations/DataComp-12M. The UIDs per shards match between mlfoundations/DataComp-12M and apple/DataCompDR-12M.

Dataset Details

Dataset Description

DataCompDR is an image-text dataset and an enhancement to the DataComp dataset. We reinforce the DataComp dataset using our multi-modal dataset reinforcement strategy. In particular, we create DataCompDR-1B and DataCompDR-12M by reinforcing the DataComp-1B (BestPool filtering) and a uniform subset of 12.8M samples, DataCompDR-12M. We have a one-time generation process, the cost of which is amortized over multiple architectures and extensive ablations. We generate 5 synthetic captions per image using the coca_ViT-L-14 model in OpenCLIP, and strong random image augmentations (10 for DataCompDR-1B and 30 for DataCompDR-12M). We compute embeddings of an ensemble of two strong teachers (ViT-L-14 with pretrained weights datacomp_xl_s13b_b90k and openai in OpenCLIP) on augmented images as well as real and synthetic captions. Embeddings are 1536-D concatenations of 2x768-D vectors. One seen sample for DataCompDR is a triplet of one randomly augmented image, one ground-truth caption, and one randomly picked synthetic caption.

  • Curated by: Original data by DataComp and metadata by Apple.
  • License: We distribute our metadata under our license. The original image url-text samples and metadata were released by DataComp under Creative Common CC-BY-4.0 license. The individual images are under their own copyrights.
  • Repository: ml-mobileclip GitHub
  • Paper: MobileCLIP paper
  • Demo: Coming Soon

Uses

Training with DataCompDR shows significant learning efficiency improvement compared to the standard CLIP training. For example, with a single node of 8×A100 GPUs, we achieve 61.7% zero-shot classification on ImageNet-val in approximately one day when training a ViT-B/16 based CLIP from scratch on DataCompDR-12M. Training with DataCompDR-1B sets new state-of-the-art performance on several metrics (Fig. 2) while still using a fraction of the training compute budget compared to previous works. Using DataCompDR, we demonstrate 10x-1000x learning efficiency in comparison to DataComp.

Dataset Structure

- uids.txt: List of 12779520 (65536*195) UIDs, one UID per line.
- uids.npy: List of 12779520 (65536*195) UIDs as a NumPy array of type `numpy.dtype("u8,u8")`.

Citation

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training. (CVPR 2024) Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.

@InProceedings{mobileclip2024,
  author = {Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel},
  title = {MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2024},
}

空文件

简介

Mirror of https://huggingface.co/datasets/apple/DataComp-12M 展开 收起
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/hf-datasets/DataComp-12M.git
git@gitee.com:hf-datasets/DataComp-12M.git
hf-datasets
DataComp-12M
DataComp-12M
main

搜索帮助