1 Star 1 Fork 0

Hugging Face 数据集镜像/coco2017

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
languagepretty_namesize_categoriestask_categoriestask_idstagsdataset_info
en
COCO2017
100K<n<1M
image-to-text
image-captioning
cocoimage-captioning
featuressplitsdownload_sizedataset_size
namedtype
licenseint64
namedtype
file_namestring
namedtype
coco_urlstring
namedtype
heightint64
namedtype
widthint64
namedtype
date_capturedstring
namedtype
flickr_urlstring
namedtype
image_idint64
namesequence
idsint64
namesequence
captionsstring
namenum_bytesnum_examples
train64026361118287
namenum_bytesnum_examples
validation26847315000
3017012766711092

coco2017

Image-text pairs from MS COCO2017.

Data origin

  • Data originates from cocodataset.org
  • While coco-karpathy uses a dense format (with several sentences and sendids per row), coco-karpathy-long uses a long format with one sentence (aka caption) and sendid per row. coco-karpathy-long uses the first five sentences and therefore is five times as long as coco-karpathy.
    • phiyodr/coco2017: One row corresponds one image with several sentences.
    • phiyodr/coco2017-long: One row correspond one sentence (aka caption). There are 5 rows (sometimes more) with the same image details.

Format

DatasetDict({
    train: Dataset({
        features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'],
        num_rows: 118287
    })
    validation: Dataset({
        features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'],
        num_rows: 5000
    })
})

Usage

  • Download image data and unzip
cd PATH_TO_IMAGE_FOLDER

wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
#wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip # zip not needed: everything you need is in load_dataset("phiyodr/coco2017")

unzip train2017.zip
unzip val2017.zip
  • Load dataset in Python
import os
from datasets import load_dataset
PATH_TO_IMAGE_FOLDER = "COCO2017"

def create_full_path(example):
    """Create full path to image using `base_path` to COCO2017 folder."""
    example["image_path"] = os.path.join(PATH_TO_IMAGE_FOLDER, example["file_name"])
    return example

dataset = load_dataset("phiyodr/coco2017")
dataset = dataset.map(create_full_path)

空文件

简介

Mirror of https://huggingface.co/datasets/phiyodr/coco2017 展开 收起
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/hf-datasets/coco2017.git
git@gitee.com:hf-datasets/coco2017.git
hf-datasets
coco2017
coco2017
main

搜索帮助