代码拉取完成,页面将自动刷新
language | pretty_name | size_categories | task_categories | task_ids | tags | dataset_info | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| COCO2017 |
|
|
|
|
|
Image-text pairs from MS COCO2017.
coco-karpathy
uses a dense format (with several sentences and sendids per row), coco-karpathy-long
uses a long format with one sentence
(aka caption) and sendid
per row. coco-karpathy-long
uses the first five sentences and therefore is five times as long as coco-karpathy
.
phiyodr/coco2017
: One row corresponds one image with several sentences.phiyodr/coco2017-long
: One row correspond one sentence (aka caption). There are 5 rows (sometimes more) with the same image details.DatasetDict({
train: Dataset({
features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'],
num_rows: 118287
})
validation: Dataset({
features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'],
num_rows: 5000
})
})
cd PATH_TO_IMAGE_FOLDER
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
#wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip # zip not needed: everything you need is in load_dataset("phiyodr/coco2017")
unzip train2017.zip
unzip val2017.zip
import os
from datasets import load_dataset
PATH_TO_IMAGE_FOLDER = "COCO2017"
def create_full_path(example):
"""Create full path to image using `base_path` to COCO2017 folder."""
example["image_path"] = os.path.join(PATH_TO_IMAGE_FOLDER, example["file_name"])
return example
dataset = load_dataset("phiyodr/coco2017")
dataset = dataset.map(create_full_path)
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。