# Image-Captioning-v2 **Repository Path**: passenger134/Image-Captioning-v2 ## Basic Information - **Project Name**: Image-Captioning-v2 - **Description**: 图像中文描述+视觉注意力 - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 5 - **Forks**: 1 - **Created**: 2019-04-27 - **Last Updated**: 2024-03-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 图像中文描述 图像中文描述 + 视觉注意力的 PyTorch 实现。 [Show, Attend, and Tell](https://arxiv.org/pdf/1502.03044.pdf) 是令人惊叹的工作,[这里](https://github.com/kelvinxu/arctic-captions)是作者的原始实现。 这个模型学会了“往哪瞅”:当模型逐词生成标题时,模型的目光在图像上移动以专注于跟下一个词最相关的部分。 ## 依赖 - Python 3.5 - PyTorch 0.4 ## 数据集 使用 AI Challenger 2017 的图像中文描述数据集,包含30万张图片,150万句中文描述。训练集:210,000 张,验证集:30,000 张,测试集 A:30,000 张,测试集 B:30,000 张。 ![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/dataset.png) 下载点这里:[图像中文描述数据集](https://challenger.ai/datasets/caption),放在 data 目录下。 ## 网络结构 ![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/net.png) ## 用法 ### 数据预处理 提取210,000 张训练图片和30,000 张验证图片: ```bash $ python pre-process.py ``` ### 训练 ```bash $ python train.py ``` 可视化训练过程,执行: ```bash $ tensorboard --logdir path_to_current_dir/logs ``` ### 演示 下载 [预训练模型](https://github.com/foamliu/Image-Captioning-v2/releases/download/v1.0/model.85-0.7657.hdf5) 放在 models 目录,然后执行: ```bash $ python demo.py ``` 原图 | 注意力 | |---|---| ||![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/out_0.jpg) | ||![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/out_1.jpg) | ||![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/out_2.jpg) | ||![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/out_3.jpg) | ||![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/out_4.jpg) | ||![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/out_5.jpg) | ||![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/out_6.jpg) | ||![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/out_7.jpg) | ||![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/out_8.jpg) | ||![image](https://github.com/foamliu/Image-Captioning-v2/raw/master/images/out_9.jpg) |