diff --git a/.github/ISSUE_TEMPLATE/-------------other-general-issues.md b/.github/ISSUE_TEMPLATE/-------------other-general-issues.md deleted file mode 100644 index f4e48541a03f15a3cb68dff228104e2dcdae0c06..0000000000000000000000000000000000000000 --- a/.github/ISSUE_TEMPLATE/-------------other-general-issues.md +++ /dev/null @@ -1,49 +0,0 @@ ---- -name: "\U0001F516 其他通用问题 / Other General Issues" -about: 提出其他问题 / Suggest other general issues -title: "[Other General Issues]" -labels: '' -assignees: '' - ---- - -**PaddleDetection team appreciate any suggestion or problem you delivered~** - -## Checklist: - -1. 查找历史相关issue寻求解答/I have searched related issues but cannot get the expected help. -2. 翻阅[FAQ](https://paddledetection.readthedocs.io/FAQ.html) /I have read the [FAQ documentation](https://paddledetection.readthedocs.io/FAQ.html) but cannot get the expected help. - -## 描述问题/Describe the bug -A clear and concise description of what the bug is. - -## 复现/Reproduction - -1. 您使用的命令是?/What command or script did you run? - -```none -请填写命令/A placeholder for the command. -``` -2. 您是否更改过代码或配置文件?您是否理解您所更改的内容?还请您提供所更改的部分代码。/Did you make any modifications on the code or config? Did you understand what you have modified? Please provide the codes that you modified. - -3. 您使用的数据集是?/What dataset did you use? - -4. 请提供您出现的报错信息及相关log。/Please provide the error messages or relevant log information. - -## 环境/Environment -1. 请提供您使用的Paddle和PaddleDetection的版本号/Please provide the version of Paddle and PaddleDetection you use: - -2. 如您在使用PaddleDetection的同时还在使用其他产品,如PaddleServing、PaddleInference等,请您提供其版本号/ Please provide the version of any other related tools/products used, such as the version of PaddleServing and etc: - -3. 请提供您使用的操作系统信息,如Linux/Windows/MacOS /Please provide the OS information, e.g., Linux: - -4. 请问您使用的Python版本是?/ Please provide the version of Python you used. - -5. 请问您使用的CUDA/cuDNN的版本号是?/ Please provide the version of CUDA/cuDNN you used. - - -如果您的issue是关于安装或环境,您可以先查询[安装文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL_cn.md)尝试解决~ - -If your issue looks like an installation issue / environment issue, -please first try to solve it yourself with the instructions in -https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL.md diff --git a/.github/ISSUE_TEMPLATE/------------feature-request.md b/.github/ISSUE_TEMPLATE/------------feature-request.md deleted file mode 100644 index 7b23ce6925a943fd56981df22c8363ec8c716de7..0000000000000000000000000000000000000000 --- a/.github/ISSUE_TEMPLATE/------------feature-request.md +++ /dev/null @@ -1,43 +0,0 @@ ---- -name: "\U0001F680 新功能需求 / Feature Request" -about: 提出一个新的功能需求或改进建议 / Suggest an improvement for PaddleDetection -title: "[Feature Request]" -labels: '' -assignees: '' - ---- - -## 🚀 新功能/Feature - -PaddleDetection欢迎大家以清晰简洁的语言提出新功能需求。 - -A clear and concise description of the feature proposal. - -## 需求原因&示例/Motivation & Examples - -请描述这个需求的必要性。 - -Tell us why the feature is useful. - -请描述这个需求可实现的具体功能,如果可以,辛苦您提供相关代码实现效果。 - -Describe what the feature would look like, if it is implemented. -Best demonstrated using **code examples** in addition to words. - -## 📣 注意/Note - -PaddleDetection仅添加通用性较高的新功能/特性。 - -We only consider adding new features if they are relevant to many users. - -如果您需要论文中的模型能力,PaddleDetection会优先考虑与目标检测强相关且意义较大的论文。 - -If you request implementation of research papers -- we only consider papers that have enough significance and prevalence in the object detection field. - -比如“让XX功能更快”类似的需求不能作为一个有效需求,需要更具体的描述,如“创建一个具体XX工具/功能,让XX更快”即是一个有效需求。 - -"Make X faster/accurate" is not a valid feature request. "Implement a concrete feature that can make X faster/accurate" can be a valid feature request. - -PaddleDetection感谢您的支持,我们期待您提出新功能需求! - -Thanks for your suggestions! diff --git a/.github/ISSUE_TEMPLATE/-----------documentation-improvement.md b/.github/ISSUE_TEMPLATE/-----------documentation-improvement.md deleted file mode 100644 index 3969fd2e0fc6c5e483268b4a85320627497c1c82..0000000000000000000000000000000000000000 --- a/.github/ISSUE_TEMPLATE/-----------documentation-improvement.md +++ /dev/null @@ -1,15 +0,0 @@ ---- -name: "\U0001F4D6 文档优化 / Documentation Improvement" -about: 对现有的文档教程提出修改建议 / Suggest an improvement about existing documentation or tutorials - in PaddleDetection. -title: "[Document Improvement]" -labels: '' -assignees: '' - ---- - -## 📖 文档优化/Documentation Improvement - -**请简单说明文档存在问题/Please provide a concise and brief description of the documentation problem:** - -**请提供有问题的文档部分截图及链接/Please provide the screen shoot and the link of the document:** diff --git a/.github/ISSUE_TEMPLATE/------bug---bug-report.md b/.github/ISSUE_TEMPLATE/------bug---bug-report.md deleted file mode 100644 index d778a03420a16fbf9425710fdb8046b33f445520..0000000000000000000000000000000000000000 --- a/.github/ISSUE_TEMPLATE/------bug---bug-report.md +++ /dev/null @@ -1,50 +0,0 @@ ---- -name: "\U0001F41B 提出Bug / Bug Report" -about: 提出PaddleDetection使用中存在的Bug / Report a bug in PaddleDetection -title: "[BUG]" -labels: '' -assignees: '' - ---- - -**PaddleDetection team appreciate any suggestion or problem you delivered~** - -## Checklist: - -1. 查找历史相关issue寻求解答/I have searched related issues but cannot get the expected help. -2. 翻阅[FAQ](https://paddledetection.readthedocs.io/FAQ.html) /I have read the [FAQ documentation](https://paddledetection.readthedocs.io/FAQ.html) but cannot get the expected help. -3. 确认bug是否在新版本里还未修复/The bug has not been fixed in the latest version. - -## 描述问题/Describe the bug -A clear and concise description of what the bug is. - -## 复现/Reproduction - -1. 您使用的命令是?/What command or script did you run? - -```none -请填写命令/A placeholder for the command. -``` -2. 您是否更改过代码或配置文件?您是否理解您所更改的内容?还请您提供所更改的部分代码。/Did you make any modifications on the code or config? Did you understand what you have modified? Please provide the codes that you modified. - -3. 您使用的数据集是?/What dataset did you use? - -4. 请提供您出现的报错信息及相关log。/Please provide the error messages or relevant log information. - -## 环境/Environment -1. 请提供您使用的Paddle和PaddleDetection的版本号/Please provide the version of Paddle and PaddleDetection you use: - -2. 如您在使用PaddleDetection的同时还在使用其他产品,如PaddleServing、PaddleInference等,请您提供其版本号/ Please provide the version of any other related tools/products used, such as the version of PaddleServing and etc: - -3. 请提供您使用的操作系统信息,如Linux/Windows/MacOS /Please provide the OS information, e.g., Linux: - -4. 请问您使用的Python版本是?/ Please provide the version of Python you used. - -5. 请问您使用的CUDA/cuDNN的版本号是?/ Please provide the version of CUDA/cuDNN you used. - - -如果您的issue是关于安装或环境,您可以先查询[安装文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL_cn.md)尝试解决~ - -If your issue looks like an installation issue / environment issue, -please first try to solve it yourself with the instructions in -https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL.md diff --git a/.github/ISSUE_TEMPLATE/1_bug-report.yml b/.github/ISSUE_TEMPLATE/1_bug-report.yml new file mode 100644 index 0000000000000000000000000000000000000000..41e32cfe941b03073044102b7077f9822cfa380f --- /dev/null +++ b/.github/ISSUE_TEMPLATE/1_bug-report.yml @@ -0,0 +1,73 @@ +name: 🐛 报BUG Bug Report +description: 报告一个可复现的BUG帮助我们修复PaddleDetection。 Report a bug to help us reproduce and fix it. +labels: [type/bug-report, status/new-issue] + +body: +- type: markdown + attributes: + value: | + Thank you for submitting a PaddleDetection Bug Report! + +- type: checkboxes + attributes: + label: 问题确认 Search before asking + description: > + 在向PaddleDetection报bug之前,请先查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)是否报过同样的bug。 + + Before submitting a bug, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/PaddlePaddle/PaddleDetection/issues). + + options: + - label: > + 我已经查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues),没有报过同样bug。I have searched the [issues](https://github.com/PaddlePaddle/PaddleDetection/issues) and found no similar bug report. + required: true + +- type: textarea + id: code + attributes: + label: bug描述 Describe the Bug + description: | + 请清晰简洁的描述这个bug,最好附上bug复现步骤及最小代码集,以便我们可以通过运行代码来重现错误。代码片段需要尽可能简洁,请花些时间去掉不相关的代码以帮助我们有效地调试。我们希望通过复制代码并运行得到与你相同的结果,请避免任何外部数据或包含相关的导入等。如果代码太长,请将可执行代码放到[AIStudio](https://aistudio.baidu.com/aistudio/index)中并将项目设置为公开(或者放到github gist上),请在项目中描述清楚bug复现步骤,在issue中描述期望结果与实际结果。 + + 如果你报告的是一个报错信息,请将完整回溯的报错贴在这里,并使用 ` ```三引号块``` `展示错误信息。 + + + placeholder: | + 请清晰简洁的描述这个bug。A clear and concise description of what the bug is. + + ```python + # 最小可复现代码。 Sample code to reproduce the problem. + ``` + + ```shell + 带有完整回溯的报错信息。 The error message you got, with the full traceback. + ``` + validations: + required: true + +- type: textarea + attributes: + label: 复现环境 Environment + description: 请具体说明复现bug的环境信息,Please specify the software and hardware you used to produce the bug. + placeholder: | + - PaddlePaddle: 2.2.2 + - PaddleDetection: release/2.4 + - Python: 3.8.0 + - CUDA: 10.2 + - CUDNN: 7.6 + validations: + required: false + +- type: checkboxes + attributes: + label: 是否愿意提交PR Are you willing to submit a PR? + description: > + (可选)如果你对修复bug有自己的想法,十分鼓励提交[Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls),共同提升PaddleDetection + + (Optional) We encourage you to submit a [Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls) (PR) to help improve PaddleDetection for everyone, especially if you have a good understanding of how to implement a fix or feature. + options: + - label: Yes I'd like to help by submitting a PR! + +- type: markdown + attributes: + value: > + 感谢你的贡献 🎉!Thanks for your contribution 🎉! diff --git a/.github/ISSUE_TEMPLATE/2_feature-request.yml b/.github/ISSUE_TEMPLATE/2_feature-request.yml new file mode 100644 index 0000000000000000000000000000000000000000..dcf9ec4462886c7064315f0fc6ac167dd6c6dbf5 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/2_feature-request.yml @@ -0,0 +1,50 @@ +name: 🚀 新需求 Feature Request +description: 提交一个你对PaddleDetection的新需求。 Submit a request for a new Paddle feature. +labels: [type/feature-request, status/new-issue] + +body: +- type: markdown + attributes: + value: > + #### 你可以在这里提出你对PaddleDetection的新需求,包括但不限于:功能或模型缺失、功能不全或无法使用、精度/性能不符合预期等。 + + #### You could submit a request for a new feature here, including but not limited to: new features or models, incomplete or unusable features, accuracy/performance not as expected, etc. + +- type: checkboxes + attributes: + label: 问题确认 Search before asking + description: > + 在向PaddleDetection提新需求之前,请先查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)是否报过同样的需求。 + + Before submitting a feature request, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/PaddlePaddle/PaddleDetection/issues). + + options: + - label: > + 我已经查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues),没有类似需求。I have searched the [issues](https://github.com/PaddlePaddle/PaddleDetection/issues) and found no similar feature requests. + required: true + +- type: textarea + id: description + attributes: + label: 需求描述 Feature Description + description: | + 请尽可能包含任务目标、需求场景、功能描述等信息,全面的信息有利于我们准确评估你的需求。 + Please include as much information as possible, such as mission objectives, requirement scenarios, functional descriptions, etc. Comprehensive information will help us accurately assess your feature request. + value: "1. 任务目标(请描述你正在做的项目是什么,如模型、论文、项目是什么?); 2. 需求场景(请描述你的项目中为什么需要用此功能); 3. 功能描述(请简单描述或设计这个功能)" + validations: + required: true + +- type: checkboxes + attributes: + label: 是否愿意提交PR Are you willing to submit a PR? + description: > + (可选)如果你对新feature有自己的想法,十分鼓励提交[Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls),共同提升PaddleDetection + + (Optional) We encourage you to submit a [Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls) (PR) to help improve PaddleDetection for everyone, especially if you have a good understanding of how to implement a fix or feature. + options: + - label: Yes I'd like to help by submitting a PR! + +- type: markdown + attributes: + value: > + 感谢你的贡献 🎉!Thanks for your contribution 🎉! diff --git a/.github/ISSUE_TEMPLATE/3_documentation-issue.yml b/.github/ISSUE_TEMPLATE/3_documentation-issue.yml new file mode 100644 index 0000000000000000000000000000000000000000..4ea08cd5f4b99003d2323e1578bd0456a9dcf848 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/3_documentation-issue.yml @@ -0,0 +1,38 @@ +name: 📚 文档 Documentation Issue +description: 反馈一个官网文档错误。 Report an issue related to https://github.com/PaddlePaddle/PaddleDetection. +labels: [type/docs, status/new-issue] + +body: +- type: markdown + attributes: + value: > + #### 请确认反馈的问题来自PaddlePaddle官网文档:https://github.com/PaddlePaddle/PaddleDetection 。 + + #### Before submitting a Documentation Issue, Please make sure that issue is related to https://github.com/PaddlePaddle/PaddleDetection. + +- type: textarea + id: link + attributes: + label: 文档链接&描述 Document Links & Description + description: | + 请说明有问题的文档链接以及该文档存在的问题。 + Please fill in the link to the document and describe the question. + validations: + required: true + + +- type: textarea + id: error + attributes: + label: 请提出你的建议 Please give your suggestion + description: | + 请告诉我们,你希望如何改进这个文档。或者你可以提个PR修复这个问题。 + Please tell us how you would like to improve this document. Or you can submit a PR to fix this problem. + + validations: + required: false + +- type: markdown + attributes: + value: > + 感谢你的贡献 🎉!Thanks for your contribution 🎉! diff --git a/.github/ISSUE_TEMPLATE/4_ask-a-question.yml b/.github/ISSUE_TEMPLATE/4_ask-a-question.yml new file mode 100644 index 0000000000000000000000000000000000000000..af237f516eb333d4c5f33bba4b7dc9c0dec2e30f --- /dev/null +++ b/.github/ISSUE_TEMPLATE/4_ask-a-question.yml @@ -0,0 +1,37 @@ +name: 🙋🏼‍♀️🙋🏻‍♂️提问 Ask a Question +description: 提出一个使用/咨询问题。 Ask a usage or consultation question. +labels: [type/question, status/new-issue] + +body: +- type: checkboxes + attributes: + label: 问题确认 Search before asking + description: > + #### 你可以在这里提出一个使用/咨询问题,提问之前请确保: + + - 1)已经百度/谷歌搜索过你的问题,但是没有找到解答; + + - 2)已经在官网查询过[教程文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/GETTING_STARTED_cn.md)与[FAQ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/docs/tutorials/FAQ),但是没有找到解答; + + - 3)已经在[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)中搜索过,没有找到同类issue或issue未被解答。 + + + #### You could ask a usage or consultation question here, before your start, please make sure: + + - 1) You have searched your question on Baidu/Google, but found no answer; + + - 2) You have checked the [tutorials](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/GETTING_STARTED.md), but found no answer; + + - 3) You have searched [the existing and past issues](https://github.com/PaddlePaddle/PaddleDetection/issues), but found no similar issue or the issue has not been answered. + + options: + - label: > + 我已经搜索过问题,但是没有找到解答。I have searched the question and found no related answer. + required: true + +- type: textarea + id: question + attributes: + label: 请提出你的问题 Please ask your question + validations: + required: true diff --git a/.github/ISSUE_TEMPLATE/5_others.yml b/.github/ISSUE_TEMPLATE/5_others.yml new file mode 100644 index 0000000000000000000000000000000000000000..ec2f08ae16098cd8987f3b6bc726d9a28696833a --- /dev/null +++ b/.github/ISSUE_TEMPLATE/5_others.yml @@ -0,0 +1,23 @@ +name: 🧩 其他 Others +description: 提出其他问题。 Report any other non-support related issues. +labels: [type/others, status/new-issue] + +body: +- type: markdown + attributes: + value: > + #### 你可以在这里提出任何前面几类模板不适用的问题,包括但不限于:优化性建议、框架使用体验反馈、版本兼容性问题、报错信息不清楚等。 + + #### You can report any issues that are not applicable to the previous types of templates, including but not limited to: enhancement suggestions, feedback on the use of the framework, version compatibility issues, unclear error information, etc. + +- type: textarea + id: others + attributes: + label: 问题描述 Please describe your issue + validations: + required: true + +- type: markdown + attributes: + value: > + 感谢你的贡献 🎉! Thanks for your contribution 🎉! diff --git a/.gitignore b/.gitignore index 2260189a0aa6105c5ead510efe24f27dfb046882..6a98a38b72ef9a59a2fdab266697661cdd1fa136 100644 --- a/.gitignore +++ b/.gitignore @@ -82,3 +82,7 @@ ppdet/version.py # NPU meta folder kernel_meta/ + +# MAC +*.DS_Store + diff --git a/README_cn.md b/README_cn.md index 1b2ebaa39c4ebdbc6d224b9e4ad0cefa1d3eaeb2..e1e3f07d205e566bf27f95c516461a7290743557 100644 --- a/README_cn.md +++ b/README_cn.md @@ -2,84 +2,81 @@

- +

**飞桨目标检测开发套件,端到端地完成从训练到部署的全流程目标检测应用。** +

+ + + + + +

+
-[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) -[![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleDetection.svg)](https://github.com/PaddlePaddle/PaddleDetection/releases) -![python version](https://img.shields.io/badge/python-3.6+-orange.svg) -![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg) +
+
-## 产品动态 - -- 🔥 **2022.3.24:PaddleDetection发布[release/2.4版本](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4)** - - - 发布高精度云边一体SOTA目标检测模型[PP-YOLOE](config/ppyoloe),全系列多尺度模型,满足不同硬件算力需求,可适配服务器、边缘端GPU及其他服务器端AI加速卡。 - - 发布边缘端和CPU端超轻量SOTA目标检测模型[PP-PicoDet增强版](configs/picodet),提供模型稀疏化和量化功能,便于模型加速,各类硬件无需单独开发后处理模块,降低部署门槛。 - - 发布实时行人分析工具[PP-Human](deploy/pphuman),支持行人跟踪、人流量统计、人体属性识别与摔倒检测四大能力,基于真实场景数据特殊优化,精准识别各类摔倒姿势,适应不同环境背景、光线及摄像角度。 - -- 2021.11.03: PaddleDetection发布[release/2.3版本](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3) - - - 发布轻量级检测特色模型⚡[PP-PicoDet](configs/picodet),0.99m的参数量可实现精度30+mAP、速度150FPS。 - - - 发布轻量级关键点特色模型⚡[PP-TinyPose](configs/keypoint/tiny_pose),单人场景FP16推理可达122FPS、51.8AP,具有精度高速度快、检测人数无限制、微小目标效果好的优势。 - - - 发布实时跟踪系统[PP-Tracking](deploy/pptracking),覆盖单、多镜头下行人、车辆、多类别跟踪,对小目标、密集型特殊优化,提供人、车流量技术解决方案。 - - - 新增[Swin Transformer](configs/faster_rcnn),[TOOD](configs/tood),[GFL](configs/gfl)目标检测模型。 - - - 发布[Sniper](configs/sniper)小目标检测优化模型,发布针对EdgeBoard优化[PP-YOLO-EB](configs/ppyolo)模型。 - - - 新增轻量化关键点模型[Lite HRNet](configs/keypoint)关键点模型并支持Paddle Lite部署。 - -- 2021.08.10: PaddleDetection发布[release/2.2版本](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.2) - - - 发布Transformer检测系列模型,包括[DETR](configs/detr), [Deformable DETR](configs/deformable_detr), [Sparse RCNN](configs/sparse_rcnn)。 - - 新增Dark HRNet关键点模型和MPII数据集[关键点模型](configs/keypoint) - - 新增[人头](configs/mot/headtracking21)、[车辆](configs/mot/vehicle)跟踪垂类模型。 - -- 2021.05.20: PaddleDetection发布[release/2.1版本](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1) - - 新增[关键点检测](configs/keypoint),模型包括HigherHRNet,HRNet。 - - 新增[多目标跟踪](configs/mot)能力,模型包括DeepSORT,JDE,FairMOT。 - - 发布PPYOLO系列模型压缩模型,新增[ONNX模型导出教程](deploy/EXPORT_ONNX_MODEL.md)。 - -## 简介 - -**PaddleDetection**为基于飞桨PaddlePaddle的端到端目标检测套件,内置**190+主流目标检测、实例分割、跟踪、关键点检测**算法,其中包括**服务器端和移动端高精度、轻量级**产业级SOTA模型、冠军方案和学术前沿算法,并提供配置化的网络模块组件、十余种数据增强策略和损失函数等高阶优化支持和多种部署方案,在打通数据处理、模型开发、训练、压缩、部署全流程的基础上,提供丰富的案例及教程,加速算法产业落地应用。 - -#### 提供目标检测、实例分割、多目标跟踪、关键点检测等多种能力 - -
- -
+## 产品动态 + +- 🔥 **2022.8.09:[YOLO家族全系列模型](https://github.com/nemonameless/PaddleDetection_YOLOSeries)发布** + - 全面覆盖的YOLO家族经典与最新模型: 包括YOLOv3,百度飞桨自研的实时高精度目标检测检测模型PP-YOLOE,以及前沿检测算法YOLOv4、YOLOv5、YOLOX,MT-YOLOv6及YOLOv7 + - 更强的模型性能:基于各家前沿YOLO算法进行创新并升级,缩短训练周期5~8倍,精度普遍提升1%~5% mAP;使用模型压缩策略实现精度无损的同时速度提升30%以上 + - 完备的端到端开发支持:支持从模型训练、评估、预测到模型量化压缩,部署多种硬件的端到端开发全流程。同时支持不同模型算法灵活切换,一键实现算法二次开发 + +- 🔥 **2022.8.01:发布[PP-TinyPose升级版](./configs/keypoint/tiny_pose/). 在健身、舞蹈等场景的业务数据集端到端AP提升9.1** + - 新增体育场景真实数据,复杂动作识别效果显著提升,覆盖侧身、卧躺、跳跃、高抬腿等非常规动作 + - 检测模型采用[PP-PicoDet增强版](./configs/picodet/README.md),在COCO数据集上精度提升3.1% + - 关键点稳定性增强,新增滤波稳定方式,使得视频预测结果更加稳定平滑 + +- 2022.7.14:[行人分析工具PP-Human v2](./deploy/pipeline)发布 + - 四大产业特色功能:高性能易扩展的五大复杂行为识别、闪电级人体属性识别、一行代码即可实现的人流检测与轨迹留存以及高精度跨镜跟踪 + - 底层核心算法性能强劲:覆盖行人检测、跟踪、属性三类核心算法能力,对目标人数、光线、背景均无限制 + - 极低使用门槛:提供保姆级全流程开发及模型优化策略、一行命令完成推理、兼容各类数据输入格式 + +- 2022.3.24:PaddleDetection发布[release/2.4版本](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4) + - 发布高精度云边一体SOTA目标检测模型[PP-YOLOE](configs/ppyoloe),提供s/m/l/x版本,l版本COCO test2017数据集精度51.6%,V100预测速度78.1 FPS,支持混合精度训练,训练较PP-YOLOv2加速33%,全系列多尺度模型,满足不同硬件算力需求,可适配服务器、边缘端GPU及其他服务器端AI加速卡。 + - 发布边缘端和CPU端超轻量SOTA目标检测模型[PP-PicoDet增强版](configs/picodet),精度提升2%左右,CPU预测速度提升63%,新增参数量0.7M的PicoDet-XS模型,提供模型稀疏化和量化功能,便于模型加速,各类硬件无需单独开发后处理模块,降低部署门槛。 + - 发布实时行人分析工具[PP-Human](deploy/pipeline),支持行人跟踪、人流量统计、人体属性识别与摔倒检测四大能力,基于真实场景数据特殊优化,精准识别各类摔倒姿势,适应不同环境背景、光线及摄像角度。 + - 新增[YOLOX](configs/yolox)目标检测模型,支持nano/tiny/s/m/l/x版本,x版本COCO val2017数据集精度51.8%。 + +- [更多版本发布](https://github.com/PaddlePaddle/PaddleDetection/releases) -#### 应用场景覆盖工业、智慧城市、安防、交通、零售、医疗等十余种行业 +## 简介 -## 特性 +**PaddleDetection**为基于飞桨PaddlePaddle的端到端目标检测套件,内置**30+模型算法**及**250+预训练模型**,覆盖**目标检测、实例分割、跟踪、关键点检测**等方向,其中包括**服务器端和移动端高精度、轻量级**产业级SOTA模型、冠军方案和学术前沿算法,并提供配置化的网络模块组件、十余种数据增强策略和损失函数等高阶优化支持和多种部署方案,在打通数据处理、模型开发、训练、压缩、部署全流程的基础上,提供丰富的案例及教程,加速算法产业落地应用。 +
+ +
+ +## 特性 -- **模型丰富**: 包含**目标检测**、**实例分割**、**人脸检测**等**100+个预训练模型**,涵盖多种**全球竞赛冠军**方案。 +- **模型丰富**: 包含**目标检测**、**实例分割**、**人脸检测**、****关键点检测****、**多目标跟踪**等**250+个预训练模型**,涵盖多种**全球竞赛冠军**方案。 - **使用简洁**:模块化设计,解耦各个网络组件,开发者轻松搭建、试用各种检测模型及优化策略,快速得到高性能、定制化的算法。 - **端到端打通**: 从数据增强、组网、训练、压缩、部署端到端打通,并完备支持**云端**/**边缘端**多架构、多设备部署。 - **高性能**: 基于飞桨的高性能内核,模型训练速度及显存占用优势明显。支持FP16训练, 支持多机训练。 -## 技术交流 +
+ +
+ +## 技术交流 - 如果你发现任何PaddleDetection存在的问题或者是建议, 欢迎通过[GitHub Issues](https://github.com/PaddlePaddle/PaddleDetection/issues)给我们提issues。 -- 欢迎加入PaddleDetection QQ、微信(添加并回复小助手“检测”)用户群 - +- 欢迎加入PaddleDetection QQ、微信用户群(添加并回复小助手“检测”) +
- - + +
-## 套件结构概览 +## 套件结构概览 @@ -100,115 +97,130 @@ @@ -232,7 +245,10 @@
    -
  • Object Detection
  • +
    Object Detection
    • Faster RCNN
    • FPN
    • Cascade-RCNN
    • -
    • Libra RCNN
    • -
    • Hybrid Task RCNN
    • PSS-Det
    • RetinaNet
    • -
    • YOLOv3
    • -
    • YOLOv4
    • +
    • YOLOv3
    • +
    • YOLOv5
    • +
    • MT-YOLOv6
    • +
    • YOLOv7
    • PP-YOLOv1/v2
    • PP-YOLO-Tiny
    • +
    • PP-YOLOE
    • +
    • YOLOX
    • SSD
    • -
    • CornerNet-Squeeze
    • +
    • CenterNet
    • FCOS
    • TTFNet
    • +
    • TOOD
    • +
    • GFL
    • PP-PicoDet
    • DETR
    • Deformable DETR
    • Swin Transformer
    • Sparse RCNN
    • -
    -
  • Instance Segmentation
  • -
      +
    +
    Instance Segmentation +
    • Mask RCNN
    • +
    • Cascade Mask RCNN
    • SOLOv2
    • -
    -
  • Face Detection
  • +
+
Face Detection
    -
  • FaceBoxes
  • BlazeFace
  • -
  • BlazeFace-NAS
  • -
-
  • Multi-Object-Tracking
  • +
    +
    Multi-Object-Tracking
    • JDE
    • FairMOT
    • -
    • DeepSort
    • -
    -
  • KeyPoint-Detection
  • +
  • DeepSORT
  • +
  • ByteTrack
  • +
    +
    KeyPoint-Detection
    • HRNet
    • HigherHRNet
    • -
    +
  • Lite-HRNet
  • +
  • PP-TinyPose
  • +
    +
    Details
    • ResNet(&vd)
    • -
    • ResNeXt(&vd)
    • +
    • Res2Net(&vd)
    • +
    • CSPResNet
    • SENet
    • Res2Net
    • HRNet
    • -
    • Hourglass
    • -
    • CBNet
    • -
    • GCNet
    • +
    • Lite-HRNet
    • DarkNet
    • CSPDarkNet
    • -
    • VGG
    • MobileNetv1/v3
    • +
    • ShuffleNet
    • GhostNet
    • -
    • Efficientnet
    • -
    • BlazeNet
    • -
    +
  • BlazeNet
  • +
  • DLA
  • +
  • HardNet
  • +
  • LCNet
  • +
  • ESNet
  • +
  • Swin-Transformer
  • +
    -
    • Common
    • +
      Common
      • Sync-BN
      • Group Norm
      • DCNv2
      • -
      • Non-local
      • -
      +
    • EMA
    • +
    -
    • KeyPoint
    • +
      KeyPoint
      • DarkPose
      • -
      +
    -
    • FPN
    • +
      FPN
      • BiFPN
      • -
      • BFP
      • +
      • CSP-PAN
      • +
      • Custom-PAN
      • +
      • ES-PAN
      • HRFPN
      • -
      • ACFPN
      • -
      +
    -
    • Loss
    • +
      Loss
      • Smooth-L1
      • GIoU/DIoU/CIoU
      • IoUAware
      • -
      +
    • Focal Loss
    • +
    • CT Focal Loss
    • +
    • VariFocal Loss
    • +
    -
    • Post-processing
    • +
      Post-processing
      • SoftNMS
      • MatrixNMS
      • -
      +
    -
    • Speed
    • +
      Speed
      • FP16 training
      • Multi-machine training
      • -
      +
    +
    Details
    • Resize
    • Lighting
    • @@ -218,12 +230,13 @@
    • Color Distort
    • Random Erasing
    • Mixup
    • +
    • AugmentHSV
    • Mosaic
    • Cutmix
    • Grid Mask
    • Auto Augment
    • Random Perspective
    • -
    +
    -## 模型性能概览 +## 模型性能概览 + +
    + 云端模型性能对比 各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。 @@ -246,8 +262,15 @@ - `Cascade-Faster-RCNN`为`Cascade-Faster-RCNN-ResNet50vd-DCN`,PaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPS - `PP-YOLO`在COCO数据集精度45.9%,Tesla V100预测速度72.9FPS,精度速度均优于[YOLOv4](https://arxiv.org/abs/2004.10934) - `PP-YOLO v2`是对`PP-YOLO`模型的进一步优化,在COCO数据集精度49.5%,Tesla V100预测速度68.9FPS +- `PP-YOLOE`是对`PP-YOLO v2`模型的进一步优化,在COCO数据集精度51.6%,Tesla V100预测速度78.1FPS +- [`YOLOX`](configs/yolox)和[`YOLOv5`](https://github.com/nemonameless/PaddleDetection_YOLOSeries/tree/develop/configs/yolov5)均为基于PaddleDetection复现算法 - 图中模型均可在[模型库](#模型库)中获取 +
    + +
    + 移动端模型性能对比 + 各移动端模型在COCO数据集上精度mAP和高通骁龙865处理器上预测速度(FPS)对比图。
    @@ -259,28 +282,132 @@ - 测试数据均使用高通骁龙865(4\*A77 + 4\*A55)处理器batch size为1, 开启4线程测试,测试使用NCNN预测库,测试脚本见[MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark) - [PP-PicoDet](configs/picodet)及[PP-YOLO-Tiny](configs/ppyolo)为PaddleDetection自研模型,其余模型PaddleDetection暂未提供 -## 文档教程 +
    + +## 模型库 + +
    + 1. 通用检测 + +#### [PP-YOLOE](./configs/ppyoloe)系列 推荐场景:Nvidia V100, T4等云端GPU和Jetson系列等边缘端设备 + +| 模型名称 | COCO精度(mAP) | V100 TensorRT FP16速度(FPS) | 配置文件 | 模型下载 | +|:---------- |:-----------:|:-------------------------:|:-----------------------------------------------------:|:------------------------------------------------------------------------------------:| +| PP-YOLOE-s | 42.7 | 333.3 | [链接](configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | +| PP-YOLOE-m | 48.6 | 208.3 | [链接](configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | +| PP-YOLOE-l | 50.9 | 149.2 | [链接](configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | +| PP-YOLOE-x | 51.9 | 95.2 | [链接](configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | + +#### [PP-PicoDet](./configs/picodet)系列 推荐场景:ARM CPU(RK3399, 树莓派等) 和NPU(比特大陆,晶晨等)移动端芯片和x86 CPU设备 + +| 模型名称 | COCO精度(mAP) | 骁龙865 四线程速度(ms) | 配置文件 | 模型下载 | +|:---------- |:-----------:|:---------------:|:---------------------------------------------------:|:---------------------------------------------------------------------------------:| +| PicoDet-XS | 23.5 | 7.81 | [链接](configs/picodet/picodet_xs_320_coco_lcnet.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/picodet_xs_320_coco_lcnet.pdparams) | +| PicoDet-S | 29.1 | 9.56 | [链接](configs/picodet/picodet_s_320_coco_lcnet.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams) | +| PicoDet-M | 34.4 | 17.68 | [链接](configs/picodet/picodet_m_320_coco_lcnet.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco_lcnet.pdparams) | +| PicoDet-L | 36.1 | 25.21 | [链接](configs/picodet/picodet_l_320_coco_lcnet.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) | + +#### 前沿检测算法 + +| 模型名称 | COCO精度(mAP) | V100 TensorRT FP16速度(FPS) | 配置文件 | 模型下载 | +|:------------------------------------------------------------------ |:-----------:|:-------------------------:|:------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------:| +| [YOLOX-l](configs/yolox) | 50.1 | 107.5 | [链接](configs/yolox/yolox_l_300e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams) | +| [YOLOv5-l](https://github.com/nemonameless/PaddleDetection_YOLOSeries/tree/develop/configs/yolov5) | 48.6 | 136.0 | [链接](https://github.com/nemonameless/PaddleDetection_YOLOSeries/blob/develop/configs/yolov5/yolov5_l_300e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/yolov5_l_300e_coco.pdparams) | +| [YOLOv7-l](https://github.com/nemonameless/PaddleDetection_YOLOSeries/tree/develop/configs/yolov7) | 51.0 | 135.0 | [链接](https://github.com/nemonameless/PaddleDetection_YOLOSeries/blob/develop/configs/yolov7/yolov7_l_300e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/yolov7_l_300e_coco.pdparams) | + +#### 其他通用检测模型 [文档链接](docs/MODEL_ZOO_cn.md) + +
    + +
    + 2. 实例分割 + +| 模型名称 | 模型简介 | 推荐场景 | COCO精度(mAP) | 配置文件 | 模型下载 | +|:----------------- |:------------ |:---- |:--------------------------------:|:---------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------:| +| Mask RCNN | 两阶段实例分割算法 | 云边端 | box AP: 41.4
    mask AP: 37.5 | [链接](configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams) | +| Cascade Mask RCNN | 两阶段实例分割算法 | 云边端 | box AP: 45.7
    mask AP: 39.7 | [链接](configs/mask_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | +| SOLOv2 | 轻量级单阶段实例分割算法 | 云边端 | mask AP: 38.0 | [链接](configs/solov2/solov2_r50_fpn_3x_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_3x_coco.pdparams) | + +
    + +
    + 3. 关键点检测 + +| 模型名称 | 模型简介 | 推荐场景 | COCO精度(AP) | 速度 | 配置文件 | 模型下载 | +|:------------------------------------------- |:---------------------------------------------------------------- |:---------------------------------- |:----------:|:-----------------------:|:-------------------------------------------------------:|:---------------------------------------------------------------------------------------:| +| HRNet-w32 + DarkPose |
    top-down 关键点检测算法
    输入尺寸384x288
    |
    云边端
    | 78.3 | T4 TensorRT FP16 2.96ms | [链接](configs/keypoint/hrnet/dark_hrnet_w32_384x288.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_384x288.pdparams) | +| HRNet-w32 + DarkPose | top-down 关键点检测算法
    输入尺寸256x192 | 云边端 | 78.0 | T4 TensorRT FP16 1.75ms | [链接](configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams) | +| [PP-TinyPose](./configs/keypoint/tiny_pose) | 轻量级关键点算法
    输入尺寸256x192 | 移动端 | 68.8 | 骁龙865 四线程 6.30ms | [链接](configs/keypoint/tiny_pose/tinypose_256x192.yml) | [下载地址](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | +| [PP-TinyPose](./configs/keypoint/tiny_pose) | 轻量级关键点算法
    输入尺寸128x96 | 移动端 | 58.1 | 骁龙865 四线程 2.37ms | [链接](configs/keypoint/tiny_pose/tinypose_128x96.yml) | [下载地址](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | + +#### 其他关键点检测模型 [文档链接](configs/keypoint) + +
    + +
    + 4. 多目标跟踪PP-Tracking + +| 模型名称 | 模型简介 | 推荐场景 | 精度 | 配置文件 | 模型下载 | +|:--------- |:------------------------ |:---------------------------------- |:----------------------:|:---------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------:| +| DeepSORT | SDE多目标跟踪算法 检测、ReID模型相互独立 |
    云边端
    | MOT-17 half val: 66.9 | [链接](configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams) | +| ByteTrack | SDE多目标跟踪算法 仅包含检测模型 | 云边端 | MOT-17 half val: 77.3 | [链接](configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_det.pdparams) | +| JDE | JDE多目标跟踪算法 多任务联合学习方法 | 云边端 | MOT-16 test: 64.6 | [链接](configs/mot/jde/jde_darknet53_30e_1088x608.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | +| FairMOT | JDE多目标跟踪算法 多任务联合学习方法 | 云边端 | MOT-16 test: 75.0 | [链接](configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | + +#### 其他多目标跟踪模型 [文档链接](configs/mot) + +
    + +
    + 5. 产业级实时行人分析工具 + + +| 任务 | 端到端速度(ms)| 模型方案 | 模型体积 | +| :---------: | :-------: | :------: |:------: | +| 行人检测(高精度) | 25.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人检测(轻量级) | 16.2ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| 行人跟踪(高精度) | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人跟踪(轻量级) | 21.0ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| 属性识别(高精度) | 单人8.5ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | 目标检测:182M
    属性识别:86M | +| 属性识别(轻量级) | 单人7.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | 目标检测:182M
    属性识别:86M | +| 摔倒识别 | 单人10ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [关键点检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
    [基于关键点行为识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | 多目标跟踪:182M
    关键点检测:101M
    基于关键点行为识别:21.8M | +| 闯入识别 | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 打架识别 | 19.7ms | [视频分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M | +| 抽烟识别 | 单人15.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | 目标检测:182M
    基于人体id的目标检测:27M | +| 打电话识别 | 单人ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的图像分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | 目标检测:182M
    基于人体id的图像分类:45M | + + +点击模型方案中的模型即可下载指定模型 + +详细信息参考[文档](deploy/pipeline) + +
    + + +## 文档教程 ### 入门教程 - [安装说明](docs/tutorials/INSTALL_cn.md) -- [数据准备](docs/tutorials/PrepareDataSet.md) -- [30分钟上手PaddleDetecion](docs/tutorials/GETTING_STARTED_cn.md) +- [快速体验](docs/tutorials/QUICK_STARTED_cn.md) +- [数据准备](docs/tutorials/data/README.md) +- [PaddleDetection全流程使用](docs/tutorials/GETTING_STARTED_cn.md) +- [自定义数据训练](docs/tutorials/CustomizeDataTraining.md) - [FAQ/常见问题汇总](docs/tutorials/FAQ) ### 进阶教程 - 参数配置 - + - [RCNN参数说明](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md) - [PP-YOLO参数说明](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md) - 模型压缩(基于[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)) - + - [剪裁/量化/蒸馏教程](configs/slim) - [推理部署](deploy/README.md) - + - [模型导出教程](deploy/EXPORT_MODEL.md) - [Paddle Inference部署](deploy/README.md) - [Python端推理部署](deploy/python) @@ -291,51 +418,44 @@ - [推理benchmark](deploy/BENCHMARK_INFER.md) - 进阶开发 - + - [数据处理模块](docs/advanced_tutorials/READER.md) - [新增检测模型](docs/advanced_tutorials/MODEL_TECHNICAL.md) + - 二次开发教程 + - [目标检测](docs/advanced_tutorials/customization/detection.md) + - [关键点检测](docs/advanced_tutorials/customization/keypoint_detection.md) + - [多目标跟踪](docs/advanced_tutorials/customization/pphuman_mot.md) + - [行为识别](docs/advanced_tutorials/customization/pphuman_action.md) + - [属性识别](docs/advanced_tutorials/customization/pphuman_attribute.md) + +### 课程专栏 + +- **【理论基础】[目标检测7日打卡营](https://aistudio.baidu.com/aistudio/education/group/info/1617):** 目标检测任务综述、RCNN系列目标检测算法详解、YOLO系列目标检测算法详解、PP-YOLO优化策略与案例分享、AnchorFree系列算法介绍和实践 + +- **【产业实践】[AI快车道产业级目标检测技术与应用](https://aistudio.baidu.com/aistudio/education/group/info/23670):** 目标检测超强目标检测算法矩阵、实时行人分析系统PP-Human、目标检测产业应用全流程拆解与实践 + +- **【行业特色】2022.3.26 [智慧城市行业七日课](https://aistudio.baidu.com/aistudio/education/group/info/25620):** 城市规划、城市治理、智慧政务、交通管理、社区治理 + +### [产业实践范例教程](./industrial_tutorial/README.md) + +- [基于PP-TinyPose增强版的智能健身动作识别](https://aistudio.baidu.com/aistudio/projectdetail/4385813) + +- [基于PP-Human的打架识别](https://aistudio.baidu.com/aistudio/projectdetail/4086987?contributionType=1) + +- [基于PP-PicoDet增强版的路面垃圾检测](https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0) + +- [基于PP-PicoDet的通信塔识别及Android端部署](https://aistudio.baidu.com/aistudio/projectdetail/3561097) + +- [基于FairMOT实现人流量统计](https://aistudio.baidu.com/aistudio/projectdetail/2421822) + +- [更多其他范例](./industrial_tutorial/README.md) + +## 应用案例 -## 模型库 - -- 通用目标检测: - - [模型库](docs/MODEL_ZOO_cn.md) - - [PP-YOLOE模型](configs/ppyoloe/README_cn.md) - - [PP-YOLO模型](configs/ppyolo/README_cn.md) - - [PP-PicoDet模型](configs/picodet/README.md) - - [增强版Anchor Free模型TTFNet](configs/ttfnet/README.md) - - [移动端模型](static/configs/mobile/README.md) - - [676类目标检测](static/docs/featured_model/LARGE_SCALE_DET_MODEL.md) - - [两阶段实用模型PSS-Det](configs/rcnn_enhance/README.md) - - [半监督知识蒸馏预训练检测模型](docs/feature_models/SSLD_PRETRAINED_MODEL.md) -- 通用实例分割 - - [SOLOv2](configs/solov2/README.md) -- 旋转框检测 - - [S2ANet](configs/dota/README.md) -- [关键点检测](configs/keypoint) - - [PP-TinyPose](configs/keypoint/tiny_pose) - - HigherHRNet - - HRNet - - LiteHRNet -- [多目标跟踪](configs/mot/README.md) - - [PP-Tracking](deploy/pptracking/README.md) - - [DeepSORT](configs/mot/deepsort/README_cn.md) - - [JDE](configs/mot/jde/README_cn.md) - - [FairMOT](configs/mot/fairmot/README_cn.md) -- 垂类领域 - - [行人检测](configs/pedestrian/README.md) - - [车辆检测](configs/vehicle/README.md) - - [人脸检测](configs/face_detection/README.md) - - [实时行人分析](deploy/pphuman/README.md) -- 比赛冠军方案 - - [Objects365 2019 Challenge夺冠模型](static/docs/featured_model/champion_model/CACascadeRCNN.md) - - [Open Images 2019-Object Detction比赛最佳单模型](static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) - -## 应用案例 - -- [人像圣诞特效自动生成工具](static/application/christmas) - [安卓健身APP](https://github.com/zhiboniu/pose_demo_android) +- [多目标跟踪系统GUI可视化界面](https://github.com/yangyudong2020/PP-Tracking_GUi) -## 第三方教程推荐 +## 第三方教程推荐 - [PaddleDetection在Windows下的部署(一)](https://zhuanlan.zhihu.com/p/268657833) - [PaddleDetection在Windows下的部署(二)](https://zhuanlan.zhihu.com/p/280206376) @@ -343,15 +463,15 @@ - [安全帽检测YOLOv3模型在树莓派上的部署](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/yolov3_for_raspi.md) - [使用SSD-MobileNetv1完成一个项目--准备数据集到完成树莓派部署](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/ssd_mobilenet_v1_for_raspi.md) -## 版本更新 +## 版本更新 版本更新内容请参考[版本更新文档](docs/CHANGELOG.md) -## 许可证书 +## 许可证书 本项目的发布受[Apache 2.0 license](LICENSE)许可认证。 -## 贡献代码 +## 贡献代码 我们非常欢迎你可以为PaddleDetection提供代码,也十分感谢你的反馈。 @@ -359,9 +479,10 @@ - 感谢[FL77N](https://github.com/FL77N/)贡献`Sparse-RCNN`模型。 - 感谢[Chen-Song](https://github.com/Chen-Song)贡献`Swin Faster-RCNN`模型。 - 感谢[yangyudong](https://github.com/yangyudong2020), [hchhtc123](https://github.com/hchhtc123) 开发PP-Tracking GUI界面 -- 感谢[Shigure19](https://github.com/Shigure19) 开发PP-TinyPose健身APP +- 感谢Shigure19 开发PP-TinyPose健身APP +- 感谢[manangoel99](https://github.com/manangoel99)贡献Wandb可视化方式 -## 引用 +## 引用 ``` @misc{ppdet2019, diff --git a/README_en.md b/README_en.md index ad02eeeaff1bd594640681c2452e2979c65bc1bf..9fc92960dc4c3b336d12a0a5bce682dd2e101300 100644 --- a/README_en.md +++ b/README_en.md @@ -1,39 +1,85 @@ -English | [简体中文](README_cn.md) +[简体中文](README_cn.md) | English +
    +

    + +

    + +**A High-Efficient Development Toolkit for Object Detection based on [PaddlePaddle](https://github.com/paddlepaddle/paddle)** + +

    + + + + + +

    +
    + +
    + + +
    + +## Product Update -# Product news +- 🔥 **2022.8.09:Release [YOLO series model zoo](https://github.com/nemonameless/PaddleDetection_YOLOSeries)** + - Comprehensive coverage of classic and latest models of the YOLO series: Including YOLOv3,Paddle real-time object detection model PP-YOLOE, and frontier detection algorithms YOLOv4, YOLOv5, YOLOX, MT-YOLOv6 and YOLOv7 + - Better model performance:Upgrade based on various YOLO algorithms, shorten training time in 5-8 times and the accuracy is generally improved by 1%-5% mAP. The model compression strategy is used to achieve 30% improvement in speed without precision loss + - Complete end-to-end development support:End-to-end development pipieline including training, evaluation, inference, model compression and deployment on various hardware. Meanwhile, support flexible algorithnm switch and implement customized development efficiently -- 2021.11.03: Release [release/2.3](https://github.com/PaddlePaddle/Paddleetection/tree/release/2.3) version. Release mobile object detection model ⚡[PP-PicoDet](configs/picodet), mobile keypoint detection model ⚡[PP-TinyPose](configs/keypoint/tiny_pose),Real-time tracking system [PP-Tracking](deploy/pptracking). Release object detection models, including [Swin-Transformer](configs/faster_rcnn), [TOOD](configs/tood), [GFL](configs/gfl), release [Sniper](configs/sniper) tiny object detection models and optimized [PP-YOLO-EB](configs/ppyolo) model for EdgeBoard. Release mobile keypoint detection model [Lite HRNet](configs/keypoint). -- 2021.08.10: Release [release/2.2](https://github.com/PaddlePaddle/Paddleetection/tree/release/2.2) version. Release Transformer object detection models, including [DETR](configs/detr), [Deformable DETR](configs/deformable_detr), [Sparse RCNN](configs/sparse_rcnn). Release [keypoint detection](configs/keypoint) models, including DarkHRNet and model trained on MPII dataset. Release [head-tracking](configs/mot/headtracking21) and [vehicle-tracking](configs/mot/vehicle) multi-object tracking models. -- 2021.05.20: Release [release/2.1](https://github.com/PaddlePaddle/Paddleetection/tree/release/2.1) version. Release [Keypoint Detection](configs/keypoint), including HigherHRNet and HRNet, [Multi-Object Tracking](configs/mot), including DeepSORT,JDE and FairMOT. Release model compression for PPYOLO series models.Update documents such as [EXPORT ONNX MODEL](deploy/EXPORT_ONNX_MODEL.md). +- 🔥 **2022.8.01:Release [PP-TinyPose plus](./configs/keypoint/tiny_pose/). The end-to-end precision improves 9.1% AP in dataset + of fitness and dance scenes** + - Increase data of sports scenes, and the recognition performance of complex actions is significantly improved, covering actions such as sideways, lying down, jumping, and raising legs + - Detection model uses PP-PicoDet plus and the precision on COCO dataset is improved by 3.1% mAP + - The stability of keypoints is enhanced. Implement the filter stabilization method to make the video prediction result more stable and smooth. +- 2022.7.14:Release [pedestrian analysis tool PP-Human v2](./deploy/pipeline) + - Four major functions: five complicated action recognition with high performance and Flexible, real-time human attribute recognition, visitor flow statistics and high-accuracy multi-camera tracking. + - High performance algorithm: including pedestrian detection, tracking, attribute recognition which is robust to the number of targets and the variant of background and light. + - Highly Flexible: providing complete introduction of end-to-end development and optimization strategy, simple command for deployment and compatibility with different input format. -# Introduction +- 2022.3.24:PaddleDetection released[release/2.4 version](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4) + - Release high-performanace SOTA object detection model [PP-YOLOE](configs/ppyoloe). It integrates cloud and edge devices and provides S/M/L/X versions. In particular, Verson L has the accuracy as 51.4% on COCO test 2017 dataset, inference speed as 78.1 FPS on a single Test V100. It supports mixed precision training, 33% faster than PP-YOLOv2. Its full range of multi-sized models can meet different hardware arithmetic requirements, and adaptable to server, edge-device GPU and other AI accelerator cards on servers. + - Release ultra-lightweight SOTA object detection model [PP-PicoDet Plus](configs/picodet) with 2% improvement in accuracy and 63% improvement in CPU inference speed. Add PicoDet-XS model with a 0.7M parameter, providing model sparsification and quantization functions for model acceleration. No specific post processing module is required for all the hardware, simplifying the deployment. + - Release the real-time pedestrian analysis tool [PP-Human](deploy/pphuman). It has four major functions: pedestrian tracking, visitor flow statistics, human attribute recognition and falling detection. For falling detection, it is optimized based on real-life data with accurate recognition of various types of falling posture. It can adapt to different environmental background, light and camera angle. + - Add [YOLOX](configs/yolox) object detection model with nano/tiny/S/M/L/X. X version has the accuracy as 51.8% on COCO Val2017 dataset. -PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle, which implements varied mainstream object detection, instance segmentation, tracking and keypoint detection algorithms in modular designwhich with configurable modules such as network components, data augmentations and losses, and release many kinds SOTA industry practice models, integrates abilities of model compression and cross-platform high-performance deployment, aims to help developers in the whole end-to-end development in a faster and better way. +- [More releases](https://github.com/PaddlePaddle/PaddleDetection/releases) -### PaddleDetection provides image processing capabilities such as object detection, instance segmentation, multi-object tracking, keypoint detection and etc. +## Brief Introduction -
    - +**PaddleDetection** is an end-to-end object detection development kit based on PaddlePaddle. Providing **over 30 model algorithm** and **over 250 pre-trained models**, it covers object detection, instance segmentation, keypoint detection, multi-object tracking. In particular, PaddleDetection offers **high- performance & light-weight** industrial SOTA models on **servers and mobile** devices, champion solution and cutting-edge algorithm. PaddleDetection provides various data augmentation methods, configurable network components, loss functions and other advanced optimization & deployment schemes. In addition to running through the whole process of data processing, model development, training, compression and deployment, PaddlePaddle also provides rich cases and tutorials to accelerate the industrial application of algorithm. + +
    +
    -### Features -- **Rich Models** -PaddleDetection provides rich of models, including **100+ pre-trained models** such as **object detection**, **instance segmentation**, **face detection** etc. It covers a variety of **global competition champion** schemes. +## Features + +- **Rich model library**: PaddleDetection provides over 250 pre-trained models including **object detection, instance segmentation, face recognition, multi-object tracking**. It covers a variety of **global competition champion** schemes. +- **Simple to use**: Modular design, decoupling each network component, easy for developers to build and try various detection models and optimization strategies, quick access to high-performance, customized algorithm. +- **Getting Through End to End**: PaddlePaddle gets through end to end from data augmentation, constructing models, training, compression, depolyment. It also supports multi-architecture, multi-device deployment for **cloud and edge** device. +- **High Performance**: Due to the high performance core, PaddlePaddle has clear advantages in training speed and memory occupation. It also supports FP16 training and multi-machine training. + +
    + newstructure +
    Exchanges -- **Production Ready:** -From data augmentation, constructing models, training, compression, depolyment, get through end to end, and complete support for multi-architecture, multi-device deployment for **cloud and edge device**. +- If you have any question or suggestion, please give us your valuable input via [GitHub Issues](https://github.com/PaddlePaddle/PaddleDetection/issues) -- **High Performance:** -Based on the high performance core of PaddlePaddle, advantages of training speed and memory occupation are obvious. FP16 training and multi-machine training are supported as well. + Welcome to join PaddleDetection user groups on QQ, WeChat (scan the QR code, add and reply "D" to the assistant) + +
    + + +
    -#### Overview of Kit Structures +## Kit Structure @@ -54,115 +100,127 @@ Based on the high performance core of PaddlePaddle, advantages of training speed -
      -
    • Object Detection
    • +
      Object Detection
      • Faster RCNN
      • FPN
      • Cascade-RCNN
      • -
      • Libra RCNN
      • -
      • Hybrid Task RCNN
      • PSS-Det
      • RetinaNet
      • -
      • YOLOv3
      • -
      • YOLOv4
      • +
      • YOLOv3
      • PP-YOLOv1/v2
      • PP-YOLO-Tiny
      • +
      • PP-YOLOE
      • +
      • YOLOX
      • SSD
      • -
      • CornerNet-Squeeze
      • +
      • CenterNet
      • FCOS
      • TTFNet
      • +
      • TOOD
      • +
      • GFL
      • PP-PicoDet
      • DETR
      • Deformable DETR
      • Swin Transformer
      • Sparse RCNN
      • -
      -
    • Instance Segmentation
    • -
        +
      +
      Instance Segmentation +
      • Mask RCNN
      • +
      • Cascade Mask RCNN
      • SOLOv2
      • -
      -
    • Face Detection
    • +
    +
    Face Detection
      -
    • FaceBoxes
    • BlazeFace
    • -
    • BlazeFace-NAS
    • -
    -
  • Multi-Object-Tracking
  • +
    +
    Multi-Object-Tracking
    • JDE
    • FairMOT
    • -
    • DeepSort
    • -
    -
  • KeyPoint-Detection
  • +
  • DeepSORT
  • +
  • ByteTrack
  • +
    +
    KeyPoint-Detection
    • HRNet
    • HigherHRNet
    • -
    +
  • Lite-HRNet
  • +
  • PP-TinyPose
  • +
    +
    Details
    • ResNet(&vd)
    • -
    • ResNeXt(&vd)
    • +
    • Res2Net(&vd)
    • +
    • CSPResNet
    • SENet
    • Res2Net
    • HRNet
    • -
    • Hourglass
    • -
    • CBNet
    • -
    • GCNet
    • +
    • Lite-HRNet
    • DarkNet
    • CSPDarkNet
    • -
    • VGG
    • MobileNetv1/v3
    • +
    • ShuffleNet
    • GhostNet
    • -
    • Efficientnet
    • -
    • BlazeNet
    • -
    +
  • BlazeNet
  • +
  • DLA
  • +
  • HardNet
  • +
  • LCNet
  • +
  • ESNet
  • +
  • Swin-Transformer
  • +
    -
    • Common
    • +
      Common
      • Sync-BN
      • Group Norm
      • DCNv2
      • -
      • Non-local
      • -
      +
    • EMA
    • +
    -
    • KeyPoint
    • +
      KeyPoint
      • DarkPose
      • -
      +
    -
    • FPN
    • +
      FPN
      • BiFPN
      • -
      • BFP
      • +
      • CSP-PAN
      • +
      • Custom-PAN
      • +
      • ES-PAN
      • HRFPN
      • -
      • ACFPN
      • -
      +
    -
    • Loss
    • +
      Loss
      • Smooth-L1
      • GIoU/DIoU/CIoU
      • IoUAware
      • -
      +
    • Focal Loss
    • +
    • CT Focal Loss
    • +
    • VariFocal Loss
    • +
    -
    • Post-processing
    • +
      Post-processing
      • SoftNMS
      • MatrixNMS
      • -
      +
    -
    • Speed
    • +
      Speed
      • FP16 training
      • Multi-machine training
      • -
      +
    +
    Details
    • Resize
    • Lighting
    • @@ -172,144 +230,253 @@ Based on the high performance core of PaddlePaddle, advantages of training speed
    • Color Distort
    • Random Erasing
    • Mixup
    • +
    • AugmentHSV
    • Mosaic
    • Cutmix
    • Grid Mask
    • Auto Augment
    • Random Perspective
    • -
    +
    -#### Overview of Model Performance +## Model Performance + +
    + Performance comparison of Cloud models -The relationship between COCO mAP and FPS on Tesla V100 of representative models of each server side architectures and backbones. +The comparison between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones.
    -
    - - **NOTE:** - - - `CBResNet stands` for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3% +
    - - `Cascade-Faster-RCNN` stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8% in PaddleDetection models +**Clarification:** - - `PP-YOLO` achieves mAP of 45.9% on COCO and 72.9FPS on Tesla V100. Both precision and speed surpass [YOLOv4](https://arxiv.org/abs/2004.10934) +- `CBResNet` stands for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3% +- `Cascade-Faster-RCNN`stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8% in PaddleDetection models +- `PP-YOLO` reached accuracy as 45.9% on COCO dataset, inference speed as 72.9 FPS on Tesla V100, higher than [YOLOv4]([[2004.10934] YOLOv4: Optimal Speed and Accuracy of Object Detection](https://arxiv.org/abs/2004.10934)) in terms of speed and accuracy +- `PP-YOLO v2`are optimized `PP-YOLO`. It reached accuracy as 49.5% on COCO dataset, inference speed as 68.9 FPS on Tesla V100. +- `PP-YOLOE`are optimized `PP-YOLO v2`. It reached accuracy as 51.4% on COCO dataset, inference speed as 78.1 FPS on Tesla V100 +- The models in the figure are available in the[ model library](#模型库) - - `PP-YOLO v2` is optimized version of `PP-YOLO` which has mAP of 49.5% and 68.9FPS on Tesla V100 + - - All these models can be get in [Model Zoo](#ModelZoo) +
    + Performance omparison on mobiles -The relationship between COCO mAP and FPS on Qualcomm Snapdragon 865 of representative mobile side models. +The comparison between COCO mAP and FPS on Qualcomm Snapdragon 865 processor of models on mobile devices.
    - +
    -**NOTE:** +**Clarification:** + +- Tests were conducted on Qualcomm Snapdragon 865 (4 \*A77 + 4 \*A55) batch_size=1, 4 thread, and NCNN inference library, test script see [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark) +- [PP-PicoDet](configs/picodet) and [PP-YOLO-Tiny](configs/ppyolo) are self-developed models of PaddleDetection, and other models are not tested yet. + +
    + +## Model libraries + +
    + 1. General detection + +#### PP-YOLOE series Recommended scenarios: Cloud GPU such as Nvidia V100, T4 and edge devices such as Jetson series + +| Model | COCO Accuracy(mAP) | V100 TensorRT FP16 Speed(FPS) | Configuration | Download | +|:---------- |:------------------:|:-----------------------------:|:-------------------------------------------------------:|:----------------------------------------------------------------------------------------:| +| PP-YOLOE-s | 42.7 | 333.3 | [Link](configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | +| PP-YOLOE-m | 48.6 | 208.3 | [Link](configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | +| PP-YOLOE-l | 50.9 | 149.2 | [Link](configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | +| PP-YOLOE-x | 51.9 | 95.2 | [Link](configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | + +#### PP-PicoDet series Recommended scenarios: Mobile chips and x86 CPU devices, such as ARM CPU(RK3399, Raspberry Pi) and NPU(BITMAIN) + +| Model | COCO Accuracy(mAP) | Snapdragon 865 four-thread speed (ms) | Configuration | Download | +|:---------- |:------------------:|:-------------------------------------:|:-----------------------------------------------------:|:-------------------------------------------------------------------------------------:| +| PicoDet-XS | 23.5 | 7.81 | [Link](configs/picodet/picodet_xs_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_xs_320_coco_lcnet.pdparams) | +| PicoDet-S | 29.1 | 9.56 | [Link](configs/picodet/picodet_s_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams) | +| PicoDet-M | 34.4 | 17.68 | [Link](configs/picodet/picodet_m_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco_lcnet.pdparams) | +| PicoDet-L | 36.1 | 25.21 | [Link](configs/picodet/picodet_l_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) | + +#### Frontier detection algorithm + +| Model | COCO Accuracy(mAP) | V100 TensorRT FP16 speed(FPS) | Configuration | Download | +|:-------- |:------------------:|:-----------------------------:|:--------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------:| +| YOLOX-l | 50.1 | 107.5 | [Link](configs/yolox/yolox_l_300e_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams) | +| YOLOv5-l | 48.6 | 136.0 | [Link](https://github.com/nemonameless/PaddleDetection_YOLOv5/blob/main/configs/yolov5/yolov5_l_300e_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/yolov5_l_300e_coco.pdparams) | + +#### Other general purpose models [doc](docs/MODEL_ZOO_cn.md) + +
    + +
    + 2. Instance segmentation + +| Model | Introduction | Recommended Scenarios | COCO Accuracy(mAP) | Configuration | Download | +|:----------------- |:-------------------------------------------------------- |:--------------------------------------------- |:--------------------------------:|:-----------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------:| +| Mask RCNN | Two-stage instance segmentation algorithm |
    Edge-Cloud end
    | box AP: 41.4
    mask AP: 37.5 | [Link](configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams) | +| Cascade Mask RCNN | Two-stage instance segmentation algorithm |
    Edge-Cloud end
    | box AP: 45.7
    mask AP: 39.7 | [Link](configs/mask_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | +| SOLOv2 | Lightweight single-stage instance segmentation algorithm |
    Edge-Cloud end
    | mask AP: 38.0 | [Link](configs/solov2/solov2_r50_fpn_3x_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_3x_coco.pdparams) | + +
    + +
    + 3. Keypoint detection -- All data tested on Qualcomm Snapdragon 865(4\*A77 + 4\*A55) processor with batch size of 1 and CPU threads of 4, and use NCNN library in testing, benchmark scripts is publiced at [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark) -- [PP-PicoDet](configs/picodet) and [PP-YOLO-Tiny](configs/ppyolo) are developed and released by PaddleDetection, other models are not provided in PaddleDetection. +| Model | Introduction | Recommended scenarios | COCO Accuracy(AP) | Speed | Configuration | Download | +|:-------------------- |:--------------------------------------------------------------------------------------------- |:--------------------------------------------- |:-----------------:|:---------------------------------:|:---------------------------------------------------------:|:-------------------------------------------------------------------------------------------:| +| HRNet-w32 + DarkPose |
    Top-down Keypoint detection algorithm
    Input size: 384x288
    |
    Edge-Cloud end
    | 78.3 | T4 TensorRT FP16 2.96ms | [Link](configs/keypoint/hrnet/dark_hrnet_w32_384x288.yml) | [Download](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_384x288.pdparams) | +| HRNet-w32 + DarkPose | Top-down Keypoint detection algorithm
    Input size: 256x192 | Edge-Cloud end | 78.0 | T4 TensorRT FP16 1.75ms | [Link](configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml) | [Download](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams) | +| PP-TinyPose | Light-weight keypoint algorithm
    Input size: 256x192 | Mobile | 68.8 | Snapdragon 865 four-thread 6.30ms | [Link](configs/keypoint/tiny_pose/tinypose_256x192.yml) | [Download](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | +| PP-TinyPose | Light-weight keypoint algorithm
    Input size: 128x96 | Mobile | 58.1 | Snapdragon 865 four-thread 2.37ms | [Link](configs/keypoint/tiny_pose/tinypose_128x96.yml) | [Download](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | -## Tutorials +#### Other keypoint detection models [doc](configs/keypoint) -### Get Started +
    -- [Installation guide](docs/tutorials/INSTALL.md) -- [Prepare dataset](docs/tutorials/PrepareDataSet_en.md) -- [Quick start on PaddleDetection](docs/tutorials/GETTING_STARTED.md) +
    + 4. Multi-object tracking PP-Tracking +| Model | Introduction | Recommended scenarios | Accuracy | Configuration | Download | +|:--------- |:------------------------------------------------------------- |:--------------------- |:----------------------:|:-----------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------:| +| DeepSORT | SDE Multi-object tracking algorithm, independent ReID models | Edge-Cloud end | MOT-17 half val: 66.9 | [Link](configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml) | [Download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams) | +| ByteTrack | SDE Multi-object tracking algorithm with detection model only | Edge-Cloud end | MOT-17 half val: 77.3 | [Link](configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml) | [Download](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_det.pdparams) | +| JDE | JDE multi-object tracking algorithm multi-task learning | Edge-Cloud end | MOT-16 test: 64.6 | [Link](configs/mot/jde/jde_darknet53_30e_1088x608.yml) | [Download](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | +| FairMOT | JDE multi-object tracking algorithm multi-task learning | Edge-Cloud end | MOT-16 test: 75.0 | [Link](configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) | [Download](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | -### Advanced Tutorials +#### Other multi-object tracking models [docs](configs/mot) -- Parameter configuration - - [Parameter configuration for RCNN model](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation_en.md) - - [Parameter configuration for PP-YOLO model](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md) +
    -- Model Compression(Based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)) - - [Prune/Quant/Distill](configs/slim) +
    + 5. Industrial real-time pedestrain analysis tool-PP Human -- Inference and deployment - - [Export model for inference](deploy/EXPORT_MODEL_en.md) - - [Paddle Inference](deploy/README_en.md) - - [Python inference](deploy/python) - - [C++ inference](deploy/cpp) - - [Paddle-Lite](deploy/lite) - - [Paddle Serving](deploy/serving) - - [Export ONNX model](deploy/EXPORT_ONNX_MODEL_en.md) - - [Inference benchmark](deploy/BENCHMARK_INFER_en.md) - - [Exporting to ONNX and using OpenVINO for inference](docs/advanced_tutorials/openvino_inference/README.md) +| Function \ Model | Obejct detection | Multi- object tracking | Attribute recognition | Keypoint detection | Action recognition | ReID | +|:------------------------------------ |:-------------------------------------------------------------------------------------- |:-------------------------------------------------------------------------------------- |:-----------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------:|:-----------------------------------------------------------------:|:----------------------------------------------------------------------:| +| Pedestrian Detection | [✅](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | | | | | +| Pedestrian Tracking | | [✅](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | | | | +| Attribute Recognition (Image) | [✅](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | [✅](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | | | | +| Attribute Recognition (Video) | | [✅](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | | | | +| Falling Detection | | [✅](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | [✅](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | [✅](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | | +| ReID | | [✅](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | | | | [✅](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) | +| **Accuracy** | mAP 56.3 | MOTA 72.0 | mA 94.86 | AP 87.1 | AP 96.43 | mAP 98.8 | +| **T4 TensorRT FP16 Inference speed** | 28.0ms | 33.1ms | Single person 2ms | Single person 2.9ms | Single person 2.7ms | Single person 1.5ms | + +
    + +**Click “ ✅ ” to download** + +## Document tutorials + +### Introductory tutorials + +- [Installation](docs/tutorials/INSTALL_cn.md) +- [Quick start](docs/tutorials/QUICK_STARTED_cn.md) +- [Data preparation](docs/tutorials/data/README.md) +- [Geting Started on PaddleDetection](docs/tutorials/GETTING_STARTED_cn.md) +- [Customize data training]((docs/tutorials/CustomizeDataTraining.md) +- [FAQ]((docs/tutorials/FAQ) + +### Advanced tutorials + +- Configuration + + - [RCNN Configuration](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md) + - [PP-YOLO Configuration](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md) + +- Compression based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) + + - [Pruning/Quantization/Distillation Tutorial](configs/slim) + +- [Inference deployment](deploy/README.md) + + - [Export model for inference](deploy/EXPORT_MODEL.md) + + - [Paddle Inference deployment](deploy/README.md) + + - [Inference deployment with Python](deploy/python) + - [Inference deployment with C++](deploy/cpp) + + - [Paddle-Lite deployment](deploy/lite) + + - [Paddle Serving deployment](deploy/serving) + + - [ONNX model export](deploy/EXPORT_ONNX_MODEL.md) + + - [Inference benchmark](deploy/BENCHMARK_INFER.md) - Advanced development - - [New data augmentations](docs/advanced_tutorials/READER_en.md) - - [New detection algorithms](docs/advanced_tutorials/MODEL_TECHNICAL_en.md) - - -## Model Zoo - -- Universal object detection - - [Model library and baselines](docs/MODEL_ZOO_cn.md) - - [PP-YOLO](configs/ppyolo/README.md) - - [PP-PicoDet](configs/picodet/README.md) - - [Enhanced Anchor Free model--TTFNet](configs/ttfnet/README_en.md) - - [Mobile models](static/configs/mobile/README_en.md) - - [676 classes of object detection](static/docs/featured_model/LARGE_SCALE_DET_MODEL_en.md) - - [Two-stage practical PSS-Det](configs/rcnn_enhance/README_en.md) - - [SSLD pretrained models](docs/feature_models/SSLD_PRETRAINED_MODEL_en.md) -- Universal instance segmentation - - [SOLOv2](configs/solov2/README.md) -- Rotation object detection - - [S2ANet](configs/dota/README_en.md) -- [Keypoint detection](configs/keypoint) - - [PP-TinyPose](configs/keypoint/tiny_pose) - - HigherHRNet - - HRNet - - LiteHRNet -- [Multi-Object Tracking](configs/mot/README.md) - - [PP-Tracking](deploy/pptracking/README.md) - - [DeepSORT](configs/mot/deepsort/README.md) - - [JDE](configs/mot/jde/README.md) - - [FairMOT](configs/mot/fairmot/README.md) -- Vertical field - - [Face detection](configs/face_detection/README_en.md) - - [Pedestrian detection](configs/pedestrian/README.md) - - [Vehicle detection](configs/vehicle/README.md) -- Competition Plan - - [Objects365 2019 Challenge champion model](static/docs/featured_model/champion_model/CACascadeRCNN_en.md) - - [Best single model of Open Images 2019-Object Detection](static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL_en.md) - -## Applications - -- [Christmas portrait automatic generation tool](static/application/christmas) -- [Android Fitness Demo](https://github.com/zhiboniu/pose_demo_android) - -## Updates - -Updates please refer to [change log](docs/CHANGELOG_en.md) for details. - - -## License - -PaddleDetection is released under the [Apache 2.0 license](LICENSE). - - -## Contributing - -Contributions are highly welcomed and we would really appreciate your feedback!! -- Thanks [Mandroide](https://github.com/Mandroide) for cleaning the code and unifying some function interface. -- Thanks [FL77N](https://github.com/FL77N/) for contributing the code of `Sparse-RCNN` model. -- Thanks [Chen-Song](https://github.com/Chen-Song) for contributing the code of `Swin Faster-RCNN` model. -- Thanks [yangyudong](https://github.com/yangyudong2020), [hchhtc123](https://github.com/hchhtc123) for contributing PP-Tracking GUI interface. -- Thanks [Shigure19](https://github.com/Shigure19) for contributing PP-TinyPose fitness APP. - -## Citation + + - [Data processing module](docs/advanced_tutorials/READER.md) + - [New object detection models](docs/advanced_tutorials/MODEL_TECHNICAL.md) + - Custumization + - [Object detection](docs/advanced_tutorials/customization/detection.md) + - [Keypoint detection](docs/advanced_tutorials/customization/keypoint_detection.md) + - [Multiple object tracking](docs/advanced_tutorials/customization/pphuman_mot.md) + - [Action recognition](docs/advanced_tutorials/customization/pphuman_action.md) + - [Attribute recognition](docs/advanced_tutorials/customization/pphuman_attribute.md) + +### Courses + +- **[Theoretical foundation] [Object detection 7-day camp](https://aistudio.baidu.com/aistudio/education/group/info/1617):** Overview of object detection tasks, details of RCNN series object detection algorithm and YOLO series object detection algorithm, PP-YOLO optimization strategy and case sharing, introduction and practice of AnchorFree series algorithm + +- **[Industrial application] [AI Fast Track industrial object detection technology and application](https://aistudio.baidu.com/aistudio/education/group/info/23670):** Super object detection algorithms, real-time pedestrian analysis system PP-Human, breakdown and practice of object detection industrial application + +- **[Industrial features] 2022.3.26** **[Smart City Industry Seven-Day Class](https://aistudio.baidu.com/aistudio/education/group/info/25620)** : Urban planning, Urban governance, Smart governance service, Traffic management, community governance. + +### [Industrial tutorial examples](./industrial_tutorial/README.md) + +- [Intelligent fitness recognition based on PP-TinyPose Plus](https://aistudio.baidu.com/aistudio/projectdetail/4385813) + +- [Road litter detection based on PP-PicoDet Plus](https://aistudio.baidu.com/aistudio/projectdetail/3561097) + +- [Communication tower detection based on PP-PicoDet and deployment on Android](https://aistudio.baidu.com/aistudio/projectdetail/3561097) + +- [Visitor flow statistics based on FairMOT](https://aistudio.baidu.com/aistudio/projectdetail/2421822) + +- [More examples](./industrial_tutorial/README.md) + +## Applications + +- [Fitness app on android mobile](https://github.com/zhiboniu/pose_demo_android) +- [PP-Tracking GUI Visualization Interface](https://github.com/yangyudong2020/PP-Tracking_GUi) + +## Recommended third-party tutorials + +- [Deployment of PaddleDetection for Windows I ](https://zhuanlan.zhihu.com/p/268657833) +- [Deployment of PaddleDetection for Windows II](https://zhuanlan.zhihu.com/p/280206376) +- [Deployment of PaddleDetection on Jestson Nano](https://zhuanlan.zhihu.com/p/319371293) +- [How to deploy YOLOv3 model on Raspberry Pi for Helmet detection](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/yolov3_for_raspi.md) +- [Use SSD-MobileNetv1 for a project -- From dataset to deployment on Raspberry Pi](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/ssd_mobilenet_v1_for_raspi.md) + +## Version updates + +Please refer to the[ Release note ](https://github.com/PaddlePaddle/Paddle/wiki/PaddlePaddle-2.3.0-Release-Note-EN)for more details about the updates + +## License + +PaddlePaddle is provided under the [Apache 2.0 license](LICENSE) + +## Contribute your code + +We appreciate your contributions and your feedback! + +- Thank [Mandroide](https://github.com/Mandroide) for code cleanup and +- Thank [FL77N](https://github.com/FL77N/) for `Sparse-RCNN`model +- Thank [Chen-Song](https://github.com/Chen-Song) for `Swin Faster-RCNN`model +- Thank [yangyudong](https://github.com/yangyudong2020), [hchhtc123](https://github.com/hchhtc123) for developing PP-Tracking GUI interface +- Thank Shigure19 for developing PP-TinyPose fitness APP +- Thank [manangoel99](https://github.com/manangoel99) for Wandb visualization methods + +## Quote ``` @misc{ppdet2019, diff --git "a/activity/\347\233\264\346\222\255\347\255\224\347\226\221\347\254\254\344\270\200\346\234\237.md" "b/activity/\347\233\264\346\222\255\347\255\224\347\226\221\347\254\254\344\270\200\346\234\237.md" new file mode 100644 index 0000000000000000000000000000000000000000..f94f0dd09941474558bf9dc6baac0669c8ded9c3 --- /dev/null +++ "b/activity/\347\233\264\346\222\255\347\255\224\347\226\221\347\254\254\344\270\200\346\234\237.md" @@ -0,0 +1,125 @@ +# 直播答疑第一期 + +### 答疑全程回放可以通过链接下载观看:https://pan.baidu.com/s/168ouju4MxN5XJEb-GU1iAw 提取码: 92mw + +## PaddleDetection框架/API问题 + +#### Q1. warmup能详细讲解下吗? +A1. warmup是在训练初期学习率从0调整至预设学习率的过程,设置可以参考[源码](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/ppdet/optimizer.py#L156),可以设置step数或epoch数 + +#### Q2. 如果类别不匹配 也能用pretrain weights吗? +A2. 可以,类别不匹配时,模型会自动不加载shape不匹配的权重,通常和类别数相关的权重位于head层 + +#### Q3. 请问nms_eta怎么用呀,源码上没有写的很清楚,API文档也没有细说 +A3. 针对密集的场景,nms_eta会在每轮动态的调整nms阈值,避免过滤掉两个重叠程度很高但是属于不同物体的检测框,具体可以参考[源码](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/detection/multiclass_nms_op.cc#L139),默认为1,通常无需设置 + +#### Q4. 请问anchor_cluster.py中的--size 是模型的input size 还是 实际使用图片的size? +A4. 是实际推理时的图片尺寸,一般可以参考TestReader中的image_shape的设置。 + +#### Q5. 请问为什么预测的坐标会出现负的值? +A5. 模型算法中是有可能负值的情况,首先需要判断模型预测效果是否符合预期,如果正常可以考虑在后处理中增加clip的操作限制输出box在图像中;如果不正常,说明模型训练效果欠佳,需要进一步排查问题或调优 + +#### Q6. PaddleDetection 人脸检测blazeface模型,一键式预测时load_params没有参数文件,从哪里下载? +A6. blazeface的模型可以在[模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/face_detection#%E6%A8%A1%E5%9E%8B%E5%BA%93)中下载到,如果想部署需要参考[步骤](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/EXPORT_MODEL.md) 导出模型 + +## PP-YOLOE问题 +#### Q1. 训练PP-YOLOE的时候,loss是越训练越高这种情况 是数据集的问题吗? +A1. 可以从以下几个方面排查 + +1. 数据: 首先确认数据集没问题,包括标注,类别等 +2. 超参数:base_lr根据batch_size调整,遵守线性原则;warmup_iters根据总的epoch数进行调整 +3. 预训练参数:可以加载官方提供的自在coco数据集上的预训练参数 +4. 网络结构方面:分析下box的分布情况 适当调整dfl的参数 + +#### Q2. 检测模型选型问题:PicoDet、PP-YOLO系列如何选型 +A2. PicoDet是针对移动端设备设计的模型,是针对arm,x86等低算力设备上设计;PP-YOLO是针对服务器端设计的模型,英伟达N卡,百度昆仑卡等。手机端,无gpu桌面端,优先PicoDet;有高算力设备,如N卡,优先PP-YOLO系列;对延时不敏感的场景,更注重高精度,优先PP-YOLO系列 + +#### Q3. ConvBNLayer中BN层的参数都不会使用L2Decay;PP-YOLOE-s的其它部分都会按照配置文件的设置使用0.0005的L2Decay。是这样吗 +A3. PP-YOLOE的backbone和neck部分使用了ConvBNLayer,其中BN层不会使用L2Decay,其他部分使用全局设置的0.0005的L2Decay + +#### Q4. PP-YOLOE的Conv的bias也不使用decay吗? +A4. PP-YOLOE的backbone和neck部分的Conv是没有bias参数的,head部分的Conv bias使用全局decay + +#### Q5. 在测速时,为什么要用PaddleInference而不是直接加载模型测时间呢 +A5. PaddleInference会将paddle导出的预测模型会前向算子做融合,从而实现速度优化,并且实际部署过程也是使用PaddleInference实现 + +#### Q6. PP-YOLOE系列在部署的时候,前后处理是不是一样的啊? +A6. PP-YOLO系列模型在部署时的前处理都是 decode-resize-nomalize-permute的流程,后处理方面PP-YOLOv2使用了Matrix NMS,PP-YOLOE使用的是普通的NMS算法 + +#### Q7. 针对小目标和类别不平衡的数据集,PP-YOLOE有什么调整策略吗 +A7 针对小目标数据集,可以适当增大ppyoloe的输入尺寸,然后在模型中增加注意力机制,目前基于PP-YOLOE的小目标检测正在开发中;针对类别不平衡问题,可以从数据采样的角度处理,目前PP-YOLOE还没有专门针对类别不平衡问题的优化 + +## PP-Human问题 +#### Q1. 请问pphuman用导出的模型18个点(不是官方17个点)去预测时,报错是问什么 +A1. 这个问题是关键点模型输出点的数量与行为识别模型不一致导致的。如果希望用18点模型预测,除了关键点用18点模型以外,还需要自建18点的动作识别模型。 + +#### Q2. 为什么官方导出模型设置的window_size是50 +A2. 导出模型的设置与训练和预测的输入数据长度是一致的;我们主要采用的数据集是ntu、企业提供的实际数据等等。在训练这个模型的时候,我们对这些数据中摔倒的片段做了统计分析,基本上每个动作片段持续的帧数大约是40~80左右。综合考虑到实际使用的延迟以及预测效果,我们选择了50这个量级,在我们的这部分数据上既能完整描述一个完整动作,又不会使得延迟过大。 + +总的来说,这个window_size的数值最好还是根据实际动作以及设备的情况进行选择。例如在某种设备上,50帧的长度根本不足以包含一个完整的动作,那么这个数值就需要扩大;又或者某些动作持续时间很短,50帧的长度包含了太多不相关的其他动作,容易造成误识别,那么这个数值可以适当缩小。 + + +#### Q3. PP-Human中如何替换检测、跟踪、关键点模型 +A3. 我们使用的模型都是PaddleDetection中模型进行导出得到的。理论上PP-Human所使用的模型都是可以直接替换的,但是需要注意是流程和前后处理一样的模型。 + +#### Q4. PP-Human中的数据标注问题(检测、跟踪、关键点、行为、属性)标注工具推荐和标注步骤 +A4. 标注工具:检测 labelme, labelImg, cvat; 跟踪darklabel,cvat;关键点 labelme,cvat。检测标注可以使用tools/x2coco.py转换成coco格式 + +#### Q5. PP-Human中如何更改label(属性和动作识别) +A5. 在PPHuman中,动作识别被定义为基于骨骼点序列的分类问题,目前我们已经开源的摔倒动作识别是一个二分类问题;属性方面我们当前还暂时没有开放训练,正在建设中 + +#### Q6. PP-Human的哪些功能支持单人、哪些支持多人 +A6. PP-Human的功能实现基于一套流程:检测->跟踪->具体功能。当前我们的具体功能模型每次处理的是单人的,即属性、动作等都是属于图像中每一个具体人的。但是基于这套流程下来,图像中的每一个人都得到了处理的。所以单人、多人实际都是一样支持的。 + +#### Q7. PP-Human对视频流预测的支持及服务化部署 +A7. 目前正在建设当中,下个版本会支持这部分功能 + +#### Q8. 在使用pphuman训练自己的数据集时,训练完进行测试时,可视化的标签如何更改,没有更改的情况下还是falling + +A8. 可视化的函数位于https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/visualize.py#L368,这里在可视化的时候将 action_text替换为期望的类别即可。 + +#### Q9. 关键点检测可以实现一个连贯动作的检测吗,比如健身规范 +A9. 基于关键点是可以实现的。这里可以有不同思路去做: + +1. 如果是期望判定动作规范的程度,且这个动作可以很好的描述。那么可以在关键点模型获得的坐标的基础上,人工增加逻辑判断即可。这里我们提供一个安卓的健身APP示例:https://github.com/zhiboniu/pose_demo_android ,其中实现判定各项动作的逻辑可以参考https://github.com/zhiboniu/pose_demo_android/blob/release/1.0/app/src/main/cpp/pose_action.cc 。 + +2. 当一个动作较难用逻辑去描述的时候,可能参考现有摔倒检测的案例,训练一个识别健身动作的模型,但对收集数据的要求会比较高。 + + +#### Q10. 有遮挡的生产环境中梯子,可以用关键点检测判断人员上下梯动作是否合规 +A10. 这个问题需要视遮挡的程度而定,如果遮挡过于严重时关键点检测模型的效果会大打折扣,从而导致行为的判断失准。此外,由于基于关键点的方案抹去了外观信息,如果只是从人物本身的动作上去做判断,那么在遮挡不严重的场景下是可以的。反之,如果梯子这个物体是判断动作是否合规的必要元素,那么这个方案其实不一定是最佳选择。 + +#### Q11. 关键点做的行为识别并不是时序上的动作识别吗 +A11. 是时序的动作识别。这里是将一定时间范围内的每一帧关键点坐标组成一个时序的关键点序列,再通过行为识别模型去预测这个序列所属的行为类别。 + + +## 检测算法问题 +#### Q1. 大图片小目标 最终推理的图片也是大图片 怎么预处理呀 +A1. 小目标问题常见的处理方式是切图以及增大网络输入尺寸,如果使用基于anchor的检测算法,可以通过对目标物体大小聚类生成anchor,参考[脚本](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/tools/anchor_cluster.py); 目前基于PP-YOLOE的小目标检测正在开发中 + +#### Q2. 想问下大的目标对象怎么检测,比如发票 +A2. 如果使用基于anchor的检测算法,可以通过对目标物体大小聚类生成anchor,参考[脚本](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/tools/anchor_cluster.py);另外可以增强深层特征提升大物体检测效果 + +#### Q3. 在做预测时发现预测框特别多,有的框的置信度甚至低于0.1,请问如果将这种框过滤掉?也就是训练模型时就把这些极低置信度的预测结果过滤掉,避免在推理部署时,做不必要的计算,从而影响推理速度。 +A3. 后处理部分有两个过滤,1)是提取置信度最高的Top 100个框做nms。2)是根据设定阈值threshold进行过滤。如果你可以确认图片上目标相对比较少<10个,可以调整Top 100这个值到50或者更低,这样可以加速nms部分的计算。其次调整threshold这个影响最终检测的准确度和召回率的效果。 + +#### Q4. 正负样本的比例一般怎么设计 +A4. 在PaddleDetection中,支持负样本训练,TrainDataset下设置allow_empty: true即可,通过数据集测试,负样本比例在0.3时对模型提升效果最明显。 + +## 压缩部署问题 +#### Q1. PaddleDetection训练的模型导出inference model后,在做推理部署的时候,前后处理相关代码如何编写,有什么参考教程吗? +A1. 目前PaddleDetection下的网络模型大部分都能够支持c++ inference,不同的处理方式针对不同功能,例如:PP-YOLOE速度测试不包含后处理,PicoDet为支持不同的第三方推理引擎会设置是否导出nms + +object_detector.cc是针对所有检测模型的流程,其中前处理大部分都是decode-resize-nomalize-permute 部分网络会加入padding的操作;大部分模型的后处理操作都放在模型里面了,picodet有单独提供nms的后处理代码 + +检测模型的输入统一为image,im_shape,scale_factor ,如果模型中没有使用im_shape,输出个数会减少,但是整套预处理流程不需要额外开发 + +#### Q2. 针对TensorRT的加速问题,fp16在v100确实可以,但是耗时好像有点偏差,我在1080ti上,单张图片跑1000次,耗时50s,还是float32的,可是在v100上,float16耗时97 +A2. 目前PPYOLOE等模型的速度都有在V100上使用TensorRT FP16测试,关于速度测试有以下几个方面可以排查: + +1. 速度测试时是否正确设置warmup,以避免过长的启动时间影响速度测试准确度 +2. 在开启TensorRT时,生成engine文件的过程耗时较长,可以在https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/infer.py#L745 中将use_static设置为True + + +#### Q3. PaddleDetection已经支持了在线量化一些模型,比如想训练其他的一个新模型,是不是可以轻松用起来qat?如果不能,为什么只能支持很有限的模型,而qat其他模型总会出各种各样的问题,原因是什么? +A3. 目前PaddleDetection模型很多,只能针对部分模型开源了QAT的config,其他模型也是支持QAT的,只是配置文件没有覆盖到,如果量化报错,通常是配置问题。检测模型一般建议跳过head最后一个conv。如果想要跳过某些层量化,可以设置skip_quant,参考[代码](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/ppdet/modeling/heads/yolo_head.py#L97) diff --git a/configs/centernet/_base_/centernet_reader.yml b/configs/centernet/_base_/centernet_reader.yml index 1f18dca49d1e39c61b9bbc7b5be8bfac7bce5ca4..81af4ab840502da6e738ac667dd0883041ba8992 100644 --- a/configs/centernet/_base_/centernet_reader.yml +++ b/configs/centernet/_base_/centernet_reader.yml @@ -30,6 +30,6 @@ TestReader: sample_transforms: - Decode: {} - WarpAffine: {keep_res: True, input_h: 512, input_w: 512} - - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834]} + - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True} - Permute: {} batch_size: 1 diff --git a/configs/convnext/README.md b/configs/convnext/README.md new file mode 100644 index 0000000000000000000000000000000000000000..644d66815660427d2a6cdf587c014d8cb877eb15 --- /dev/null +++ b/configs/convnext/README.md @@ -0,0 +1,20 @@ +# ConvNeXt (A ConvNet for the 2020s) + +## 模型库 +### ConvNeXt on COCO + +| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: | +| PP-YOLOE-ConvNeXt-tiny | 640 | 16 | 36e | 44.6 | 63.3 | 33.04 | 13.87 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_convnext_tiny_36e_coco.pdparams) | [配置文件](./ppyoloe_convnext_tiny_36e_coco.yml) | +| YOLOX-ConvNeXt-s | 640 | 8 | 36e | 44.6 | 65.3 | 36.20 | 27.52 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_convnext_s_36e_coco.pdparams) | [配置文件](./yolox_convnext_s_36e_coco.yml) | + + +## Citations +``` +@Article{liu2022convnet, + author = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie}, + title = {A ConvNet for the 2020s}, + journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + year = {2022}, +} +``` diff --git a/configs/convnext/ppyoloe_convnext_tiny_36e_coco.yml b/configs/convnext/ppyoloe_convnext_tiny_36e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..360a368ec0837033ab408db59aa0d4ea5b7972dd --- /dev/null +++ b/configs/convnext/ppyoloe_convnext_tiny_36e_coco.yml @@ -0,0 +1,55 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +depth_mult: 0.25 +width_mult: 0.50 + +log_iter: 100 +snapshot_epoch: 5 +weights: output/ppyoloe_convnext_tiny_36e_coco/model_final +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/convnext_tiny_22k_224.pdparams + + +YOLOv3: + backbone: ConvNeXt + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +ConvNeXt: + arch: 'tiny' + drop_path_rate: 0.4 + layer_scale_init_value: 1.0 + return_idx: [1, 2, 3] + + +PPYOLOEHead: + static_assigner_epoch: 12 + nms: + nms_top_k: 10000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 + + +TrainReader: + batch_size: 16 + + +epoch: 36 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [36] + use_warmup: false + +OptimizerBuilder: + regularizer: false + optimizer: + type: AdamW + weight_decay: 0.0005 diff --git a/configs/convnext/yolox_convnext_s_36e_coco.yml b/configs/convnext/yolox_convnext_s_36e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..b41551dee8a2e2793ac09d474c0e7d2a8868299f --- /dev/null +++ b/configs/convnext/yolox_convnext_s_36e_coco.yml @@ -0,0 +1,58 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../yolox/_base_/yolox_cspdarknet.yml', + '../yolox/_base_/yolox_reader.yml' +] +depth_mult: 0.33 +width_mult: 0.50 + +log_iter: 100 +snapshot_epoch: 5 +weights: output/yolox_convnext_s_36e_coco/model_final +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/convnext_tiny_22k_224.pdparams + + +YOLOX: + backbone: ConvNeXt + neck: YOLOCSPPAN + head: YOLOXHead + size_stride: 32 + size_range: [15, 25] # multi-scale range [480*480 ~ 800*800] + +ConvNeXt: + arch: 'tiny' + drop_path_rate: 0.4 + layer_scale_init_value: 1.0 + return_idx: [1, 2, 3] + + +TrainReader: + batch_size: 8 + mosaic_epoch: 30 + + +YOLOXHead: + l1_epoch: 30 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 1000 + score_threshold: 0.001 + nms_threshold: 0.65 + + +epoch: 36 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [36] + use_warmup: false + +OptimizerBuilder: + regularizer: false + optimizer: + type: AdamW + weight_decay: 0.0005 diff --git a/configs/datasets/coco_detection.yml b/configs/datasets/coco_detection.yml index 7a62c3b0b57a5d76c8ed519d3a3940c1b4532c15..291c24874b72bbb92fb2510e754c791a3f06c146 100644 --- a/configs/datasets/coco_detection.yml +++ b/configs/datasets/coco_detection.yml @@ -16,4 +16,5 @@ EvalDataset: TestDataset: !ImageFolder - anno_path: annotations/instances_val2017.json + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/configs/datasets/coco_instance.yml b/configs/datasets/coco_instance.yml index 5eaf76791a94bfd2819ba6dab610fae54b69f26e..b04dbdca955a326ffc5eb13756e73ced83b92309 100644 --- a/configs/datasets/coco_instance.yml +++ b/configs/datasets/coco_instance.yml @@ -16,4 +16,5 @@ EvalDataset: TestDataset: !ImageFolder - anno_path: annotations/instances_val2017.json + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/configs/datasets/dota.yml b/configs/datasets/dota.yml index f9d9395b00d7ed9028396044c407784d251e43e5..5153163d95a8a418a82d3d6d43f6e1f9404ed075 100644 --- a/configs/datasets/dota.yml +++ b/configs/datasets/dota.yml @@ -17,3 +17,4 @@ EvalDataset: TestDataset: !ImageFolder anno_path: trainval_split/s2anet_trainval_paddle_coco.json + dataset_dir: dataset/DOTA_1024_s2anet/ diff --git a/configs/datasets/roadsign_voc.yml b/configs/datasets/roadsign_voc.yml index ddbfc7889e0027d85971c6ab11f3f33adfe8be71..9a081611aa8dafef5d5c6f1af1476cc038db5702 100644 --- a/configs/datasets/roadsign_voc.yml +++ b/configs/datasets/roadsign_voc.yml @@ -3,19 +3,19 @@ map_type: integral num_classes: 4 TrainDataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: train.txt - label_list: label_list.txt - data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + name: VOCDataSet + dataset_dir: dataset/roadsign_voc + anno_path: train.txt + label_list: label_list.txt + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] EvalDataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: valid.txt - label_list: label_list.txt - data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + name: VOCDataSet + dataset_dir: dataset/roadsign_voc + anno_path: valid.txt + label_list: label_list.txt + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] TestDataset: - !ImageFolder - anno_path: dataset/roadsign_voc/label_list.txt + name: ImageFolder + anno_path: dataset/roadsign_voc/label_list.txt diff --git a/configs/datasets/visdrone_detection.yml b/configs/datasets/visdrone_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..37feb6e2618ff9d83ce2842a9e581dcfd31efc78 --- /dev/null +++ b/configs/datasets/visdrone_detection.yml @@ -0,0 +1,22 @@ +metric: COCO +num_classes: 10 + +TrainDataset: + !COCODataSet + image_dir: VisDrone2019-DET-train + anno_path: train.json + dataset_dir: dataset/visdrone + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: VisDrone2019-DET-val + anno_path: val.json + # image_dir: test_dev + # anno_path: test_dev.json + dataset_dir: dataset/visdrone + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/visdrone diff --git a/configs/dota/README.md b/configs/dota/README.md index 9a5988a761810600b02cb4e9f1348c6072e02cac..adde27691ef78606d2528b95dee8e30842bfff64 100644 --- a/configs/dota/README.md +++ b/configs/dota/README.md @@ -53,7 +53,7 @@ DOTA数据集中总共有2806张图像,其中1411张图像作为训练集,45 - PaddlePaddle >= 2.1.1 - GCC == 8.2 -推荐使用docker镜像[paddle:2.1.1-gpu-cuda10.1-cudnn7](registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.1-cudnn7)。 +推荐使用docker镜像 paddle:2.1.1-gpu-cuda10.1-cudnn7。 执行如下命令下载镜像并启动容器: ``` diff --git a/configs/dota/README_en.md b/configs/dota/README_en.md index e299e0e81808888e947d0e0b1e1423bb5f7fdbea..61eeee7f5c53b7ec4e01c2a68c75f98f9a09bd14 100644 --- a/configs/dota/README_en.md +++ b/configs/dota/README_en.md @@ -64,7 +64,7 @@ To use the rotating frame IOU to calculate the OP, the following conditions must - PaddlePaddle >= 2.1.1 - GCC == 8.2 -Docker images are recommended[paddle:2.1.1-gpu-cuda10.1-cudnn7](registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.1-cudnn7)。 +Docker images are recommended paddle:2.1.1-gpu-cuda10.1-cudnn7。 Run the following command to download the image and start the container: ``` diff --git a/configs/faster_rcnn/_base_/faster_rcnn_swin_reader.yml b/configs/faster_rcnn/_base_/faster_rcnn_swin_reader.yml index 396462a2ff7cf68f7d91a1bb1bb87b8e5a040486..1af6175a931a571f1c6726f0f312591c07489d1d 100644 --- a/configs/faster_rcnn/_base_/faster_rcnn_swin_reader.yml +++ b/configs/faster_rcnn/_base_/faster_rcnn_swin_reader.yml @@ -30,14 +30,12 @@ EvalReader: TestReader: inputs_def: - image_shape: [1, 3, 640, 640] + image_shape: [-1, 3, 640, 640] sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [640, 640], keep_ratio: True} + - LetterBoxResize: {target_size: 640} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} - batch_transforms: - - PadBatch: {pad_to_stride: 32} batch_size: 1 shuffle: false drop_last: false diff --git a/configs/fcos/_base_/fcos_r50_fpn.yml b/configs/fcos/_base_/fcos_r50_fpn.yml index 64a275d88023030b2299b0c3932b1c3fc9ce1e34..cd22c229a1192ea384037848be1b7e6edc43741a 100644 --- a/configs/fcos/_base_/fcos_r50_fpn.yml +++ b/configs/fcos/_base_/fcos_r50_fpn.yml @@ -30,7 +30,6 @@ FCOSHead: num_convs: 4 norm_type: "gn" use_dcn: false - num_classes: 80 fpn_stride: [8, 16, 32, 64, 128] prior_prob: 0.01 fcos_loss: FCOSLoss @@ -46,7 +45,6 @@ FCOSLoss: FCOSPostProcess: decode: name: FCOSBox - num_classes: 80 nms: name: MultiClassNMS nms_top_k: 1000 diff --git a/configs/keypoint/README.md b/configs/keypoint/README.md index e750312a0f0c17197ca74032d00d97978298549d..4ca9b07474caca53ca529fc19bd5b239acb742e4 100644 --- a/configs/keypoint/README.md +++ b/configs/keypoint/README.md @@ -1,66 +1,110 @@ 简体中文 | [English](README_en.md) -# KeyPoint模型系列 +# 关键点检测系列模型 +
    + +
    +## 目录 + +- [简介](#简介) +- [模型推荐](#模型推荐) +- [模型库](#模型库) +- [快速开始](#快速开始) + - [环境安装](#1环境安装) + - [数据准备](#2数据准备) + - [训练与测试](#3训练与测试) + - [单卡训练](#单卡训练) + - [多卡训练](#多卡训练) + - [模型评估](#模型评估) + - [模型预测](#模型预测) + - [模型部署](#模型部署) + - [Top-Down模型联合部署](#top-down模型联合部署) + - [Bottom-Up模型独立部署](#bottom-up模型独立部署) + - [与多目标跟踪联合部署](#与多目标跟踪模型fairmot联合部署) + - [完整部署教程及Demo](#完整部署教程及Demo) +- [自定义数据训练](#自定义数据训练) +- [BenchMark](#benchmark) ## 简介 -- PaddleDetection KeyPoint部分紧跟业内最新最优算法方案,包含Top-Down、BottomUp两套方案,以满足用户的不同需求。 +PaddleDetection 中的关键点检测部分紧跟最先进的算法,包括 Top-Down 和 Bottom-Up 两种方法,可以满足用户的不同需求。Top-Down 先检测对象,再检测特定关键点。Top-Down 模型的准确率会更高,但速度会随着对象数量的增加而变慢。不同的是,Bottom-Up 首先检测点,然后对这些点进行分组或连接以形成多个人体姿势实例。Bottom-Up 的速度是固定的,不会随着物体数量的增加而变慢,但精度会更低。 -
    - -
    +同时,PaddleDetection 提供针对移动端设备优化的自研实时关键点检测模型 [PP-TinyPose](./tiny_pose/README.md)。 +## 模型推荐 +### 移动端模型推荐 -#### Model Zoo -COCO数据集 -| 模型 | 输入尺寸 | AP(coco val) | 模型下载 | 配置文件 | -| :---------------- | -------- | :----------: | :----------------------------------------------------------: | ----------------------------------------------------------- | -| HigherHRNet-w32 | 512 | 67.1 | [higherhrnet_hrnet_w32_512.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512.yml) | -| HigherHRNet-w32 | 640 | 68.3 | [higherhrnet_hrnet_w32_640.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_640.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_640.yml) | -| HigherHRNet-w32+SWAHR | 512 | 68.9 | [higherhrnet_hrnet_w32_512_swahr.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512_swahr.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512_swahr.yml) | -| HRNet-w32 | 256x192 | 76.9 | [hrnet_w32_256x192.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams) | [config](./hrnet/hrnet_w32_256x192.yml) | -| HRNet-w32 | 384x288 | 77.8 | [hrnet_w32_384x288.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams) | [config](./hrnet/hrnet_w32_384x288.yml) | -| HRNet-w32+DarkPose | 256x192 | 78.0 | [dark_hrnet_w32_256x192.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams) | [config](./hrnet/dark_hrnet_w32_256x192.yml) | -| HRNet-w32+DarkPose | 384x288 | 78.3 | [dark_hrnet_w32_384x288.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_384x288.pdparams) | [config](./hrnet/dark_hrnet_w32_384x288.yml) | -| WiderNaiveHRNet-18 | 256x192 | 67.6(+DARK 68.4) | [wider_naive_hrnet_18_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/wider_naive_hrnet_18_256x192_coco.pdparams) | [config](./lite_hrnet/wider_naive_hrnet_18_256x192_coco.yml) | -| LiteHRNet-18 | 256x192 | 66.5 | [lite_hrnet_18_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_18_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_18_256x192_coco.yml) | -| LiteHRNet-18 | 384x288 | 69.7 | [lite_hrnet_18_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_18_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_18_384x288_coco.yml) | -| LiteHRNet-30 | 256x192 | 69.4 | [lite_hrnet_30_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_256x192_coco.yml) | -| LiteHRNet-30 | 384x288 | 72.5 | [lite_hrnet_30_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_384x288_coco.yml) | +| 检测模型 | 关键点模型 | 输入尺寸 | COCO数据集精度 | 平均推理耗时 (FP16) | 参数量 (M) | Flops (G) | 模型权重 | Paddle-Lite部署模型(FP16) | +| :----------------------------------------------------------- | :------------------------------------ | :------------------------------: | :-----------------------------: | :------------------------------------: | --------------------------- | :-------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| [PicoDet-S-Pedestrian](../picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml) | [PP-TinyPose](./tiny_pose/tinypose_128x96.yml) | 检测:192x192
    关键点:128x96 | 检测mAP:29.0
    关键点AP:58.1 | 检测耗时:2.37ms
    关键点耗时:3.27ms | 检测:1.18
    关键点:1.36 | 检测:0.35
    关键点:0.08 | [检测](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams)
    [关键点](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [检测](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16.nb)
    [关键点](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16.nb) | +| [PicoDet-S-Pedestrian](../picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [PP-TinyPose](./tiny_pose/tinypose_256x192.yml) | 检测:320x320
    关键点:256x192 | 检测mAP:38.5
    关键点AP:68.8 | 检测耗时:6.30ms
    关键点耗时:8.33ms | 检测:1.18
    关键点:1.36 | 检测:0.97
    关键点:0.32 | [检测](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams)
    [关键点](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [检测](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16.nb)
    [关键点](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16.nb) | + + +*详细关于PP-TinyPose的使用请参考[文档](./tiny_pose/README.md)。 + +### 服务端模型推荐 + +| 检测模型 | 关键点模型 | 输入尺寸 | COCO数据集精度 | 参数量 (M) | Flops (G) | 模型权重 | +| :----------------------------------------------------------- | :----------------------------------------- | :------------------------------: | :-----------------------------: | :----------------------: | :----------------------: | :----------------------------------------------------------: | +| [PP-YOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [HRNet-w32](./hrnet/hrnet_w32_384x288.yml) | 检测:640x640
    关键点:384x288 | 检测mAP:49.5
    关键点AP:77.8 | 检测:54.6
    关键点:28.6 | 检测:115.8
    关键点:17.3 | [检测](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams)
    [关键点](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams) | +| [PP-YOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [HRNet-w32](./hrnet/hrnet_w32_256x192.yml) | 检测:640x640
    关键点:256x192 | 检测mAP:49.5
    关键点AP:76.9 | 检测:54.6
    关键点:28.6 | 检测:115.8
    关键点:7.68 | [检测](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams)
    [关键点](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams) | + + +## 模型库 +COCO数据集 +| 模型 | 方案 |输入尺寸 | AP(coco val) | 模型下载 | 配置文件 | +| :---------------- | -------- | :----------: | :----------------------------------------------------------: | ----------------------------------------------------| ------- | +| HigherHRNet-w32 |Bottom-Up| 512 | 67.1 | [higherhrnet_hrnet_w32_512.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512.yml) | +| HigherHRNet-w32 | Bottom-Up| 640 | 68.3 | [higherhrnet_hrnet_w32_640.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_640.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_640.yml) | +| HigherHRNet-w32+SWAHR |Bottom-Up| 512 | 68.9 | [higherhrnet_hrnet_w32_512_swahr.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512_swahr.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512_swahr.yml) | +| HRNet-w32 | Top-Down| 256x192 | 76.9 | [hrnet_w32_256x192.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams) | [config](./hrnet/hrnet_w32_256x192.yml) | +| HRNet-w32 |Top-Down| 384x288 | 77.8 | [hrnet_w32_384x288.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams) | [config](./hrnet/hrnet_w32_384x288.yml) | +| HRNet-w32+DarkPose |Top-Down| 256x192 | 78.0 | [dark_hrnet_w32_256x192.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams) | [config](./hrnet/dark_hrnet_w32_256x192.yml) | +| HRNet-w32+DarkPose |Top-Down| 384x288 | 78.3 | [dark_hrnet_w32_384x288.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_384x288.pdparams) | [config](./hrnet/dark_hrnet_w32_384x288.yml) | +| WiderNaiveHRNet-18 | Top-Down|256x192 | 67.6(+DARK 68.4) | [wider_naive_hrnet_18_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/wider_naive_hrnet_18_256x192_coco.pdparams) | [config](./lite_hrnet/wider_naive_hrnet_18_256x192_coco.yml) | +| LiteHRNet-18 |Top-Down| 256x192 | 66.5 | [lite_hrnet_18_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_18_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_18_256x192_coco.yml) | +| LiteHRNet-18 |Top-Down| 384x288 | 69.7 | [lite_hrnet_18_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_18_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_18_384x288_coco.yml) | +| LiteHRNet-30 | Top-Down|256x192 | 69.4 | [lite_hrnet_30_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_256x192_coco.yml) | +| LiteHRNet-30 |Top-Down| 384x288 | 72.5 | [lite_hrnet_30_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_384x288_coco.yml) | 备注: Top-Down模型测试AP结果基于GroundTruth标注框 MPII数据集 -| 模型 | 输入尺寸 | PCKh(Mean) | PCKh(Mean@0.1) | 模型下载 | 配置文件 | -| :---- | -------- | :--------: | :------------: | :----------------------------------------------------------: | -------------------------------------------- | -| HRNet-w32 | 256x256 | 90.6 | 38.5 | [hrnet_w32_256x256_mpii.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x256_mpii.pdparams) | [config](./hrnet/hrnet_w32_256x256_mpii.yml) | +| 模型 | 方案| 输入尺寸 | PCKh(Mean) | PCKh(Mean@0.1) | 模型下载 | 配置文件 | +| :---- | ---|----- | :--------: | :------------: | :----------------------------------------------------------: | -------------------------------------------- | +| HRNet-w32 | Top-Down|256x256 | 90.6 | 38.5 | [hrnet_w32_256x256_mpii.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x256_mpii.pdparams) | [config](./hrnet/hrnet_w32_256x256_mpii.yml) | + +场景模型 +| 模型 | 方案 | 输入尺寸 | 精度 | 预测速度 |模型权重 | 部署模型 | 说明| +| :---- | ---|----- | :--------: | :--------: | :------------: |:------------: |:-------------------: | +| HRNet-w32 + DarkPose | Top-Down|256x192 | AP: 87.1 (业务数据集)| 单人2.9ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) | 针对摔倒场景特别优化,该模型应用于[PP-Human](../../deploy/pipeline/README.md) | + + +我们同时推出了基于LiteHRNet(Top-Down)针对移动端设备优化的实时关键点检测模型[PP-TinyPose](./tiny_pose/README.md), 欢迎体验。 -我们同时推出了针对移动端设备优化的实时关键点检测模型[PP-TinyPose](./tiny_pose/README.md), 欢迎体验。 ## 快速开始 ### 1、环境安装 -​ 请参考PaddleDetection [安装文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL_cn.md)正确安装PaddlePaddle和PaddleDetection即可。 +​ 请参考PaddleDetection [安装文档](../../docs/tutorials/INSTALL_cn.md)正确安装PaddlePaddle和PaddleDetection即可。 ### 2、数据准备 -​ 目前KeyPoint模型支持[COCO](https://cocodataset.org/#keypoints-2017)数据集和[MPII](http://human-pose.mpi-inf.mpg.de/#overview)数据集,数据集的准备方式请参考[关键点数据准备](../../docs/tutorials/PrepareKeypointDataSet_cn.md)。 +​ 目前KeyPoint模型支持[COCO](https://cocodataset.org/#keypoints-2017)数据集和[MPII](http://human-pose.mpi-inf.mpg.de/#overview)数据集,数据集的准备方式请参考[关键点数据准备](../../docs/tutorials/data/PrepareKeypointDataSet.md)。 ​ 关于config配置文件内容说明请参考[关键点配置文件说明](../../docs/tutorials/KeyPointConfigGuide_cn.md)。 - - - 请注意,Top-Down方案使用检测框测试时,需要通过检测模型生成bbox.json文件。COCO val2017的检测结果可以参考[Detector having human AP of 56.4 on COCO val2017 dataset](https://paddledet.bj.bcebos.com/data/bbox.json),下载后放在根目录(PaddleDetection)下,然后修改config配置文件中`use_gt_bbox: False`后生效。然后正常执行测试命令即可。 - +- 请注意,Top-Down方案使用检测框测试时,需要通过检测模型生成bbox.json文件。COCO val2017的检测结果可以参考[Detector having human AP of 56.4 on COCO val2017 dataset](https://paddledet.bj.bcebos.com/data/bbox.json),下载后放在根目录(PaddleDetection)下,然后修改config配置文件中`use_gt_bbox: False`后生效。然后正常执行测试命令即可。 ### 3、训练与测试 -​ **单卡训练:** +#### 单卡训练 ```shell #COCO DataSet @@ -70,7 +114,7 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/keypoint/higherhrnet/hi CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml ``` -​ **多卡训练:** +#### 多卡训练 ```shell #COCO DataSet @@ -80,7 +124,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml ``` -​ **模型评估:** +#### 模型评估 ```shell #COCO DataSet @@ -93,7 +137,7 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32 CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml --save_prediction_only ``` -​ **模型预测:** +#### 模型预测 ​ 注意:top-down模型只支持单人截图预测,如需使用多人图,请使用[联合部署推理]方式。或者使用bottom-up模型。 @@ -101,22 +145,32 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/higherhrnet/hig CUDA_VISIBLE_DEVICES=0 python3 tools/infer.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=./output/higherhrnet_hrnet_w32_512/model_final.pdparams --infer_dir=../images/ --draw_threshold=0.5 --save_txt=True ``` -​ **部署预测:** +#### 模型部署 + +##### Top-Down模型联合部署 + +```shell +#导出检测模型 +python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams + +#导出关键点模型 +python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams + +#detector 检测 + keypoint top-down模型联合部署(联合推理只支持top-down方式) +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file=../video/xxx.mp4 --device=gpu +``` + +##### Bottom-Up模型独立部署 ```shell #导出模型 python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=output/higherhrnet_hrnet_w32_512/model_final.pdparams #部署推理 -#keypoint top-down/bottom-up 单独推理,该模式下top-down模型只支持单人截图预测。 python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=gpu --threshold=0.5 -python deploy/python/keypoint_infer.py --model_dir=output_inference/hrnet_w32_384x288/ --image_file=./demo/hrnet_demo.jpg --device=gpu --threshold=0.5 - -#detector 检测 + keypoint top-down模型联合部署(联合推理只支持top-down方式) -python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file=../video/xxx.mp4 --device=gpu ``` -​ **与多目标跟踪模型FairMOT联合部署预测:** +##### 与多目标跟踪模型FairMOT联合部署 ```shell #导出FairMOT跟踪模型 @@ -125,18 +179,73 @@ python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.y #用导出的跟踪和关键点模型Python联合预测 python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU ``` + **注意:** - 跟踪模型导出教程请参考`configs/mot/README.md`。 + 跟踪模型导出教程请参考[文档](../mot/README.md)。 + +### 完整部署教程及Demo + + +​ 我们提供了PaddleInference(服务器端)、PaddleLite(移动端)、第三方部署(MNN、OpenVino)支持。无需依赖训练代码,deploy文件夹下相应文件夹提供独立完整部署代码。 详见 [部署文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/README.md)介绍。 + +## 自定义数据训练 + +我们以[tinypose_256x192](./tiny_pose/README.md)为例来说明对于自定义数据如何修改: + +#### 1、配置文件[tinypose_256x192.yml](../../configs/keypoint/tiny_pose/tinypose_256x192.yml) + +基本的修改内容及其含义如下: + +``` +num_joints: &num_joints 17 #自定义数据的关键点数量 +train_height: &train_height 256 #训练图片尺寸-高度h +train_width: &train_width 192 #训练图片尺寸-宽度w +hmsize: &hmsize [48, 64] #对应训练尺寸的输出尺寸,这里是输入[w,h]的1/4 +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #关键点定义中左右对称的关键点,用于flip增强。若没有对称结构在 TrainReader 的 RandomFlipHalfBodyTransform 一栏中 flip_pairs 后面加一行 "flip: False"(注意缩紧对齐) +num_joints_half_body: 8 #半身关键点数量,用于半身增强 +prob_half_body: 0.3 #半身增强实现概率,若不需要则修改为0 +upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #上半身对应关键点id,用于半身增强中获取上半身对应的关键点。 +``` + +上述是自定义数据时所需要的修改部分,完整的配置及含义说明可参考文件:[关键点配置文件说明](../../docs/tutorials/KeyPointConfigGuide_cn.md)。 + +#### 2、其他代码修改(影响测试、可视化) +- keypoint_utils.py中的sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07,.87, .87, .89, .89]) / 10.0,表示每个关键点的确定范围方差,根据实际关键点可信区域设置,区域精确的一般0.25-0.5,例如眼睛。区域范围大的一般0.5-1.0,例如肩膀。若不确定建议0.75。 +- visualizer.py中的draw_pose函数中的EDGES,表示可视化时关键点之间的连接线关系。 +- pycocotools工具中的sigmas,同第一个keypoint_utils.py中的设置。用于coco指标评估时计算。 + +#### 3、数据准备注意 +- 训练数据请按coco数据格式处理。需要包括关键点[Nx3]、检测框[N]标注。 +- 请注意area>0,area=0时数据在训练时会被过滤掉。此外,由于COCO的评估机制,area较小的数据在评估时也会被过滤掉,我们建议在自定义数据时取`area = bbox_w * bbox_h`。 + +如有遗漏,欢迎反馈 -### 4、模型单独部署 -​ 我们提供了PaddleInference(服务器端)、PaddleLite(移动端)、第三方部署(MNN、OpenVino)支持。无需依赖训练代码,deploy文件夹下相应文件夹提供独立完整部署代码。 -详见 [部署文档](../../deploy/README.md)介绍。 +## 关键点稳定策略(仅适用于视频数据) +使用关键点算法处理视频数据时,由于预测针对单帧图像进行,在视频结果上往往会有抖动的现象。在一些依靠精细化坐标的应用场景(例如健身计数、基于关键点的虚拟渲染等)上容易造成误检或体验不佳的问题。针对这个问题,在PaddleDetection关键点视频推理中加入了[OneEuro滤波器](http://www.lifl.fr/~casiez/publications/CHI2012-casiez.pdf)和EMA两种关键点稳定方式。实现将当前关键点坐标结果和历史关键点坐标结果结合计算,使得输出的点坐标更加稳定平滑。该功能同时支持在Python及C++推理中一键开启使用。 -## Benchmark -我们给出了不同运行环境下的测试结果,供您在选用模型时参考。详细数据请见[Keypoint Inference Benchmark](./KeypointBenchmark.md)。 +```bash +# 使用Python推理 +python deploy/python/det_keypoint_unite_infer.py \ + --det_model_dir output_inference/picodet_s_320 \ + --keypoint_model_dir output_inference/tinypose_256x192 \ + --video_file test_video.mp4 --device gpu --smooth True + +# 使用CPP推理 +./deploy/cpp/build/main --det_model_dir output_inference/picodet_s_320 \ + --keypoint_model_dir output_inference/tinypose_256x192 \ + --video_file test_video.mp4 --device gpu --smooth True +``` +效果如下: + +![](https://user-images.githubusercontent.com/15810355/181733125-3710bacc-2080-47e4-b397-3621a2f0caae.gif) + +## BenchMark + +我们给出了不同运行环境下的测试结果,供您在选用模型时参考。详细数据请见[Keypoint Inference Benchmark](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/keypoint/KeypointBenchmark.md)。 ## 引用 + ``` @inproceedings{cheng2020bottom, title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation}, diff --git a/configs/keypoint/README_en.md b/configs/keypoint/README_en.md index 05e77f66819c26a38a54746eeb9e569f4945b442..f6f049824eb5aa185b767f37648bed429196d913 100644 --- a/configs/keypoint/README_en.md +++ b/configs/keypoint/README_en.md @@ -2,19 +2,63 @@ # KeyPoint Detection Models - +## Content + +- [Introduction](#introduction) +- [Model Recommendation](#model-recommendation) +- [Model Zoo](#model-zoo) +- [Getting Start](#getting-start) + - [Environmental Installation](#1environmental-installation) + - [Dataset Preparation](#2dataset-preparation) + - [Training and Testing](#3training-and-testing) + - [Training on single GPU](#training-on-single-gpu) + - [Training on multiple GPU](#training-on-multiple-gpu) + - [Evaluation](#evaluation) + - [Inference](#inference) + - [Deploy Inference](#deploy-inference) + - [Deployment for Top-Down models](#deployment-for-top-down-models) + - [Deployment for Bottom-Up models](#deployment-for-bottom-up-models) + - [Joint Inference with Multi-Object Tracking Model FairMOT](#joint-inference-with-multi-object-tracking-model-fairmot) + - [Complete Deploy Instruction and Demo](#complete-deploy-instruction-and-demo) +- [Train with custom data](#train-with-custom-data) +- [BenchMark](#benchmark) ## Introduction -- The keypoint detection part in PaddleDetection follows the state-of-the-art algorithm closely, including Top-Down and BottomUp methods, which can meet the different needs of users. +The keypoint detection part in PaddleDetection follows the state-of-the-art algorithm closely, including Top-Down and Bottom-Up methods, which can satisfy the different needs of users. Top-Down detects the object first and then detects the specific keypoint. Top-Down models will be more accurate, but slower as the number of objects increases. Differently, Bottom-Up detects the point first and then group or connect those points to form several instances of human pose. The speed of Bottom-Up is fixed, it won't slow down as the number of objects increases, but it will be less accurate. + +At the same time, PaddleDetection provides a self-developed real-time keypoint detection model [PP-TinyPose](./tiny_pose/README_en.md) optimized for mobile devices.
    +## Model Recommendation + +### Mobile Terminal + + + + +| Detection Model | Keypoint Model | Input Size | Accuracy of COCO | Average Inference Time (FP16) | Params (M) | Flops (G) | Model Weight | Paddle-Lite Inference Model(FP16) | +| :----------------------------------------------------------- | :------------------------------------ | :-------------------------------------: | :--------------------------------------: | :-----------------------------------: | :--------------------------------: | :--------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| [PicoDet-S-Pedestrian](../picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml) | [PP-TinyPose](./tiny_pose/tinypose_128x96.yml) | Detection:192x192
    Keypoint:128x96 | Detection mAP:29.0
    Keypoint AP:58.1 | Detection:2.37ms
    Keypoint:3.27ms | Detection:1.18
    Keypoint:1.36 | Detection:0.35
    Keypoint:0.08 | [Detection](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams)
    [Keypoint](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [Detection](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16.nb)
    [Keypoint](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16.nb) | +| [PicoDet-S-Pedestrian](../picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [PP-TinyPose](./tiny_pose/tinypose_256x192.yml) | Detection:320x320
    Keypoint:256x192 | Detection mAP:38.5
    Keypoint AP:68.8 | Detection:6.30ms
    Keypoint:8.33ms | Detection:1.18
    Keypoint:1.36 | Detection:0.97
    Keypoint:0.32 | [Detection](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams)
    [Keypoint](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [Detection](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16.nb)
    [Keypoint](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16.nb) | + + +*Specific documents of PP-TinyPose, please refer to [Document](./tiny_pose/README.md)。 + +### Terminal Server + +| Detection Model | Keypoint Model | Input Size | Accuracy of COCO | Params (M) | Flops (G) | Model Weight | +| :----------------------------------------------------------- | :----------------------------------------- | :-------------------------------------: | :--------------------------------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------------------------: | +| [PP-YOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [HRNet-w32](./hrnet/hrnet_w32_384x288.yml) | Detection:640x640
    Keypoint:384x288 | Detection mAP:49.5
    Keypoint AP:77.8 | Detection:54.6
    Keypoint:28.6 | Detection:115.8
    Keypoint:17.3 | [Detection](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams)
    [Keypoint](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams) | +| [PP-YOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) | [HRNet-w32](./hrnet/hrnet_w32_256x192.yml) | Detection:640x640
    Keypoint:256x192 | Detection mAP:49.5
    Keypoint AP:76.9 | Detection:54.6
    Keypoint:28.6 | Detection:115.8
    Keypoint:7.68 | [Detection](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams)
    [Keypoint](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams) | + + +## Model Zoo -#### Model Zoo COCO Dataset | Model | Input Size | AP(coco val) | Model Download | Config File | | :---------------- | -------- | :----------: | :----------------------------------------------------------: | ----------------------------------------------------------- | @@ -31,7 +75,6 @@ COCO Dataset | LiteHRNet-30 | 256x192 | 69.4 | [lite_hrnet_30_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_256x192_coco.yml) | | LiteHRNet-30 | 384x288 | 72.5 | [lite_hrnet_30_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_384x288_coco.yml) | - Note:The AP results of Top-Down models are based on bounding boxes in GroundTruth. MPII Dataset @@ -40,25 +83,31 @@ MPII Dataset | HRNet-w32 | 256x256 | 90.6 | 38.5 | [hrnet_w32_256x256_mpii.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x256_mpii.pdparams) | [config](./hrnet/hrnet_w32_256x256_mpii.yml) | +Model for Scenes +| Model | Strategy | Input Size | Precision | Inference Speed |Model Weights | Model Inference and Deployment | description| +| :---- | ---|----- | :--------: | :-------: |:------------: |:------------: |:-------------------: | +| HRNet-w32 + DarkPose | Top-Down|256x192 | AP: 87.1 (on internal dataset)| 2.9ms per person |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) | Especially optimized for fall scenarios, the model is applied to [PP-Human](../../deploy/pipeline/README.md) | + + We also release [PP-TinyPose](./tiny_pose/README_en.md), a real-time keypoint detection model optimized for mobile devices. Welcome to experience. ## Getting Start -### 1. Environmental Installation +### 1.Environmental Installation -​ Please refer to [PaddleDetection Installation Guild](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL.md) to install PaddlePaddle and PaddleDetection correctly. +​ Please refer to [PaddleDetection Installation Guide](../../docs/tutorials/INSTALL.md) to install PaddlePaddle and PaddleDetection correctly. -### 2. Dataset Preparation +### 2.Dataset Preparation -​ Currently, KeyPoint Detection Models support [COCO](https://cocodataset.org/#keypoints-2017) and [MPII](http://human-pose.mpi-inf.mpg.de/#overview). Please refer to [Keypoint Dataset Preparation](../../docs/tutorials/PrepareKeypointDataSet_en.md) to prepare dataset. +​ Currently, KeyPoint Detection Models support [COCO](https://cocodataset.org/#keypoints-2017) and [MPII](http://human-pose.mpi-inf.mpg.de/#overview). Please refer to [Keypoint Dataset Preparation](../../docs/tutorials/data/PrepareKeypointDataSet_en.md) to prepare dataset. ​ About the description for config files, please refer to [Keypoint Config Guild](../../docs/tutorials/KeyPointConfigGuide_en.md). - - Note that, when testing by detected bounding boxes in Top-Down method, We should get `bbox.json` by a detection model. You can download the detected results for COCO val2017 [(Detector having human AP of 56.4 on COCO val2017 dataset)](https://paddledet.bj.bcebos.com/data/bbox.json) directly, put it at the root path (`PaddleDetection/`), and set `use_gt_bbox: False` in config file. +- Note that, when testing by detected bounding boxes in Top-Down method, We should get `bbox.json` by a detection model. You can download the detected results for COCO val2017 [(Detector having human AP of 56.4 on COCO val2017 dataset)](https://paddledet.bj.bcebos.com/data/bbox.json) directly, put it at the root path (`PaddleDetection/`), and set `use_gt_bbox: False` in config file. -### 3、Training and Testing +### 3.Training and Testing -​ **Training on single gpu:** +#### Training on single GPU ```shell #COCO DataSet @@ -68,7 +117,7 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/keypoint/higherhrnet/hi CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml ``` -​ **Training on multiple gpu:** +#### Training on multiple GPU ```shell #COCO DataSet @@ -78,7 +127,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml ``` -​ **Evaluation** +#### Evaluation ```shell #COCO DataSet @@ -91,7 +140,7 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32 CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml --save_prediction_only ``` -​ **Inference** +#### Inference ​ Note:Top-down models only support inference for a cropped image with single person. If you want to do inference on image with several people, please see "joint inference by detection and keypoint". Or you can choose a Bottom-up model. @@ -99,22 +148,34 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/higherhrnet/hig CUDA_VISIBLE_DEVICES=0 python3 tools/infer.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=./output/higherhrnet_hrnet_w32_512/model_final.pdparams --infer_dir=../images/ --draw_threshold=0.5 --save_txt=True ``` -​ **Deploy Inference** +#### Deploy Inference + +##### Deployment for Top-Down models ```shell -#export models -python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=output/higherhrnet_hrnet_w32_512/model_final.pdparams +#Export Detection Model +python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams -#deploy inference -#keypoint inference for a single model of top-down/bottom-up method. In this mode, top-down model only support inference for a cropped image with single person. -python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=gpu --threshold=0.5 -python deploy/python/keypoint_infer.py --model_dir=output_inference/hrnet_w32_384x288/ --image_file=./demo/hrnet_demo.jpg --device=gpu --threshold=0.5 -#joint inference by detection and keypoint for top-down models. +#Export Keypoint Model +python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams + +#Deployment for detector and keypoint, which is only for Top-Down models python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file=../video/xxx.mp4 --device=gpu ``` -​ **joint inference with Multi-Object Tracking model FairMOT** +##### Deployment for Bottom-Up models + +```shell +#Export model +python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=output/higherhrnet_hrnet_w32_512/model_final.pdparams + + +#Keypoint independent deployment, which is only for bottom-up models +python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=gpu --threshold=0.5 +``` + +##### Joint Inference with Multi-Object Tracking Model FairMOT ```shell #export FairMOT model @@ -123,17 +184,51 @@ python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.y #joint inference with Multi-Object Tracking model FairMOT python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU ``` + **Note:** To export MOT model, please refer to [Here](../../configs/mot/README_en.md). -### 4、Deploy standalone +### Complete Deploy Instruction and Demo + +​ We provide standalone deploy of PaddleInference(Server-GPU)、PaddleLite(mobile、ARM)、Third-Engine(MNN、OpenVino), which is independent of training codes。For detail, please click [Deploy-docs](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/README_en.md)。 + +## Train with custom data + +We take an example of [tinypose_256x192](./tiny_pose/README_en.md) to show how to train with custom data. -​ We provide standalone deploy of PaddleInference(Server-GPU)、PaddleLite(mobile、ARM)、Third-Engine(MNN、OpenVino), which is independent of training codes。For detail, please click [Deploy-docs](../../deploy/README_en.md)。 +#### 1、For configs [tinypose_256x192.yml](../../configs/keypoint/tiny_pose/tinypose_256x192.yml) -## Benchmark -We provide benchmarks in different runtime environments for your reference when choosing models. See [Keypoint Inference Benchmark](./KeypointBenchmark.md) for details. +you may need to modity these for your job: + +``` +num_joints: &num_joints 17 #the number of joints in your job +train_height: &train_height 256 #the height of model input +train_width: &train_width 192 #the width of model input +hmsize: &hmsize [48, 64] #the shape of model output,usually 1/4 of [w,h] +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #the correspondence between left and right keypoint id,used for flip transform。You can add an line(by "flip: False") behind of flip_pairs in RandomFlipHalfBodyTransform of TrainReader if you don't need it +num_joints_half_body: 8 #The joint numbers of half body, used for half_body transform +prob_half_body: 0.3 #The probility of half_body transform, set to 0 if you don't need it +upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #The joint ids of half(upper) body, used to get the upper joints in half_body transform +``` + +For more configs, please refer to [KeyPointConfigGuide](../../docs/tutorials/KeyPointConfigGuide_en.md)。 + +#### 2、Others(used for test and visualization) +- In keypoint_utils.py, please set: "sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07,.87, .87, .89, .89]) / 10.0", the value indicate the variance of a joint locations,normally 0.25-0.5 means the location is highly accuracy,for example: eyes。0.5-1.0 means the location is not sure so much,for example: shoulder。0.75 is recommand if you not sure。 +- In visualizer.py, please set "EDGES" in draw_pose function,this indicate the line to show between joints for visualization。 +- In pycocotools you installed, please set "sigmas",it is the same as that in keypoint_utils.py, but used for coco evaluation。 + +#### 3、Note for data preparation +- The data should has the same format as Coco data, and the keypoints(Nx3) and bbox(N) should be annotated. +- please set "area">0 in annotations files otherwise it will be skiped while training. Moreover, due to the evaluation mechanism of COCO, the data with small area may also be filtered out during evaluation. We recommend to set `area = bbox_w * bbox_h` when customizing your dataset. + + +## BenchMark + +We provide benchmarks in different runtime environments for your reference when choosing models. See [Keypoint Inference Benchmark](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/keypoint/KeypointBenchmark.md) for details. ## Reference + ``` @inproceedings{cheng2020bottom, title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation}, diff --git a/configs/keypoint/tiny_pose/README.md b/configs/keypoint/tiny_pose/README.md index 6d9c5be02d4eccfff00abc611da96296fab2223c..809b0ab10f30f74969e5f4773ddad1e8add2c8b6 100644 --- a/configs/keypoint/tiny_pose/README.md +++ b/configs/keypoint/tiny_pose/README.md @@ -7,12 +7,20 @@
    图片来源:COCO2017开源数据集
    +## 最新动态 +- **2022.8.01:发布PP-TinyPose升级版。 在健身、舞蹈等场景的业务数据集端到端AP提升9.1** + - 新增体育场景真实数据,复杂动作识别效果显著提升,覆盖侧身、卧躺、跳跃、高抬腿等非常规动作 + - 检测模型升级为[PP-PicoDet增强版](../../../configs/picodet/README.md),在COCO数据集上精度提升3.1% + - 关键点稳定性增强。新增滤波稳定方式,视频预测结果更加稳定平滑 + + ![](https://user-images.githubusercontent.com/15810355/181733705-d0f84232-c6a2-43dd-be70-4a3a246b8fbc.gif) + ## 简介 PP-TinyPose是PaddleDetecion针对移动端设备优化的实时关键点检测模型,可流畅地在移动端设备上执行多人姿态估计任务。借助PaddleDetecion自研的优秀轻量级检测模型[PicoDet](../../picodet/README.md),我们同时提供了特色的轻量级垂类行人检测模型。TinyPose的运行环境有以下依赖要求: - [PaddlePaddle](https://github.com/PaddlePaddle/Paddle)>=2.2 如希望在移动端部署,则还需要: -- [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)>=2.10 +- [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)>=2.11
    @@ -34,28 +42,49 @@ PP-TinyPose是PaddleDetecion针对移动端设备优化的实时关键点检测 ## 模型库 -### 关键点检测模型 -| 模型 | 输入尺寸 | AP (COCO Val) | 单人推理耗时 (FP32) | 单人推理耗时(FP16) | 配置文件 | 模型权重 | 预测部署模型 | Paddle-Lite部署模型(FP32) | Paddle-Lite部署模型(FP16) | -| :---------- | :------: | :-----------: | :-----------------: | :-----------------: | :------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | -| PP-TinyPose | 128*96 | 58.1 | 4.57ms | 3.27ms | [Config](./tinypose_128x96.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16.tar) | -| PP-TinyPose | 256*192 | 68.8 | 14.07ms | 8.33ms | [Config](./tinypose_256x192.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16.tar) | +### Pipeline性能 +| 单人模型配置 | AP (业务数据集) | AP (COCO Val单人)| 单人耗时 (FP32) | 单人耗时 (FP16) | +| :---------------------------------- | :------: | :------: | :---: | :---: | +| PicoDet-S-Lcnet-Pedestrian-192\*192 + PP-TinyPose-128\*96 | 77.1 (+9.1) | 52.3 (+0.5) | 12.90 ms| 9.61 ms | +| 多人模型配置 | AP (业务数据集) | AP (COCO Val多人)| 6人耗时 (FP32) | 6人耗时 (FP16)| +| :------------------------ | :-------: | :-------: | :---: | :---: | +| PicoDet-S-Lcnet-Pedestrian-320\*320 + PP-TinyPose-128\*96 | 78.0 (+7.7) | 50.1 (-0.2) | 47.63 ms| 34.62 ms | -### 行人检测模型 -| 模型 | 输入尺寸 | mAP (COCO Val) | 平均推理耗时 (FP32) | 平均推理耗时 (FP16) | 配置文件 | 模型权重 | 预测部署模型 | Paddle-Lite部署模型(FP32) | Paddle-Lite部署模型(FP16) | -| :------------------- | :------: | :------------: | :-----------------: | :-----------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | -| PicoDet-S-Pedestrian | 192*192 | 29.0 | 4.30ms | 2.37ms | [Config](../../picodet/application/pedestrian_detection/picodet_s_192_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16.tar) | -| PicoDet-S-Pedestrian | 320*320 | 38.5 | 10.26ms | 6.30ms | [Config](../../picodet/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16.tar) | +**说明** +- 关键点检测模型的精度指标是基于对应行人检测模型检测得到的检测框。 +- 精度测试中去除了flip操作,且检测置信度阈值要求0.5。 +- 速度测试环境为qualcomm snapdragon 865,采用arm8下4线程推理。 +- Pipeline速度包含模型的预处理、推理及后处理部分。 +- 精度值的增量对比自历史版本中对应模型组合, 详情请见**历史版本-Pipeline性能**。 +- 精度测试中,为了公平比较,多人数据去除了6人以上(不含6人)的图像。 +### 关键点检测模型 +| 模型 | 输入尺寸 | AP (业务数据集) | AP (COCO Val) | 参数量 | FLOPS |单人推理耗时 (FP32) | 单人推理耗时(FP16) | 配置文件 | 模型权重 | 预测部署模型 | Paddle-Lite部署模型(FP32) | Paddle-Lite部署模型(FP16) | +| :---------- | :------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------------: | :-----------------: | :------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| PP-TinyPose | 128*96 | 84.3 | 58.4 | 1.32 M | 81.56 M | 4.57ms | 3.27ms | [Config](./tinypose_128x96.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_128x96.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_128x96.zip) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_128x96_fp32.nb) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_128x96_fp16.nb) | +| PP-TinyPose | 256*192 | 91.0 | 68.3 | 1.32 M | 326.24M |14.07ms | 8.33ms | [Config](./tinypose_256x192.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_256x192.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_256x192.zip) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_256x192_fp32.nb) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/tinypose_256x192_fp16.nb) | +### 行人检测模型 +| 模型 | 输入尺寸 | mAP (COCO Val-Person) | 参数量 | FLOPS | 平均推理耗时 (FP32) | 平均推理耗时 (FP16) | 配置文件 | 模型权重 | 预测部署模型 | Paddle-Lite部署模型(FP32) | Paddle-Lite部署模型(FP16) | +| :------------------- | :------: | :------------: | :------------: | :------------: | :-----------------: | :-----------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| PicoDet-S-Lcnet-Pedestrian | 192*192 | 31.7 | 1.16 M | 170.03 M | 5.24ms | 3.66ms | [Config](../../picodet/application/pedestrian_detection/picodet_s_192_lcnet_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_192_lcnet_pedestrian.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_192_lcnet_pedestrian.zip) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_192_lcnet_pedestrian_fp32.nb) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_192_lcnet_pedestrian_fp16.nb) | +| PicoDet-S-Lcnet-Pedestrian | 320*320 | 41.6 | 1.16 M | 472.07 M | 13.87ms | 8.94ms | [Config](../../picodet/application/pedestrian_detection/picodet_s_320_lcnet_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_320_lcnet_pedestrian.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_320_lcnet_pedestrian.zip) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_320_lcnet_pedestrian_fp32.nb) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_enhance/picodet_s_320_lcnet_pedestrian_fp16.nb) | + **说明** -- 关键点检测模型与行人检测模型均使用`COCO train2017`和`AI Challenger trainset`作为训练集。关键点检测模型使用`COCO person keypoints val2017`作为测试集,行人检测模型采用`COCO instances val2017`作为测试集。 +- 关键点检测模型与行人检测模型均使用`COCO train2017`, `AI Challenger trainset`以及采集的多姿态场景数据集作为训练集。关键点检测模型使用多姿态场景数据集作为测试集,行人检测模型采用`COCO instances val2017`作为测试集。 - 关键点检测模型的精度指标所依赖的检测框为ground truth标注得到。 - 关键点检测模型与行人检测模型均在4卡环境下训练,若实际训练环境需要改变GPU数量或batch size, 须参考[FAQ](../../../docs/tutorials/FAQ/README.md)对应调整学习率。 - 推理速度测试环境为 Qualcomm Snapdragon 865,采用arm8下4线程推理得到。 +## 历史版本 + +
    +2021版本 + + ### Pipeline性能 | 单人模型配置 | AP (COCO Val 单人) | 单人耗时 (FP32) | 单人耗时 (FP16) | | :------------------------ | :------: | :---: | :---: | @@ -76,6 +105,29 @@ PP-TinyPose是PaddleDetecion针对移动端设备优化的实时关键点检测 - 其他优秀开源模型的测试及部署方案,请参考[这里](https://github.com/zhiboniu/MoveNet-PaddleLite)。 - 更多环境下的性能测试结果,请参考[Keypoint Inference Benchmark](../KeypointBenchmark.md)。 + +### 关键点检测模型 +| 模型 | 输入尺寸 | AP (COCO Val) | 单人推理耗时 (FP32) | 单人推理耗时(FP16) | 配置文件 | 模型权重 | 预测部署模型 | Paddle-Lite部署模型(FP32) | Paddle-Lite部署模型(FP16) | +| :---------- | :------: | :-----------: | :-----------------: | :-----------------: | :------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| PP-TinyPose | 128*96 | 58.1 | 4.57ms | 3.27ms | [Config](./tinypose_128x96.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_lite.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16_lite.tar) | +| PP-TinyPose | 256*192 | 68.8 | 14.07ms | 8.33ms | [Config](./tinypose_256x192.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_lite.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16_lite.tar) | + +### 行人检测模型 +| 模型 | 输入尺寸 | mAP (COCO Val-Person) | 平均推理耗时 (FP32) | 平均推理耗时 (FP16) | 配置文件 | 模型权重 | 预测部署模型 | Paddle-Lite部署模型(FP32) | Paddle-Lite部署模型(FP16) | +| :------------------- | :------: | :------------: | :-----------------: | :-----------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| PicoDet-S-Pedestrian | 192*192 | 29.0 | 4.30ms | 2.37ms | [Config](../../picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_lite.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16_lite.tar) | +| PicoDet-S-Pedestrian | 320*320 | 38.5 | 10.26ms | 6.30ms | [Config](../../picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams) | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.tar) | [Lite部署模型](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_lite.tar) | [Lite部署模型(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16_lite.tar) | + + +**说明** +- 关键点检测模型与行人检测模型均使用`COCO train2017`和`AI Challenger trainset`作为训练集。关键点检测模型使用`COCO person keypoints val2017`作为测试集,行人检测模型采用`COCO instances val2017`作为测试集。 +- 关键点检测模型的精度指标所依赖的检测框为ground truth标注得到。 +- 关键点检测模型与行人检测模型均在4卡环境下训练,若实际训练环境需要改变GPU数量或batch size, 须参考[FAQ](../../../docs/tutorials/FAQ/README.md)对应调整学习率。 +- 推理速度测试环境为 Qualcomm Snapdragon 865,采用arm8下4线程推理得到。 + + +
    + ## 模型训练 关键点检测模型与行人检测模型的训练集在`COCO`以外还扩充了[AI Challenger](https://arxiv.org/abs/1711.06475)数据集,各数据集关键点定义如下: ``` @@ -176,7 +228,7 @@ python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inferen ### 实现移动端部署 #### 直接使用我们提供的模型进行部署 1. 下载模型库中提供的`Paddle-Lite部署模型`,分别获取得到行人检测模型和关键点检测模型的`.nb`格式文件。 -2. 准备Paddle-Lite运行环境, 可直接通过[PaddleLite预编译库下载](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html)获取预编译库,无需自行编译。如需要采用FP16推理,则需要下载[FP16的预编译库](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8_clang_c++_static_with_extra_with_cv_with_fp16.tiny_publish_427e46.zip) +2. 准备Paddle-Lite运行环境, 可直接通过[PaddleLite预编译库下载](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html)获取预编译库,无需自行编译。如需要采用FP16推理,则需要下载FP16的预编译库。 3. 编译模型运行代码,详细步骤见[Paddle-Lite端侧部署](../../../deploy/lite/README.md)。 #### 将训练的模型实现端侧部署 @@ -216,11 +268,14 @@ paddle_lite_opt --model_dir=inference_model/tinypose_128x96 --valid_targets=arm - 在导出模型时增加`TestReader.fuse_normalize=true`参数,可以将对图像的Normalize操作合并在模型中执行,从而实现加速。 - FP16推理可实现更快的模型推理速度。若希望部署FP16模型,除模型转换步骤外,还需要编译支持FP16的Paddle-Lite预测库,详见[Paddle Lite 使用 ARM CPU 预测部署](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/arm_cpu.html)。 +## 关键点稳定策略(仅支持视频推理) +请参考[关键点稳定策略](../README.md#关键点稳定策略仅适用于视频数据)。 + ## 优化策略 TinyPose采用了以下策略来平衡模型的速度和精度表现: - 轻量级的姿态估计任务骨干网络,[wider naive Lite-HRNet](https://arxiv.org/abs/2104.06403)。 -- 更小的输入尺寸。 +- 更小的输入尺寸,以提升整体推理速度。 - 加入Distribution-Aware coordinate Representation of Keypoints ([DARK](https://arxiv.org/abs/1910.06278)),以提升低分辨率热力图下模型的精度表现。 -- Unbiased Data Processing ([UDP](https://arxiv.org/abs/1911.07524))。 -- Augmentation by Information Dropping ([AID](https://arxiv.org/abs/2008.07139v2))。 -- FP16 推理。 +- Unbiased Data Processing ([UDP](https://arxiv.org/abs/1911.07524)),使用无偏数据编解码提升模型精度。 +- Augmentation by Information Dropping ([AID](https://arxiv.org/abs/2008.07139v2)),通过添加信息丢失的数组增强,提升模型对关键点的定位能力。 +- FP16 推理, 实现更快的模型推理速度。 diff --git a/configs/keypoint/tiny_pose/README_en.md b/configs/keypoint/tiny_pose/README_en.md index e632c5b1996b71e5414fb8f596a4a497c9160015..0db1520315a34b1721d15c395896b3133d1ffefa 100644 --- a/configs/keypoint/tiny_pose/README_en.md +++ b/configs/keypoint/tiny_pose/README_en.md @@ -37,14 +37,14 @@ If you want to deploy it on the mobile devives, you also need: ### Keypoint Detection Model | Model | Input Size | AP (COCO Val) | Inference Time for Single Person (FP32)| Inference Time for Single Person(FP16) | Config | Model Weights | Deployment Model | Paddle-Lite Model(FP32) | Paddle-Lite Model(FP16)| | :------------------------ | :-------: | :------: | :------: |:---: | :---: | :---: | :---: | :---: | :---: | -| PP-TinyPose | 128*96 | 58.1 | 4.57ms | 3.27ms | [Config](./tinypose_128x96.yml) |[Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16.tar) | -| PP-TinyPose | 256*192 | 68.8 | 14.07ms | 8.33ms | [Config](./tinypose_256x192.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16.tar) | +| PP-TinyPose | 128*96 | 58.1 | 4.57ms | 3.27ms | [Config](./tinypose_128x96.yml) |[Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_lite.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16_lite.tar) | +| PP-TinyPose | 256*192 | 68.8 | 14.07ms | 8.33ms | [Config](./tinypose_256x192.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_lite.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16_lite.tar) | ### Pedestrian Detection Model | Model | Input Size | mAP (COCO Val) | Average Inference Time (FP32)| Average Inference Time (FP16) | Config | Model Weights | Deployment Model | Paddle-Lite Model(FP32) | Paddle-Lite Model(FP16)| | :------------------------ | :-------: | :------: | :------: | :---: | :---: | :---: | :---: | :---: | :---: | -| PicoDet-S-Pedestrian | 192*192 | 29.0 | 4.30ms | 2.37ms | [Config](../../picodet/application/pedestrian_detection/picodet_s_192_pedestrian.yml) |[Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16.tar) | -| PicoDet-S-Pedestrian | 320*320 | 38.5 | 10.26ms | 6.30ms | [Config](../../picodet/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16.tar) | +| PicoDet-S-Pedestrian | 192*192 | 29.0 | 4.30ms | 2.37ms | [Config](../../picodet/legacy_model/application/pedestrian_detection/picodet_s_192_pedestrian.yml) |[Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_lite.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16_lite.tar) | +| PicoDet-S-Pedestrian | 320*320 | 38.5 | 10.26ms | 6.30ms | [Config](../../picodet/legacy_model/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_lite.tar) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16_lite.tar) | **Tips** diff --git a/configs/mot/DataDownload.md b/configs/mot/DataDownload.md new file mode 100644 index 0000000000000000000000000000000000000000..d17cadf42631eed066700986d7c97ef04cd2a681 --- /dev/null +++ b/configs/mot/DataDownload.md @@ -0,0 +1,38 @@ +# 多目标跟踪数据集下载汇总 +## 目录 +- [行人跟踪](#行人跟踪) +- [车辆跟踪](#车辆跟踪) +- [人头跟踪](#人头跟踪) +- [多类别跟踪](#多类别跟踪) + +## 行人跟踪 + +| 数据集 | 下载链接 | 备注 | +| :-------------| :-------------| :----: | +| MOT17 | [download](https://dataset.bj.bcebos.com/mot/MOT16.zip) | - | +| MOT16 | [download](https://dataset.bj.bcebos.com/mot/MOT16.zip) | - | +| Caltech | [download](https://dataset.bj.bcebos.com/mot/Caltech.zip) | - | +| Cityscapes | [download](https://dataset.bj.bcebos.com/mot/Cityscapes.zip) | - | +| CUHKSYSU | [download](https://dataset.bj.bcebos.com/mot/CUHKSYSU.zip) | - | +| PRW | [download](https://dataset.bj.bcebos.com/mot/PRW.zip) | - | +| ETHZ | [download](https://dataset.bj.bcebos.com/mot/ETHZ.zip) | - | + +## 车辆跟踪 + +| 数据集 | 下载链接 | 备注 | +| :-------------| :-------------| :----: | +| AICity21 | [download](https://bj.bcebos.com/v1/paddledet/data/mot/aic21mtmct_vehicle.zip) | - | + + +## 人头跟踪 + +| 数据集 | 下载链接 | 备注 | +| :-------------| :-------------| :----: | +| HT21 | [download](https://bj.bcebos.com/v1/paddledet/data/mot/HT21.zip) | - | + + +## 多类别跟踪 + +| 数据集 | 下载链接 | 备注 | +| :-------------| :-------------| :----: | +| VisDrone-MOT | [download](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot.zip) | - | diff --git a/configs/mot/README.md b/configs/mot/README.md index 18284eb34c07682d5af98eb26d5e8886adc9bc81..f21d760be587a3aeb8a5395b3f7e95232f8172b6 100644 --- a/configs/mot/README.md +++ b/configs/mot/README.md @@ -5,33 +5,39 @@ ## 内容 - [简介](#简介) - [安装依赖](#安装依赖) -- [模型库](#模型库) -- [数据集准备](#数据集准备) +- [模型库和选型](#模型库和选型) +- [MOT数据集准备](#MOT数据集准备) + - [SDE数据集](#SDE数据集) + - [JDE数据集](#JDE数据集) + - [用户自定义数据集准备](#用户自定义数据集准备) - [引用](#引用) ## 简介 -当前主流的Tracking By Detecting方式的多目标追踪(Multi-Object Tracking, MOT)算法主要由两部分组成:Detection+Embedding。Detection部分即针对视频,检测出每一帧中的潜在目标。Embedding部分则将检出的目标分配和更新到已有的对应轨迹上(即ReID重识别任务)。根据这两部分实现的不同,又可以划分为**SDE**系列和**JDE**系列算法。 -- SDE(Separate Detection and Embedding)这类算法完全分离Detection和Embedding两个环节,最具代表性的就是**DeepSORT**算法。这样的设计可以使系统无差别的适配各类检测器,可以针对两个部分分别调优,但由于流程上是串联的导致速度慢耗时较长,在构建实时MOT系统中面临较大挑战。 +多目标跟踪(Multi-Object Tracking, MOT)是对给定视频或图片序列,定位出多个感兴趣的目标,并在连续帧之间维持个体的ID信息和记录其轨迹。 +当前主流的做法是Tracking By Detecting方式,算法主要由两部分组成:Detection + Embedding。Detection部分即针对视频,检测出每一帧中的潜在目标。Embedding部分则将检出的目标分配和更新到已有的对应轨迹上(即ReID重识别任务),进行物体间的长时序关联。根据这两部分实现的不同,又可以划分为**SDE**系列和**JDE**系列算法。 +- SDE(Separate Detection and Embedding)这类算法完全分离Detection和Embedding两个环节,最具代表性的是**DeepSORT**算法。这样的设计可以使系统无差别的适配各类检测器,可以针对两个部分分别调优,但由于流程上是串联的导致速度慢耗时较长。也有算法如**ByteTrack**算法为了降低耗时,不使用Embedding特征来计算外观相似度,前提是检测器的精度足够高。 - JDE(Joint Detection and Embedding)这类算法完是在一个共享神经网络中同时学习Detection和Embedding,使用一个多任务学习的思路设置损失函数。代表性的算法有**JDE**和**FairMOT**。这样的设计兼顾精度和速度,可以实现高精度的实时多目标跟踪。 -PaddleDetection实现了这两个系列的3种多目标跟踪算法,分别是SDE系列的[DeepSORT](https://arxiv.org/abs/1812.00442)和JDE系列的[JDE](https://arxiv.org/abs/1909.12605)与[FairMOT](https://arxiv.org/abs/2004.01888)。 +PaddleDetection中提供了SDE和JDE两个系列的多种算法实现: +- SDE + - [ByteTrack](./bytetrack) + - [DeepSORT](./deepsort) +- JDE + - [JDE](./jde) + - [FairMOT](./fairmot) + - [MCFairMOT](./mcfairmot) -### PP-Tracking 实时多目标跟踪系统 -此外,PaddleDetection还提供了[PP-Tracking](../../deploy/pptracking/README.md)实时多目标跟踪系统。PP-Tracking是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统,具有模型丰富、应用广泛和部署高效三大优势。 -PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式,针对实际业务的难点和痛点,提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用,部署方式支持API调用和GUI可视化界面,部署语言支持Python和C++,部署平台环境支持Linux、NVIDIA Jetson等。 - -### AI Studio公开项目案例 -PP-Tracking 提供了AI Studio公开项目案例,教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。 - -### Python端预测部署 -PP-Tracking 支持Python预测部署,教程请参考[PP-Tracking Python部署文档](../../deploy/pptracking/python/README.md)。 +**注意:** + - 以上算法原论文均为单类别的多目标跟踪,PaddleDetection团队同时也支持了[ByteTrack](./bytetrack)和FairMOT([MCFairMOT](./mcfairmot))的多类别的多目标跟踪; + - [DeepSORT](./deepsort)和[JDE](./jde)均只支持单类别的多目标跟踪; + - [DeepSORT](./deepsort)需要额外添加ReID权重一起执行,[ByteTrack](./bytetrack)可加可不加ReID权重,默认不加; -### C++端预测部署 -PP-Tracking 支持C++预测部署,教程请参考[PP-Tracking C++部署文档](../../deploy/pptracking/cpp/README.md)。 -### GUI可视化界面预测部署 -PP-Tracking 提供了简洁的GUI可视化界面,教程请参考[PP-Tracking可视化界面试用版使用文档](https://github.com/yangyudong2020/PP-Tracking_GUi)。 +### 实时多目标跟踪系统 PP-Tracking +PaddleDetection团队提供了实时多目标跟踪系统[PP-Tracking](../../deploy/pptracking),是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统,具有模型丰富、应用广泛和部署高效三大优势。 +PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式,针对实际业务的难点和痛点,提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用,部署方式支持API调用和GUI可视化界面,部署语言支持Python和C++,部署平台环境支持Linux、NVIDIA Jetson等。 +PP-Tracking单镜头跟踪采用的方案是[FairMOT](./fairmot),跨镜头跟踪采用的方案是[DeepSORT](./deepsort)。
    @@ -43,20 +49,46 @@ PP-Tracking 提供了简洁的GUI可视化界面,教程请参考[PP-Tracking 视频来源:VisDrone和BDD100K公开数据集
    +#### AI Studio公开项目案例 +教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。 + +#### Python端预测部署 +教程请参考[PP-Tracking Python部署文档](../../deploy/pptracking/python/README.md)。 + +#### C++端预测部署 +教程请参考[PP-Tracking C++部署文档](../../deploy/pptracking/cpp/README.md)。 + +#### GUI可视化界面预测部署 +教程请参考[PP-Tracking可视化界面使用文档](https://github.com/yangyudong2020/PP-Tracking_GUi)。 + + +### 实时行人分析工具 PP-Human +PaddleDetection团队提供了实时行人分析工具[PP-Human](../../deploy/pipeline),是基于PaddlePaddle深度学习框架的业界首个开源的产业级实时行人分析工具,具有模型丰富、应用广泛和部署高效三大优势。 +PP-Human支持图片/单镜头视频/多镜头视频多种输入方式,功能覆盖多目标跟踪、属性识别、行为分析及人流量计数与轨迹记录。能够广泛应用于智慧交通、智慧社区、工业巡检等领域。支持服务器端部署及TensorRT加速,T4服务器上可达到实时。 +PP-Human跟踪采用的方案是[ByteTrack](./bytetrack)。 + +![](https://user-images.githubusercontent.com/48054808/173030254-ecf282bd-2cfe-43d5-b598-8fed29e22020.gif) + +#### AI Studio公开项目案例 +PP-Human实时行人分析全流程实战教程[链接](https://aistudio.baidu.com/aistudio/projectdetail/3842982)。 + +PP-Human赋能社区智能精细化管理教程[链接](https://aistudio.baidu.com/aistudio/projectdetail/3679564)。 + + ## 安装依赖 一键安装MOT相关的依赖: ``` -pip install lap sklearn motmetrics openpyxl cython_bbox -或者 pip install -r requirements.txt +# 或手动pip安装MOT相关的库 +pip install lap motmetrics sklearn filterpy ``` **注意:** -- `cython_bbox`在windows上安装:`pip install -e git+https://github.com/samson-wang/cython_bbox.git#egg=cython-bbox`。可参考这个[教程](https://stackoverflow.com/questions/60349980/is-there-a-way-to-install-cython-bbox-for-windows)。 -- 预测需确保已安装[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + - 预测需确保已安装[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + -## 模型库 +## 模型库和选型 - 基础模型 - [ByteTrack](bytetrack/README_cn.md) - [DeepSORT](deepsort/README_cn.md) @@ -71,25 +103,75 @@ pip install -r requirements.txt - 跨境头跟踪 - [跨境头跟踪](mtmct/README_cn.md) +### 模型选型总结 -## 数据集准备 -### MOT数据集 -PaddleDetection复现[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 和[FairMOT](https://github.com/ifzhang/FairMOT),是使用的和他们相同的MIX数据集,包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。使用前6者作为联合数据集参与训练,MOT16作为评测数据集。如果您想使用这些数据集,请**遵循他们的License**。 +关于模型选型,PaddleDetection团队提供的总结建议如下: + +| MOT方式 | 经典算法 | 算法流程 | 数据集要求 | 其他特点 | +| :--------------| :--------------| :------- | :----: | :----: | +| SDE系列 | DeepSORT,ByteTrack | 分离式,两个独立模型权重先检测后ReID,也可不加ReID | 检测和ReID数据相对独立,不加ReID时即纯检测数据集 |检测和ReID可分别调优,鲁棒性较高,AI竞赛常用| +| JDE系列 | FairMOT | 联合式,一个模型权重端到端同时检测和ReID | 必须同时具有检测和ReID标注 | 检测和ReID联合训练,不易调优,泛化性不强| **注意:** -- 多目标跟踪数据集一般是用于单类别的多目标跟踪,DeepSORT、JDE和FairMOT均为单类别跟踪模型,MIX数据集以及其子数据集也都是单类别的行人跟踪数据集,可认为相比于行人检测数据集多了id号的标注。 -- 为了训练更多场景的垂类模型例如车辆等,垂类数据集也需要处理成与MIX数据集相同的格式,PaddleDetection也提供了[车辆跟踪](vehicle/README_cn.md)、[人头跟踪](headtracking21/README_cn.md)以及更通用的[行人跟踪](pedestrian/README_cn.md)的垂类数据集和模型。用户自定义数据集也可参照[数据准备文档](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。 -- 多类别跟踪模型是[MCFairMOT](mcfairmot/README_cn.md),多类别数据集是VisDrone数据集的整合版,可参照[MCFairMOT](mcfairmot/README_cn.md)的文档说明。 -- 跨镜头跟踪模型,是选用的[AIC21 MTMCT](https://www.aicitychallenge.org) (CityFlow)车辆跨镜头跟踪数据集,数据集和模型可参照[跨境头跟踪](mtmct/README_cn.md)的文档说明。 + - 由于数据标注的成本较大,建议选型前优先考虑**数据集要求**,如果数据集只有检测框标注而没有ReID标注,是无法使用JDE系列算法训练的,更推荐使用SDE系列; + - SDE系列算法在检测器精度足够高时,也可以不使用ReID权重进行物体间的长时序关联,可以参照[ByteTrack](bytetrack); + - 耗时速度和模型权重参数量计算量有一定关系,耗时从理论上看`不使用ReID的SDE系列 < JDE系列 < 使用ReID的SDE系列`; + + + +## MOT数据集准备 +PaddleDetection团队提供了众多公开数据集或整理后数据集的下载链接,参考[数据集下载汇总](DataDownload.md),用户可以自行下载使用。 + +根据模型选型总结,MOT数据集可以分为两类:一类纯检测框标注的数据集,仅SDE系列可以使用;另一类是同时有检测和ReID标注的数据集,SDE系列和JDE系列都可以使用。 -### 数据集目录 -首先按照以下命令下载image_lists.zip并解压放在`PaddleDetection/dataset/mot`目录下: +### SDE数据集 +SDE数据集是纯检测标注的数据集,用户自定义数据集可以参照[DET数据准备文档](../../docs/tutorials/data/PrepareDetDataSet.md)准备。 + +以MOT17数据集为例,下载并解压放在`PaddleDetection/dataset/mot`目录下: +``` +wget https://dataset.bj.bcebos.com/mot/MOT17.zip + +``` +并修改数据集部分的配置文件如下: +``` +num_classes: 1 + +TrainDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/train_half.json + image_dir: images/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + image_dir: images/train + +TestDataset: + !ImageFolder + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json +``` + +数据集目录为: +``` +dataset/mot + |——————MOT17 + |——————annotations + |——————images +``` + +### JDE数据集 +JDE数据集是同时有检测和ReID标注的数据集,首先按照以下命令`image_lists.zip`并解压放在`PaddleDetection/dataset/mot`目录下: ``` wget https://dataset.bj.bcebos.com/mot/image_lists.zip ``` -然后按照以下命令可以快速下载MIX数据集的各个子数据集,并解压放在`PaddleDetection/dataset/mot`目录下: +然后按照以下命令可以快速下载各个公开数据集,也解压放在`PaddleDetection/dataset/mot`目录下: ``` +# MIX数据,同JDE,FairMOT论文使用的数据集 wget https://dataset.bj.bcebos.com/mot/MOT17.zip wget https://dataset.bj.bcebos.com/mot/Caltech.zip wget https://dataset.bj.bcebos.com/mot/CUHKSYSU.zip @@ -98,24 +180,17 @@ wget https://dataset.bj.bcebos.com/mot/Cityscapes.zip wget https://dataset.bj.bcebos.com/mot/ETHZ.zip wget https://dataset.bj.bcebos.com/mot/MOT16.zip ``` - -最终目录为: +数据集目录为: ``` dataset/mot |——————image_lists - |——————caltech.10k.val |——————caltech.all - |——————caltech.train - |——————caltech.val |——————citypersons.train - |——————citypersons.val |——————cuhksysu.train - |——————cuhksysu.val |——————eth.train |——————mot16.train |——————mot17.train |——————prw.train - |——————prw.val |——————Caltech |——————Cityscapes |——————CUHKSYSU @@ -125,7 +200,7 @@ dataset/mot |——————PRW ``` -### 数据格式 +#### JDE数据集的格式 这几个相关数据集都遵循以下结构: ``` MOT17 @@ -139,11 +214,20 @@ MOT17 ``` [class] [identity] [x_center] [y_center] [width] [height] ``` -**注意**: -- `class`为类别id,支持单类别和多类别,从`0`开始计,单类别即为`0`。 -- `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 -- `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意他们的值是由图片的宽度/高度标准化的,因此它们是从0到1的浮点数。 + - `class`为类别id,支持单类别和多类别,从`0`开始计,单类别即为`0`。 + - `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 + - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意他们的值是由图片的宽度/高度标准化的,因此它们是从0到1的浮点数。 + + +**注意:** + - MIX数据集是[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT)和[FairMOT](https://github.com/ifzhang/FairMOT)原论文使用的数据集,包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。使用前6者作为联合数据集参与训练,MOT16作为评测数据集。如果您想使用这些数据集,请**遵循他们的License**。 + - MIX数据集以及其子数据集都是单类别的行人跟踪数据集,可认为相比于行人检测数据集多了id号的标注。 + - 更多场景的垂类模型例如车辆行人人头跟踪等,垂类数据集也需要处理成与MIX数据集相同的格式,参照[数据集下载汇总](DataDownload.md)、[车辆跟踪](vehicle/README_cn.md)、[人头跟踪](headtracking21/README_cn.md)以及更通用的[行人跟踪](pedestrian/README_cn.md)。 + - 用户自定义数据集可参照[MOT数据集准备教程](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。 + +### 用户自定义数据集准备 +用户自定义数据集准备请参考[MOT数据集准备教程](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。 ## 引用 ``` diff --git a/configs/mot/README_en.md b/configs/mot/README_en.md index e23bc451b36cb100440d47e90351eb6d22879983..eafd462ce782ab19e3f53cc69ebdd25a3d74701e 100644 --- a/configs/mot/README_en.md +++ b/configs/mot/README_en.md @@ -49,12 +49,11 @@ PP-Tracking supports GUI predict and deployment. Please refer to this [doc](http ## Installation Install all the related dependencies for MOT: ``` -pip install lap sklearn motmetrics openpyxl cython_bbox +pip install lap motmetrics sklearn filterpy or pip install -r requirements.txt ``` **Notes:** -- Install `cython_bbox` for Windows: `pip install -e git+https://github.com/samson-wang/cython_bbox.git#egg=cython-bbox`. You can refer to this [tutorial](https://stackoverflow.com/questions/60349980/is-there-a-way-to-install-cython-bbox-for-windows). - Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`. @@ -80,7 +79,7 @@ PaddleDetection implement [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT **Notes:** - Multi-Object Tracking(MOT) datasets are always used for single category tracking. DeepSORT, JDE and FairMOT are single category MOT models. 'MIX' dataset and it's sub datasets are also single category pedestrian tracking datasets. It can be considered that there are additional IDs ground truth for detection datasets. -- In order to train the feature models of more scenes, more datasets are also processed into the same format as the MIX dataset. PaddleDetection Team also provides feature datasets and models of [vehicle tracking](vehicle/readme.md), [head tracking](headtracking21/readme.md) and more general [pedestrian tracking](pedestrian/readme.md). User defined datasets can also be prepared by referring to data preparation [doc](../../docs/tutorials/PrepareMOTDataSet.md). +- In order to train the feature models of more scenes, more datasets are also processed into the same format as the MIX dataset. PaddleDetection Team also provides feature datasets and models of [vehicle tracking](vehicle/README.md), [head tracking](headtracking21/README.md) and more general [pedestrian tracking](pedestrian/README.md). User defined datasets can also be prepared by referring to data preparation [doc](../../docs/tutorials/data/PrepareMOTDataSet.md). - The multipe category MOT model is [MCFairMOT] (mcfairmot/readme_cn.md), and the multi category dataset is the integrated version of VisDrone dataset. Please refer to the doc of [MCFairMOT](mcfairmot/README.md). - The Multi-Target Multi-Camera Tracking (MTMCT) model is [AIC21 MTMCT](https://www.aicitychallenge.org)(CityFlow) Multi-Camera Vehicle Tracking dataset. The dataset and model can refer to the doc of [MTMCT](mtmct/README.md) diff --git a/configs/mot/bytetrack/README_cn.md b/configs/mot/bytetrack/README_cn.md index 242319c08332eedc66c6a9f72c85e6f7e6731d87..8e93134f83bac05abfa85ea8551f9571913df0f9 100644 --- a/configs/mot/bytetrack/README_cn.md +++ b/configs/mot/bytetrack/README_cn.md @@ -13,18 +13,38 @@ ## 模型库 -### ByteTrack在MOT-17 half Val Set上结果 +### 基于不同检测器的ByteTrack在MOT-17 half Val Set上结果 -| 检测训练数据集 | 检测器 | 输入尺度 | ReID | 检测mAP | MOTA | IDF1 | FPS | 配置文件 | +| 检测训练数据集 | 检测器 | 输入尺度 | ReID | 检测mAP(0.5:0.95) | MOTA | IDF1 | FPS | 配置文件 | | :-------- | :----- | :----: | :----:|:------: | :----: |:-----: |:----:|:----: | | MOT-17 half train | YOLOv3 | 608x608 | - | 42.7 | 49.5 | 54.8 | - |[配置文件](./bytetrack_yolov3.yml) | -| MOT-17 half train | PPYOLOe | 640x640 | - | 52.9 | 50.4 | 59.7 | - |[配置文件](./bytetrack_ppyoloe.yml) | -| MOT-17 half train | PPYOLOe | 640x640 |PPLCNet| 52.9 | 51.7 | 58.8 | - |[配置文件](./bytetrack_ppyoloe_pplcnet.yml) | +| MOT-17 half train | PP-YOLOE-l | 640x640 | - | 52.9 | 50.4 | 59.7 | - |[配置文件](./bytetrack_ppyoloe.yml) | +| MOT-17 half train | PP-YOLOE-l | 640x640 |PPLCNet| 52.9 | 51.7 | 58.8 | - |[配置文件](./bytetrack_ppyoloe_pplcnet.yml) | +| **mix_mot_ch** | YOLOX-x | 800x1440| - | 61.9 | 77.3 | 71.6 | - |[配置文件](./bytetrack_yolox.yml) | +| **mix_det** | YOLOX-x | 800x1440| - | 65.4 | 84.5 | 77.4 | - |[配置文件](./bytetrack_yolox.yml) | **注意:** -- 模型权重下载链接在配置文件中的```det_weights```和```reid_weights```,运行验证的命令即可自动下载。 -- ByteTrack的训练是单独的检测器训练MOT数据集,推理是组装跟踪器去评估MOT指标,单独的检测模型也可以评估检测指标。 -- ByteTrack的导出部署,是单独导出检测模型,再组装跟踪器运行的,参照[PP-Tracking](../../../deploy/pptracking/python/README.md)。 + - 检测任务相关配置和文档请查看[detector](detector/) + + +### YOLOX-x ByteTrack(mix_det) + +[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pp-yoloe-an-evolved-version-of-yolo/multi-object-tracking-on-mot16)](https://paperswithcode.com/sota/multi-object-tracking-on-mot16?p=pp-yoloe-an-evolved-version-of-yolo) + +| 网络 | 测试集 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| :---------: | :-------: | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | +| ByteTrack-x| MOT-17 Train | 84.4 | 72.8 | 837 | 5653 | 10985 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) | +| ByteTrack-x| MOT-17 Test | 78.4 | 69.7 | 4974 | 37551 | 79524 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) | +| ByteTrack-x| MOT-16 Train | 83.5 | 72.7 | 800 | 6973 | 10419 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) | +| ByteTrack-x| MOT-16 Test | 77.7 | 70.1 | 1570 | 15695 | 23304 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) | + + +**注意:** + - 模型权重下载链接在配置文件中的```det_weights```和```reid_weights```,运行```tools/eval_mot.py```评估的命令即可自动下载,```reid_weights```若为None则表示不需要使用,ByteTrack默认不使用ReID权重。 + - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集,而为了验证精度可以都用**MOT17-half val**数据集去评估,它是每个视频的后一半帧组成的,数据集可以从[此链接](https://dataset.bj.bcebos.com/mot/MOT17.zip)下载,并解压放在`dataset/mot/`文件夹下。 + - **mix_mot_ch**数据集,是MOT17、CrowdHuman组成的联合数据集,**mix_det**是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集,数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation),最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。 + - ByteTrack的训练是单独的检测器训练MOT数据集,推理是组装跟踪器去评估MOT指标,单独的检测模型也可以评估检测指标。 + - ByteTrack的导出部署,是单独导出检测模型,再组装跟踪器运行的,参照[PP-Tracking](../../../deploy/pptracking/python/README.md)。 ## 快速开始 @@ -32,13 +52,20 @@ ### 1. 训练 通过如下命令一键式启动训练和评估 ```bash -python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp --fleet +python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp +# 或者 +python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml --eval --amp ``` +**注意:** + - ` --eval`是边训练边验证精度;`--amp`是混合精度训练避免溢出,推荐使用paddlepaddle2.2.2版本。 + ### 2. 评估 #### 2.1 评估检测效果 ```bash -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams ``` **注意:** @@ -51,30 +78,41 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetra CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --scaled=True # 或者 CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml --scaled=True +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_yolox.yml --scaled=True ``` **注意:** - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE YOLOv3则为False,如果使用通用检测模型则为True, 默认值是False。 - - 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。 + - 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置,默认文件夹名为`output`。 ### 3. 预测 使用单个GPU通过如下命令预测一个视频,并保存为视频 ```bash -CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --video_file={your video name}.mp4 --scaled=True --save_videos +# 下载demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + +# 使用PPYOLOe行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos +# 或者使用YOLOX行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_yolox.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos ``` **注意:** - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 + - `--save_videos`表示保存可视化视频,同时会保存可视化的图片在`{output_dir}/mot_outputs/`中,`{output_dir}`可通过`--output_dir`设置,默认文件夹名为`output`。 ### 4. 导出预测模型 Step 1:导出检测模型 ```bash -# 导出PPYOLe行人检测模型 +# 导出PPYOLOe行人检测模型 CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +# 或者导出YOLOX行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams ``` Step 2:导出ReID模型(可选步骤,默认不需要) @@ -83,15 +121,17 @@ Step 2:导出ReID模型(可选步骤,默认不需要) CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams ``` -### 4. 用导出的模型基于Python去预测 +### 5. 用导出的模型基于Python去预测 ```bash -python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=tracker_config.yml --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts +# 或者 +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/yolox_x_24e_800x1440_mix_det/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts ``` + **注意:** - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 - - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 ## 引用 diff --git a/configs/mot/bytetrack/_base_/ht21.yml b/configs/mot/bytetrack/_base_/ht21.yml new file mode 100644 index 0000000000000000000000000000000000000000..8500af3165e1173cc442396ace1af54f09ab810a --- /dev/null +++ b/configs/mot/bytetrack/_base_/ht21.yml @@ -0,0 +1,34 @@ +metric: COCO +num_classes: 1 + +# Detection Dataset for training +TrainDataset: + !COCODataSet + image_dir: images/train + anno_path: annotations/train.json + dataset_dir: dataset/mot/HT21 + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images/train + anno_path: annotations/val_half.json + dataset_dir: dataset/mot/HT21 + +TestDataset: + !ImageFolder + dataset_dir: dataset/mot/HT21 + anno_path: annotations/val_half.json + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: HT21/images/test + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/configs/mot/bytetrack/_base_/mix_det.yml b/configs/mot/bytetrack/_base_/mix_det.yml new file mode 100644 index 0000000000000000000000000000000000000000..fbe19bdaa29246919189d5d93a3ea01e3734b52c --- /dev/null +++ b/configs/mot/bytetrack/_base_/mix_det.yml @@ -0,0 +1,34 @@ +metric: COCO +num_classes: 1 + +# Detection Dataset for training +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/mot/mix_det + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images/train + anno_path: annotations/val_half.json + dataset_dir: dataset/mot/MOT17 + +TestDataset: + !ImageFolder + anno_path: annotations/val_half.json + dataset_dir: dataset/mot/MOT17 + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/configs/mot/bytetrack/_base_/mix_mot_ch.yml b/configs/mot/bytetrack/_base_/mix_mot_ch.yml new file mode 100644 index 0000000000000000000000000000000000000000..a19f149301a1d993c552a12e60144f63990d6f4d --- /dev/null +++ b/configs/mot/bytetrack/_base_/mix_mot_ch.yml @@ -0,0 +1,34 @@ +metric: COCO +num_classes: 1 + +# Detection Dataset for training +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/mot/mix_mot_ch + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images/train + anno_path: annotations/val_half.json + dataset_dir: dataset/mot/MOT17 + +TestDataset: + !ImageFolder + anno_path: annotations/val_half.json + dataset_dir: dataset/mot/MOT17 + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/configs/mot/bytetrack/_base_/mot17.yml b/configs/mot/bytetrack/_base_/mot17.yml index 2efa55546026168c39396c4d51a71428e19a0638..faf47f622d1c2847a9686dfa8d7e48a49c05436c 100644 --- a/configs/mot/bytetrack/_base_/mot17.yml +++ b/configs/mot/bytetrack/_base_/mot17.yml @@ -17,6 +17,7 @@ EvalDataset: TestDataset: !ImageFolder + dataset_dir: dataset/mot/MOT17 anno_path: annotations/val_half.json diff --git a/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml b/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml index c1e7ab8418956810ae6d2788cd4d67b9f2e17775..ef6342fd0e9249acf386b7795cb538b73a26f108 100644 --- a/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml +++ b/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml @@ -1,4 +1,8 @@ -worker_num: 8 +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + TrainReader: sample_transforms: - Decode: {} @@ -20,17 +24,17 @@ TrainReader: EvalReader: sample_transforms: - Decode: {} - - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} - Permute: {} batch_size: 8 TestReader: inputs_def: - image_shape: [3, 640, 640] + image_shape: [3, *eval_height, *eval_width] sample_transforms: - Decode: {} - - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} - Permute: {} batch_size: 1 @@ -40,17 +44,17 @@ TestReader: EvalMOTReader: sample_transforms: - Decode: {} - - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} - Permute: {} batch_size: 1 TestMOTReader: inputs_def: - image_shape: [3, 640, 640] + image_shape: [3, *eval_height, *eval_width] sample_transforms: - Decode: {} - - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} - Permute: {} batch_size: 1 diff --git a/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml b/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml new file mode 100644 index 0000000000000000000000000000000000000000..48d4144221f6fa353af90ce3781a21329a566751 --- /dev/null +++ b/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml @@ -0,0 +1,67 @@ + +input_height: &input_height 800 +input_width: &input_width 1440 +input_size: &input_size [*input_height, *input_width] + +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - Mosaic: + prob: 1.0 + input_dim: *input_size + degrees: [-10, 10] + scale: [0.1, 2.0] + shear: [-2, 2] + translate: [-0.1, 0.1] + enable_mixup: True + mixup_prob: 1.0 + mixup_scale: [0.5, 1.5] + - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30} + - PadResize: {target_size: *input_size} + - RandomFlip: {} + batch_transforms: + - Permute: {} + batch_size: 6 + shuffle: True + drop_last: True + collate_batch: False + mosaic_epoch: 20 + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *input_size, keep_ratio: True} + - Pad: {size: *input_size, fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 800, 1440] + sample_transforms: + - Decode: {} + - Resize: {target_size: *input_size, keep_ratio: True} + - Pad: {size: *input_size, fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 + + +# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT +EvalMOTReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *input_size, keep_ratio: True} + - Pad: {size: *input_size, fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 + +TestMOTReader: + inputs_def: + image_shape: [3, 800, 1440] + sample_transforms: + - Decode: {} + - Resize: {target_size: *input_size, keep_ratio: True} + - Pad: {size: *input_size, fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 diff --git a/configs/mot/bytetrack/bytetrack_ppyoloe.yml b/configs/mot/bytetrack/bytetrack_ppyoloe.yml index 08b7a00d89b79ad1bd1e2753738f22fcc66e657c..5e7ffe07f0f758c641596e90ee0da4c31085fd85 100644 --- a/configs/mot/bytetrack/bytetrack_ppyoloe.yml +++ b/configs/mot/bytetrack/bytetrack_ppyoloe.yml @@ -8,7 +8,7 @@ weights: output/bytetrack_ppyoloe/model_final log_iter: 20 snapshot_epoch: 2 -metric: MOT # eval/infer mode +metric: MOT # eval/infer mode, set 'COCO' can be training mode num_classes: 1 architecture: ByteTrack @@ -33,7 +33,6 @@ PPYOLOEHead: grid_cell_offset: 0.5 static_assigner_epoch: -1 # 100 use_varifocal_loss: True - eval_input_size: [640, 640] loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} static_assigner: name: ATSSAssigner diff --git a/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml b/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml index 98ea15f15299b9d550bc4de1a53fe203e7cd61fc..60f81165d5b324943a997dbc26fbe56f249f2ef6 100644 --- a/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml +++ b/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml @@ -33,7 +33,6 @@ PPYOLOEHead: grid_cell_offset: 0.5 static_assigner_epoch: -1 # 100 use_varifocal_loss: True - eval_input_size: [640, 640] loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} static_assigner: name: ATSSAssigner diff --git a/configs/mot/bytetrack/bytetrack_yolox.yml b/configs/mot/bytetrack/bytetrack_yolox.yml new file mode 100644 index 0000000000000000000000000000000000000000..2e195c56d00cfc696e93fee4e9f709f123b5dcec --- /dev/null +++ b/configs/mot/bytetrack/bytetrack_yolox.yml @@ -0,0 +1,68 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + 'detector/yolox_x_24e_800x1440_mix_det.yml', + '_base_/mix_det.yml', + '_base_/yolox_mot_reader_800x1440.yml' +] +weights: output/bytetrack_yolox/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +ByteTrack: + detector: YOLOX + reid: None + tracker: JDETracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams +reid_weights: None + +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. + + +# BYTETracker +JDETracker: + use_byte: True + match_thres: 0.9 + conf_thres: 0.6 + low_conf_thres: 0.2 + min_box_area: 100 + vertical_ratio: 1.6 # for pedestrian diff --git a/configs/mot/bytetrack/bytetrack_yolox_ht21.yml b/configs/mot/bytetrack/bytetrack_yolox_ht21.yml new file mode 100644 index 0000000000000000000000000000000000000000..ea21a87c5ed1ec8297155c80b8e7136e1941c636 --- /dev/null +++ b/configs/mot/bytetrack/bytetrack_yolox_ht21.yml @@ -0,0 +1,68 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + 'detector/yolox_x_24e_800x1440_ht21.yml', + '_base_/ht21.yml', + '_base_/yolox_mot_reader_800x1440.yml' +] +weights: output/bytetrack_yolox_ht21/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +ByteTrack: + detector: YOLOX + reid: None + tracker: JDETracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_ht21.pdparams +reid_weights: None + +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 30000 + keep_top_k: 1000 + score_threshold: 0.01 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. + + +# BYTETracker +JDETracker: + use_byte: True + match_thres: 0.9 + conf_thres: 0.7 + low_conf_thres: 0.1 + min_box_area: 0 + vertical_ratio: 0 # 1.6 for pedestrian diff --git a/configs/mot/bytetrack/detector/README_cn.md b/configs/mot/bytetrack/detector/README_cn.md index 7bdb095f177fa9365955649841a0a27eda571d7e..98b40bf61b43a170d2026f47d33d3ecbdb46e6da 100644 --- a/configs/mot/bytetrack/detector/README_cn.md +++ b/configs/mot/bytetrack/detector/README_cn.md @@ -12,23 +12,28 @@ | :-------------- | :------------- | :--------: | :---------: | :-----------: | :-----: | :------: | :-----: | | DarkNet-53 | YOLOv3 | 608X608 | 40e | ---- | 42.7 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams) | [配置文件](./yolov3_darknet53_40e_608x608_mot17half.yml) | | CSPResNet | PPYOLOe | 640x640 | 36e | ---- | 52.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams) | [配置文件](./ppyoloe_crn_l_36e_640x640_mot17half.yml) | +| CSPDarkNet | YOLOX-x(mix_mot_ch) | 800x1440 | 24e | ---- | 61.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_mot_ch.pdparams) | [配置文件](./yolox_x_24e_800x1440_mix_mot_ch.yml) | +| CSPDarkNet | YOLOX-x(mix_det) | 800x1440 | 24e | ---- | 65.4 | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./yolox_x_24e_800x1440_mix_det.yml) | **注意:** - - 以上模型均可采用**MOT17-half train**数据集训练,数据集可以从[此链接](https://dataset.bj.bcebos.com/mot/MOT17.zip)下载。 + - 以上模型除YOLOX外采用**MOT17-half train**数据集训练,数据集可以从[此链接](https://dataset.bj.bcebos.com/mot/MOT17.zip)下载。 - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集,而为了验证精度可以都用**MOT17-half val**数据集去评估,它是每个视频的后一半帧组成的,数据集可以从[此链接](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip)下载,并解压放在`dataset/mot/MOT17/images/`文件夹下。 + - YOLOX-x(mix_mot_ch)采用**mix_mot_ch**数据集,是MOT17、CrowdHuman组成的联合数据集;YOLOX-x(mix_det)采用**mix_det**数据集,是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集,数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation),最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。 - 行人跟踪请使用行人检测器结合行人ReID模型。车辆跟踪请使用车辆检测器结合车辆ReID模型。 - 用于ByteTrack跟踪时,这些模型的NMS阈值等后处理设置会与纯检测任务的设置不同。 ## 快速开始 -通过如下命令一键式启动训练和评估 +通过如下命令一键式启动评估、评估和导出 ```bash job_name=ppyoloe_crn_l_36e_640x640_mot17half config=configs/mot/bytetrack/detector/${job_name}.yml log_dir=log_dir/${job_name} # 1. training -python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp --fleet +python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp # 2. evaluation -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/${job_name}.pdparams +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=output/${job_name}/model_final.pdparams +# 3. export +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=output/${job_name}/model_final.pdparams ``` diff --git a/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml b/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml index 89654f059e603eda24002dfec844f450bd73e8ff..6c770e9bf85e953a30df43faf57c401518b7f6ad 100644 --- a/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml +++ b/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml @@ -7,6 +7,7 @@ weights: output/ppyoloe_crn_l_36e_640x640_mot17half/model_final log_iter: 20 snapshot_epoch: 2 + # schedule configuration for fine-tuning epoch: 36 LearningRate: @@ -16,7 +17,7 @@ LearningRate: max_epochs: 43 - !LinearWarmup start_factor: 0.001 - steps: 100 + epochs: 1 OptimizerBuilder: optimizer: @@ -26,9 +27,11 @@ OptimizerBuilder: factor: 0.0005 type: L2 + TrainReader: batch_size: 8 + # detector configuration architecture: YOLOv3 norm_type: sync_bn @@ -63,7 +66,6 @@ PPYOLOEHead: grid_cell_offset: 0.5 static_assigner_epoch: -1 # 100 use_varifocal_loss: True - eval_input_size: [640, 640] loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} static_assigner: name: ATSSAssigner diff --git a/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml b/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml new file mode 100644 index 0000000000000000000000000000000000000000..bd102a48d1013b9e6399411562b47e1e85e2c2ec --- /dev/null +++ b/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml @@ -0,0 +1,80 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../../../yolox/yolox_x_300e_coco.yml', + '../_base_/ht21.yml', +] +weights: output/yolox_x_24e_800x1440_ht21/model_final +log_iter: 20 +snapshot_epoch: 2 + +# schedule configuration for fine-tuning +epoch: 24 +LearningRate: + base_lr: 0.0005 # fintune + schedulers: + - !CosineDecay + max_epochs: 24 + min_lr_ratio: 0.05 + last_plateau_epochs: 4 + - !ExpWarmup + epochs: 1 + +OptimizerBuilder: + optimizer: + type: Momentum + momentum: 0.9 + use_nesterov: True + regularizer: + factor: 0.0005 + type: L2 + + +TrainReader: + batch_size: 4 + mosaic_epoch: 20 + +# detector configuration +architecture: YOLOX +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +norm_type: sync_bn +use_ema: True +ema_decay: 0.9999 +ema_decay_type: "exponential" +act: silu +find_unused_parameters: True +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 32] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. diff --git a/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml b/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml new file mode 100644 index 0000000000000000000000000000000000000000..2585e5a47ac0589f7d673803a5172b42f3b902bc --- /dev/null +++ b/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml @@ -0,0 +1,80 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../../../yolox/yolox_x_300e_coco.yml', + '../_base_/mix_det.yml', +] +weights: output/yolox_x_24e_800x1440_mix_det/model_final +log_iter: 20 +snapshot_epoch: 2 + +# schedule configuration for fine-tuning +epoch: 24 +LearningRate: + base_lr: 0.00075 # fintune + schedulers: + - !CosineDecay + max_epochs: 24 + min_lr_ratio: 0.05 + last_plateau_epochs: 4 + - !ExpWarmup + epochs: 1 + +OptimizerBuilder: + optimizer: + type: Momentum + momentum: 0.9 + use_nesterov: True + regularizer: + factor: 0.0005 + type: L2 + + +TrainReader: + batch_size: 6 + mosaic_epoch: 20 + +# detector configuration +architecture: YOLOX +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +norm_type: sync_bn +use_ema: True +ema_decay: 0.9999 +ema_decay_type: "exponential" +act: silu +find_unused_parameters: True +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 30] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. diff --git a/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml b/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml new file mode 100644 index 0000000000000000000000000000000000000000..34678d52b107f92b8c374f12af1b3834f16b9676 --- /dev/null +++ b/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml @@ -0,0 +1,80 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../../../yolox/yolox_x_300e_coco.yml', + '../_base_/mix_mot_ch.yml', +] +weights: output/yolox_x_24e_800x1440_mix_mot_ch/model_final +log_iter: 20 +snapshot_epoch: 2 + +# schedule configuration for fine-tuning +epoch: 24 +LearningRate: + base_lr: 0.00075 # fintune + schedulers: + - !CosineDecay + max_epochs: 24 + min_lr_ratio: 0.05 + last_plateau_epochs: 4 + - !ExpWarmup + epochs: 1 + +OptimizerBuilder: + optimizer: + type: Momentum + momentum: 0.9 + use_nesterov: True + regularizer: + factor: 0.0005 + type: L2 + + +TrainReader: + batch_size: 6 + mosaic_epoch: 20 + +# detector configuration +architecture: YOLOX +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +norm_type: sync_bn +use_ema: True +ema_decay: 0.9999 +ema_decay_type: "exponential" +act: silu +find_unused_parameters: True +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 30] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. diff --git a/configs/mot/deepsort/README_cn.md b/configs/mot/deepsort/README_cn.md index 8a957d7fad61624bd70f9670182f3d2ad15e5992..08bee2e1e4c173c426c608562a4bcd4334bcc5e7 100644 --- a/configs/mot/deepsort/README_cn.md +++ b/configs/mot/deepsort/README_cn.md @@ -6,7 +6,7 @@ - [简介](#简介) - [模型库](#模型库) - [快速开始](#快速开始) -- [适配其他检测器](适配其他检测器) +- [适配其他检测器](#适配其他检测器) - [引用](#引用) ## 简介 @@ -18,18 +18,18 @@ | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测结果或模型 | ReID模型 |配置文件 | | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----:| :-----: | :-----: | -| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) | +| ResNet-101 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) | | ResNet-101 | 1088x608 | 68.3 | 56.5 | 1722 | 17337 | 15890 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) | -| PPLCNet | 1088x608 | 72.2 | 59.5 | 1087 | 8034 | 21481 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) | +| PPLCNet | 1088x608 | 72.2 | 59.5 | 1087 | 8034 | 21481 | - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) | | PPLCNet | 1088x608 | 68.1 | 53.6 | 1979 | 17446 | 15766 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort_jde_yolov3_pplcnet.yml) | ### DeepSORT在MOT-16 Test Set上结果 | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 检测结果或模型 | ReID模型 |配置文件 | | :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |:-----: | -| ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) | +| ResNet-101 | 1088x608 | 64.1 | 53.0 | 1024 | 12457 | 51919 | - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) | | ResNet-101 | 1088x608 | 61.2 | 48.5 | 1799 | 25796 | 43232 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) | -| PPLCNet | 1088x608 | 64.0 | 51.3 | 1208 | 12697 | 51784 | - | [检测结果](https://dataset.bj.bcebos.com/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) | +| PPLCNet | 1088x608 | 64.0 | 51.3 | 1208 | 12697 | 51784 | - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) | | PPLCNet | 1088x608 | 61.1 | 48.8 | 2010 | 25401 | 43432 | - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort_jde_yolov3_pplcnet.yml) | @@ -41,8 +41,8 @@ | MIX | JDE YOLOv3 | PPLCNet | - | 66.3 | 62.1 | - |[配置文件](./deepsort_jde_yolov3_pplcnet.yml) | | MOT-17 half train | YOLOv3 | PPLCNet | 42.7 | 50.2 | 52.4 | - |[配置文件](./deepsort_yolov3_pplcnet.yml) | | MOT-17 half train | PPYOLOv2 | PPLCNet | 46.8 | 51.8 | 55.8 | - |[配置文件](./deepsort_ppyolov2_pplcnet.yml) | -| MOT-17 half train | PPYOLOe | PPLCNet | 52.9 | 56.7 | 60.5 | - |[配置文件](./deepsort_ppyoloe_pplcnet.yml) | -| MOT-17 half train | PPYOLOe | ResNet-50 | 52.9 | 56.7 | 64.6 | - |[配置文件](./deepsort_ppyoloe_resnet.yml) | +| MOT-17 half train | PPYOLOe | PPLCNet | 52.7 | 56.7 | 60.5 | - |[配置文件](./deepsort_ppyoloe_pplcnet.yml) | +| MOT-17 half train | PPYOLOe | ResNet-50 | 52.7 | 56.7 | 64.6 | - |[配置文件](./deepsort_ppyoloe_resnet.yml) | **注意:** 模型权重下载链接在配置文件中的```det_weights```和```reid_weights```,运行验证的命令即可自动下载。 @@ -60,7 +60,7 @@ det_results_dir ``` 对于MOT16数据集,可以下载PaddleDetection提供的一个经过匹配之后的检测框结果det_results_dir.zip并解压: ``` -wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip +wget https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip ``` 如果使用更强的检测模型,可以取得更好的结果。其中每个txt是每个视频中所有图片的检测结果,每行都描述一个边界框,格式如下: ``` @@ -72,7 +72,7 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip - `score`是目标框的得分 - `class_id`是目标框的类别,如果只有1类则是`0` -- **方式2**:同时加载检测模型和ReID模型,此处选用JDE版本的YOLOv3,具体配置见`configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml`。加载其他通用检测模型可参照`configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml`进行修改。 +- **方式2**:同时加载检测模型和ReID模型,此处选用JDE版本的YOLOv3,具体配置见`configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml`。加载其他通用检测模型可参照`configs/mot/deepsort/deepsort_yoloe_pplcnet.yml`进行修改。 ## 快速开始 @@ -80,7 +80,7 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip #### 1.1 评估检测效果 ```bash -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams ``` **注意:** @@ -89,9 +89,12 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/deepsort/detector/ppy #### 1.2 评估跟踪效果 **方式1**:加载检测结果文件和ReID模型,得到跟踪结果 ```bash -CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml --det_results_dir {your detection results} +# 下载PaddleDetection提供的MOT16数据集检测结果文件并解压,如需自己使用其他检测器生成请参照这个文件里的格式 +wget https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip + +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml --det_results_dir det_results_dir # 或者 -CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml --det_results_dir {your detection results} +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml --det_results_dir det_results_dir ``` **方式2**:加载行人检测模型和ReID模型,得到跟踪结果 @@ -115,11 +118,14 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort 使用单个GPU通过如下命令预测一个视频,并保存为视频 ```bash +# 下载demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + # 加载JDE YOLOv3行人检测模型和PCB Pyramid ReID模型,并保存为视频 -CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml --video_file={your video name}.mp4 --save_videos +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml --video_file=mot17_demo.mp4 --save_videos -# 或者加载PPYOLOv2行人检测模型和PPLCNet ReID模型,并保存为视频 -CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml --video_file={your video name}.mp4 --scaled=True --save_videos +# 或者加载PPYOLOE行人检测模型和PPLCNet ReID模型,并保存为视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos ``` **注意:** @@ -132,33 +138,34 @@ CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsor Step 1:导出检测模型 ```bash # 导出JDE YOLOv3行人检测模型 -CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams +CUDA_VISIBLE_DEVICES=0 python3.7 tools/export_model.py -c configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams -# 或导出PPYOLOv2行人检测模型 -CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams +# 或导出PPYOLOE行人检测模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams ``` Step 2:导出ReID模型 ```bash # 导出PCB Pyramid ReID模型 CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams + # 或者导出PPLCNet ReID模型 CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams + +# 或者导出ResNet ReID模型 +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_resnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_resnet.pdparams ``` ### 4. 用导出的模型基于Python去预测 ```bash -# 用导出JDE YOLOv3行人检测模型和PCB Pyramid ReID模型 -python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/jde_yolov3_darknet53_30e_1088x608_mix/ --reid_model_dir=output_inference/deepsort_pcb_pyramid_r101/ --video_file={your video name}.mp4 --device=GPU --save_mot_txts - -# 或用导出的PPYOLOv2行人检测模型和PPLCNet ReID模型 -python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyolov2_r50vd_dcn_365e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts +# 用导出的PPYOLOE行人检测模型和PPLCNet ReID模型 +python3.7 deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts --threshold=0.5 ``` **注意:** - - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 运行前需要先改动`deploy/pptracking/python/tracker_config.yml`里的tracker为`DeepSORTTracker`。 + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`表示对每个视频保存一个txt,或`--save_images`表示保存跟踪结果可视化图片。 - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 - - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 ## 适配其他检测器 @@ -184,7 +191,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort ``` #### 2.加载检测模型和ReID模型去推理: ``` -CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_xxx_yyy.yml --video_file={your video name}.mp4 --scaled=True --save_videos +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_xxx_yyy.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos ``` #### 3.导出检测模型和ReID模型: ```bash @@ -195,7 +202,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid ``` #### 4.使用导出的检测模型和ReID模型去部署: ``` -python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/xxx./ --reid_model_dir=output_inference/deepsort_yyy/ --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/xxx./ --reid_model_dir=output_inference/deepsort_yyy/ --video_file=mot17_demo.mp4 --device=GPU --scaled=True --save_mot_txts ``` **注意:** - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 diff --git a/configs/mot/deepsort/_base_/mot17.yml b/configs/mot/deepsort/_base_/mot17.yml index 2efa55546026168c39396c4d51a71428e19a0638..faf47f622d1c2847a9686dfa8d7e48a49c05436c 100644 --- a/configs/mot/deepsort/_base_/mot17.yml +++ b/configs/mot/deepsort/_base_/mot17.yml @@ -17,6 +17,7 @@ EvalDataset: TestDataset: !ImageFolder + dataset_dir: dataset/mot/MOT17 anno_path: annotations/val_half.json diff --git a/configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml b/configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml index 972a1d16975a587a162ad5354b784dffdd473e60..0af80a7d899f02ac4b66c5191b2616ed1db1aa8e 100644 --- a/configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml +++ b/configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml @@ -92,7 +92,6 @@ PPYOLOEHead: grid_cell_offset: 0.5 static_assigner_epoch: -1 # 100 use_varifocal_loss: True - eval_input_size: [640, 640] loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} static_assigner: name: ATSSAssigner diff --git a/configs/mot/deepsort/deepsort_ppyoloe_resnet.yml b/configs/mot/deepsort/deepsort_ppyoloe_resnet.yml index cde2cf23cf4ea81ce67f7dbd08b4fe1fc4b77e15..d9692304b055040bb22c49a2f90e05e4e7ba53eb 100644 --- a/configs/mot/deepsort/deepsort_ppyoloe_resnet.yml +++ b/configs/mot/deepsort/deepsort_ppyoloe_resnet.yml @@ -91,7 +91,6 @@ PPYOLOEHead: grid_cell_offset: 0.5 static_assigner_epoch: -1 # 100 use_varifocal_loss: True - eval_input_size: [640, 640] loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} static_assigner: name: ATSSAssigner diff --git a/configs/mot/deepsort/detector/README_cn.md b/configs/mot/deepsort/detector/README_cn.md index 1a984b5c358f2c12e77e3d12396bba22bb46793c..6ebe7de7949d1db3d4c4f72db5ad8147f12e1f3d 100644 --- a/configs/mot/deepsort/detector/README_cn.md +++ b/configs/mot/deepsort/detector/README_cn.md @@ -26,11 +26,11 @@ 通过如下命令一键式启动训练和评估 ```bash -job_name=ppyolov2_r50vd_dcn_365e_640x640_mot17half +job_name=ppyoloe_crn_l_36e_640x640_mot17half config=configs/mot/deepsort/detector/${job_name}.yml log_dir=log_dir/${job_name} # 1. training -python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp --fleet +python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp # 2. evaluation -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/${job_name}.pdparams +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/${job_name}.pdparams ``` diff --git a/configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml b/configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml index 4852d6b7e21924156826575dcc10a00cfecb5f80..a0501222c9f35d657826fb525e54bd7f4f663ae4 100644 --- a/configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml +++ b/configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml @@ -6,6 +6,7 @@ weights: output/ppyoloe_crn_l_36e_640x640_mot17half/model_final log_iter: 20 snapshot_epoch: 2 + # schedule configuration for fine-tuning epoch: 36 LearningRate: @@ -15,7 +16,7 @@ LearningRate: max_epochs: 43 - !LinearWarmup start_factor: 0.001 - steps: 100 + epochs: 1 OptimizerBuilder: optimizer: @@ -25,9 +26,11 @@ OptimizerBuilder: factor: 0.0005 type: L2 + TrainReader: batch_size: 8 + # detector configuration architecture: YOLOv3 norm_type: sync_bn @@ -62,7 +65,6 @@ PPYOLOEHead: grid_cell_offset: 0.5 static_assigner_epoch: -1 # 100 use_varifocal_loss: True - eval_input_size: [640, 640] loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} static_assigner: name: ATSSAssigner diff --git a/configs/mot/fairmot/README.md b/configs/mot/fairmot/README.md index 25441f21cba40a5e7b26dbbab7627e8bb7097b2f..fbb9daa04e05b1f9848c03ef62f790ebeeee167e 100644 --- a/configs/mot/fairmot/README.md +++ b/configs/mot/fairmot/README.md @@ -86,7 +86,7 @@ PP-tracking provides an AI studio public project tutorial. Please refer to this ### Results on MOT-17 Half Set | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | -| DLA-34 | 1088x608 | 69.1 | 72.8 | 299 | 1957 | 14412 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bytetracker.pdparams) | [config](./fairmot_dla34_30e_1088x608.yml) | +| DLA-34 | 1088x608 | 69.1 | 72.8 | 299 | 1957 | 14412 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](./fairmot_dla34_30e_1088x608.yml) | | DLA-34 + BYTETracker| 1088x608 | 70.3 | 73.2 | 234 | 2176 | 13598 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bytetracker.pdparams) | [config](./fairmot_dla34_30e_1088x608_bytetracker.yml) | **Notes:** diff --git a/configs/mot/fairmot/README_cn.md b/configs/mot/fairmot/README_cn.md index bb22459e858c36414a13c142e38184df3899b7b4..dd5a27874e6c7439222ca9f8648099ca25bf9863 100644 --- a/configs/mot/fairmot/README_cn.md +++ b/configs/mot/fairmot/README_cn.md @@ -82,7 +82,7 @@ PP-Tracking 提供了AI Studio公开项目案例,教程请参考[PP-Tracking ### 在MOT-17 Half上结果 | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | -| DLA-34 | 1088x608 | 69.1 | 72.8 | 299 | 1957 | 14412 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bytetracker.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608.yml) | +| DLA-34 | 1088x608 | 69.1 | 72.8 | 299 | 1957 | 14412 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608.yml) | | DLA-34 + BYTETracker| 1088x608 | 70.3 | 73.2 | 234 | 2176 | 13598 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bytetracker.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_bytetracker.yml) | diff --git a/configs/mot/fairmot/fairmot_dla34_30e_1088x608_bytetracker.yml b/configs/mot/fairmot/fairmot_dla34_30e_1088x608_bytetracker.yml index 7b668c5687584a65e6895efe26454ca4418c7226..a0ad44a0f9a6ef12d3904f1d78ede896f917a90b 100644 --- a/configs/mot/fairmot/fairmot_dla34_30e_1088x608_bytetracker.yml +++ b/configs/mot/fairmot/fairmot_dla34_30e_1088x608_bytetracker.yml @@ -14,8 +14,18 @@ TrainDataset: image_lists: ['mot17.half', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train'] data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide'] +# for MOT evaluation +# If you want to change the MOT evaluation dataset, please modify 'data_root' +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT + JDETracker: use_byte: True match_thres: 0.8 conf_thres: 0.4 low_conf_thres: 0.2 + min_box_area: 200 + vertical_ratio: 1.6 # for pedestrian diff --git a/configs/mot/headtracking21/README_cn.md b/configs/mot/headtracking21/README_cn.md index eafd87d7cbae64ea46bd31682687b8c6b7f7df8a..092dfac6c240949f93d5b4cd75af0ba618e40b55 100644 --- a/configs/mot/headtracking21/README_cn.md +++ b/configs/mot/headtracking21/README_cn.md @@ -11,21 +11,22 @@ ## 模型库 ### FairMOT在HT-21 Training Set上结果 -| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | +| 模型 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | | :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: | -| DLA-34 | 1088x608 | 64.7 | 69.0 | 8533 | 148817 | 234970 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) | -| HRNetv2-W18 | 1088x608 | 57.2 | 58.4 | 30950 | 188260 | 256580 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608_headtracking21.yml) | - +| FairMOT DLA-34 | 1088x608 | 64.7 | 69.0 | 8533 | 148817 | 234970 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) | +| ByteTrack-x | 1440x800 | 64.1 | 63.4 | 4191 | 185162 | 210240 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](../bytetrack/bytetrack_yolox_ht21.yml) | ### FairMOT在HT-21 Test Set上结果 | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: | -| DLA-34 | 1088x608 | 60.8 | 62.8 | 12781 | 118109 | 198896 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) | -| HRNetv2-W18 | 1088x608 | 41.2 | 47.1 | 48809 | 241683 | 204346 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) | +| FairMOT DLA-34 | 1088x608 | 60.8 | 62.8 | 12781 | 118109 | 198896 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) | +| ByteTrack-x | 1440x800 | 72.6 | 61.8 | 5163 | 71235 | 154139 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](../bytetrack/bytetrack_yolox_ht21.yml) | **注意:** - - FairMOT DLA-34使用2个GPU进行训练,每个GPU上batch size为6,训练30个epoch。目前MOTA精度位于MOT官网[Head Tracking 21](https://motchallenge.net/results/Head_Tracking_21)榜单榜首。 - - FairMOT HRNetv2-W18使用4个GPU进行训练,每个GPU上batch size为8,训练30个epoch。 + - FairMOT DLA-34使用2个GPU进行训练,每个GPU上batch size为6,训练30个epoch。 + - ByteTrack使用YOLOX-x做检测器,使用8个GPU进行训练,每个GPU上batch size为8,训练30个epoch,具体细节参照[bytetrack](../bytetrack/)。 + - 此处提供PaddleDetection团队整理后的[下载链接](https://bj.bcebos.com/v1/paddledet/data/mot/HT21.zip),下载后需解压放到`dataset/mot/`目录下,HT-21 Test集的结果需要交到[官网](https://motchallenge.net)评测。 + ## 快速开始 diff --git a/configs/mot/jde/_base_/jde_darknet53.yml b/configs/mot/jde/_base_/jde_darknet53.yml index 73faa52f662e7db24ef40c25c029561225d1a3b8..f5370fc6affa10f33af04185c48d61d5a2f06d98 100644 --- a/configs/mot/jde/_base_/jde_darknet53.yml +++ b/configs/mot/jde/_base_/jde_darknet53.yml @@ -53,4 +53,4 @@ JDETracker: det_thresh: 0.3 track_buffer: 30 min_box_area: 200 - motion: KalmanFilter + vertical_ratio: 1.6 # for pedestrian diff --git a/configs/mot/mcfairmot/README.md b/configs/mot/mcfairmot/README.md index 555aee9fecd5a03c91c5fd3500e5f9b5c6d38e3c..4e595f3900fa89e0789bf98474a2ea40c1f2c633 100644 --- a/configs/mot/mcfairmot/README.md +++ b/configs/mot/mcfairmot/README.md @@ -48,7 +48,7 @@ PP-tracking provides an AI studio public project tutorial. Please refer to this | Model | Compression Strategy | Prediction Delay(T4) |Prediction Delay(V100)| Model Configuration File |Compression Algorithm Configuration File | | :--------------| :------- | :------: | :----: | :----: | :----: | | DLA-34 | baseline | 41.3 | 21.9 |[Configuration File](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)| - | -| DLA-34 | off-line quantization | 37.8 | 21.2 |[Configuration File](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)|[Configuration File](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/configs/slim/post_quant/mcfairmot_ptq.yml)| +| DLA-34 | off-line quantization | 37.8 | 21.2 |[Configuration File](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)|[Configuration File](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/slim/post_quant/mcfairmot_ptq.yml)| ## Getting Start @@ -122,8 +122,8 @@ CUDA_VISIBLE_DEVICES=0 python3.7 tools/post_quant.py -c configs/mot/mcfairmot/mc @ARTICLE{9573394, author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, - journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, - title={Detection and Tracking Meet Drones Challenge}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, year={2021}, volume={}, number={}, diff --git a/configs/mot/mcfairmot/README_cn.md b/configs/mot/mcfairmot/README_cn.md index 184045584a455cc8b2443a9d5541e12732e625a9..d054a04314977f397e840a4d778770ba9b50d366 100644 --- a/configs/mot/mcfairmot/README_cn.md +++ b/configs/mot/mcfairmot/README_cn.md @@ -47,7 +47,7 @@ PP-Tracking 提供了AI Studio公开项目案例,教程请参考[PP-Tracking | 骨干网络 | 压缩策略 | 预测时延(T4) |预测时延(V100)| 配置文件 |压缩算法配置文件 | | :--------------| :------- | :------: | :----: | :----: | :----: | | DLA-34 | baseline | 41.3 | 21.9 |[配置文件](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)| - | -| DLA-34 | 离线量化 | 37.8 | 21.2 |[配置文件](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/configs/slim/post_quant/mcfairmot_ptq.yml)| +| DLA-34 | 离线量化 | 37.8 | 21.2 |[配置文件](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/slim/post_quant/mcfairmot_ptq.yml)| ## 快速开始 @@ -119,8 +119,8 @@ CUDA_VISIBLE_DEVICES=0 python3.7 tools/post_quant.py -c configs/mot/mcfairmot/mc @ARTICLE{9573394, author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, - journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, - title={Detection and Tracking Meet Drones Challenge}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, year={2021}, volume={}, number={}, diff --git a/configs/mot/ocsort/README.md b/configs/mot/ocsort/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1d2d6144a2b4a0360854c1fbd8557d9158ac3608 --- /dev/null +++ b/configs/mot/ocsort/README.md @@ -0,0 +1,101 @@ +简体中文 | [English](README.md) + +# OC_SORT (Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking) + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [快速开始](#快速开始) +- [引用](#引用) + +## 简介 +[OC_SORT](https://arxiv.org/abs/2203.14360)(Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking)。此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异,请自行根据需求进行适配。 + +## 模型库 + +### OC_SORT在MOT-17 half Val Set上结果 + +| 检测训练数据集 | 检测器 | 输入尺度 | ReID | 检测mAP | MOTA | IDF1 | FPS | 配置文件 | +| :-------- | :----- | :----: | :----:|:------: | :----: |:-----: |:----:|:----: | +| MOT-17 half train | PP-YOLOE-l | 640x640 | - | 52.9 | 50.1 | 62.6 | - |[配置文件](./bytetrack_ppyoloe.yml) | +| **mot17_ch** | YOLOX-x | 800x1440| - | 61.9 | 75.5 | 77.0 | - |[配置文件](./ocsort_yolox.yml) | + +**注意:** + - 模型权重下载链接在配置文件中的```det_weights```和```reid_weights```,运行验证的命令即可自动下载,OC_SORT默认不需要```reid_weights```权重。 + - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集,而为了验证精度可以都用**MOT17-half val**数据集去评估,它是每个视频的后一半帧组成的,数据集可以从[此链接](https://dataset.bj.bcebos.com/mot/MOT17.zip)下载,并解压放在`dataset/mot/`文件夹下。 + - **mix_mot_ch**数据集,是MOT17、CrowdHuman组成的联合数据集,**mix_det**是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集,数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation),最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。 + - OC_SORT的训练是单独的检测器训练MOT数据集,推理是组装跟踪器去评估MOT指标,单独的检测模型也可以评估检测指标。 + - OC_SORT的导出部署,是单独导出检测模型,再组装跟踪器运行的,参照[PP-Tracking](../../../deploy/pptracking/python)。 + - OC_SORT是PP-Human和PP-Vehicle等Pipeline分析项目跟踪方向的主要方案,具体使用参照[Pipeline](../../../deploy/pipeline)和[MOT](../../../deploy/pipeline/docs/tutorials/pphuman_mot.md)。 + + +## 快速开始 + +### 1. 训练 +通过如下命令一键式启动训练和评估 +```bash +python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp +``` + +### 2. 评估 +#### 2.1 评估检测效果 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml +``` + +**注意:** + - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。 + +#### 2.2 评估跟踪效果 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/ocsort/ocsort_ppyoloe.yml --scaled=True +# 或者 +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/ocsort/ocsort_yolox.yml --scaled=True +``` +**注意:** + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE YOLOv3则为False,如果使用通用检测模型则为True, 默认值是False。 + - 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。 + +### 3. 预测 + +使用单个GPU通过如下命令预测一个视频,并保存为视频 + +```bash +# 下载demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 + +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/ocsort/ocsort_yolox.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos +``` + +**注意:** + - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。 + - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。 + + +### 4. 导出预测模型 + +Step 1:导出检测模型 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams +``` + +### 5. 用导出的模型基于Python去预测 + +```bash +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/yolox_x_24e_800x1440_mix_det/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts +``` +**注意:** + - 运行前需要手动修改`tracker_config.yml`的跟踪器类型为`type: OCSORTTracker`。 + - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。 + - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 + + +## 引用 +``` +@article{cao2022observation, + title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking}, + author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris}, + journal={arXiv preprint arXiv:2203.14360}, + year={2022} +} +``` diff --git a/configs/mot/ocsort/ocsort_ppyoloe.yml b/configs/mot/ocsort/ocsort_ppyoloe.yml new file mode 100644 index 0000000000000000000000000000000000000000..0d81b2d1c0b0def8cd4458a96c6a352e04c16456 --- /dev/null +++ b/configs/mot/ocsort/ocsort_ppyoloe.yml @@ -0,0 +1,75 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml', + '../bytetrack/_base_/mot17.yml', + '../bytetrack/_base_/ppyoloe_mot_reader_640x640.yml' +] +weights: output/ocsort_ppyoloe/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode, set 'COCO' can be training mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams +ByteTrack: + detector: YOLOv3 # PPYOLOe version + reid: None + tracker: OCSORTTracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +reid_weights: None + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: -1 # 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.1 # 0.01 in original detector + nms_threshold: 0.4 # 0.6 in original detector + + +OCSORTTracker: + det_thresh: 0.4 # 0.6 in yolox ocsort + max_age: 30 + min_hits: 3 + iou_threshold: 0.3 + delta_t: 3 + inertia: 0.2 + vertical_ratio: 0 + min_box_area: 0 + use_byte: False + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/configs/mot/ocsort/ocsort_yolox.yml b/configs/mot/ocsort/ocsort_yolox.yml new file mode 100644 index 0000000000000000000000000000000000000000..4f05e2d04ce1d83c98e54b35d21217915c5ee8f4 --- /dev/null +++ b/configs/mot/ocsort/ocsort_yolox.yml @@ -0,0 +1,83 @@ +# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT. +_BASE_: [ + '../bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml', + '../bytetrack/_base_/mix_det.yml', + '../bytetrack/_base_/yolox_mot_reader_800x1440.yml' +] +weights: output/ocsort_yolox/model_final +log_iter: 20 +snapshot_epoch: 2 + +metric: MOT # eval/infer mode +num_classes: 1 + +architecture: ByteTrack +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams +ByteTrack: + detector: YOLOX + reid: None + tracker: OCSORTTracker +det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_mot_ch.pdparams +reid_weights: None + +depth_mult: 1.33 +width_mult: 1.25 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + input_size: [800, 1440] + size_stride: 32 + size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8 + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +# Tracking requires higher quality boxes, so NMS score_threshold will be higher +YOLOXHead: + l1_epoch: 20 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.1 + nms_threshold: 0.7 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. + + +OCSORTTracker: + det_thresh: 0.6 + max_age: 30 + min_hits: 3 + iou_threshold: 0.3 + delta_t: 3 + inertia: 0.2 + vertical_ratio: 1.6 + min_box_area: 100 + use_byte: False + + +# MOTDataset for MOT evaluation and inference +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT17/images/half + keep_ori_im: True # set as True in DeepSORT and ByteTrack + +TestMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + keep_ori_im: True # set True if save visualization images or video diff --git a/configs/mot/pedestrian/README_cn.md b/configs/mot/pedestrian/README_cn.md index eca2963c51872e000b7b9ab0e02770b1fc98b60a..768733db537c5f752bbb56198bad196c68b28602 100644 --- a/configs/mot/pedestrian/README_cn.md +++ b/configs/mot/pedestrian/README_cn.md @@ -18,7 +18,7 @@ | :-------------| :-------- | :------- | :----: | :----: | :----: | :-----: |:------: | | PathTrack | DLA-34 | 1088x608 | 44.9 | 59.3 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_pathtrack.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_pathtrack.yml) | | VisDrone | DLA-34 | 1088x608 | 49.2 | 63.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_visdrone_pedestrian.yml) | -| VisDrone | HRNetv2-W18| 1088x608 | 40.5 | 54.7 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.yml) | +| VisDrone | HRNetv2-W18| 1088x608 | 40.5 | 54.7 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_pedestrian.yml) | | VisDrone | HRNetv2-W18| 864x480 | 38.6 | 50.9 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone_pedestrian.yml) | | VisDrone | HRNetv2-W18| 576x320 | 30.6 | 47.2 | - | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone_pedestrian.yml) | @@ -124,8 +124,8 @@ month={Oct},} @ARTICLE{9573394, author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, - journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, - title={Detection and Tracking Meet Drones Challenge}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, year={2021}, volume={}, number={}, diff --git a/configs/picodet/README.md b/configs/picodet/README.md index a226cc9a95e91b3e28635023996e201eed08e089..b6562428308b83a4aa49570a7a983a081beace7c 100644 --- a/configs/picodet/README.md +++ b/configs/picodet/README.md @@ -1,60 +1,63 @@ -English | [简体中文](README_cn.md) +简体中文 | [English](README_en.md) # PP-PicoDet ![](../../docs/images/picedet_demo.jpeg) -## News +## 最新动态 -- Released a new series of PP-PicoDet models: **(2022.03.20)** - - (1) It was used TAL/Task-aligned-Head and optimized PAN, which greatly improved the accuracy; - - (2) Moreover optimized CPU prediction speed, and the training speed is greatly improved; - - (3) The export model includes post-processing, and the prediction directly outputs the result, without secondary development, and the migration cost is lower. +- 发布全新系列PP-PicoDet模型:**(2022.03.20)** + - (1)引入TAL及ETA Head,优化PAN等结构,精度提升2个点以上; + - (2)优化CPU端预测速度,同时训练速度提升一倍; + - (3)导出模型将后处理包含在网络中,预测直接输出box结果,无需二次开发,迁移成本更低,端到端预测速度提升10%-20%。 -### Legacy Model +## 历史版本模型 -- Please refer to: [PicoDet 2021.10版本](./legacy_model/) +- 详情请参考:[PicoDet 2021.10版本](./legacy_model/) -## Introduction +## 简介 -We developed a series of lightweight models, named `PP-PicoDet`. Because of the excellent performance, our models are very suitable for deployment on mobile or CPU. For more details, please refer to our [report on arXiv](https://arxiv.org/abs/2111.00902). +PaddleDetection中提出了全新的轻量级系列模型`PP-PicoDet`,在移动端具有卓越的性能,成为全新SOTA轻量级模型。详细的技术细节可以参考我们的[arXiv技术报告](https://arxiv.org/abs/2111.00902)。 -- 🌟 Higher mAP: the **first** object detectors that surpass mAP(0.5:0.95) **30+** within 1M parameters when the input size is 416. -- 🚀 Faster latency: 150FPS on mobile ARM CPU. -- 😊 Deploy friendly: support PaddleLite/MNN/NCNN/OpenVINO and provide C++/Python/Android implementation. -- 😍 Advanced algorithm: use the most advanced algorithms and offer innovation, such as ESNet, CSP-PAN, SimOTA with VFL, etc. +PP-PicoDet模型有如下特点: + +- 🌟 更高的mAP: 第一个在1M参数量之内`mAP(0.5:0.95)`超越**30+**(输入416像素时)。 +- 🚀 更快的预测速度: 网络预测在ARM CPU下可达150FPS。 +- 😊 部署友好: 支持PaddleLite/MNN/NCNN/OpenVINO等预测库,支持转出ONNX,提供了C++/Python/Android的demo。 +- 😍 先进的算法: 我们在现有SOTA算法中进行了创新, 包括:ESNet, CSP-PAN, SimOTA等等。
    -## Benchmark +## 基线 + +| 模型 | 输入尺寸 | mAPval
    0.5:0.95 | mAPval
    0.5 | 参数量
    (M) | FLOPS
    (G) | 预测时延[CPU](#latency)
    (ms) | 预测时延[Lite](#latency)
    (ms) | 权重下载 | 配置文件 | 导出模型 | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------: | :--------------------------------------- | :--------------------------------------- | +| PicoDet-XS | 320*320 | 23.5 | 36.1 | 0.70 | 0.67 | 3.9ms | 7.81ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_xs_320_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_320_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-XS | 416*416 | 26.2 | 39.3 | 0.70 | 1.13 | 6.1ms | 12.38ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_xs_416_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_416_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-S | 320*320 | 29.1 | 43.4 | 1.18 | 0.97 | 4.8ms | 9.56ms | [model](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_320_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_320_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-S | 416*416 | 32.5 | 47.6 | 1.18 | 1.65 | 6.6ms | 15.20ms | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_416_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-M | 320*320 | 34.4 | 50.0 | 3.46 | 2.57 | 8.2ms | 17.68ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_320_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_320_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-M | 416*416 | 37.5 | 53.4 | 3.46 | 4.34 | 12.7ms | 28.39ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_416_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_416_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 320*320 | 36.1 | 52.0 | 5.80 | 4.20 | 11.5ms | 25.21ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_320_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_320_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 416*416 | 39.4 | 55.7 | 5.80 | 7.10 | 20.7ms | 42.23ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_416_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_416_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 640*640 | 42.6 | 59.2 | 5.80 | 16.81 | 62.5ms | 108.1ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_640_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_640_coco_lcnet.yml) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_640_coco_lcnet.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_640_coco_lcnet_non_postprocess.tar) | -| Model | Input size | mAPval
    0.5:0.95 | mAPval
    0.5 | Params
    (M) | FLOPS
    (G) | Latency[CPU](#latency)
    (ms) | Latency[Lite](#latency)
    (ms) | Download | Config | -| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------: | :--------------------------------------- | -| PicoDet-XS | 320*320 | 23.5 | 36.1 | 0.70 | 0.67 | 10.9ms | 7.81ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_xs_320_coco_lcnet.yml) | -| PicoDet-XS | 416*416 | 26.2 | 39.3 | 0.70 | 1.13 | 15.4ms | 12.38ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_xs_416_coco_lcnet.yml) | -| PicoDet-S | 320*320 | 29.1 | 43.4 | 1.18 | 0.97 | 12.6ms | 9.56ms | [model](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_320_coco_lcnet.yml) | -| PicoDet-S | 416*416 | 32.5 | 47.6 | 1.18 | 1.65 | 17.2ms | 15.20 | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_416_coco_lcnet.yml) | -| PicoDet-M | 320*320 | 34.4 | 50.0 | 3.46 | 2.57 | 14.5ms | 17.68ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_320_coco_lcnet.yml) | -| PicoDet-M | 416*416 | 37.5 | 53.4 | 3.46 | 4.34 | 19.5ms | 28.39ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_416_coco_lcnet.yml) | -| PicoDet-L | 320*320 | 36.1 | 52.0 | 5.80 | 4.20 | 18.3ms | 25.21ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_320_coco_lcnet.yml) | -| PicoDet-L | 416*416 | 39.4 | 55.7 | 5.80 | 7.10 | 22.1ms | 42.23ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_416_coco_lcnet.yml) | -| PicoDet-L | 640*640 | 42.3 | 59.2 | 5.80 | 16.81 | 43.1ms | 108.1ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_640_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_640_coco_lcnet.yml) |
    -Table Notes: +注意事项: -- Latency: All our models test on `Intel-Xeon-Gold-6148` CPU with MKLDNN by 10 threads and `Qualcomm Snapdragon 865(4xA77+4xA55)` with 4 threads by arm8 and with FP16. In the above table, test CPU latency on Paddle-Inference and testing Mobile latency with `Lite`->[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite). -- PicoDet is trained on COCO train2017 dataset and evaluated on COCO val2017. And PicoDet used 4 GPUs for training and all checkpoints are trained with default settings and hyperparameters. -- Benchmark test: When testing the speed benchmark, the post-processing is not included in the exported model, you need to set `-o export.benchmark=True` or manually modify [runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/runtime.yml#L12). +- 时延测试: 我们所有的模型都在`英特尔酷睿i7 10750H`的CPU 和`骁龙865(4xA77+4xA55)`的ARM CPU上测试(4线程,FP16预测)。上面表格中标有`CPU`的是使用OpenVINO测试,标有`Lite`的是使用[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite)进行测试。 +- PicoDet在COCO train2017上训练,并且在COCO val2017上进行验证。使用4卡GPU训练,并且上表所有的预训练模型都是通过发布的默认配置训练得到。 +- Benchmark测试:测试速度benchmark性能时,导出模型后处理不包含在网络中,需要设置`-o export.benchmark=True` 或手动修改[runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/runtime.yml#L12)。
    -#### Benchmark of Other Models +#### 其他模型的基线 -| Model | Input size | mAPval
    0.5:0.95 | mAPval
    0.5 | Params
    (M) | FLOPS
    (G) | Latency[NCNN](#latency)
    (ms) | +| 模型 | 输入尺寸 | mAPval
    0.5:0.95 | mAPval
    0.5 | 参数量
    (M) | FLOPS
    (G) | 预测时延[NCNN](#latency)
    (ms) | | :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | | YOLOv3-Tiny | 416*416 | 16.6 | 33.1 | 8.86 | 5.62 | 25.42 | | YOLOv4-Tiny | 416*416 | 21.7 | 40.2 | 6.06 | 6.96 | 23.69 | @@ -68,112 +71,118 @@ We developed a series of lightweight models, named `PP-PicoDet`. Because of the | YOLOv5n | 640*640 | 28.4 | 46.0 | 1.9 | 4.5 | 40.35 | | YOLOv5s | 640*640 | 37.2 | 56.0 | 7.2 | 16.5 | 78.05 | -- Testing Mobile latency with code: [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark). +- ARM测试的benchmark脚本来自: [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark)。 -## Quick Start +## 快速开始
    -Requirements: +依赖包: -- PaddlePaddle >= 2.2.1 +- PaddlePaddle == 2.2.2
    -Installation +安装 -- [Installation guide](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL.md) -- [Prepare dataset](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/PrepareDataSet_en.md) +- [安装指导文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL.md) +- [准备数据文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/PrepareDataSet_en.md)
    -Training and Evaluation +训练&评估 -- Training model on single-GPU: +- 单卡GPU上训练: ```shell # training on single-GPU export CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --eval ``` -If the GPU is out of memory during training, reduce the batch_size in TrainReader, and reduce the base_lr in LearningRate proportionally. -- Training model on multi-GPU: +**注意:**如果训练时显存out memory,将TrainReader中batch_size调小,同时LearningRate中base_lr等比例减小。同时我们发布的config均由4卡训练得到,如果改变GPU卡数为1,那么base_lr需要减小4倍。 + +- 多卡GPU上训练: ```shell # training on multi-GPU -export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --eval +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --eval ``` -- Evaluation: +**注意:**PicoDet所有模型均由4卡GPU训练得到,如果改变训练GPU卡数,需要按线性比例缩放学习率base_lr。 + +- 评估: ```shell python tools/eval.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams ``` -- Infer: +- 测试: ```shell python tools/infer.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams ``` -Detail also can refer to [Quick start guide](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED.md). +详情请参考[快速开始文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED.md).
    -## Deployment +## 部署 -### Export and Convert Model +### 导出及转换模型 -
    -1. Export model (click to expand) +
    +1. 导出模型 ```shell cd PaddleDetection python tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams \ - --output_dir=inference_model + --output_dir=output_inference ``` +- 如无需导出后处理,请指定:`-o export.benchmark=True`(如果-o已出现过,此处删掉-o)或者手动修改[runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/runtime.yml) 中相应字段。 +- 如无需导出NMS,请指定:`-o export.nms=False`或者手动修改[runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/runtime.yml) 中相应字段。 许多导出至ONNX场景只支持单输入及固定shape输出,所以如果导出至ONNX,推荐不导出NMS。 +
    -2. Convert to PaddleLite (click to expand) +2. 转换模型至Paddle Lite (点击展开) -- Install Paddlelite>=2.10: +- 安装Paddlelite>=2.10: ```shell pip install paddlelite ``` -- Convert model: +- 转换模型至Paddle Lite格式: ```shell # FP32 -paddle_lite_opt --model_dir=inference_model/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp32 +paddle_lite_opt --model_dir=output_inference/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp32 # FP16 -paddle_lite_opt --model_dir=inference_model/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp16 --enable_fp16=true +paddle_lite_opt --model_dir=output_inference/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp16 --enable_fp16=true ```
    -3. Convert to ONNX (click to expand) +3. 转换模型至ONNX (点击展开) -- Install [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) >= 0.7 and ONNX > 1.10.1, for details, please refer to [Tutorials of Export ONNX Model](../../deploy/EXPORT_ONNX_MODEL.md) +- 安装[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) >= 0.7 并且 ONNX > 1.10.1, 细节请参考[导出ONNX模型教程](../../deploy/EXPORT_ONNX_MODEL.md) ```shell pip install onnx -pip install paddle2onnx +pip install paddle2onnx==0.9.2 ``` -- Convert model: +- 转换模型: ```shell paddle2onnx --model_dir output_inference/picodet_s_320_coco_lcnet/ \ @@ -183,123 +192,117 @@ paddle2onnx --model_dir output_inference/picodet_s_320_coco_lcnet/ \ --save_file picodet_s_320_coco.onnx ``` -- Simplify ONNX model: use onnx-simplifier to simplify onnx model. +- 简化ONNX模型: 使用`onnx-simplifier`库来简化ONNX模型。 - - Install onnx-simplifier >= 0.3.6: + - 安装 onnxsim >= 0.4.1: ```shell - pip install onnx-simplifier + pip install onnxsim ``` - - simplify onnx model: + - 简化ONNX模型: ```shell - python -m onnxsim picodet_s_320_coco.onnx picodet_s_processed.onnx + onnxsim picodet_s_320_coco.onnx picodet_s_processed.onnx ```
    -- Deploy models +- 部署用的模型 -| Model | Input size | ONNX | Paddle Lite(fp32) | Paddle Lite(fp16) | +| 模型 | 输入尺寸 | ONNX | Paddle Lite(fp32) | Paddle Lite(fp16) | | :-------- | :--------: | :---------------------: | :----------------: | :----------------: | -| PicoDet-S | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_fp16.tar) | -| PicoDet-S | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_fp16.tar) | -| PicoDet-M | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_fp16.tar) | -| PicoDet-M | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_fp16.tar) | -| PicoDet-L | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_fp16.tar) | -| PicoDet-L | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_fp16.tar) | -| PicoDet-L | 640*640 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_fp16.tar) | -| PicoDet-Shufflenetv2 1x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_shufflenetv2_1x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_shufflenetv2_1x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_shufflenetv2_1x_fp16.tar) | -| PicoDet-MobileNetv3-large 1x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_mobilenetv3_large_1x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_mobilenetv3_large_1x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_mobilenetv3_large_1x_fp16.tar) | -| PicoDet-LCNet 1.5x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_lcnet_1_5x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_lcnet_1_5x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_lcnet_1_5x_fp16.tar) | - - -### Deploy - -- PaddleInference demo [Python](../../deploy/python) & [C++](../../deploy/cpp) -- [PaddleLite C++ demo](../../deploy/lite) -- [NCNN C++/Python demo](../../deploy/third_engine/demo_ncnn) -- [MNN C++/Python demo](../../deploy/third_engine/demo_mnn) -- [OpenVINO C++ demo](../../deploy/third_engine/demo_openvino) -- [Android demo(Paddle Lite)](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/app/cxx/picodet_detection_demo) - - -Android demo visualization: +| PicoDet-XS | 320*320 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_320_coco_lcnet_fp16.tar) | +| PicoDet-XS | 416*416 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_416_coco_lcnet_fp16.tar) | +| PicoDet-S | 320*320 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_coco_lcnet_fp16.tar) | +| PicoDet-S | 416*416 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_fp16.tar) | +| PicoDet-M | 320*320 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_coco_lcnet_fp16.tar) | +| PicoDet-M | 416*416 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_coco_lcnet_fp16.tar) | +| PicoDet-L | 320*320 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_coco_lcnet_fp16.tar) | +| PicoDet-L | 416*416 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_coco_lcnet_fp16.tar) | +| PicoDet-L | 640*640 | [( w/ 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_lcnet_postprocessed.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_coco_lcnet_fp16.tar) | + +### 部署 + +| 预测库 | Python | C++ | 带后处理预测 | +| :-------- | :--------: | :---------------------: | :----------------: | +| OpenVINO | [Python](../../deploy/third_engine/demo_openvino/python) | [C++](../../deploy/third_engine/demo_openvino)(带后处理开发中) | ✔︎ | +| Paddle Lite | - | [C++](../../deploy/lite) | ✔︎ | +| Android Demo | - | [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/app/cxx/picodet_detection_demo) | ✔︎ | +| PaddleInference | [Python](../../deploy/python) | [C++](../../deploy/cpp) | ✔︎ | +| ONNXRuntime | [Python](../../deploy/third_engine/demo_onnxruntime) | Coming soon | ✔︎ | +| NCNN | Coming soon | [C++](../../deploy/third_engine/demo_ncnn) | ✘ | +| MNN | Coming soon | [C++](../../deploy/third_engine/demo_mnn) | ✘ | + + + +Android demo可视化:
    -## Quantization +## 量化
    -Requirements: +依赖包: - PaddlePaddle >= 2.2.2 -- PaddleSlim >= 2.2.1 +- PaddleSlim >= 2.2.2 -**Install:** +**安装:** ```shell -pip install paddleslim==2.2.1 +pip install paddleslim==2.2.2 ```
    -
    -Quant aware (click to expand) +
    +量化训练 -Configure the quant config and start training: +开始量化训练: ```shell -python tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ - --slim_config configs/slim/quant/picodet_s_quant.yml --eval +python tools/train.py -c configs/picodet/picodet_s_416_coco_lcnet.yml \ + --slim_config configs/slim/quant/picodet_s_416_lcnet_quant.yml --eval ``` -- More detail can refer to [slim document](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim) +- 更多细节请参考[slim文档](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim)
    -
    -Post quant (click to expand) - -Configure the post quant config and start calibrate model: +- 量化训练Model ZOO: -```shell -python tools/post_quant.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ - --slim_config configs/slim/post_quant/picodet_s_ptq.yml -``` - -- Notes: Now the accuracy of post quant is abnormal and this problem is being solved. - -
    +| 量化模型 | 输入尺寸 | mAPval
    0.5:0.95 | Configs | Weight | Inference Model | Paddle Lite(INT8) | +| :-------- | :--------: | :--------------------: | :-------: | :----------------: | :----------------: | :----------------: | +| PicoDet-S | 416*416 | 31.5 | [config](./picodet_s_416_coco_lcnet.yml) | [slim config](../slim/quant/picodet_s_416_lcnet_quant.yml) | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet_quant.pdparams) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_quant.tar) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_quant_non_postprocess.tar) | [w/ 后处理](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_quant.nb) | [w/o 后处理](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_quant_non_postprocess.nb) | -## Unstructured Pruning +## 非结构化剪枝
    -Toturial: +教程: -Please refer this [documentation](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/pruner/README.md) for details such as requirements, training and deployment. +训练及部署细节请参考[非结构化剪枝文档](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/legacy_model/pruner/README.md)。
    -## Application +## 应用 -- **Pedestrian detection:** model zoo of `PicoDet-S-Pedestrian` please refer to [PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/tiny_pose#%E8%A1%8C%E4%BA%BA%E6%A3%80%E6%B5%8B%E6%A8%A1%E5%9E%8B) +- **行人检测:** `PicoDet-S-Pedestrian`行人检测模型请参考[PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/tiny_pose#%E8%A1%8C%E4%BA%BA%E6%A3%80%E6%B5%8B%E6%A8%A1%E5%9E%8B) -- **Mainbody detection:** model zoo of `PicoDet-L-Mainbody` please refer to [mainbody detection](./application/mainbody_detection/README.md) +- **主体检测:** `PicoDet-L-Mainbody`主体检测模型请参考[主体检测文档](./legacy_model/application/mainbody_detection/README.md) ## FAQ
    -Out of memory error. +显存爆炸(Out of memory error) -Please reduce the `batch_size` of `TrainReader` in config. +请减小配置文件中`TrainReader`的`batch_size`。
    -How to transfer learning. +如何迁移学习 -Please reset `pretrain_weights` in config, which trained on coco. Such as: +请重新设置配置文件中的`pretrain_weights`字段,比如利用COCO上训好的模型在自己的数据上继续训练: ```yaml pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams ``` @@ -307,17 +310,17 @@ pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcne
    -The transpose operator is time-consuming on some hardware. +`transpose`算子在某些硬件上耗时验证 -Please use `PicoDet-LCNet` model, which has fewer `transpose` operators. +请使用`PicoDet-LCNet`模型,`transpose`较少。
    -How to count model parameters. +如何计算模型参数量。 -You can insert below code at [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/engine/trainer.py#L141) to count learnable parameters. +可以将以下代码插入:[trainer.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/engine/trainer.py#L141) 来计算参数量。 ```python params = sum([ @@ -329,8 +332,8 @@ print('params: ', params)
    -## Cite PP-PicoDet -If you use PicoDet in your research, please cite our work by using the following BibTeX entry: +## 引用PP-PicoDet +如果需要在你的研究中使用PP-PicoDet,请通过一下方式引用我们的技术报告: ``` @misc{yu2021pppicodet, title={PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices}, diff --git a/configs/picodet/README_cn.md b/configs/picodet/README_en.md similarity index 30% rename from configs/picodet/README_cn.md rename to configs/picodet/README_en.md index 7131200a2e106e50fe71a97eda566a4520bfc5e8..d7d51c7b3b774ce0f68822a7d5084ea8639ada53 100644 --- a/configs/picodet/README_cn.md +++ b/configs/picodet/README_en.md @@ -1,63 +1,60 @@ -简体中文 | [English](README.md) +English | [简体中文](README.md) # PP-PicoDet ![](../../docs/images/picedet_demo.jpeg) -## 最新动态 +## News -- 发布全新系列PP-PicoDet模型:**(2022.03.20)** - - (1)引入TAL及Task-aligned Head,优化PAN等结构,精度大幅提升; - - (2)优化CPU端预测速度,同时训练速度大幅提升; - - (3)导出模型将后处理包含在网络中,预测直接输出box结果,无需二次开发,迁移成本更低。 +- Released a new series of PP-PicoDet models: **(2022.03.20)** + - (1) It was used TAL/ETA Head and optimized PAN, which greatly improved the accuracy; + - (2) Moreover optimized CPU prediction speed, and the training speed is greatly improved; + - (3) The export model includes post-processing, and the prediction directly outputs the result, without secondary development, and the migration cost is lower. -## 历史版本模型 +### Legacy Model -- 详情请参考:[PicoDet 2021.10版本](./legacy_model/) +- Please refer to: [PicoDet 2021.10](./legacy_model/) -## 简介 +## Introduction -PaddleDetection中提出了全新的轻量级系列模型`PP-PicoDet`,在移动端具有卓越的性能,成为全新SOTA轻量级模型。详细的技术细节可以参考我们的[arXiv技术报告](https://arxiv.org/abs/2111.00902)。 +We developed a series of lightweight models, named `PP-PicoDet`. Because of the excellent performance, our models are very suitable for deployment on mobile or CPU. For more details, please refer to our [report on arXiv](https://arxiv.org/abs/2111.00902). -PP-PicoDet模型有如下特点: - -- 🌟 更高的mAP: 第一个在1M参数量之内`mAP(0.5:0.95)`超越**30+**(输入416像素时)。 -- 🚀 更快的预测速度: 网络预测在ARM CPU下可达150FPS。 -- 😊 部署友好: 支持PaddleLite/MNN/NCNN/OpenVINO等预测库,支持转出ONNX,提供了C++/Python/Android的demo。 -- 😍 先进的算法: 我们在现有SOTA算法中进行了创新, 包括:ESNet, CSP-PAN, SimOTA等等。 +- 🌟 Higher mAP: the **first** object detectors that surpass mAP(0.5:0.95) **30+** within 1M parameters when the input size is 416. +- 🚀 Faster latency: 150FPS on mobile ARM CPU. +- 😊 Deploy friendly: support PaddleLite/MNN/NCNN/OpenVINO and provide C++/Python/Android implementation. +- 😍 Advanced algorithm: use the most advanced algorithms and offer innovation, such as ESNet, CSP-PAN, SimOTA with VFL, etc.
    -## 基线 - -| 模型 | 输入尺寸 | mAPval
    0.5:0.95 | mAPval
    0.5 | 参数量
    (M) | FLOPS
    (G) | 预测时延[NCNN](#latency)
    (ms) | 预测时延[Lite](#latency)
    (ms) | 下载 | 配置文件 | -| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------: | :--------------------------------------- | -| PicoDet-XS | 320*320 | 23.5 | 36.1 | 0.70 | 0.67 | 10.9ms | 7.81ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_xs_320_coco_lcnet.yml) | -| PicoDet-XS | 416*416 | 26.2 | 39.3 | 0.70 | 1.13 | 15.4ms | 12.38ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_xs_416_coco_lcnet.yml) | -| PicoDet-S | 320*320 | 29.1 | 43.4 | 1.18 | 0.97 | 12.6ms | 9.56ms | [model](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_320_coco_lcnet.yml) | -| PicoDet-S | 416*416 | 32.5 | 47.6 | 1.18 | 1.65 | 17.2ms | 15.20 | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_416_coco_lcnet.yml) | -| PicoDet-M | 320*320 | 34.4 | 50.0 | 3.46 | 2.57 | 14.5ms | 17.68ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_320_coco_lcnet.yml) | -| PicoDet-M | 416*416 | 37.5 | 53.4 | 3.46 | 4.34 | 19.5ms | 28.39ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_416_coco_lcnet.yml) | -| PicoDet-L | 320*320 | 36.1 | 52.0 | 5.80 | 4.20 | 18.3ms | 25.21ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_320_coco_lcnet.yml) | -| PicoDet-L | 416*416 | 39.4 | 55.7 | 5.80 | 7.10 | 22.1ms | 42.23ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_416_coco_lcnet.yml) | -| PicoDet-L | 640*640 | 42.3 | 59.2 | 5.80 | 16.81 | 43.1ms | 108.1ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_640_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_640_coco_lcnet.yml) | +## Benchmark +| Model | Input size | mAPval
    0.5:0.95 | mAPval
    0.5 | Params
    (M) | FLOPS
    (G) | Latency[CPU](#latency)
    (ms) | Latency[Lite](#latency)
    (ms) | Weight | Config | Inference Model | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | :----------------------------------------: | :--------------------------------------- | :--------------------------------------- | +| PicoDet-XS | 320*320 | 23.5 | 36.1 | 0.70 | 0.67 | 3.9ms | 7.81ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_xs_320_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_320_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-XS | 416*416 | 26.2 | 39.3 | 0.70 | 1.13 | 6.1ms | 12.38ms | [model](https://paddledet.bj.bcebos.com/models/picodet_xs_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_xs_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_xs_416_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_416_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_xs_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-S | 320*320 | 29.1 | 43.4 | 1.18 | 0.97 | 4.8ms | 9.56ms | [model](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_320_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_320_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-S | 416*416 | 32.5 | 47.6 | 1.18 | 1.65 | 6.6ms | 15.20ms | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_s_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_416_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-M | 320*320 | 34.4 | 50.0 | 3.46 | 2.57 | 8.2ms | 17.68ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_320_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_320_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-M | 416*416 | 37.5 | 53.4 | 3.46 | 4.34 | 12.7ms | 28.39ms | [model](https://paddledet.bj.bcebos.com/models/picodet_m_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_m_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_m_416_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_416_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_m_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 320*320 | 36.1 | 52.0 | 5.80 | 4.20 | 11.5ms | 25.21ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_320_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_320_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_320_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_320_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 416*416 | 39.4 | 55.7 | 5.80 | 7.10 | 20.7ms | 42.23ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_416_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_416_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_416_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_416_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_416_coco_lcnet_non_postprocess.tar) | +| PicoDet-L | 640*640 | 42.6 | 59.2 | 5.80 | 16.81 | 62.5ms | 108.1ms | [model](https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams) | [log](https://paddledet.bj.bcebos.com/logs/train_picodet_l_640_coco_lcnet.log) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_l_640_coco_lcnet.yml) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_640_coco_lcnet.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_l_640_coco_lcnet_non_postprocess.tar) |
    -注意事项: +Table Notes: -- 时延测试: 我们所有的模型都在英特尔至强6148的CPU(MKLDNN 10线程)和`骁龙865(4xA77+4xA55)`的ARM CPU上测试(4线程,FP16预测)。上面表格中标有`CPU`的是使用Paddle Inference库测试,标有`Lite`的是使用[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite)进行测试。 -- PicoDet在COCO train2017上训练,并且在COCO val2017上进行验证。使用4卡GPU训练,并且上表所有的预训练模型都是通过发布的默认配置训练得到。 -- Benchmark测试:测试速度benchmark性能时,导出模型后处理不包含在网络中,需要设置`-o export.benchmark=True` 或手动修改[runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/runtime.yml#L12)。 +- Latency: All our models test on `Intel core i7 10750H` CPU with MKLDNN by 12 threads and `Qualcomm Snapdragon 865(4xA77+4xA55)` with 4 threads by arm8 and with FP16. In the above table, test CPU latency on Paddle-Inference and testing Mobile latency with `Lite`->[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite). +- PicoDet is trained on COCO train2017 dataset and evaluated on COCO val2017. And PicoDet used 4 GPUs for training and all checkpoints are trained with default settings and hyperparameters. +- Benchmark test: When testing the speed benchmark, the post-processing is not included in the exported model, you need to set `-o export.benchmark=True` or manually modify [runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/runtime.yml#L12).
    -#### 其他模型的基线 +#### Benchmark of Other Models -| 模型 | 输入尺寸 | mAPval
    0.5:0.95 | mAPval
    0.5 | 参数量
    (M) | FLOPS
    (G) | 预测时延[NCNN](#latency)
    (ms) | +| Model | Input size | mAPval
    0.5:0.95 | mAPval
    0.5 | Params
    (M) | FLOPS
    (G) | Latency[NCNN](#latency)
    (ms) | | :-------- | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | | YOLOv3-Tiny | 416*416 | 16.6 | 33.1 | 8.86 | 5.62 | 25.42 | | YOLOv4-Tiny | 416*416 | 21.7 | 40.2 | 6.06 | 6.96 | 23.69 | @@ -71,39 +68,38 @@ PP-PicoDet模型有如下特点: | YOLOv5n | 640*640 | 28.4 | 46.0 | 1.9 | 4.5 | 40.35 | | YOLOv5s | 640*640 | 37.2 | 56.0 | 7.2 | 16.5 | 78.05 | -- ARM测试的benchmark脚本来自: [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark)。 +- Testing Mobile latency with code: [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark). -## 快速开始 +## Quick Start
    -依赖包: +Requirements: -- PaddlePaddle == 2.2.2 +- PaddlePaddle >= 2.2.2
    -安装 +Installation -- [安装指导文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL.md) -- [准备数据文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/PrepareDataSet_en.md) +- [Installation guide](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL.md) +- [Prepare dataset](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/PrepareDataSet_en.md)
    -训练&评估 +Training and Evaluation -- 单卡GPU上训练: +- Training model on single-GPU: ```shell # training on single-GPU export CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --eval ``` +If the GPU is out of memory during training, reduce the batch_size in TrainReader, and reduce the base_lr in LearningRate proportionally. At the same time, the configs we published are all trained with 4 GPUs. If the number of GPUs is changed to 1, the base_lr needs to be reduced by a factor of 4. -如果训练时显存out memory,将TrainReader中batch_size调小,同时LearningRate中base_lr等比例减小。 - -- 多卡GPU上训练: +- Training model on multi-GPU: ```shell @@ -112,72 +108,76 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml --eval ``` -- 评估: +- Evaluation: ```shell python tools/eval.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams ``` -- 测试: +- Infer: ```shell python tools/infer.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams ``` -详情请参考[快速开始文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED.md). +Detail also can refer to [Quick start guide](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED.md).
    -## 部署 +## Deployment -### 导出及转换模型 +### Export and Convert Model -
    -1. 导出模型 (点击展开) +
    +1. Export model ```shell cd PaddleDetection python tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams \ - --output_dir=inference_model + --output_dir=output_inference ``` +- If no post processing is required, please specify: `-o export.benchmark=True` (if -o has already appeared, delete -o here) or manually modify corresponding fields in [runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/runtime.yml). +- If no NMS is required, please specify: `-o export.nms=True` or manually modify corresponding fields in [runtime.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/runtime.yml). Many scenes exported to ONNX only support single input and fixed shape output, so if exporting to ONNX, it is recommended not to export NMS. + +
    -2. 转换模型至Paddle Lite (点击展开) +2. Convert to PaddleLite (click to expand) -- 安装Paddlelite>=2.10: +- Install Paddlelite>=2.10: ```shell pip install paddlelite ``` -- 转换模型至Paddle Lite格式: +- Convert model: ```shell # FP32 -paddle_lite_opt --model_dir=inference_model/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp32 +paddle_lite_opt --model_dir=output_inference/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp32 # FP16 -paddle_lite_opt --model_dir=inference_model/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp16 --enable_fp16=true +paddle_lite_opt --model_dir=output_inference/picodet_s_320_coco_lcnet --valid_targets=arm --optimize_out=picodet_s_320_coco_fp16 --enable_fp16=true ```
    -3. 转换模型至ONNX (点击展开) +3. Convert to ONNX (click to expand) -- 安装[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) >= 0.7 并且 ONNX > 1.10.1, 细节请参考[导出ONNX模型教程](../../deploy/EXPORT_ONNX_MODEL.md) +- Install [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) >= 0.7 and ONNX > 1.10.1, for details, please refer to [Tutorials of Export ONNX Model](../../deploy/EXPORT_ONNX_MODEL.md) ```shell pip install onnx -pip install paddle2onnx +pip install paddle2onnx==0.9.2 ``` -- 转换模型: +- Convert model: ```shell paddle2onnx --model_dir output_inference/picodet_s_320_coco_lcnet/ \ @@ -187,123 +187,117 @@ paddle2onnx --model_dir output_inference/picodet_s_320_coco_lcnet/ \ --save_file picodet_s_320_coco.onnx ``` -- 简化ONNX模型: 使用`onnx-simplifier`库来简化ONNX模型。 +- Simplify ONNX model: use onnx-simplifier to simplify onnx model. - - 安装 onnx-simplifier >= 0.3.6: + - Install onnxsim >= 0.4.1: ```shell - pip install onnx-simplifier + pip install onnxsim ``` - - 简化ONNX模型: + - simplify onnx model: ```shell - python -m onnxsim picodet_s_320_coco.onnx picodet_s_processed.onnx + onnxsim picodet_s_320_coco.onnx picodet_s_processed.onnx ```
    -- 部署用的模型 +- Deploy models -| 模型 | 输入尺寸 | ONNX | Paddle Lite(fp32) | Paddle Lite(fp16) | +| Model | Input size | ONNX(w/o postprocess) | Paddle Lite(fp32) | Paddle Lite(fp16) | | :-------- | :--------: | :---------------------: | :----------------: | :----------------: | -| PicoDet-S | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_fp16.tar) | -| PicoDet-S | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_fp16.tar) | -| PicoDet-M | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_fp16.tar) | -| PicoDet-M | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_fp16.tar) | -| PicoDet-L | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_fp16.tar) | -| PicoDet-L | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_fp16.tar) | -| PicoDet-L | 640*640 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_fp16.tar) | -| PicoDet-Shufflenetv2 1x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_shufflenetv2_1x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_shufflenetv2_1x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_shufflenetv2_1x_fp16.tar) | -| PicoDet-MobileNetv3-large 1x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_mobilenetv3_large_1x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_mobilenetv3_large_1x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_mobilenetv3_large_1x_fp16.tar) | -| PicoDet-LCNet 1.5x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_lcnet_1_5x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_lcnet_1_5x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_lcnet_1_5x_fp16.tar) | - - -### 部署 - -- PaddleInference demo [Python](../../deploy/python) & [C++](../../deploy/cpp) -- [PaddleLite C++ demo](../../deploy/lite) -- [NCNN C++/Python demo](../../deploy/third_engine/demo_ncnn) -- [MNN C++/Python demo](../../deploy/third_engine/demo_mnn) -- [OpenVINO C++ demo](../../deploy/third_engine/demo_openvino) -- [Android demo(Paddle Lite)](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/app/cxx/picodet_detection_demo) - - -Android demo可视化: +| PicoDet-XS | 320*320 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_320_coco_lcnet_fp16.tar) | +| PicoDet-XS | 416*416 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_xs_416_coco_lcnet_fp16.tar) | +| PicoDet-S | 320*320 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_coco_lcnet_fp16.tar) | +| PicoDet-S | 416*416 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_fp16.tar) | +| PicoDet-M | 320*320 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_coco_lcnet_fp16.tar) | +| PicoDet-M | 416*416 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_coco_lcnet_fp16.tar) | +| PicoDet-L | 320*320 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_coco_lcnet_fp16.tar) | +| PicoDet-L | 416*416 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_coco_lcnet.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_coco_lcnet_fp16.tar) | +| PicoDet-L | 640*640 | [( w/ postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_lcnet_postprocessed.onnx) | [( w/o postprocess)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_coco_lcnet.onnx) [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_coco_lcnet.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_coco_lcnet_fp16.tar) | + + +### Deploy + +| Infer Engine | Python | C++ | Predict With Postprocess | +| :-------- | :--------: | :---------------------: | :----------------: | +| OpenVINO | [Python](../../deploy/third_engine/demo_openvino/python) | [C++](../../deploy/third_engine/demo_openvino)(postprocess coming soon) | ✔︎ | +| Paddle Lite | - | [C++](../../deploy/lite) | ✔︎ | +| Android Demo | - | [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/app/cxx/picodet_detection_demo) | ✔︎ | +| PaddleInference | [Python](../../deploy/python) | [C++](../../deploy/cpp) | ✔︎ | +| ONNXRuntime | [Python](../../deploy/third_engine/demo_onnxruntime) | Coming soon | ✔︎ | +| NCNN | Coming soon | [C++](../../deploy/third_engine/demo_ncnn) | ✘ | +| MNN | Coming soon | [C++](../../deploy/third_engine/demo_mnn) | ✘ | + + +Android demo visualization:
    -## 量化 +## Quantization
    -依赖包: +Requirements: - PaddlePaddle >= 2.2.2 -- PaddleSlim >= 2.2.1 +- PaddleSlim >= 2.2.2 -**安装:** +**Install:** ```shell -pip install paddleslim==2.2.1 +pip install paddleslim==2.2.2 ```
    -
    -量化训练 (点击展开) +
    +Quant aware -开始量化训练: +Configure the quant config and start training: ```shell -python tools/train.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ - --slim_config configs/slim/quant/picodet_s_quant.yml --eval +python tools/train.py -c configs/picodet/picodet_s_416_coco_lcnet.yml \ + --slim_config configs/slim/quant/picodet_s_416_lcnet_quant.yml --eval ``` -- 更多细节请参考[slim文档](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim) +- More detail can refer to [slim document](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim)
    -
    -离线量化 (点击展开) - -校准及导出量化模型: +- Quant Aware Model ZOO: -```shell -python tools/post_quant.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \ - --slim_config configs/slim/post_quant/picodet_s_ptq.yml -``` - -- 注意: 离线量化模型精度问题正在解决中. - -
    +| Quant Model | Input size | mAPval
    0.5:0.95 | Configs | Weight | Inference Model | Paddle Lite(INT8) | +| :-------- | :--------: | :--------------------: | :-------: | :----------------: | :----------------: | :----------------: | +| PicoDet-S | 416*416 | 31.5 | [config](./picodet_s_416_coco_lcnet.yml) | [slim config](../slim/quant/picodet_s_416_lcnet_quant.yml) | [model](https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet_quant.pdparams) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_quant.tar) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet_quant_non_postprocess.tar) | [w/ postprocess](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_quant.nb) | [w/o postprocess](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_coco_lcnet_quant_non_postprocess.nb) | -## 非结构化剪枝 +## Unstructured Pruning
    -教程: +Tutorial: -训练及部署细节请参考[非结构化剪枝文档](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/pruner/README.md)。 +Please refer this [documentation](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/legacy_model/pruner/README.md) for details such as requirements, training and deployment.
    -## 应用 +## Application -- **行人检测:** `PicoDet-S-Pedestrian`行人检测模型请参考[PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/tiny_pose#%E8%A1%8C%E4%BA%BA%E6%A3%80%E6%B5%8B%E6%A8%A1%E5%9E%8B) +- **Pedestrian detection:** model zoo of `PicoDet-S-Pedestrian` please refer to [PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/tiny_pose#%E8%A1%8C%E4%BA%BA%E6%A3%80%E6%B5%8B%E6%A8%A1%E5%9E%8B) -- **主体检测:** `PicoDet-L-Mainbody`主体检测模型请参考[主体检测文档](./application/mainbody_detection/README.md) +- **Mainbody detection:** model zoo of `PicoDet-L-Mainbody` please refer to [mainbody detection](./legacy_model/application/mainbody_detection/README.md) ## FAQ
    -显存爆炸(Out of memory error) +Out of memory error. -请减小配置文件中`TrainReader`的`batch_size`。 +Please reduce the `batch_size` of `TrainReader` in config.
    -如何迁移学习 +How to transfer learning. -请重新设置配置文件中的`pretrain_weights`字段,比如利用COCO上训好的模型在自己的数据上继续训练: +Please reset `pretrain_weights` in config, which trained on coco. Such as: ```yaml pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams ``` @@ -311,17 +305,17 @@ pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcne
    -`transpose`算子在某些硬件上耗时验证 +The transpose operator is time-consuming on some hardware. -请使用`PicoDet-LCNet`模型,`transpose`较少。 +Please use `PicoDet-LCNet` model, which has fewer `transpose` operators.
    -如何计算模型参数量。 +How to count model parameters. -可以将以下代码插入:[trainer.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/engine/trainer.py#L141) 来计算参数量。 +You can insert below code at [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/engine/trainer.py#L141) to count learnable parameters. ```python params = sum([ @@ -333,8 +327,8 @@ print('params: ', params)
    -## 引用PP-PicoDet -如果需要在你的研究中使用PP-PicoDet,请通过一下方式引用我们的技术报告: +## Cite PP-PicoDet +If you use PicoDet in your research, please cite our work by using the following BibTeX entry: ``` @misc{yu2021pppicodet, title={PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices}, diff --git a/configs/picodet/_base_/picodet_320_reader.yml b/configs/picodet/_base_/picodet_320_reader.yml index 6b0112469e0f2b827f6addd14ac3a0b6cb42f3c0..7d6500679dba0f06c6238aa8bed4f2fd0ad8bd5b 100644 --- a/configs/picodet/_base_/picodet_320_reader.yml +++ b/configs/picodet/_base_/picodet_320_reader.yml @@ -1,4 +1,8 @@ worker_num: 6 +eval_height: &eval_height 320 +eval_width: &eval_width 320 +eval_size: &eval_size [*eval_height, *eval_width] + TrainReader: sample_transforms: - Decode: {} @@ -18,7 +22,7 @@ TrainReader: EvalReader: sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [320, 320], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} batch_transforms: @@ -29,13 +33,10 @@ EvalReader: TestReader: inputs_def: - image_shape: [1, 3, 320, 320] + image_shape: [1, 3, *eval_height, *eval_width] sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [320, 320], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} - batch_transforms: - - PadBatch: {pad_to_stride: 32} batch_size: 1 - shuffle: false diff --git a/configs/picodet/_base_/picodet_416_reader.yml b/configs/picodet/_base_/picodet_416_reader.yml index f98fe08e1aa312cdd6e59acd0594f19d5c27b7ea..ee4ae98865f7eb58994c0a79964d24e41c697373 100644 --- a/configs/picodet/_base_/picodet_416_reader.yml +++ b/configs/picodet/_base_/picodet_416_reader.yml @@ -1,4 +1,8 @@ worker_num: 6 +eval_height: &eval_height 416 +eval_width: &eval_width 416 +eval_size: &eval_size [*eval_height, *eval_width] + TrainReader: sample_transforms: - Decode: {} @@ -18,7 +22,7 @@ TrainReader: EvalReader: sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [416, 416], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} batch_transforms: @@ -29,13 +33,10 @@ EvalReader: TestReader: inputs_def: - image_shape: [1, 3, 416, 416] + image_shape: [1, 3, *eval_height, *eval_width] sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [416, 416], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} - batch_transforms: - - PadBatch: {pad_to_stride: 32} batch_size: 1 - shuffle: false diff --git a/configs/picodet/_base_/picodet_640_reader.yml b/configs/picodet/_base_/picodet_640_reader.yml index d90fbeb9770b911ec2dbe3c80c4b5479f7dd53e1..5502026af8b1d0762405db17e655b2b6628dea04 100644 --- a/configs/picodet/_base_/picodet_640_reader.yml +++ b/configs/picodet/_base_/picodet_640_reader.yml @@ -1,4 +1,8 @@ worker_num: 6 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + TrainReader: sample_transforms: - Decode: {} @@ -18,7 +22,7 @@ TrainReader: EvalReader: sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [640, 640], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} batch_transforms: @@ -29,13 +33,10 @@ EvalReader: TestReader: inputs_def: - image_shape: [1, 3, 640, 640] + image_shape: [1, 3, *eval_height, *eval_width] sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [640, 640], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} - batch_transforms: - - PadBatch: {pad_to_stride: 32} batch_size: 1 - shuffle: false diff --git a/configs/picodet/application/pedestrian_detection/picodet_s_192_lcnet_pedestrian.yml b/configs/picodet/application/pedestrian_detection/picodet_s_192_lcnet_pedestrian.yml new file mode 100644 index 0000000000000000000000000000000000000000..bb3d2e9bc923f0ee41c12fbbb7d1a7b91b97d339 --- /dev/null +++ b/configs/picodet/application/pedestrian_detection/picodet_s_192_lcnet_pedestrian.yml @@ -0,0 +1,161 @@ +use_gpu: true +use_xpu: false +log_iter: 20 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +metric: COCO +num_classes: 1 + +architecture: PicoDet +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams +weights: output/picodet_s_192_lcnet_pedestrian/best_model +find_unused_parameters: True +use_ema: true +epoch: 300 +snapshot_epoch: 10 + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 0.75 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 96 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 96 + fpn_stride: [8, 16, 32, 64] + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 4 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 + +LearningRate: + base_lr: 0.32 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +worker_num: 6 +eval_height: &eval_height 192 +eval_width: &eval_width 192 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [128, 160, 192, 224, 256], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 64 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: aic_coco_train_cocoformat.json + dataset_dir: dataset + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/configs/picodet/application/pedestrian_detection/picodet_s_320_lcnet_pedestrian.yml b/configs/picodet/application/pedestrian_detection/picodet_s_320_lcnet_pedestrian.yml new file mode 100644 index 0000000000000000000000000000000000000000..91402ba5e6cf8edb587566260c1bb7a202d3be61 --- /dev/null +++ b/configs/picodet/application/pedestrian_detection/picodet_s_320_lcnet_pedestrian.yml @@ -0,0 +1,160 @@ +use_gpu: true +use_xpu: false +log_iter: 20 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +metric: COCO +num_classes: 1 + +architecture: PicoDet +pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams +weights: output/picodet_s_320_lcnet_pedestrian/best_model +find_unused_parameters: True +use_ema: true +epoch: 300 +snapshot_epoch: 10 + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 0.75 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 96 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 96 + feat_out: 96 + num_convs: 2 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + feat_in_chan: 96 + fpn_stride: [8, 16, 32, 64] + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 + +LearningRate: + base_lr: 0.32 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +worker_num: 6 +eval_height: &eval_height 320 +eval_width: &eval_width 320 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [256, 288, 320, 352, 384], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 64 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: aic_coco_train_cocoformat.json + dataset_dir: dataset + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' diff --git a/configs/picodet/legacy_model/README.md b/configs/picodet/legacy_model/README.md index f58ebc75be2f9aee557341143b97c3d90de3a459..7821c88be602bfa57cfd4ab36bcaa1040ce8a85c 100644 --- a/configs/picodet/legacy_model/README.md +++ b/configs/picodet/legacy_model/README.md @@ -29,6 +29,23 @@
    +- Deploy models + +| Model | Input size | ONNX | Paddle Lite(fp32) | Paddle Lite(fp16) | +| :-------- | :--------: | :---------------------: | :----------------: | :----------------: | +| PicoDet-S | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_320_fp16.tar) | +| PicoDet-S | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_s_416_fp16.tar) | +| PicoDet-M | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_320_fp16.tar) | +| PicoDet-M | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_m_416_fp16.tar) | +| PicoDet-L | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_320_fp16.tar) | +| PicoDet-L | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_416_fp16.tar) | +| PicoDet-L | 640*640 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_l_640_fp16.tar) | +| PicoDet-Shufflenetv2 1x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_shufflenetv2_1x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_shufflenetv2_1x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_shufflenetv2_1x_fp16.tar) | +| PicoDet-MobileNetv3-large 1x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_mobilenetv3_large_1x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_mobilenetv3_large_1x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_mobilenetv3_large_1x_fp16.tar) | +| PicoDet-LCNet 1.5x | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_lcnet_1_5x_416_coco.onnx) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_lcnet_1_5x.tar) | [model](https://paddledet.bj.bcebos.com/deploy/paddlelite/picodet_lcnet_1_5x_fp16.tar) | + + + ## Cite PP-PicoDet ``` @misc{yu2021pppicodet, diff --git a/configs/picodet/legacy_model/_base_/picodet_320_reader.yml b/configs/picodet/legacy_model/_base_/picodet_320_reader.yml index 2ce5bca6695ac50f25622b7d1704e68a20179b22..4d3f0cbd8648bf2d8ef44cdbf1d2422865a22c94 100644 --- a/configs/picodet/legacy_model/_base_/picodet_320_reader.yml +++ b/configs/picodet/legacy_model/_base_/picodet_320_reader.yml @@ -1,4 +1,8 @@ worker_num: 6 +eval_height: &eval_height 320 +eval_width: &eval_width 320 +eval_size: &eval_size [*eval_height, *eval_width] + TrainReader: sample_transforms: - Decode: {} @@ -18,7 +22,7 @@ TrainReader: EvalReader: sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [320, 320], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} batch_transforms: @@ -29,13 +33,10 @@ EvalReader: TestReader: inputs_def: - image_shape: [1, 3, 320, 320] + image_shape: [1, 3, *eval_height, *eval_width] sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [320, 320], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} - batch_transforms: - - PadBatch: {pad_to_stride: 32} batch_size: 1 - shuffle: false diff --git a/configs/picodet/legacy_model/_base_/picodet_416_reader.yml b/configs/picodet/legacy_model/_base_/picodet_416_reader.yml index 12070a4be22abe3d0cdb6593b41e1f98658efca2..59433c64534163a454ad7e5a07b71d011119913c 100644 --- a/configs/picodet/legacy_model/_base_/picodet_416_reader.yml +++ b/configs/picodet/legacy_model/_base_/picodet_416_reader.yml @@ -1,4 +1,8 @@ worker_num: 6 +eval_height: &eval_height 416 +eval_width: &eval_width 416 +eval_size: &eval_size [*eval_height, *eval_width] + TrainReader: sample_transforms: - Decode: {} @@ -18,7 +22,7 @@ TrainReader: EvalReader: sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [416, 416], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} batch_transforms: @@ -29,13 +33,10 @@ EvalReader: TestReader: inputs_def: - image_shape: [1, 3, 416, 416] + image_shape: [1, 3, *eval_height, *eval_width] sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [416, 416], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} - batch_transforms: - - PadBatch: {pad_to_stride: 32} batch_size: 1 - shuffle: false diff --git a/configs/picodet/legacy_model/_base_/picodet_640_reader.yml b/configs/picodet/legacy_model/_base_/picodet_640_reader.yml index a931f2a765855790f877e419a9cd46615c43be5e..60904fb6ba77c858a50f1e743e637961c38ccd1f 100644 --- a/configs/picodet/legacy_model/_base_/picodet_640_reader.yml +++ b/configs/picodet/legacy_model/_base_/picodet_640_reader.yml @@ -1,4 +1,8 @@ worker_num: 6 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + TrainReader: sample_transforms: - Decode: {} @@ -18,7 +22,7 @@ TrainReader: EvalReader: sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [640, 640], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} batch_transforms: @@ -29,13 +33,10 @@ EvalReader: TestReader: inputs_def: - image_shape: [1, 3, 640, 640] + image_shape: [1, 3, *eval_height, *eval_width] sample_transforms: - Decode: {} - - Resize: {interp: 2, target_size: [640, 640], keep_ratio: False} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} - batch_transforms: - - PadBatch: {pad_to_stride: 32} batch_size: 1 - shuffle: false diff --git a/configs/picodet/legacy_model/application/mainbody_detection/README.md b/configs/picodet/legacy_model/application/mainbody_detection/README.md index dc75d9f3ebb860925ea6c981b129087e27f225a7..0408587e62a81dbd97ae9128f59497287da26f5f 100644 --- a/configs/picodet/legacy_model/application/mainbody_detection/README.md +++ b/configs/picodet/legacy_model/application/mainbody_detection/README.md @@ -20,7 +20,7 @@ PP-ShiTu图像识别任务中,训练主体检测模型时主要用到了以下 | LogoDet-3k | 155k | 155k | Logo检测 | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | | RPC | 54k | 54k | 商品检测 | [地址](https://rpc-dataset.github.io/) | -在实际训练的过程中,将所有数据集混合在一起。由于是主体检测,这里将所有标注出的检测框对应的类别都修改为 `前景` 的类别,最终融合的数据集中只包含 1 个类别,即前景,数据集定义配置可以参考[mainbody_detection.yml](./mainbody_detection.yml)。 +在实际训练的过程中,将所有数据集混合在一起。由于是主体检测,这里将所有标注出的检测框对应的类别都修改为 `前景` 的类别,最终融合的数据集中只包含 1 个类别,即前景,数据集定义配置可以参考[picodet_lcnet_x2_5_640_mainbody.yml](./picodet_lcnet_x2_5_640_mainbody.yml)。 ### 1.2 模型库 diff --git a/configs/pphuman/README.md b/configs/pphuman/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e583668ab7b722a4def52e579eaa121a980ea354 --- /dev/null +++ b/configs/pphuman/README.md @@ -0,0 +1,35 @@ +简体中文 | [English](README.md) + +# PP-YOLOE Human 检测模型 + +PaddleDetection团队提供了针对行人的基于PP-YOLOE的检测模型,用户可以下载模型进行使用。 +其中整理后的COCO格式的CrowdHuman数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/crowdhuman.zip),检测类别仅一类 `pedestrian(1)`,原始数据集[下载链接](http://www.crowdhuman.org/download.html)。 + +| 模型 | 数据集 | mAPval
    0.5:0.95 | mAPval
    0.5 | 下载 | 配置文件 | +|:---------|:-------:|:------:|:------:| :----: | :------:| +|PP-YOLOE-s| CrowdHuman | 42.5 | 77.9 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_36e_crowdhuman.pdparams) | [配置文件](./ppyoloe_crn_s_36e_crowdhuman.yml) | +|PP-YOLOE-l| CrowdHuman | 48.0 | 81.9 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_36e_crowdhuman.pdparams) | [配置文件](./ppyoloe_crn_l_36e_crowdhuman.yml) | + + +**注意:** +- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)。 + + +# PP-YOLOE 香烟检测模型 +基于PP-YOLOE模型的香烟检测模型,是实现PP-Human中的基于检测的行为识别方案的一环,如何在PP-Human中使用该模型进行吸烟行为识别,可参考[PP-Human行为识别模块](../../deploy/pipeline/docs/tutorials/pphuman_action.md)。该模型检测类别仅包含香烟一类。由于数据来源限制,目前暂无法直接公开训练数据。该模型使用了小目标数据集VisDrone上的权重(参照[visdrone](../visdrone))作为预训练模型,以提升检测效果。 + +| 模型 | 数据集 | mAPval
    0.5:0.95 | mAPval
    0.5 | 下载 | 配置文件 | +|:---------|:-------:|:------:|:------:| :----: | :------:| +| PP-YOLOE-s | 香烟业务数据集 | 39.7 | 79.5 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.pdparams) | [配置文件](./ppyoloe_crn_s_80e_smoking_visdrone.yml) | + + +## 引用 +``` +@article{shao2018crowdhuman, + title={CrowdHuman: A Benchmark for Detecting Human in a Crowd}, + author={Shao, Shuai and Zhao, Zijian and Li, Boxun and Xiao, Tete and Yu, Gang and Zhang, Xiangyu and Sun, Jian}, + journal={arXiv preprint arXiv:1805.00123}, + year={2018} + } +``` diff --git a/configs/pphuman/ppyoloe_crn_l_36e_crowdhuman.yml b/configs/pphuman/ppyoloe_crn_l_36e_crowdhuman.yml new file mode 100644 index 0000000000000000000000000000000000000000..445fefdc5c1a86c307a5c11b471df1aa95aafe7d --- /dev/null +++ b/configs/pphuman/ppyoloe_crn_l_36e_crowdhuman.yml @@ -0,0 +1,55 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_l_36e_crowdhuman/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 1 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/crowdhuman + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val.json + dataset_dir: dataset/crowdhuman + +TestDataset: + !ImageFolder + anno_path: annotations/val.json + dataset_dir: dataset/crowdhuman + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/pphuman/ppyoloe_crn_s_36e_crowdhuman.yml b/configs/pphuman/ppyoloe_crn_s_36e_crowdhuman.yml new file mode 100644 index 0000000000000000000000000000000000000000..7be5fe7e72e28c1d1fc9f1d517a95caa796fee76 --- /dev/null +++ b/configs/pphuman/ppyoloe_crn_s_36e_crowdhuman.yml @@ -0,0 +1,55 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_s_36e_crowdhuman/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +num_classes: 1 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/crowdhuman + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val.json + dataset_dir: dataset/crowdhuman + +TestDataset: + !ImageFolder + anno_path: annotations/val.json + dataset_dir: dataset/crowdhuman + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml b/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..40a731d4dece54b02948c58e9bbaef60d1d6d9ce --- /dev/null +++ b/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml @@ -0,0 +1,54 @@ +_BASE_: [ + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_s_80e_smoking_visdrone/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_80e_visdrone.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +TrainReader: + batch_size: 16 + +LearningRate: + base_lr: 0.01 + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 80 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + +metric: COCO +num_classes: 1 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_train_cocoformat.json + dataset_dir: dataset/smoking + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking + +TestDataset: + !ImageFolder + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking diff --git a/configs/ppvehicle/README.md b/configs/ppvehicle/README.md new file mode 100644 index 0000000000000000000000000000000000000000..5559009056a56b727b516ea942c41f67929f4be4 --- /dev/null +++ b/configs/ppvehicle/README.md @@ -0,0 +1,56 @@ +简体中文 | [English](README.md) + +# PP-YOLOE Vehicle 检测模型 + +PaddleDetection团队提供了针对自动驾驶场景的基于PP-YOLOE的检测模型,用户可以下载模型进行使用,主要包含5个数据集(BDD100K-DET、BDD100K-MOT、UA-DETRAC、PPVehicle9cls、PPVehicle)。其中前3者为公开数据集,后两者为整合数据集。 +- BDD100K-DET具体类别为10类,包括`pedestrian(1), rider(2), car(3), truck(4), bus(5), train(6), motorcycle(7), bicycle(8), traffic light(9), traffic sign(10)`。 +- BDD100K-MOT具体类别为8类,包括`pedestrian(1), rider(2), car(3), truck(4), bus(5), train(6), motorcycle(7), bicycle(8)`,但数据集比BDD100K-DET更大更多。 +- UA-DETRAC具体类别为4类,包括`car(1), bus(2), van(3), others(4)`。 +- PPVehicle9cls数据集整合了BDD100K-MOT和UA-DETRAC,具体类别为9类,包括`pedestrian(1), rider(2), car(3), truck(4), bus(5), van(6), motorcycle(7), bicycle(8), others(9)`。 +- PPVehicle数据集整合了BDD100K-MOT和UA-DETRAC,是将BDD100K-MOT中的`car, truck, bus, van`和UA-DETRAC中的`car, bus, van`都合并为1类`vehicle(1)`后的数据集。 + + +| 模型 | 数据集 | 类别数 | mAPval
    0.5:0.95 | 下载链接 | 配置文件 | +|:---------|:---------------:|:------:|:-----------------------:|:---------:| :-----: | +|PP-YOLOE-l| BDD100K-DET | 10 | 35.6 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_36e_bdd100kdet.pdparams) | [配置文件](./ppyoloe_crn_l_36e_bdd100kdet.yml) | +|PP-YOLOE-l| BDD100K-MOT | 8 | 33.7 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_36e_bdd100kmot.pdparams) | [配置文件](./ppyoloe_crn_l_36e_bdd100kmot.yml) | +|PP-YOLOE-l| UA-DETRAC | 4 | 51.4 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_36e_uadetrac.pdparams) | [配置文件](./ppyoloe_crn_l_36e_uadetrac.yml) | +|PP-YOLOE-l| PPVehicle9cls | 9 | 40.0 | [下载链接](https://paddledet.bj.bcebos.com/models/mot_ppyoloe_l_36e_ppvehicle9cls.pdparams) | [配置文件](./mot_ppyoloe_l_36e_ppvehicle9cls.yml) | +|PP-YOLOE-s| PPVehicle9cls | 9 | 35.3 | [下载链接](https://paddledet.bj.bcebos.com/models/mot_ppyoloe_s_36e_ppvehicle9cls.pdparams) | [配置文件](./mot_ppyoloe_s_36e_ppvehicle9cls.yml) | +|PP-YOLOE-l| PPVehicle | 1 | 63.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mot_ppyoloe_l_36e_ppvehicle.pdparams) | [配置文件](./mot_ppyoloe_l_36e_ppvehicle.yml) | +|PP-YOLOE-s| PPVehicle | 1 | 61.3 | [下载链接](https://paddledet.bj.bcebos.com/models/mot_ppyoloe_s_36e_ppvehicle.pdparams) | [配置文件](./mot_ppyoloe_s_36e_ppvehicle.yml) | + +**注意:** +- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)。 +- 如需预测出对应类别,可自行修改和添加对应的label_list.txt文件(一行记录一个对应种类),TestDataset中的anno_path为绝对路径,如: +``` +TestDataset: + !ImageFolder + anno_path: label_list.txt # 如不使用dataset_dir,则anno_path即为相对于PaddleDetection主目录的相对路径 + # dataset_dir: dataset/ppvehicle # 如使用dataset_dir,则dataset_dir/anno_path作为新的anno_path +``` +label_list.txt里的一行记录一个对应种类,如下所示: +``` +vehicle +``` + +## 引用 +``` +@InProceedings{bdd100k, + author = {Yu, Fisher and Chen, Haofeng and Wang, Xin and Xian, Wenqi and Chen, + Yingying and Liu, Fangchen and Madhavan, Vashisht and Darrell, Trevor}, + title = {BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning}, + booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + month = {June}, + year = {2020} +} + +@article{CVIU_UA-DETRAC, + author = {Longyin Wen and Dawei Du and Zhaowei Cai and Zhen Lei and Ming{-}Ching Chang and + Honggang Qi and Jongwoo Lim and Ming{-}Hsuan Yang and Siwei Lyu}, + title = {{UA-DETRAC:} {A} New Benchmark and Protocol for Multi-Object Detection and Tracking}, + journal = {Computer Vision and Image Understanding}, + year = {2020} +} +``` diff --git a/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle.yml b/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..61df2fcc4b55820d6dca9e4f57ecc1fc02484777 --- /dev/null +++ b/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle.yml @@ -0,0 +1,57 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/mot_ppyoloe_l_36e_ppvehicle/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 1 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train_all.json + dataset_dir: dataset/ppvehicle + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + allow_empty: true + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val_all.json + dataset_dir: dataset/ppvehicle + +TestDataset: + !ImageFolder + anno_path: annotations/val_all.json + dataset_dir: dataset/ppvehicle + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle9cls.yml b/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle9cls.yml new file mode 100644 index 0000000000000000000000000000000000000000..4cd73b7e244a47fcdb3f64663df8995e1dde3e55 --- /dev/null +++ b/configs/ppvehicle/mot_ppyoloe_l_36e_ppvehicle9cls.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/mot_ppyoloe_l_36e_ppvehicle9cls/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 9 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train_all_9cls.json + dataset_dir: dataset/ppvehicle + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val_all_9cls.json + dataset_dir: dataset/ppvehicle + +TestDataset: + !ImageFolder + anno_path: annotations/val_all_9cls.json + dataset_dir: dataset/ppvehicle + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle.yml b/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..f4f384584c12ae0eff897ebc0fb7f233463ea708 --- /dev/null +++ b/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle.yml @@ -0,0 +1,57 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/mot_ppyoloe_s_36e_ppvehicle/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +num_classes: 1 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train_all.json + dataset_dir: dataset/ppvehicle + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + allow_empty: true + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val_all.json + dataset_dir: dataset/ppvehicle + +TestDataset: + !ImageFolder + anno_path: annotations/val_all.json + dataset_dir: dataset/ppvehicle + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle9cls.yml b/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle9cls.yml new file mode 100644 index 0000000000000000000000000000000000000000..653ff1a75822f965bfb0a8134f5fa78a309d52b9 --- /dev/null +++ b/configs/ppvehicle/mot_ppyoloe_s_36e_ppvehicle9cls.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/mot_ppyoloe_s_36e_ppvehicle9cls/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +num_classes: 9 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train_all_9cls.json + dataset_dir: dataset/ppvehicle + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val_all_9cls.json + dataset_dir: dataset/ppvehicle + +TestDataset: + !ImageFolder + anno_path: annotations/val_all_9cls.json + dataset_dir: dataset/ppvehicle + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kdet.yml b/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kdet.yml new file mode 100644 index 0000000000000000000000000000000000000000..921d8b33f17a2a6850cf292769bf51b00a7b1d92 --- /dev/null +++ b/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kdet.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_l_36e_bdd100kdet/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 10 + +TrainDataset: + !COCODataSet + image_dir: images/100k/train + anno_path: labels/det_20/det_train_cocofmt.json + dataset_dir: dataset/bdd100k + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images/100k/val + anno_path: labels/det_20/det_val_cocofmt.json + dataset_dir: dataset/bdd100k + +TestDataset: + !ImageFolder + anno_path: labels/det_20/det_val_cocofmt.json + dataset_dir: dataset/bdd100k + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kmot.yml b/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kmot.yml new file mode 100644 index 0000000000000000000000000000000000000000..b9d32be10d6cb415f22257bd778aab412420fa8a --- /dev/null +++ b/configs/ppvehicle/ppyoloe_crn_l_36e_bdd100kmot.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_l_36e_bdd100kmot/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 8 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/train.json + dataset_dir: dataset/bdd100k + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: annotations/val.json + dataset_dir: dataset/bdd100k + +TestDataset: + !ImageFolder + anno_path: annotations/val.json + dataset_dir: dataset/bdd100k + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/ppvehicle/ppyoloe_crn_l_36e_uadetrac.yml b/configs/ppvehicle/ppyoloe_crn_l_36e_uadetrac.yml new file mode 100644 index 0000000000000000000000000000000000000000..5f3dd59cd9ef9d0c5e2947608f264187f433983c --- /dev/null +++ b/configs/ppvehicle/ppyoloe_crn_l_36e_uadetrac.yml @@ -0,0 +1,56 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 4 +weights: output/ppyoloe_crn_l_36e_uadetrac/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +num_classes: 4 + +TrainDataset: + !COCODataSet + image_dir: train + anno_path: annotations/train.json + dataset_dir: dataset/uadetrac + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val + anno_path: annotations/test.json + dataset_dir: dataset/uadetrac + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/uadetrac + +TrainReader: + batch_size: 8 + +epoch: 36 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/ppyoloe/README.md b/configs/ppyoloe/README.md index c7d4df8131d905779cc1b85c4e759a0321f9c61a..9458420fc1a7b81207bd1b0397176218b84b378a 100644 --- a/configs/ppyoloe/README.md +++ b/configs/ppyoloe/README.md @@ -9,39 +9,64 @@ English | [简体中文](README_cn.md) - [Appendix](#Appendix) ## Introduction -PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surpassing a variety of popular yolo models. PP-YOLOE has a series of models, named s/m/l/x, which are configured through width multiplier and depth multiplier. PP-YOLOE avoids using special operators, such as deformable convolution or matrix nms, to be deployed friendly on various hardware. For more details, please refer to our report. +PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surpassing a variety of popular YOLO models. PP-YOLOE has a series of models, named s/m/l/x, which are configured through width multiplier and depth multiplier. PP-YOLOE avoids using special operators, such as Deformable Convolution or Matrix NMS, to be deployed friendly on various hardware. For more details, please refer to our [report](https://arxiv.org/abs/2203.16250).
    -PP-YOLOE-l achieves 51.4 mAP on COCO test-dev2017 dataset with 78.1 FPS on Tesla V100. While using TensorRT FP16, PP-YOLOE-l can be further accelerated to 149.2 FPS. PP-YOLOE-s/m/x also have excellent accuracy and speed performance, which can be found in [Model Zoo](#Model-Zoo) +PP-YOLOE-l achieves 51.6 mAP on COCO test-dev2017 dataset with 78.1 FPS on Tesla V100. While using TensorRT FP16, PP-YOLOE-l can be further accelerated to 149.2 FPS. PP-YOLOE-s/m/x also have excellent accuracy and speed performance, which can be found in [Model Zoo](#Model-Zoo) PP-YOLOE is composed of following methods: - Scalable backbone and neck - [Task Alignment Learning](https://arxiv.org/abs/2108.07755) - Efficient Task-aligned head with [DFL](https://arxiv.org/abs/2006.04388) and [VFL](https://arxiv.org/abs/2008.13367) -- [SiLU activation function](https://arxiv.org/abs/1710.05941) +- [SiLU(Swish) activation function](https://arxiv.org/abs/1710.05941) ## Model Zoo -| Model | GPU number | images/GPU | backbone | input shape | Box APval | Box APtest | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config | -|:------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :------: | -| PP-YOLOE-s | 8 | 32 | cspresnet-s | 640 | 42.7 | 43.1 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | -| PP-YOLOE-m | 8 | 32 | cspresnet-m | 640 | 48.6 | 48.9 | 123.4 | 208.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml) | -| PP-YOLOE-l | 8 | 24 | cspresnet-l | 640 | 50.9 | 51.4 | 78.1 | 149.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | -| PP-YOLOE-x | 8 | 16 | cspresnet-x | 640 | 51.9 | 52.2 | 45.0 | 95.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml) | +| Model | Epoch | GPU number | images/GPU | backbone | input shape | Box APval
    0.5:0.95 | Box APtest
    0.5:0.95 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config | +|:------------------------:|:-------:|:-------:|:--------:|:----------:| :-------:| :------------------: | :-------------------: |:---------:|:--------:|:---------------:| :---------------------: | :------: | :------: | +| PP-YOLOE-s | 400 | 8 | 32 | cspresnet-s | 640 | 43.4 | 43.6 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) | [config](./ppyoloe_crn_s_400e_coco.yml) | +| PP-YOLOE-s | 300 | 8 | 32 | cspresnet-s | 640 | 43.0 | 43.2 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](./ppyoloe_crn_s_300e_coco.yml) | +| PP-YOLOE-m | 300 | 8 | 28 | cspresnet-m | 640 | 49.0 | 49.1 | 23.43 | 49.91 | 123.4 | 208.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](./ppyoloe_crn_m_300e_coco.yml) | +| PP-YOLOE-l | 300 | 8 | 20 | cspresnet-l | 640 | 51.4 | 51.6 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](./ppyoloe_crn_l_300e_coco.yml) | +| PP-YOLOE-x | 300 | 8 | 16 | cspresnet-x | 640 | 52.3 | 52.4 | 98.42 | 206.59 | 45.0 | 95.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](./ppyoloe_crn_x_300e_coco.yml) | + + +### Comprehensive Metrics +| Model | Epoch | AP0.5:0.95 | AP0.5 | AP0.75 | APsmall | APmedium | APlarge | ARsmall | ARmedium | ARlarge | download | config | +|:----------------------:|:-----:|:---------------:|:----------:|:-------------:| :------------:| :-----------: | :----------: |:------------:|:-------------:|:------------:| :-----: | :-----: | +| PP-YOLOE-s | 400 | 43.4 | 60.0 | 47.5 | 25.7 | 47.8 | 59.2 | 43.9 | 70.8 | 81.9 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) | [config](./ppyoloe_crn_s_400e_coco.yml)| +| PP-YOLOE-s | 300 | 43.0 | 59.6 | 47.2 | 26.0 | 47.4 | 58.7 | 45.1 | 70.6 | 81.4 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](./ppyoloe_crn_s_300e_coco.yml)| +| PP-YOLOE-m | 300 | 49.0 | 65.9 | 53.8 | 30.9 | 53.5 | 65.3 | 50.9 | 74.4 | 84.7 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](./ppyoloe_crn_m_300e_coco.yml)| +| PP-YOLOE-l | 300 | 51.4 | 68.6 | 56.2 | 34.8 | 56.1 | 68.0 | 53.1 | 76.8 | 85.6 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](./ppyoloe_crn_l_300e_coco.yml)| +| PP-YOLOE-x | 300 | 52.3 | 69.5 | 56.8 | 35.1 | 57.0 | 68.6 | 55.5 | 76.9 | 85.7 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](./ppyoloe_crn_x_300e_coco.yml)| + **Notes:** -- PP-YOLOE is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset,Box APtest is evaluation results of `mAP(IoU=0.5:0.95)`. -- PP-YOLOE used 8 GPUs for mixed precision training, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/FAQ). -- PP-YOLOE inference speed is tesed on single Tesla V100 with batch size as 1, CUDA 10.2, CUDNN 7.6.5, TensorRT 6.0.1.8 in TensorRT mode. -- PP-YOLOE inference speed testing uses inference model exported by `tools/export_model.py` with `-o exclude_nms=True` and benchmarked by running `depoly/python/infer.py` with `--run_benchmark`. All testing results do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method. +- PP-YOLOE is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset. +- The model weights in the table of Comprehensive Metrics are **the same as** that in the original Model Zoo, and evaluated on **val2017**. +- PP-YOLOE used 8 GPUs for mixed precision training, if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)**. +- PP-YOLOE inference speed is tesed on single Tesla V100 with batch size as 1, **CUDA 10.2**, **CUDNN 7.6.5**, **TensorRT 6.0.1.8** in TensorRT mode. +- Refer to [Speed testing](#Speed-testing) to reproduce the speed testing results of PP-YOLOE. - If you set `--run_benchmark=True`,you should install these dependencies at first, `pip install pynvml psutil GPUtil`. + +### Feature Models + +The PaddleDetection team provides configs and weights of various feature detection models based on PP-YOLOE, which users can download for use: + +|Scenarios | Related Datasets | Links| +| :--------: | :---------: | :------: | +|Pedestrian Detection | CrowdHuman | [pphuman](../pphuman) | +|Vehicle Detection | BDD100K, UA-DETRAC | [ppvehicle](../ppvehicle) | +|Small Object Detection | VisDrone | [visdrone](../visdrone) | + + ## Getting Start -### 1. Training +### Training Training PP-YOLOE with mixed precision on 8 GPUs with following command @@ -49,9 +74,12 @@ Training PP-YOLOE with mixed precision on 8 GPUs with following command python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --amp ``` -** Notes: ** use `--amp` to train with default config to avoid out of memeory. +**Notes:** +- use `--amp` to train with default config to avoid out of memeory. +- PaddleDetection supports multi-machine distribued training, you can refer to [DistributedTraining tutorial](../../docs/DistributedTraining_en.md). + -### 2. Evaluation +### Evaluation Evaluating PP-YOLOE on COCO val2017 dataset in single GPU with following commands: @@ -61,7 +89,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyoloe/ppyoloe_crn_l_300 For evaluation on COCO test-dev2017 dataset, please download COCO test-dev2017 dataset from [COCO dataset download](https://cocodataset.org/#download) and decompress to COCO dataset directory and configure `EvalDataset` like `configs/ppyolo/ppyolo_test.yml`. -### 3. Inference +### Inference Inference images in single GPU with following commands, use `--infer_img` to inference a single image and `--infer_dir` to inference all images in the directory. @@ -73,56 +101,120 @@ CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_crn_l_30 CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams --infer_dir=demo ``` -### 4. Deployment +### Exporting models -- PaddleInference [Python](../../deploy/python) & [C++](../../deploy/cpp) -- [Paddle-TensorRT](../../deploy/TENSOR_RT.md) -- [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) -- [PaddleServing](https://github.com/PaddlePaddle/Serving) - +For deployment on GPU or speed testing, model should be first exported to inference model using `tools/export_model.py`. -For deployment on GPU or benchmarked, model should be first exported to inference model using `tools/export_model.py`. - -Exporting PP-YOLOE for Paddle Inference **without TensorRT**, use following command. +**Exporting PP-YOLOE for Paddle Inference without TensorRT**, use following command ```bash -python tools/export_model.py configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams ``` -Exporting PP-YOLOE for Paddle Inference **with TensorRT** for better performance, use following command with extra `-o trt=True` setting. +**Exporting PP-YOLOE for Paddle Inference with TensorRT** for better performance, use following command with extra `-o trt=True` setting. ```bash -python tools/export_model.py configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True ``` -`deploy/python/infer.py` is used to load exported paddle inference model above for inference and benchmark through PaddleInference. +If you want to export PP-YOLOE model to **ONNX format**, use following command refer to [PaddleDetection Model Export as ONNX Format Tutorial](../../deploy/EXPORT_ONNX_MODEL_en.md). ```bash -# inference single image -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=demo/000000014439_640x640.jpg --device=gpu +# export inference model +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams -# inference all images in the directory -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_dir=demo/ --device=gpu +# install paddle2onnx +pip install paddle2onnx + +# convert to onnx +paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_300e_coco.onnx -# benchmark -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True ``` -If you want to export PP-YOLOE model to **ONNX format**, use following command refer to [PaddleDetection Model Export as ONNX Format Tutorial](../../deploy/EXPORT_ONNX_MODEL_en.md). +**Notes:** ONNX model only supports batch_size=1 now + +### Speed testing + +For fair comparison, the speed in [Model Zoo](#Model-Zoo) do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method. Thus, you should export model with extra `-o exclude_nms=True` setting. + +**Using Paddle Inference without TensorRT** to test speed, run following command ```bash # export inference model -python tools/export_model.py configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams exclude_nms=True -# install paddle2onnx -pip install paddle2onnx +# speed testing with run_benchmark=True +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=gpu --run_benchmark=True +``` + +**Using Paddle Inference with TensorRT** to test speed, run following command + +```bash +# export inference model with trt=True +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams exclude_nms=True trt=True + +# speed testing with run_benchmark=True,run_mode=trt_fp32/trt_fp16 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=trt_fp16 --device=gpu --run_benchmark=True + +``` + +**Using TensorRT Inference with ONNX** to test speed, run following command + +```bash +# export inference model with trt=True +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams exclude_nms=True trt=True # convert to onnx -paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_300e_coco.onnx +paddle2onnx --model_dir output_inference/ppyoloe_crn_s_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_crn_s_300e_coco.onnx + +# trt inference using fp16 and batch_size=1 +trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16 + +# trt inference using fp16 and batch_size=32 +trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16 + +# Using the above script, T4 and tensorrt 7.2 machine, the speed of PPYOLOE-s model is as follows, + +# batch_size=1, 2.80ms, 357fps +# batch_size=32, 67.69ms, 472fps + +``` + + +### Deployment +PP-YOLOE can be deployed by following approches: + - Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp) + - [Paddle-TensorRT](../../deploy/TENSOR_RT.md) + - [PaddleServing](https://github.com/PaddlePaddle/Serving) + - [PaddleSlim](../slim) + +Next, we will introduce how to use Paddle Inference to deploy PP-YOLOE models in TensorRT FP16 mode. + +First, refer to [Paddle Inference Docs](https://www.paddlepaddle.org.cn/inference/master/user_guides/download_lib.html#python), download and install packages corresponding to CUDA, CUDNN and TensorRT version. + +Then, Exporting PP-YOLOE for Paddle Inference **with TensorRT**, use following command. + +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True ``` -### 5. Other Datasets +Finally, inference in TensorRT FP16 mode. + +```bash +# inference single image +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_mode=trt_fp16 + +# inference all images in the directory +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_dir=demo/ --device=gpu --run_mode=trt_fp16 + +``` + +**Notes:** +- TensorRT will perform optimization for the current hardware platform according to the definition of the network, generate an inference engine and serialize it into a file. This inference engine is only applicable to the current hardware hardware platform. If your hardware and software platform has not changed, you can set `use_static=True` in [enable_tensorrt_engine](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/infer.py#L660). In this way, the serialized file generated will be saved in the `output_inference` folder, and the saved serialized file will be loaded the next time when TensorRT is executed. +- PaddleDetection release/2.4 and later versions will support NMS calling TensorRT, which requires PaddlePaddle release/2.3 and later versions. + +### Other Datasets Model | AP | AP50 ---|---|--- @@ -130,7 +222,7 @@ Model | AP | AP50 [YOLOv5](https://github.com/ultralytics/yolov5) | 26.0 | 42.7 **PP-YOLOE** | **30.5** | **46.4** -**Note** +**Notes** - Here, we use [VisDrone](https://github.com/VisDrone/VisDrone-Dataset) dataset, and to detect 9 objects including `person, bicycles, car, van, truck, tricyle, awning-tricyle, bus, motor`. - Above models trained using official default config, and load pretrained parameters on COCO dataset. - *Due to the limited time, more verification results will be supplemented in the future. You are also welcome to contribute to PP-YOLOE* diff --git a/configs/ppyoloe/README_cn.md b/configs/ppyoloe/README_cn.md index 32c259667ada4ffb9e3b46e94a6a620012d1cada..71ef0198feac0b2f913e05a2c92cc58c8d079d67 100644 --- a/configs/ppyoloe/README_cn.md +++ b/configs/ppyoloe/README_cn.md @@ -9,49 +9,75 @@ - [附录](#附录) ## 简介 -PP-YOLOE是基于PP-YOLOv2的卓越的单阶段Anchor-free模型,超越了多种流行的yolo模型。PP-YOLOE有一系列的模型,即s/m/l/x,可以通过width multiplier和depth multiplier配置。PP-YOLOE避免使用诸如deformable convolution或者matrix nms之类的特殊算子,以使其能轻松地部署在多种多样的硬件上。更多细节可以参考我们的report。 +PP-YOLOE是基于PP-YOLOv2的卓越的单阶段Anchor-free模型,超越了多种流行的YOLO模型。PP-YOLOE有一系列的模型,即s/m/l/x,可以通过width multiplier和depth multiplier配置。PP-YOLOE避免了使用诸如Deformable Convolution或者Matrix NMS之类的特殊算子,以使其能轻松地部署在多种多样的硬件上。更多细节可以参考我们的[report](https://arxiv.org/abs/2203.16250)。
    -PP-YOLOE-l在COCO test-dev2017达到了51.4的mAP, 同时其速度在Tesla V100上达到了78.1 FPS。PP-YOLOE-s/m/x同样具有卓越的精度速度性价比, 其精度速度可以在[模型库](#模型库)中找到。 +PP-YOLOE-l在COCO test-dev2017达到了51.6的mAP, 同时其速度在Tesla V100上达到了78.1 FPS。PP-YOLOE-s/m/x同样具有卓越的精度速度性价比, 其精度速度可以在[模型库](#模型库)中找到。 PP-YOLOE由以下方法组成 - 可扩展的backbone和neck - [Task Alignment Learning](https://arxiv.org/abs/2108.07755) - Efficient Task-aligned head with [DFL](https://arxiv.org/abs/2006.04388)和[VFL](https://arxiv.org/abs/2008.13367) -- [SiLU激活函数](https://arxiv.org/abs/1710.05941) +- [SiLU(Swish)激活函数](https://arxiv.org/abs/1710.05941) ## 模型库 -| 模型 | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box APval | Box APtest | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 | -|:------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :------: | -| PP-YOLOE-s | 8 | 32 | cspresnet-s | 640 | 42.7 | 43.1 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | -| PP-YOLOE-m | 8 | 32 | cspresnet-m | 640 | 48.6 | 48.9 | 123.4 | 208.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml) | -| PP-YOLOE-l | 8 | 24 | cspresnet-l | 640 | 50.9 | 51.4 | 78.1 | 149.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | -| PP-YOLOE-x | 8 | 16 | cspresnet-x | 640 | 51.9 | 52.2 | 45.0 | 95.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml) | +| 模型 | Epoch | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box APval
    0.5:0.95 | Box APtest
    0.5:0.95 | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 | +|:------------------------:|:-------:|:-------:|:--------:|:----------:| :-------:| :------------------: | :-------------------: |:---------:|:--------:|:---------------:| :---------------------: | :------: | :------: | +| PP-YOLOE-s | 400 | 8 | 32 | cspresnet-s | 640 | 43.4 | 43.6 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) | [config](./ppyoloe_crn_s_400e_coco.yml) | +| PP-YOLOE-s | 300 | 8 | 32 | cspresnet-s | 640 | 43.0 | 43.2 | 7.93 | 17.36 | 208.3 | 333.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](./ppyoloe_crn_s_300e_coco.yml) | +| PP-YOLOE-m | 300 | 8 | 28 | cspresnet-m | 640 | 49.0 | 49.1 | 23.43 | 49.91 | 123.4 | 208.3 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](./ppyoloe_crn_m_300e_coco.yml) | +| PP-YOLOE-l | 300 | 8 | 20 | cspresnet-l | 640 | 51.4 | 51.6 | 52.20 | 110.07 | 78.1 | 149.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](./ppyoloe_crn_l_300e_coco.yml) | +| PP-YOLOE-x | 300 | 8 | 16 | cspresnet-x | 640 | 52.3 | 52.4 | 98.42 | 206.59 | 45.0 | 95.2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](./ppyoloe_crn_x_300e_coco.yml) | + + +### 综合指标 +| 模型 | Epoch | AP0.5:0.95 | AP0.5 | AP0.75 | APsmall | APmedium | APlarge | ARsmall | ARmedium | ARlarge | 模型下载 | 配置文件 | +|:----------------------:|:-----:|:---------------:|:----------:|:-------------:| :------------:| :-----------: | :----------: |:------------:|:-------------:|:------------:| :-----: | :-----: | +| PP-YOLOE-s | 400 | 43.4 | 60.0 | 47.5 | 25.7 | 47.8 | 59.2 | 43.9 | 70.8 | 81.9 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) | [config](./ppyoloe_crn_s_400e_coco.yml)| +| PP-YOLOE-s | 300 | 43.0 | 59.6 | 47.2 | 26.0 | 47.4 | 58.7 | 45.1 | 70.6 | 81.4 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](./ppyoloe_crn_s_300e_coco.yml)| +| PP-YOLOE-m | 300 | 49.0 | 65.9 | 53.8 | 30.9 | 53.5 | 65.3 | 50.9 | 74.4 | 84.7 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](./ppyoloe_crn_m_300e_coco.yml)| +| PP-YOLOE-l | 300 | 51.4 | 68.6 | 56.2 | 34.8 | 56.1 | 68.0 | 53.1 | 76.8 | 85.6 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](./ppyoloe_crn_l_300e_coco.yml)| +| PP-YOLOE-x | 300 | 52.3 | 69.5 | 56.8 | 35.1 | 57.0 | 68.6 | 55.5 | 76.9 | 85.7 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](./ppyoloe_crn_x_300e_coco.yml)| + **注意:** -- PP-YOLOE模型使用COCO数据集中train2017作为训练集,使用val2017和test-dev2017作为测试集,Box APtest为`mAP(IoU=0.5:0.95)`评估结果。 -- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果训练GPU数和batch size不使用上述配置,须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/FAQ)调整学习率和迭代次数。 -- PP-YOLOE模型推理速度测试采用单卡V100,batch size=1进行测试,使用CUDA 10.2, CUDNN 7.6.5,TensorRT推理速度测试使用TensorRT 6.0.1.8。 -- PP-YOLOE推理速度测试使用`tools/export_model.py`并设置`-o exclude_nms=True`脚本导出的模型,并用`deploy/python/infer.py`设置`--run_benchnark`参数得到。测试结果均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致)。 -- 如果你设置了`--run_benchnark=True`, 你首先需要安装以下依赖`pip install pynvml psutil GPUtil`。 +- PP-YOLOE模型使用COCO数据集中train2017作为训练集,使用val2017和test-dev2017作为测试集。 +- 综合指标的表格与模型库的表格里的模型权重是**同一个权重**,综合指标是使用**val2017**作为验证精度的。 +- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- PP-YOLOE模型推理速度测试采用单卡V100,batch size=1进行测试,使用**CUDA 10.2**, **CUDNN 7.6.5**,TensorRT推理速度测试使用**TensorRT 6.0.1.8**。 +- 参考[速度测试](#速度测试)以复现PP-YOLOE推理速度测试结果。 +- 如果你设置了`--run_benchmark=True`, 你首先需要安装以下依赖`pip install pynvml psutil GPUtil`。 + + +### 垂类应用模型 + +PaddleDetection团队提供了基于PP-YOLOE的各种垂类检测模型的配置文件和权重,用户可以下载进行使用: + +| 场景 | 相关数据集 | 链接 | +| :--------: | :---------: | :------: | +| 行人检测 | CrowdHuman | [pphuman](../pphuman) | +| 车辆检测 | BDD100K、UA-DETRAC | [ppvehicle](../ppvehicle) | +| 小目标检测 | VisDrone | [visdrone](../visdrone) | + -## 使用教程 +## 使用说明 -### 1. 训练 +### 训练 执行以下指令使用混合精度训练PP-YOLOE ```bash python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --amp ``` +**注意:** +- 使用默认配置训练需要设置`--amp`以避免显存溢出. +- PaddleDetection支持多机训练,可以参考[多机训练教程](../../docs/DistributedTraining_cn.md). -** 注意: ** 使用默认配置训练需要设置`--amp`以避免显存溢出. - -### 2. 评估 +### 评估 执行以下命令在单个GPU上评估COCO val2017数据集 @@ -61,7 +87,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyoloe/ppyoloe_crn_l_300 在coco test-dev2017上评估,请先从[COCO数据集下载](https://cocodataset.org/#download)下载COCO test-dev2017数据集,然后解压到COCO数据集文件夹并像`configs/ppyolo/ppyolo_test.yml`一样配置`EvalDataset`。 -### 3. 推理 +### 推理 使用以下命令在单张GPU上预测图片,使用`--infer_img`推理单张图片以及使用`--infer_dir`推理文件中的所有图片。 @@ -74,59 +100,121 @@ CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_crn_l_30 CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams --infer_dir=demo ``` -### 4. 部署 +### 模型导出 -- PaddleInference [Python](../../deploy/python) & [C++](../../deploy/cpp) -- [Paddle-TensorRT](../../deploy/TENSOR_RT.md) -- [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) -- [PaddleServing](https://github.com/PaddlePaddle/Serving) +PP-YOLOE在GPU上部署或者速度测试需要通过`tools/export_model.py`导出模型。 +当你**使用Paddle Inference但不使用TensorRT**时,运行以下的命令导出模型 -PP-YOLOE在GPU上部署或者推理benchmark需要通过`tools/export_model.py`导出模型。 +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +``` -当你使用PaddleInferenced但不使用TensorRT时,运行以下的命令进行导出 +当你**使用Paddle Inference且使用TensorRT**时,需要指定`-o trt=True`来导出模型。 ```bash -python tools/export_model.py configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True ``` -当你使用PaddleInference的TensorRT时,需要指定`-o trt=True`进行导出 +如果你想将PP-YOLOE模型导出为**ONNX格式**,参考 +[PaddleDetection模型导出为ONNX格式教程](../../deploy/EXPORT_ONNX_MODEL.md),运行以下命令: ```bash -python tools/export_model.py configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True + +# 导出推理模型 +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams + +# 安装paddle2onnx +pip install paddle2onnx + +# 转换成onnx格式 +paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_300e_coco.onnx ``` -`deploy/python/infer.py`使用上述导出后的PaddleInference模型用于推理和benchnark. +**注意:** ONNX模型目前只支持batch_size=1 + +### 速度测试 + +为了公平起见,在[模型库](#模型库)中的速度测试结果均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致),需要在导出模型时指定`-o exclude_nms=True`. + +**使用Paddle Inference但不使用TensorRT**进行测速,执行以下命令: ```bash -# 推理单张图片 -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=demo/000000014439_640x640.jpg --device=gpu +# 导出模型 +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams exclude_nms=True -# 推理文件夹下的所有图片 -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_dir=demo/ --device=gpu +# 速度测试,使用run_benchmark=True +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=gpu --run_benchmark=True +``` + +**使用Paddle Inference且使用TensorRT**进行测速,执行以下命令: + +```bash +# 导出模型,使用trt=True +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams exclude_nms=True trt=True + +# 速度测试,使用run_benchmark=True, run_mode=trt_fp32/trt_fp16 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=trt_fp16 --device=gpu --run_benchmark=True -# benchmark -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True ``` -如果你想将PP-YOLOE模型导出为**ONNX格式**,参考 -[PaddleDetection模型导出为ONNX格式教程](../../deploy/EXPORT_ONNX_MODEL.md) +**使用 ONNX 和 TensorRT** 进行测速,执行以下命令: ```bash -# 导出推理模型 -python tools/export_model.py configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +# 导出模型 +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams exclude_nms=True trt=True -# 安装paddle2onnx -pip install paddle2onnx +# 转化成ONNX格式 +paddle2onnx --model_dir output_inference/ppyoloe_crn_s_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_crn_s_300e_coco.onnx -# 转换成onnx格式 -paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_300e_coco.onnx +# 测试速度,半精度,batch_size=1 +trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16 + +# 测试速度,半精度,batch_size=32 +trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16 + +# 使用上边的脚本, 在T4 和 TensorRT 7.2的环境下,PPYOLOE-s模型速度如下 +# batch_size=1, 2.80ms, 357fps +# batch_size=32, 67.69ms, 472fps +``` + + + +### 部署 + +PP-YOLOE可以使用以下方式进行部署: + - Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp) + - [Paddle-TensorRT](../../deploy/TENSOR_RT.md) + - [PaddleServing](https://github.com/PaddlePaddle/Serving) + - [PaddleSlim模型量化](../slim) + +接下来,我们将介绍PP-YOLOE如何使用Paddle Inference在TensorRT FP16模式下部署 + +首先,参考[Paddle Inference文档](https://www.paddlepaddle.org.cn/inference/master/user_guides/download_lib.html#python),下载并安装与你的CUDA, CUDNN和TensorRT相应的wheel包。 + +然后,运行以下命令导出模型 + +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True +``` + +最后,使用TensorRT FP16进行推理 + +```bash +# 推理单张图片 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_mode=trt_fp16 + +# 推理文件夹下的所有图片 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_dir=demo/ --device=gpu --run_mode=trt_fp16 ``` +**注意:** +- TensorRT会根据网络的定义,执行针对当前硬件平台的优化,生成推理引擎并序列化为文件。该推理引擎只适用于当前软硬件平台。如果你的软硬件平台没有发生变化,你可以设置[enable_tensorrt_engine](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/infer.py#L660)的参数`use_static=True`,这样生成的序列化文件将会保存在`output_inference`文件夹下,下次执行TensorRT时将加载保存的序列化文件。 +- PaddleDetection release/2.4及其之后的版本将支持NMS调用TensorRT,需要依赖PaddlePaddle release/2.3及其之后的版本 -### 5. 泛化性验证 +### 泛化性验证 模型 | AP | AP50 ---|---|--- @@ -139,6 +227,7 @@ paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filenam - 以上模型训练均采用官方提供的默认参数,并且加载COCO预训练参数 - *由于人力/时间有限,后续将会持续补充更多验证结果,也欢迎各位开源用户贡献,共同优化PP-YOLOE* + ## 附录 PP-YOLOE消融实验 diff --git a/configs/ppyoloe/_base_/optimizer_36e_xpu.yml b/configs/ppyoloe/_base_/optimizer_36e_xpu.yml new file mode 100644 index 0000000000000000000000000000000000000000..59d76f4bae98e4774c2cee9cbc8c77ac341af35d --- /dev/null +++ b/configs/ppyoloe/_base_/optimizer_36e_xpu.yml @@ -0,0 +1,18 @@ +epoch: 36 + +LearningRate: + base_lr: 0.00125 + schedulers: + - !CosineDecay + max_epochs: 43 + - !LinearWarmup + start_factor: 0.001 + steps: 2000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 diff --git a/configs/ppyoloe/_base_/ppyoloe_crn.yml b/configs/ppyoloe/_base_/ppyoloe_crn.yml index 2ad9a11a8e98fe0f415606fb0b1119263d3b5aa4..7abee87a2433bd6b83a5257f254809f5acda3908 100644 --- a/configs/ppyoloe/_base_/ppyoloe_crn.yml +++ b/configs/ppyoloe/_base_/ppyoloe_crn.yml @@ -28,7 +28,6 @@ PPYOLOEHead: grid_cell_offset: 0.5 static_assigner_epoch: 100 use_varifocal_loss: True - eval_input_size: [640, 640] loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} static_assigner: name: ATSSAssigner @@ -41,6 +40,6 @@ PPYOLOEHead: nms: name: MultiClassNMS nms_top_k: 1000 - keep_top_k: 100 + keep_top_k: 300 score_threshold: 0.01 - nms_threshold: 0.6 + nms_threshold: 0.7 diff --git a/configs/ppyoloe/_base_/ppyoloe_reader.yml b/configs/ppyoloe/_base_/ppyoloe_reader.yml index a7574de1fcd4db98a0aaccc2bfd8a126db676932..058b4ee478d05fea850d7970e1e6a61d81352e78 100644 --- a/configs/ppyoloe/_base_/ppyoloe_reader.yml +++ b/configs/ppyoloe/_base_/ppyoloe_reader.yml @@ -1,4 +1,8 @@ worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + TrainReader: sample_transforms: - Decode: {} @@ -20,17 +24,17 @@ TrainReader: EvalReader: sample_transforms: - Decode: {} - - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} - Permute: {} batch_size: 2 TestReader: inputs_def: - image_shape: [3, 640, 640] + image_shape: [3, *eval_height, *eval_width] sample_transforms: - Decode: {} - - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} - Permute: {} batch_size: 1 diff --git a/configs/ppyoloe/ppyoloe_crn_l_36e_coco_xpu.yml b/configs/ppyoloe/ppyoloe_crn_l_36e_coco_xpu.yml new file mode 100644 index 0000000000000000000000000000000000000000..3797288864c978f7872c9b994af2477bb6c31acd --- /dev/null +++ b/configs/ppyoloe/ppyoloe_crn_l_36e_coco_xpu.yml @@ -0,0 +1,69 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_36e_xpu.yml', + './_base_/ppyoloe_reader.yml', +] + +# note: these are default values (use_gpu = true and use_xpu = false) for CI. +# set use_gpu = false and use_xpu = true for training. +use_gpu: true +use_xpu: false + +log_iter: 100 +snapshot_epoch: 1 +weights: output/ppyoloe_crn_l_36e_coco/model_final +find_unused_parameters: True + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +TrainReader: + batch_size: 8 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 4 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml b/configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..c3cddf48bb75d733aaf15bfbd148b8e480d48493 --- /dev/null +++ b/configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml @@ -0,0 +1,46 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/ppyoloe_crn.yml', + './_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_s_400e_coco/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +TrainReader: + batch_size: 32 + +epoch: 400 +LearningRate: + base_lr: 0.04 + schedulers: + - !CosineDecay + max_epochs: 480 + - !LinearWarmup + start_factor: 0. + epochs: 5 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +PPYOLOEHead: + static_assigner_epoch: 133 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 diff --git a/configs/rcnn_enhance/README.md b/configs/rcnn_enhance/README.md index 6e2c0917b53ecc2f07836f54bb6400d40d04548c..974935b379e5569abfc001e26d3b3b61cb1ecd3b 100644 --- a/configs/rcnn_enhance/README.md +++ b/configs/rcnn_enhance/README.md @@ -2,7 +2,7 @@ ### 简介 -* 近年来,学术界和工业界广泛关注图像中目标检测任务。基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)中SSLD蒸馏方案训练得到的ResNet50_vd预训练模型(ImageNet1k验证集上Top1 Acc为82.39%),结合PaddleDetection中的丰富算子,飞桨提供了一种面向服务器端实用的目标检测方案PSS-DET(Practical Server Side Detection)。基于COCO2017目标检测数据集,V100单卡预测速度为为61FPS时,COCO mAP可达41.2%。 +* 近年来,学术界和工业界广泛关注图像中目标检测任务。基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)中SSLD蒸馏方案训练得到的ResNet50_vd预训练模型(ImageNet1k验证集上Top1 Acc为82.39%),结合PaddleDetection中的丰富算子,飞桨提供了一种面向服务器端实用的目标检测方案PSS-DET(Practical Server Side Detection)。基于COCO2017目标检测数据集,V100单卡预测速度为61FPS时,COCO mAP可达41.2%。 ### 模型库 diff --git a/configs/rcnn_enhance/_base_/faster_rcnn_enhance_reader.yml b/configs/rcnn_enhance/_base_/faster_rcnn_enhance_reader.yml index 33ec222e2715ca4f4819ffc424a0c366f42dc7af..f1a7c998d4e332661491024ca17a1a0d996b589d 100644 --- a/configs/rcnn_enhance/_base_/faster_rcnn_enhance_reader.yml +++ b/configs/rcnn_enhance/_base_/faster_rcnn_enhance_reader.yml @@ -2,9 +2,9 @@ worker_num: 2 TrainReader: sample_transforms: - Decode: {} + - AutoAugment: {autoaug_type: v1} - RandomResize: {target_size: [[384,1000], [416,1000], [448,1000], [480,1000], [512,1000], [544,1000], [576,1000], [608,1000], [640,1000], [672,1000]], interp: 2, keep_ratio: True} - RandomFlip: {prob: 0.5} - - AutoAugment: {autoaug_type: v1} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} - Permute: {} batch_transforms: diff --git a/configs/retinanet/README.md b/configs/retinanet/README.md index bfa281321ed0b2509a423390b26db15fabc5caf4..ac3d455e03bcbc8cdd63becb12098b734d1d42c8 100644 --- a/configs/retinanet/README.md +++ b/configs/retinanet/README.md @@ -1,20 +1,19 @@ -# Focal Loss for Dense Object Detection - -## Introduction - -We reproduce RetinaNet proposed in paper Focal Loss for Dense Object Detection. +# RetinaNet (Focal Loss for Dense Object Detection) ## Model Zoo -| Backbone | Model | mstrain | imgs/GPU | lr schedule | FPS | Box AP | download | config | -| ------------ | --------- | ------- | -------- | ----------- | --- | ------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------- | -| ResNet50-FPN | RetinaNet | Yes | 4 | 1x | --- | 37.5 | [model](https://bj.bcebos.com/v1/paddledet/models/retinanet_r50_fpn_mstrain_1x_coco.pdparams)\|[log](https://bj.bcebos.com/v1/paddledet/logs/retinanet_r50_fpn_mstrain_1x_coco.log) | retinanet_r50_fpn_mstrain_1x_coco.yml | +| Backbone | Model | imgs/GPU | lr schedule | FPS | Box AP | download | config | +| ------------ | --------- | -------- | ----------- | --- | ------ | ---------- | ----------- | +| ResNet50-FPN | RetinaNet | 2 | 1x | --- | 37.5 | [model](https://bj.bcebos.com/v1/paddledet/models/retinanet_r50_fpn_1x_coco.pdparams) | [config](./retinanet_r50_fpn_1x_coco.yml) | +| ResNet101-FPN| RetinaNet | 2 | 2x | --- | 40.6 | [model](https://paddledet.bj.bcebos.com/models/retinanet_r101_fpn_2x_coco.pdparams) | [config](./retinanet_r101_fpn_2x_coco.yml) | +| ResNet50-FPN | RetinaNet + [FGD](../slim/distill/README.md) | 2 | 2x | --- | 40.8 | [model](https://bj.bcebos.com/v1/paddledet/models/retinanet_r101_distill_r50_2x_coco.pdparams) | [config](./retinanet_r50_fpn_2x_coco.yml)/[slim_config](../slim/distill/retinanet_resnet101_coco_distill.yml) | + **Notes:** -- All above models are trained on COCO train2017 with 4 GPUs and evaludated on val2017. Box AP=`mAP(IoU=0.5:0.95)`. +- The ResNet50-FPN are trained on COCO train2017 with 8 GPUs. Both ResNet101-FPN and ResNet50-FPN with [FGD](../slim/distill/README.md) are trained on COCO train2017 with 4 GPUs. +- All above models are evaludated on val2017. Box AP=`mAP(IoU=0.5:0.95)`. -- Config `configs/retinanet/retinanet_r50_fpn_1x_coco.yml` is for 8 GPUs and `configs/retinanet/retinanet_r50_fpn_mstrain_1x_coco.yml` is for 4 GPUs (mind the difference of train batch size). ## Citation diff --git a/configs/retinanet/_base_/optimizer_2x.yml b/configs/retinanet/_base_/optimizer_2x.yml new file mode 100644 index 0000000000000000000000000000000000000000..61841433417b9fcc6f29a6c71a72ba23406b55ad --- /dev/null +++ b/configs/retinanet/_base_/optimizer_2x.yml @@ -0,0 +1,19 @@ +epoch: 24 + +LearningRate: + base_lr: 0.01 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.001 + steps: 500 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 diff --git a/configs/retinanet/_base_/retinanet_r101_fpn.yml b/configs/retinanet/_base_/retinanet_r101_fpn.yml new file mode 100644 index 0000000000000000000000000000000000000000..ae5595769d940c2ecb5b857fdc8970da76d572ab --- /dev/null +++ b/configs/retinanet/_base_/retinanet_r101_fpn.yml @@ -0,0 +1,57 @@ +architecture: RetinaNet +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_pretrained.pdparams + +RetinaNet: + backbone: ResNet + neck: FPN + head: RetinaHead + +ResNet: + depth: 101 + variant: b + norm_type: bn + freeze_at: 0 + return_idx: [1,2,3] + num_stages: 4 + +FPN: + out_channel: 256 + spatial_scales: [0.125, 0.0625, 0.03125] + extra_stage: 2 + has_extra_convs: true + use_c5: false + +RetinaHead: + conv_feat: + name: RetinaFeat + feat_in: 256 + feat_out: 256 + num_convs: 4 + norm_type: null + use_dcn: false + anchor_generator: + name: RetinaAnchorGenerator + octave_base_scale: 4 + scales_per_octave: 3 + aspect_ratios: [0.5, 1.0, 2.0] + strides: [8.0, 16.0, 32.0, 64.0, 128.0] + bbox_assigner: + name: MaxIoUAssigner + positive_overlap: 0.5 + negative_overlap: 0.4 + allow_low_quality: true + loss_class: + name: FocalLoss + gamma: 2.0 + alpha: 0.25 + loss_weight: 1.0 + loss_bbox: + name: SmoothL1Loss + beta: 0.0 + loss_weight: 1.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/retinanet/_base_/retinanet_r50_fpn.yml b/configs/retinanet/_base_/retinanet_r50_fpn.yml index 156a17fea84119322c4e34b5e58b37e47cadcb63..fb2d767aed5bd383f312ce79e4e39e3710c3cb9c 100644 --- a/configs/retinanet/_base_/retinanet_r50_fpn.yml +++ b/configs/retinanet/_base_/retinanet_r50_fpn.yml @@ -22,10 +22,6 @@ FPN: use_c5: false RetinaHead: - num_classes: 80 - prior_prob: 0.01 - nms_pre: 1000 - decode_reg_out: false conv_feat: name: RetinaFeat feat_in: 256 @@ -44,10 +40,6 @@ RetinaHead: positive_overlap: 0.5 negative_overlap: 0.4 allow_low_quality: true - bbox_coder: - name: DeltaBBoxCoder - norm_mean: [0.0, 0.0, 0.0, 0.0] - norm_std: [1.0, 1.0, 1.0, 1.0] loss_class: name: FocalLoss gamma: 2.0 diff --git a/configs/retinanet/_base_/retinanet_reader.yml b/configs/retinanet/_base_/retinanet_reader.yml index 8cf31aa5ecdb903ce50e6c48ca7fb8429f3d776b..1f686b4d7f06f143106491e9b8fe3957a40927c2 100644 --- a/configs/retinanet/_base_/retinanet_reader.yml +++ b/configs/retinanet/_base_/retinanet_reader.yml @@ -1,39 +1,36 @@ worker_num: 2 TrainReader: sample_transforms: - - Decode: {} - - RandomFlip: {prob: 0.5} - - Resize: {target_size: [800, 1333], keep_ratio: true, interp: 1} - - NormalizeImage: {is_scale: true, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} - - Permute: {} + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 1} + - RandomFlip: {} + - NormalizeImage: {is_scale: True, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} batch_transforms: - - PadBatch: {pad_to_stride: 32} + - PadBatch: {pad_to_stride: 32} batch_size: 2 - shuffle: true - drop_last: true - use_process: true - collate_batch: false + shuffle: True + drop_last: True + collate_batch: False EvalReader: sample_transforms: - - Decode: {} - - Resize: {target_size: [800, 1333], keep_ratio: true, interp: 1} - - NormalizeImage: {is_scale: true, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} - - Permute: {} + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {is_scale: True, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} batch_transforms: - - PadBatch: {pad_to_stride: 32} - batch_size: 2 - shuffle: false + - PadBatch: {pad_to_stride: 32} + batch_size: 8 TestReader: sample_transforms: - - Decode: {} - - Resize: {target_size: [800, 1333], keep_ratio: true, interp: 1} - - NormalizeImage: {is_scale: true, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} - - Permute: {} + - Decode: {} + - Resize: {target_size: [800, 1333], keep_ratio: True, interp: 1} + - NormalizeImage: {is_scale: True, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} + - Permute: {} batch_transforms: - - PadBatch: {pad_to_stride: 32} + - PadBatch: {pad_to_stride: 32} batch_size: 1 - shuffle: false diff --git a/configs/retinanet/retinanet_r101_distill_r50_2x_coco.yml b/configs/retinanet/retinanet_r101_distill_r50_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..bb72cda8e99ac6a597ea5fc9b113378f7954bac3 --- /dev/null +++ b/configs/retinanet/retinanet_r101_distill_r50_2x_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/retinanet_r50_fpn.yml', + '_base_/optimizer_2x.yml', + '_base_/retinanet_reader.yml' +] + +weights: https://paddledet.bj.bcebos.com/models/retinanet_r101_distill_r50_2x_coco.pdparams diff --git a/configs/retinanet/retinanet_r101_fpn_2x_coco.yml b/configs/retinanet/retinanet_r101_fpn_2x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..854def4ad82ebcc48d904e665b856ec47655d167 --- /dev/null +++ b/configs/retinanet/retinanet_r101_fpn_2x_coco.yml @@ -0,0 +1,9 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + '_base_/retinanet_r101_fpn.yml', + '_base_/optimizer_2x.yml', + '_base_/retinanet_reader.yml' +] + +weights: https://paddledet.bj.bcebos.com/models/retinanet_r101_fpn_2x_coco.pdparams diff --git a/configs/retinanet/retinanet_r50_fpn_1x_coco.yml b/configs/retinanet/retinanet_r50_fpn_1x_coco.yml index bb2c5a404033691650b99430649cd512a81a91be..cb6d342baeb428547d42f417acda02e8c90e39da 100644 --- a/configs/retinanet/retinanet_r50_fpn_1x_coco.yml +++ b/configs/retinanet/retinanet_r50_fpn_1x_coco.yml @@ -7,4 +7,3 @@ _BASE_: [ ] weights: output/retinanet_r50_fpn_1x_coco/model_final -find_unused_parameters: true \ No newline at end of file diff --git a/configs/retinanet/retinanet_r50_fpn_mstrain_1x_coco.yml b/configs/retinanet/retinanet_r50_fpn_mstrain_1x_coco.yml deleted file mode 100644 index ef4023d2284941e6df255dd4e403f88e0d2d1513..0000000000000000000000000000000000000000 --- a/configs/retinanet/retinanet_r50_fpn_mstrain_1x_coco.yml +++ /dev/null @@ -1,20 +0,0 @@ -_BASE_: [ - '../datasets/coco_detection.yml', - '../runtime.yml', - '_base_/retinanet_r50_fpn.yml', - '_base_/optimizer_1x.yml', - '_base_/retinanet_reader.yml' -] - -worker_num: 4 -TrainReader: - batch_size: 4 - sample_transforms: - - Decode: {} - - RandomFlip: {prob: 0.5} - - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: true, interp: 1} - - NormalizeImage: {is_scale: true, mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]} - - Permute: {} - -weights: output/retinanet_r50_fpn_mstrain_1x_coco/model_final -find_unused_parameters: true \ No newline at end of file diff --git a/configs/runtime.yml b/configs/runtime.yml index c67c6c94f836998cb436fc4933370a5d396d5d1f..f601433afc13619183da153cbb09afe9f1332fbe 100644 --- a/configs/runtime.yml +++ b/configs/runtime.yml @@ -10,3 +10,4 @@ export: post_process: True # Whether post-processing is included in the network when export model. nms: True # Whether NMS is included in the network when export model. benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + fuse_conv_bn: False diff --git a/configs/slim/README.md b/configs/slim/README.md index ac025688b18a502e974c438a3cbac47ede3a753d..4eabd73b570dd3d971d0625c3e138895778719b5 100755 --- a/configs/slim/README.md +++ b/configs/slim/README.md @@ -126,6 +126,8 @@ python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{ | 模型 | 压缩策略 | 输入尺寸 | 模型体积(MB) | 预测时延(V100) | 预测时延(SD855) | Box AP | 下载 | Inference模型下载 | 模型配置文件 | 压缩算法配置文件 | | ------------------ | ------------ | -------- | :---------: | :---------: |:---------: | :---------: | :----------------------------------------------: | :----------------------------------------------: |:------------------------------------------: | :------------------------------------: | +| PP-YOLOE-l | baseline | 640 | - | 11.2ms(trt_fp32) | 7.7ms(trt_fp16) | -- | 50.9 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | - | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | - | +| PP-YOLOE-l | 普通在线量化 | 640 | - | 6.7ms(trt_int8) | -- | 48.8 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyoloe_l_coco_qat.pdparams) | - | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/ppyoloe_l_qat.yml) | | PP-YOLOv2_R50vd | baseline | 640 | 208.6 | 19.1ms | -- | 49.1 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_365e_coco.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | | PP-YOLOv2_R50vd | PACT在线量化 | 640 | -- | 17.3ms | -- | 48.1 | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_qat.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_qat.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/ppyolov2_r50vd_dcn_qat.yml) | | PP-YOLO_R50vd | baseline | 608 | 183.3 | 17.4ms | -- | 44.8 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_dcn_1x_coco.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | @@ -146,6 +148,7 @@ python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{ 说明: - 上述V100预测时延非量化模型均是使用TensorRT-FP32测试,量化模型均使用TensorRT-INT8测试,并且都包含NMS耗时。 - SD855预测时延为使用PaddleLite部署,使用arm8架构并使用4线程(4 Threads)推理时延。 +- 上述PP-YOLOE模型均在V100,开启TensorRT环境中测速,不包含NMS。(导出模型时指定:-o trt=True exclude_nms=True) ### 离线量化 需要准备val集,用来对离线量化模型进行校准,运行方式: @@ -176,3 +179,4 @@ python3.7 tools/post_quant.py -c configs/ppyolo/ppyolo_mbv3_large_coco.yml --sli | ------------------ | ------------ | -------- | :---------: |:---------: |:---------: | :---------: |:----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | | YOLOv3-MobileNetV1 | baseline | 608 | 24.65 | 94.2 | 332.0ms | 29.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | - | | YOLOv3-MobileNetV1 | 蒸馏+剪裁 | 608 | 7.54(-69.4%) | 30.9(-67.2%) | 166.1ms | 28.4(-1.0) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill_prune.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml) | +| YOLOv3-MobileNetV1 | 剪裁+量化 | 608 | - | - | - | - | - | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/extensions/yolov3_mobilenetv1_prune_qat.yml) | diff --git a/configs/slim/README_en.md b/configs/slim/README_en.md index 8d2b39c914c281126f6d70b2f4150f49add6e087..924757e3fcdd465c7eb51c0cb7ce8b71c8e2fcb5 100755 --- a/configs/slim/README_en.md +++ b/configs/slim/README_en.md @@ -124,7 +124,9 @@ Description: | Model | Compression Strategy | Input Size | Model Volume(MB) | Prediction Delay(V100) | Prediction Delay(SD855) | Box AP | Download | Download of Inference Model | Model Configuration File | Compression Algorithm Configuration File | | ------------------------- | -------------------------- | ----------- | :--------------: | :--------------------: | :---------------------: | :-------------------: | :-----------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | -| PP-YOLOv2_R50vd | baseline | 640 | 208.6 | 19.1ms | -- | 49.1 | [link](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_365e_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | +| PP-YOLOE-l | baseline | 640 | - | 11.2ms(trt_fp32) | 7.7ms(trt_fp16) | -- | 50.9 | [link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | - | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | - | +| PP-YOLOE-l | Common Online quantitative | 640 | - | 6.7ms(trt_int8) | -- | 48.8 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyoloe_l_coco_qat.pdparams) | - | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/ppyoloe_l_qat.yml) | +| PP-YOLOv2_R50vd | baseline | 640 | 208.6 | 19.1ms | -- | 49.1 | [link](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_365e_coco.tar) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | | PP-YOLOv2_R50vd | PACT Online quantitative | 640 | -- | 17.3ms | -- | 48.1 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/ppyolov2_r50vd_dcn_qat.yml) | | PP-YOLO_R50vd | baseline | 608 | 183.3 | 17.4ms | -- | 44.8 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_dcn_1x_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | | PP-YOLO_R50vd | PACT Online quantitative | 608 | 67.3 | 13.8ms | -- | 44.3 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_qat_pact.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_qat_pact.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/ppyolo_r50vd_qat_pact.yml) | diff --git a/configs/slim/distill/README.md b/configs/slim/distill/README.md index da5795764cec02ea384f8e063f918b56b4f2b9bb..a08bf35fc350cd7c4284bb00ffdf641d80de9798 100644 --- a/configs/slim/distill/README.md +++ b/configs/slim/distill/README.md @@ -5,6 +5,19 @@ COCO数据集作为目标检测任务的训练目标难度更大,意味着teacher网络会预测出更多的背景bbox,如果直接用teacher的预测输出作为student学习的`soft label`会有严重的类别不均衡问题。解决这个问题需要引入新的方法,详细背景请参考论文:[Object detection at 200 Frames Per Second](https://arxiv.org/abs/1805.06361)。 为了确定蒸馏的对象,我们首先需要找到student和teacher网络得到的`x,y,w,h,cls,objness`等Tensor,用teacher得到的结果指导student训练。具体实现可参考[代码](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/ppdet/slim/distill.py) + +## FGD模型蒸馏 + +FGD全称为[Focal and Global Knowledge Distillation for Detectors](https://arxiv.org/abs/2111.11837v1),是目标检测任务的一种蒸馏方法,FGD蒸馏分为两个部分`Focal`和`Global`。`Focal`蒸馏分离图像的前景和背景,让学生模型分别关注教师模型的前景和背景部分特征的关键像素;`Global`蒸馏部分重建不同像素之间的关系并将其从教师转移到学生,以补偿`Focal`蒸馏中丢失的全局信息。试验结果表明,FGD蒸馏算法在基于anchor和anchor free的方法上能有效提升模型精度。 +在PaddleDetection中,我们实现了FGD算法,并基于retinaNet算法进行验证,实验结果如下: +| algorithm | model | AP | download| +|:-:| :-: | :-: | :-:| +|retinaNet_r101_fpn_2x | teacher | 40.6 | [download](https://paddledet.bj.bcebos.com/models/retinanet_r101_fpn_2x_coco.pdparams) | +|retinaNet_r50_fpn_1x| student | 37.5 |[download](https://paddledet.bj.bcebos.com/models/retinanet_r50_fpn_1x_coco.pdparams) | +|retinaNet_r50_fpn_2x + FGD| student | 40.8 |[download](https://paddledet.bj.bcebos.com/models/retinanet_r101_distill_r50_2x_coco.pdparams) | + + + ## Citations ``` @article{mehta2018object, @@ -15,4 +28,12 @@ COCO数据集作为目标检测任务的训练目标难度更大,意味着teac archivePrefix={arXiv}, primaryClass={cs.CV} } + +@inproceedings{yang2022focal, + title={Focal and global knowledge distillation for detectors}, + author={Yang, Zhendong and Li, Zhe and Jiang, Xiaohu and Gong, Yuan and Yuan, Zehuan and Zhao, Danpei and Yuan, Chun}, + booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + pages={4643--4652}, + year={2022} +} ``` diff --git a/configs/slim/distill/retinanet_resnet101_coco_distill.yml b/configs/slim/distill/retinanet_resnet101_coco_distill.yml new file mode 100644 index 0000000000000000000000000000000000000000..d4793c02063d8159e5277e705a58cc0b423d94ea --- /dev/null +++ b/configs/slim/distill/retinanet_resnet101_coco_distill.yml @@ -0,0 +1,19 @@ +_BASE_: [ + '../../retinanet/retinanet_r101_fpn_2x_coco.yml', +] + +pretrain_weights: https://paddledet.bj.bcebos.com/models/retinanet_r101_fpn_2x_coco.pdparams + +slim: Distill +slim_method: FGD +distill_loss: FGDFeatureLoss +distill_loss_name: ['neck_f_4', 'neck_f_3', 'neck_f_2', 'neck_f_1', 'neck_f_0'] + +FGDFeatureLoss: + student_channels: 256 + teacher_channels: 256 + temp: 0.5 + alpha_fgd: 0.001 + beta_fgd: 0.0005 + gamma_fgd: 0.0005 + lambda_fgd: 0.000005 diff --git a/configs/slim/extensions/yolov3_mobilenetv1_prune_qat.yml b/configs/slim/extensions/yolov3_mobilenetv1_prune_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..ff17ea0b4126d934b851df60cda2db2e17fbbae2 --- /dev/null +++ b/configs/slim/extensions/yolov3_mobilenetv1_prune_qat.yml @@ -0,0 +1,19 @@ +# Weights of yolov3_mobilenet_v1_voc +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams +slim: PrunerQAT + +PrunerQAT: + criterion: fpgm + pruned_params: ['conv2d_27.w_0', 'conv2d_28.w_0', 'conv2d_29.w_0', + 'conv2d_30.w_0', 'conv2d_31.w_0', 'conv2d_32.w_0', + 'conv2d_34.w_0', 'conv2d_35.w_0', 'conv2d_36.w_0', + 'conv2d_37.w_0', 'conv2d_38.w_0', 'conv2d_39.w_0', + 'conv2d_41.w_0', 'conv2d_42.w_0', 'conv2d_43.w_0', + 'conv2d_44.w_0', 'conv2d_45.w_0', 'conv2d_46.w_0'] + pruned_ratios: [0.1,0.2,0.2,0.2,0.2,0.1,0.2,0.3,0.3,0.3,0.2,0.1,0.3,0.4,0.4,0.4,0.4,0.3] + print_prune_params: False + quant_config: { + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_qat_model: True diff --git a/configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml b/configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..d715aedffe2dd5e15bdb222a74aa35bc273d2240 --- /dev/null +++ b/configs/slim/post_quant/mask_rcnn_r50_fpn_1x_coco_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml b/configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..dfa793d528a63255fce62c6d1c94a594fee58853 --- /dev/null +++ b/configs/slim/post_quant/ppyoloe_crn_s_300e_coco_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/configs/slim/post_quant/tinypose_128x96_ptq.yml b/configs/slim/post_quant/tinypose_128x96_ptq.yml new file mode 100644 index 0000000000000000000000000000000000000000..a3bd64761fac679d83bbbdb4011ea3ab327ad3f9 --- /dev/null +++ b/configs/slim/post_quant/tinypose_128x96_ptq.yml @@ -0,0 +1,10 @@ +weights: https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +slim: PTQ + +PTQ: + ptq_config: { + 'activation_quantizer': 'HistQuantizer', + 'upsample_bins': 127, + 'hist_percent': 0.999} + quant_batch_num: 10 + fuse: True diff --git a/configs/slim/quant/picodet_s_416_lcnet_quant.yml b/configs/slim/quant/picodet_s_416_lcnet_quant.yml new file mode 100644 index 0000000000000000000000000000000000000000..000807ab6b138ca8f28440f97b44809e75a9ac3d --- /dev/null +++ b/configs/slim/quant/picodet_s_416_lcnet_quant.yml @@ -0,0 +1,22 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_s_416_coco_lcnet.pdparams +slim: QAT + +QAT: + quant_config: { + 'activation_preprocess_type': 'PACT', + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: False + +TrainReader: + batch_size: 48 + +LearningRate: + base_lr: 0.024 + schedulers: + - !CosineDecay + max_epochs: 300 + - !LinearWarmup + start_factor: 0.1 + steps: 300 diff --git a/configs/slim/quant/ppyoloe_l_qat.yml b/configs/slim/quant/ppyoloe_l_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..4c0e94003a6ed0b7dde95ecd1f2361b87c61b4c8 --- /dev/null +++ b/configs/slim/quant/ppyoloe_l_qat.yml @@ -0,0 +1,26 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +slim: QAT + +QAT: + quant_config: { + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: True + +epoch: 30 +snapshot_epoch: 5 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 10 + - 20 + - !LinearWarmup + start_factor: 0. + steps: 100 + +TrainReader: + batch_size: 8 diff --git a/configs/slim/quant/tinypose_qat.yml b/configs/slim/quant/tinypose_qat.yml new file mode 100644 index 0000000000000000000000000000000000000000..3b85dfe55d226d2514bf11c530abb8df1abf8664 --- /dev/null +++ b/configs/slim/quant/tinypose_qat.yml @@ -0,0 +1,26 @@ +pretrain_weights: https://paddledet.bj.bcebos.com/models/keypoint/tinypose_128x96.pdparams +slim: QAT + +QAT: + quant_config: { + 'activation_preprocess_type': 'PACT', + 'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', + 'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, + 'quantizable_layer_type': ['Conv2D', 'Linear']} + print_model: False + +epoch: 50 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 30 + - 40 + - !LinearWarmup + start_factor: 0. + steps: 100 + +TrainReader: + batch_size: 256 diff --git a/configs/smalldet/README.md b/configs/smalldet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..042e2b25a4f305f04bca7b58afb0cbed01cd96f3 --- /dev/null +++ b/configs/smalldet/README.md @@ -0,0 +1,16 @@ +# PP-YOLOE Smalldet 检测模型 + +VisDroneVisDroneDOTAXview + + +| 模型 | 数据集 | SLICE_SIZE | OVERLAP_RATIO | 类别数 | mAPval
    0.5:0.95 | APval
    0.5 | 下载链接 | 配置文件 | +|:---------|:---------------:|:---------------:|:---------------:|:------:|:-----------------------:|:-------------------:|:---------:| :-----: | +|PP-YOLOE-l| Xview | 400 | 0.25 | 60 | 14.5 | 26.8 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_xview_400_025.pdparams) | [配置文件](./ppyoloe_crn_l_80e_sliced_xview_400_025.yml) | +|PP-YOLOE-l| DOTA | 500 | 0.25 | 15 | 46.8 | 72.6 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_dota_500_025.pdparams) | [配置文件](./ppyoloe_crn_l_80e_sliced_DOTA_500_025.yml) | +|PP-YOLOE-l| VisDrone | 500 | 0.25 | 10 | 29.7 | 48.5 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams) | [配置文件](./ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml) | + + +**注意:** +- **SLICE_SIZE**表示使用SAHI工具切图后子图的大小(SLICE_SIZE*SLICE_SIZE);**OVERLAP_RATIO**表示切图重叠率。 +- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)。 diff --git a/configs/smalldet/_base_/DOTA_sliced_500_025_detection.yml b/configs/smalldet/_base_/DOTA_sliced_500_025_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..100d8cbf17ef78e6cc182e14672147c235896be1 --- /dev/null +++ b/configs/smalldet/_base_/DOTA_sliced_500_025_detection.yml @@ -0,0 +1,20 @@ +metric: COCO +num_classes: 15 + +TrainDataset: + !COCODataSet + image_dir: DOTA_slice_train/train_images_500_025 + anno_path: DOTA_slice_train/train_500_025.json + dataset_dir: dataset/DOTA + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: DOTA_slice_val/val_images_500_025 + anno_path: DOTA_slice_val/val_500_025.json + dataset_dir: dataset/DOTA + +TestDataset: + !ImageFolder + anno_path: dataset/DOTA/DOTA_slice_val/val_500_025.json + dataset_dir: dataset/DOTA/DOTA_slice_val/val_images_500_025 diff --git a/configs/smalldet/_base_/visdrone_sliced_640_025_detection.yml b/configs/smalldet/_base_/visdrone_sliced_640_025_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..2d88b2c00ff5e691cbcda56036a704cbf7cf0a0c --- /dev/null +++ b/configs/smalldet/_base_/visdrone_sliced_640_025_detection.yml @@ -0,0 +1,20 @@ +metric: COCO +num_classes: 10 + +TrainDataset: + !COCODataSet + image_dir: train_images_640_025 + anno_path: train_640_025.json + dataset_dir: dataset/visdrone_sliced + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val_images_640_025 + anno_path: val_640_025.json + dataset_dir: dataset/visdrone_sliced + +TestDataset: + !ImageFolder + anno_path: dataset/visdrone_sliced/val_640_025.json + dataset_dir: dataset/visdrone_sliced/val_images_640_025 diff --git a/configs/smalldet/_base_/xview_sliced_400_025_detection.yml b/configs/smalldet/_base_/xview_sliced_400_025_detection.yml new file mode 100644 index 0000000000000000000000000000000000000000..b932359db56957b3daca58bbbebc66ff156582b5 --- /dev/null +++ b/configs/smalldet/_base_/xview_sliced_400_025_detection.yml @@ -0,0 +1,20 @@ +metric: COCO +num_classes: 60 + +TrainDataset: + !COCODataSet + image_dir: train_images_400_025 + anno_path: train_400_025.json + dataset_dir: dataset/xview/xview_slic + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val_images_400_025 + anno_path: val_400_025.json + dataset_dir: dataset/xview/xview_slic + +TestDataset: + !ImageFolder + anno_path: dataset/xview/xview_slic/val_400_025.json + dataset_dir: dataset/xview/xview_slic/val_images_400_025 diff --git a/configs/smalldet/ppyoloe_crn_l_80e_sliced_DOTA_500_025.yml b/configs/smalldet/ppyoloe_crn_l_80e_sliced_DOTA_500_025.yml new file mode 100644 index 0000000000000000000000000000000000000000..6a1429d56b1d8118d0763d3c68931f77267d4704 --- /dev/null +++ b/configs/smalldet/ppyoloe_crn_l_80e_sliced_DOTA_500_025.yml @@ -0,0 +1,36 @@ +_BASE_: [ + './_base_/DOTA_sliced_500_025_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_l_80e_sliced_DOTA_500_025/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +TrainReader: + batch_size: 8 + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml b/configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml new file mode 100644 index 0000000000000000000000000000000000000000..8d133bb722477c5b56ea2e9e48a1a3f81d155dae --- /dev/null +++ b/configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml @@ -0,0 +1,36 @@ +_BASE_: [ + './_base_/visdrone_sliced_640_025_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_l_80e_sliced_visdrone_640_025/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +TrainReader: + batch_size: 8 + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smalldet/ppyoloe_crn_l_80e_sliced_xview_400_025.yml b/configs/smalldet/ppyoloe_crn_l_80e_sliced_xview_400_025.yml new file mode 100644 index 0000000000000000000000000000000000000000..7c9d80ea5ba869e13e5eb19450c041eb5350b65d --- /dev/null +++ b/configs/smalldet/ppyoloe_crn_l_80e_sliced_xview_400_025.yml @@ -0,0 +1,36 @@ +_BASE_: [ + './_base_/xview_sliced_400_025_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_l_80e_sliced_xview_400_025/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +TrainReader: + batch_size: 8 + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/DataAnalysis.md b/configs/smrt/DataAnalysis.md new file mode 100644 index 0000000000000000000000000000000000000000..66da22f43d9ba9494c35ee0fa0285aa45099399f --- /dev/null +++ b/configs/smrt/DataAnalysis.md @@ -0,0 +1,68 @@ +# 数据分析功能说明 + +为了更好的帮助用户进行数据分析,从推荐更合适的模型,我们推出了**数据分析**功能,用户不需要上传原图,只需要上传标注好的文件格式即可进一步分析数据特点。 + +当前支持格式有: +* LabelMe标注数据格式 +* 精灵标注数据格式 +* LabelImg标注数据格式 +* VOC数据格式 +* COCO数据格式 +* Seg数据格式 + +## LabelMe标注数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中的内容为与标注图像相同数量的json文件,每一个json文件除后缀外与对应的图像同名。 +2. 支持检测与分割任务。若提供的标注信息与所选择的任务类型不匹配,则将提示错误。 +3. 对于检测任务,需提供rectangle类型标注信息;对于分割任务,需提供polygon类型标注信息。 +
    + +
    + +## 精灵标注数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中的内容为与标注图像相同数量的json文件,每一个json文件除后缀外与对应的图像同名。 +2. 支持检测与分割任务。若提供的标注信息与所选择的任务类型不匹配,则将提示错误。 +3. 对于检测任务,需提供bndbox或polygon类型标注信息;对于分割任务,需提供polygon类型标注信息。 +
    + +
    + +## LabelImg标注数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中的内容为与标注图像相同数量的xml文件,每一个xml文件除后缀外与对应的图像同名。 +2. 仅支持检测任务。 +3. 标注文件中必须提供bndbox字段信息;segmentation字段是可选的。 + +
    + +
    + +## VOC数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中的内容为与标注图像相同数量的xml文件,每一个xml文件除后缀外与对应的图像同名。 +2. 仅支持检测任务。 +3. 标注文件中必须提供bndbox字段信息;segmentation字段是可选的。 +
    + +
    + +## COCO数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中仅存在一个名为annotation.json的文件。 +2. 支持检测与分割任务。若提供的标注信息与所选择的任务类型不匹配,则将提示错误。 +3. 对于检测任务,标注文件中必须包含bbox字段,segmentation字段是可选的;对于分割任务,标注文件中必须包含segmentation字段。 +
    + +
    + + +## Seg数据格式 + +1. 需要选定包含标注文件的zip格式压缩包。zip格式压缩包中包含一个annotations文件夹,文件夹中的内容为与标注图像相同数量的png文件,每一个png文件除后缀外与对应的图像同名。 +2. 仅支持分割任务。 +3. 标注文件需要与原始图像在像素上严格保持一一对应,格式只可为png(后缀为.png或.PNG)。标注文件中的每个像素值为[0,255]区间内从0开始依序递增的整数ID,除255外,标注ID值的增加不能跳跃。在标注文件中,使用255表示需要忽略的像素,使用0表示背景类标注。 + +
    + +
    diff --git a/configs/smrt/README.md b/configs/smrt/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d5a59ae1799bc53096cdcf53ece2ab53758f1a48 --- /dev/null +++ b/configs/smrt/README.md @@ -0,0 +1,216 @@ +# 飞桨产业模型选型工具PaddleSMRT + +## 一、项目介绍 + +PaddleSMRT (Paddle Sense Model Recommend Tool) 是飞桨结合产业落地经验推出的产业模型选型工具,在项目落地过程中,用户根据自身的实际情况,输入自己的需求,即可以得到对应在算法模型、部署硬件以及教程文档的信息。同时为了更加精准的推荐,增加了数据分析功能,用户上传自己的标注文件,系统可以自动分析数据特点,例如数据分布不均衡、小目标、密集型等,从而提供更加精准的模型以及优化策略,更好的符合场景的需求。 + +飞桨官网使用[链接](https://www.paddlepaddle.org.cn/smrt) + +本文档主要介绍PaddleSMRT在检测方向上是如何进行模型选型推荐,以及推荐模型的使用方法。分割方向模型介绍请参考[文档](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.5/configs/smrt) + +## 二、数据介绍 + +PaddleSMRT结合产业真实场景,通过比较检测算法效果,向用户推荐最适合的模型。目前PaddleSMRT覆盖工业质检、城市安防两大场景,下面介绍PaddleSMRT进行算法对比所使用的数据集 + +### 1. 新能源电池质检数据集 + +数据集为新能源电池电池组件质检数据集,包含15021张图片,包含22045个标注框,覆盖45种缺陷类型,例如掉胶,裂纹,划痕等。 + +新能源电池数据展示图: + +
    + + +
    + +数据集特点为: + +1. 类别分布均衡 +2. 属于小目标数据 +3. 非密集型数据 + +### 2. 铝件质检数据集 + +数据集为铝件生产过程中的质检数据集,包含11293张图片,包含43157个标注框,覆盖5种缺陷类型,例如划伤,压伤,起皮等。 + +铝件质检数据展示图: + +
    + + +
    + + +数据集特点为: + +1. 类别分布不均衡 +2. 属于小目标数据 +3. 非密集型数据 + + +### 3. 人车数据集 + +数据集包含2600张人工标注的两点anchor box标签。标签包括以下人和车的类别共22种: +其中行人包括普通行人、3D 假人、坐着的人、骑车的人;车辆包括两厢车、三厢车、小型客车、小货车、皮卡车、轻卡、厢式货车、牵引车、水泥车、工程车辆、校车、中小型客车、大型单层客车、小型电动车、摩托车、自行车、三轮车以及其它特殊车辆。 + +人车数据展示图: + +
    + + +
    + + +数据集特点为: + +1. 类别分布不均衡 +2. 属于小目标数据 +3. 非密集型数据 + +**说明:** + +数据集特点判断依据如下: + +- 数据分布不均衡:采样1000张图片,不同类别样本个数标准差大于400 +- 小目标数据集:相对大小小于0.1或绝对大小小于32像素的样本个数比例大于30% +- 密集型数据集: + +``` + 密集目标定义:周围目标距离小于自身大小两倍的个数大于2; + + 密集图片定义:密集目标个数占图片目标总数50%以上; + + 密集数据集定义:密集图片个数占总个数30%以上 + +``` + +为了更好的帮助用户选择模型,我们也提供了丰富的数据分析功能,用户只需要上传标注文件(不需要原图)即可了解数据特点分布和模型优化建议 + +
    + +
    + +## 三、推荐模型使用全流程 + +通过模型选型工具会得到对应场景和数据特点的检测模型配置,例如[PP-YOLOE](./ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml) + +该配置文件的使用方法如下 + +### 1. 环境配置 + +首先需要安装PaddlePaddle + +```bash +# CUDA10.2 +pip install paddlepaddle-gpu==2.2.2 -i https://mirror.baidu.com/pypi/simple + +# CPU +pip install paddlepaddle==2.2.2 -i https://mirror.baidu.com/pypi/simple +``` + +然后安装PaddleDetection和相关依赖 + +```bash +# 克隆PaddleDetection仓库 +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +# 安装其他依赖 +cd PaddleDetection +pip install -r requirements.txt +``` + +详细安装文档请参考[文档](../../docs/tutorials/INSTALL_cn.md) + +### 2. 数据准备 + +用户需要准备训练数据集,建议标注文件使用COCO数据格式。如果使用lableme或者VOC数据格式,先使用[格式转换脚本](../../tools/x2coco.py)将标注格式转化为COCO,详细数据准备文档请参考[文档](../../docs/tutorials/PrepareDataSet.md) + +本文档以新能源电池工业质检子数据集为例展开,数据下载[链接](https://bj.bcebos.com/v1/paddle-smrt/data/battery_mini.zip) + +数据储存格式如下: + +``` +battery_mini +├── annotations +│   ├── test.json +│   └── train.json +└── images + ├── Board_daowen_101.png + ├── Board_daowen_109.png + ├── Board_daowen_117.png + ... +``` + + + +### 3. 模型训练/评估/预测 + +使用经过模型选型工具推荐的模型进行训练,目前所推荐的模型均使用**单卡训练**,可以在训练的过程中进行评估,模型默认保存在`./output`下 + +```bash +python tools/train.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml --eval +``` + +如果训练过程出现中断,可以使用-r命令恢复训练 + +```bash +python tools/train.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml --eval -r output/ppyoloe_crn_m_300e_battery_1024/9.pdparams +``` + +如果期望单独评估模型训练精度,可以使用`tools/eval.py` + +```bash +python tools/eval.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml -o weights=output/ppyoloe_crn_m_300e_battery_1024/model_final.pdparams +``` + +完成训练后,可以使用`tools/infer.py`可视化训练效果 + +```bash +python tools/infer.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml -o weights=output/ppyoloe_crn_m_300e_battery_1024/model_final.pdparams --infer_img=images/Board_diaojiao_1591.png +``` + +更多模型训练参数请参考[文档](../../docs/tutorials/GETTING_STARTED_cn.md) + +### 4. 模型导出部署 + +完成模型训练后,需要将模型部署到1080Ti,2080Ti或其他服务器设备上,使用Paddle Inference完成C++部署 + +首先需要将模型导出为部署时使用的模型和配置文件 + +```bash +python tools/export_model.py -c configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml -o weights=output/ppyoloe_crn_m_300e_battery_1024/model_final.pdparams +``` + +接下来可以使用PaddleDetection中的部署代码实现C++部署,详细步骤请参考[文档](../../deploy/cpp/README.md) + +如果期望使用可视化界面的方式进行部署,可以参考下面部分的内容。 + +## 四、部署demo + +为了更方便大家部署,我们也提供了完备的可视化部署Demo,欢迎尝试使用 + +* [Windows Demo下载地址](https://github.com/PaddlePaddle/PaddleX/tree/develop/deploy/cpp/docs/csharp_deploy) + +
    + +
    + +* [Linux Demo下载地址](https://github.com/cjh3020889729/The-PaddleX-QT-Visualize-GUI) + +
    + +
    + +## 五、场景范例 + +为了更方便大家更好的进行产业落地,PaddleSMRT也提供了详细的应用范例,欢迎大家使用。 + +* 工业视觉 + * [工业缺陷检测](https://aistudio.baidu.com/aistudio/projectdetail/2598319) + * [表计读数](https://aistudio.baidu.com/aistudio/projectdetail/2598327) + * [钢筋计数](https://aistudio.baidu.com/aistudio/projectdetail/2404188) +* 城市 + * [行人计数](https://aistudio.baidu.com/aistudio/projectdetail/2421822) + * [车辆计数](https://aistudio.baidu.com/aistudio/projectdetail/3391734?contributionType=1) + * [安全帽检测](https://aistudio.baidu.com/aistudio/projectdetail/3944737?contributionType=1) diff --git a/configs/smrt/images/00362.jpg b/configs/smrt/images/00362.jpg new file mode 100644 index 0000000000000000000000000000000000000000..da4ab37d5cb5501e3c1471b30a3f465dd9b0a88f Binary files /dev/null and b/configs/smrt/images/00362.jpg differ diff --git a/configs/smrt/images/Board_diaojiao_1591.png b/configs/smrt/images/Board_diaojiao_1591.png new file mode 100644 index 0000000000000000000000000000000000000000..0ec35b9450209fba4b9579fcc325e70fc5f63ddd Binary files /dev/null and b/configs/smrt/images/Board_diaojiao_1591.png differ diff --git a/configs/smrt/images/UpCoa_liewen_163.png b/configs/smrt/images/UpCoa_liewen_163.png new file mode 100644 index 0000000000000000000000000000000000000000..294c29b4ed04c81d672cbe72ddaa4ccb3e301f67 Binary files /dev/null and b/configs/smrt/images/UpCoa_liewen_163.png differ diff --git a/configs/smrt/images/lvjian1_0.jpg b/configs/smrt/images/lvjian1_0.jpg new file mode 100644 index 0000000000000000000000000000000000000000..dbf0dfaa769f9a72bd7825f8589fffa5aca3ac6e Binary files /dev/null and b/configs/smrt/images/lvjian1_0.jpg differ diff --git a/configs/smrt/images/lvjian1_10.jpg b/configs/smrt/images/lvjian1_10.jpg new file mode 100644 index 0000000000000000000000000000000000000000..25467e8174b27df7bf43b33795eb7ea1af605813 Binary files /dev/null and b/configs/smrt/images/lvjian1_10.jpg differ diff --git a/configs/smrt/images/renche_00002.jpg b/configs/smrt/images/renche_00002.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9446db44df96cf18ef7871c345a8010cdfec49df Binary files /dev/null and b/configs/smrt/images/renche_00002.jpg differ diff --git a/configs/smrt/images/renche_00204.jpg b/configs/smrt/images/renche_00204.jpg new file mode 100644 index 0000000000000000000000000000000000000000..2c46e933b970411eca850195b59f1c477d5d2a5e Binary files /dev/null and b/configs/smrt/images/renche_00204.jpg differ diff --git a/configs/smrt/picodet/picodet_l_1024_coco_lcnet_battery.yml b/configs/smrt/picodet/picodet_l_1024_coco_lcnet_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..601dd1915ee65fb452340c783be8ca1cab905ce1 --- /dev/null +++ b/configs/smrt/picodet/picodet_l_1024_coco_lcnet_battery.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_1024_coco_lcnet_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/configs/smrt/picodet/picodet_l_1024_coco_lcnet_lvjian1.yml b/configs/smrt/picodet/picodet_l_1024_coco_lcnet_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..734f1bee70ce4c2708f846af4d10e350fa6a329f --- /dev/null +++ b/configs/smrt/picodet/picodet_l_1024_coco_lcnet_lvjian1.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_1024_coco_lcnet_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/configs/smrt/picodet/picodet_l_1024_coco_lcnet_renche.yml b/configs/smrt/picodet/picodet_l_1024_coco_lcnet_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..cdebd4ba4ae55e40c940b230bd61528a39fe0fcf --- /dev/null +++ b/configs/smrt/picodet/picodet_l_1024_coco_lcnet_renche.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_1024_coco_lcnet_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/configs/smrt/picodet/picodet_l_640_coco_lcnet_battery.yml b/configs/smrt/picodet/picodet_l_640_coco_lcnet_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..8200439dc928fea3d0c091d98acee30117a33be1 --- /dev/null +++ b/configs/smrt/picodet/picodet_l_640_coco_lcnet_battery.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_640_coco_lcnet_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/configs/smrt/picodet/picodet_l_640_coco_lcnet_lvjian1.yml b/configs/smrt/picodet/picodet_l_640_coco_lcnet_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..6000902f03363a5763e7c26fd38565f85dcb2388 --- /dev/null +++ b/configs/smrt/picodet/picodet_l_640_coco_lcnet_lvjian1.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_640_coco_lcnet_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: /paddle/dataset/model-select/gongye/lvjian1/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: /paddle/dataset/model-select/gongye/lvjian1/slice_lvjian1_data/eval/ + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/configs/smrt/picodet/picodet_l_640_coco_lcnet_renche.yml b/configs/smrt/picodet/picodet_l_640_coco_lcnet_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..fc1ce195ea4c5dce2bb60358f0509eb5e01a50c4 --- /dev/null +++ b/configs/smrt/picodet/picodet_l_640_coco_lcnet_renche.yml @@ -0,0 +1,162 @@ +weights: output/picodet_l_640_coco_lcnet_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/picodet_l_640_coco_lcnet.pdparams + +worker_num: 2 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 50 +LearningRate: + base_lr: 0.006 + schedulers: + - !CosineDecay + max_epochs: 50 + - !LinearWarmup + start_factor: 0.001 + steps: 300 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomCrop: {} + - RandomFlip: {prob: 0.5} + - RandomDistort: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 8 + shuffle: false + + +TestReader: + inputs_def: + image_shape: [1, 3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 10 +print_flops: false +find_unused_parameters: True +use_ema: true + + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.00004 + type: L2 + +architecture: PicoDet + +PicoDet: + backbone: LCNet + neck: LCPAN + head: PicoHeadV2 + +LCNet: + scale: 2.0 + feature_maps: [3, 4, 5] + +LCPAN: + out_channels: 160 + use_depthwise: True + num_features: 4 + +PicoHeadV2: + conv_feat: + name: PicoFeat + feat_in: 160 + feat_out: 160 + num_convs: 4 + num_fpn_stride: 4 + norm_type: bn + share_cls_reg: True + use_se: True + fpn_stride: [8, 16, 32, 64] + feat_in_chan: 160 + prior_prob: 0.01 + reg_max: 7 + cell_offset: 0.5 + grid_cell_scale: 5.0 + static_assigner_epoch: 100 + use_align_head: True + static_assigner: + name: ATSSAssigner + topk: 9 + force_gt_matching: False + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + loss_class: + name: VarifocalLoss + use_sigmoid: False + iou_weighted: True + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.5 + loss_bbox: + name: GIoULoss + loss_weight: 2.5 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.025 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery.yml b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..507d0088cbc343e7ca281f8c8f54aa169e135a43 --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery.yml @@ -0,0 +1,154 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/battery_mini # if set, anno_path will be 'dataset_dir/anno_path' + + +epoch: 40 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +snapshot_epoch: 5 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.4 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery_1024.yml b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..cd8b6d49f2d5b3a2de4a72e8bfe2064bcfcfc4a7 --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_battery_1024.yml @@ -0,0 +1,154 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/battery_mini # if set, anno_path will be 'dataset_dir/anno_path' + + +epoch: 40 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +snapshot_epoch: 5 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.4 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_1024.yml b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..0c09049a60423464a8c14666f7c86098564482d8 --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_1024.yml @@ -0,0 +1,155 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + + +epoch: 20 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[8, 7], [24, 12], [14, 25], [37, 35], [30, 140], [89, 52], [93, 189], [226, 99], [264, 352]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[8, 7], [24, 12], [14, 25], + [37, 35], [30, 140], [89, 52], + [93, 189], [226, 99], [264, 352]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_640.yml b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_640.yml new file mode 100644 index 0000000000000000000000000000000000000000..f7dc75f975efdbd08ae4f96ce2f54cc55a4569f4 --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_lvjian1_640.yml @@ -0,0 +1,155 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + + +epoch: 20 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[8, 7], [24, 12], [14, 25], [37, 35], [30, 140], [89, 52], [93, 189], [226, 99], [264, 352]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[8, 7], [24, 12], [14, 25], + [37, 35], [30, 140], [89, 52], + [93, 189], [226, 99], [264, 352]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_1024.yml b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..96ab192171798f7947ee857b8291152e5933c57a --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_1024.yml @@ -0,0 +1,156 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: dataset/renche/test.json + + +epoch: 100 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_640.yml b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_640.yml new file mode 100644 index 0000000000000000000000000000000000000000..ccc2162de1c995fdede25ccfa337d6136d14b3df --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r101vd_dcn_365e_renche_640.yml @@ -0,0 +1,156 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: dataset/renche/test.json + + +epoch: 100 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 101 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/static/configs/ppyolo/ppyolo_test.yml b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery.yml similarity index 31% rename from static/configs/ppyolo/ppyolo_test.yml rename to configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery.yml index 250beb0551c95dfbe731e1ba3da52e0b695cc6e9..e886dd6c10bd03bcb2ef64f1a5f54ba0e923efcc 100644 --- a/static/configs/ppyolo/ppyolo_test.yml +++ b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery.yml @@ -1,138 +1,154 @@ -# NOTE: this config file is only used for evaluation on COCO test2019 set, -# for training or evaluationg on COCO val2017, please use ppyolo.yml architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 use_gpu: true -max_iters: 500000 +use_xpu: false log_iter: 100 save_dir: output -snapshot_iter: 10000 + metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -weights: output/ppyolo/model_final -num_classes: 80 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 -save_prediction_only: True +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/battery_mini # if set, anno_path will be 'dataset_dir/anno_path' + + +epoch: 40 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +snapshot_epoch: 5 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + YOLOv3: backbone: ResNet + neck: PPYOLOPAN yolo_head: YOLOv3Head - use_fine_grained_loss: true + post_process: BBoxPostProcess ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. depth: 50 - feature_maps: [3, 4, 5] variant: d - dcn_v2_stages: [5] + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - coord_conv: true + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss iou_aware: true iou_aware_factor: 0.4 - scale_x_y: 1.05 - spp: true - yolo_loss: YOLOv3Loss - nms: MatrixNMS - drop_block: true YOLOv3Loss: ignore_thresh: 0.7 - scale_x_y: 1.05 + downsample: [32, 16, 8] label_smooth: false - use_fine_grained_loss: true + scale_x_y: 1.05 iou_loss: IouLoss iou_aware_loss: IouAwareLoss IouLoss: loss_weight: 2.5 - max_height: 608 - max_width: 608 + loss_square: true IouAwareLoss: loss_weight: 1.0 - max_height: 608 - max_width: 608 -MatrixNMS: - background_label: -1 +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS keep_top_k: 100 - normalized: false score_threshold: 0.01 post_threshold: 0.01 - -LearningRate: - base_lr: 0.00333 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolo_reader.yml' -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id'] - num_max_boxes: 90 - dataset: - !COCODataSet - image_dir: test2017 - anno_path: annotations/image_info_test-dev2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 1 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 - -TestReader: - dataset: - !ImageFolder - use_default_label: true - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 1 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery_1024.yml b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..d3b7ac28fc68dfb85c0bd0d67f61b77b844bd034 --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_battery_1024.yml @@ -0,0 +1,154 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json # also support txt (like VOC's label_list.txt) + dataset_dir: dataset/battery_mini # if set, anno_path will be 'dataset_dir/anno_path' + + +epoch: 40 +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + +snapshot_epoch: 5 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.4 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_1024.yml b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..6138e875e83a9a91708159b7f99e692c901f4f1b --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_1024.yml @@ -0,0 +1,155 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + + +epoch: 20 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[8, 7], [24, 12], [14, 25], [37, 35], [30, 140], [89, 52], [93, 189], [226, 99], [264, 352]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[8, 7], [24, 12], [14, 25], + [37, 35], [30, 140], [89, 52], + [93, 189], [226, 99], [264, 352]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_640.yml b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_640.yml new file mode 100644 index 0000000000000000000000000000000000000000..5da1090006a2e1ba967c0870ba7311306ec1a164 --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_lvjian1_640.yml @@ -0,0 +1,155 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + + +epoch: 20 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 2 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[8, 7], [24, 12], [14, 25], [37, 35], [30, 140], [89, 52], [93, 189], [226, 99], [264, 352]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[8, 7], [24, 12], [14, 25], + [37, 35], [30, 140], [89, 52], + [93, 189], [226, 99], [264, 352]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_1024.yml b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..611ea34a6b6bed018698f7c2fc7f1e6cf6528988 --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_1024.yml @@ -0,0 +1,156 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: dataset/renche/test.json + + +epoch: 100 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 1024, 1024] + sample_transforms: + - Decode: {} + - Resize: {target_size: [1024, 1024], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_640.yml b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_640.yml new file mode 100644 index 0000000000000000000000000000000000000000..37fb675f0acc4585f5ded137db46473b57c517c0 --- /dev/null +++ b/configs/smrt/ppyolo/ppyolov2_r50vd_dcn_365e_renche_640.yml @@ -0,0 +1,156 @@ +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/coco/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/coco/renche + +TestDataset: + !ImageFolder + anno_path: dataset/coco/renche/test.json + + +epoch: 100 +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 80 + - !LinearWarmup + start_factor: 0. + steps: 1000 + + +snapshot_epoch: 3 +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 100 + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [576, 608, 640, 672, 704], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 100} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 8 + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + + +OptimizerBuilder: + clip_grad_by_norm: 35. + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + + +YOLOv3: + backbone: ResNet + neck: PPYOLOPAN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + return_idx: [1, 2, 3] + dcn_v2_stages: [3] + freeze_at: -1 + freeze_norm: false + norm_decay: 0. + +PPYOLOPAN: + drop_block: true + block_size: 3 + keep_prob: 0.9 + spp: true + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + iou_aware: true + iou_aware_factor: 0.5 + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + scale_x_y: 1.05 + iou_loss: IouLoss + iou_aware_loss: IouAwareLoss + +IouLoss: + loss_weight: 2.5 + loss_square: true + +IouAwareLoss: + loss_weight: 1.0 + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery.yml b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..bc58d999cabfdfb8f2252ca0e34c73e118ba70e9 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..027e38e202eaff50e69ac0d3204541d5ae7a08a6 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_battery_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_battery_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1.yml b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..272caf679a296cb4375e3628aa070fd71cec9931 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..38a14259f54dd7a515aa68e5a5f7a79909f5a40b --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_lvjian1_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_lvjian1_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche.yml b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..80c7bac76453e407d743a4e677257ebd4e2505b3 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..2151ecf711c0f52560f9318085f0fee2de7b8a85 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_l_300e_renche_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_l_300e_renche_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery.yml b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..8902f32ec42da89643b85f0743799555c3abc8ec --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..f244c1dd13381d360440a1c7705c8f5f81abf576 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_battery_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_battery_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1.yml b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..7563756955b97f722a9c099dfb8ce57a90b6c6f7 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.002 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 16 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..d15e07f8e88cd1f9d592296e71cc587a6e6892ef --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_lvjian1_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_lvjian1_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.0015 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche.yml b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..a65cbdf540bd9e48800610516e0978d9f51b2c41 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..0427b81d4f8eeca71f6245a583f0f0a2d99f3569 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_m_300e_renche_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_m_300e_renche_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams +depth_mult: 0.67 +width_mult: 0.75 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: /paddle/dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: /paddle/dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: /paddle/dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery.yml b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..1ef01cfc633414a9e4f71bbfc656a116c76fc7bf --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..42d30e00ff940b49b778306fa45562cf87f36396 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_battery_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_battery_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian1.yml b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..b6155305fc4233b1c754dae4f2bb6cc368aa55f8 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian1.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.002 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 16 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..72a184127f10d32176a90bd0045d20a6d88457fa --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_lvjian_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_lvjian1_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.003 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 16 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche.yml b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..df1939153b2672222fd9f3589da89ac3aa1a5a93 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [12, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..07310a067794e789bd58172381cfecf37a1b3f03 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_s_300e_renche_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_s_300e_renche_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery.yml b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..ba94ad254319fa8fa2ca1cb3b982c7f4b5508c5f --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..961d7823a32e8ee377274f1bf65399ab21b5a321 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_battery_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_battery_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1.yml b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..7a47aded5e8cea1ded2d916509f54d53157dd7be --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.001 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 1 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..c1e70d2198f2af380c5cc9ab80704a9861f11c00 --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_lvjian1_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 2 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche.yml b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..be3f79044af32b12bba0e5aa13059585fd65d9ab --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 4 +eval_height: &eval_height 640 +eval_width: &eval_width 640 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche_1024.yml b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche_1024.yml new file mode 100644 index 0000000000000000000000000000000000000000..250251c32504ced54291d2b5449e1ffdafb8b3ea --- /dev/null +++ b/configs/smrt/ppyoloe/ppyoloe_crn_x_300e_renche_1024.yml @@ -0,0 +1,140 @@ +weights: output/ppyoloe_crn_x_300e_renche_1024/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams +depth_mult: 1.33 +width_mult: 1.25 + +worker_num: 4 +eval_height: &eval_height 1024 +eval_width: &eval_width 1024 +eval_size: &eval_size [*eval_height, *eval_width] + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 30 +LearningRate: + base_lr: 0.0005 + schedulers: + - !CosineDecay + max_epochs: 36 + - !LinearWarmup + start_factor: 0. + epochs: 3 + +TrainReader: + sample_transforms: + - Decode: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [960, 992, 1024, 1056, 1088], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 4 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +architecture: YOLOv3 +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 100 + use_varifocal_loss: True + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..128328bf3853bff327b47bb1945908c338b3dcb8 --- /dev/null +++ b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [1350,1425,1500,1575,1650], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..c6b4b8ce5c6ef099f9ba3ef9e603ddc4e273e413 --- /dev/null +++ b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [1350,1425,1500,1575,1650], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..ef11461339080740eb3ac2414eda709f10b00ddb --- /dev/null +++ b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_1500_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [1350,1425,1500,1575,1650], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..20025b07da573fbb7cff5936c50509358b85aa99 --- /dev/null +++ b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_800_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..6e0352a1952c58a3d168787364f0b2b77fede322 --- /dev/null +++ b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..448b65db663322476f7f0db79fcd5e6a52982720 --- /dev/null +++ b/configs/smrt/rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml @@ -0,0 +1,168 @@ +weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_800_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.00025 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [12, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 5 +print_flops: false +find_unused_parameters: True + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + +architecture: CascadeRCNN + +CascadeRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + +ResNet: + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +CascadeHead: + head: CascadeTwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + +CascadeTwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_battery.yml b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..7e6b8871b9525a0f6775266298872178cf5b49aa --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_battery.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_1500_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_lvjian1.yml b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..190ed8fa183656127445602792df861b8018e938 --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_lvjian1.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_1500_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_renche.yml b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..947c6e43bc6ff42f150566b7ef1e9713cd749926 --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_1500_renche.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_1500_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_battery.yml b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..148a0459e8e8f5aea9b74d1e943852c82f524127 --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_battery.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_800_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_lvjian1.yml b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..9362638d3f05b30c2274e199410e7fa509e0eb10 --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_lvjian1.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_800_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_renche.yml b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..bf881d55a0808df85739784270373e1ada4d9f3a --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r101_vd_fpn_ssld_2x_800_renche.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r101_vd_fpn_ssld_2x_800_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 101 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..688ea9bfdf6160715343d18c5b9ea83a27b6bc8e --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_battery.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_1500_battery/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..4b7d8e7d85a3cf61aadf2bfb276f1d325a712808 --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_1500_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..39eca1f8ee87f21026ee483dc6c69e6f30ac9bf7 --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_1500_renche.yml @@ -0,0 +1,166 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_1500_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[800, 800], [900, 900], [1000, 1000], [1200, 1200], [1400, 1400], [1500, 1500]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [1500, 1500], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml new file mode 100644 index 0000000000000000000000000000000000000000..7a982c06df9f32675c3de251f96e0b6477ea0943 --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_battery.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 45 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: annotations/train.json + dataset_dir: dataset/battery_mini + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +TestDataset: + !ImageFolder + anno_path: annotations/test.json + dataset_dir: dataset/battery_mini + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml new file mode 100644 index 0000000000000000000000000000000000000000..39020c77e8ef1d47e9b3df08417f7f4c6a765249 --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_800_lvjian1/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 5 + +TrainDataset: + !COCODataSet + image_dir: images + anno_path: train.json + dataset_dir: dataset/slice_lvjian1_data/train/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: images + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +TestDataset: + !ImageFolder + anno_path: val.json + dataset_dir: dataset/slice_lvjian1_data/eval + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml new file mode 100644 index 0000000000000000000000000000000000000000..e27315c3572f3c89f1f98fc250e50a3d23661250 --- /dev/null +++ b/configs/smrt/rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_800_renche.yml @@ -0,0 +1,167 @@ +weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_800_renche/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams + +metric: COCO +num_classes: 22 + +TrainDataset: + !COCODataSet + image_dir: train_images + anno_path: train.json + dataset_dir: dataset/renche + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: train_images + anno_path: test.json + dataset_dir: dataset/renche + +TestDataset: + !ImageFolder + anno_path: test.json + dataset_dir: dataset/renche + +epoch: 24 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [16, 22] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + + +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 4 + shuffle: true + drop_last: true + collate_batch: false + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + + +TestReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: -1} + batch_size: 1 + shuffle: false + drop_last: false + +use_gpu: true +use_xpu: false +log_iter: 100 +save_dir: output +snapshot_epoch: 2 +print_flops: false + + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0001 + type: L2 + + +architecture: FasterRCNN + +FasterRCNN: + backbone: ResNet + neck: FPN + rpn_head: RPNHead + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +ResNet: + # index 0 stands for res2 + depth: 50 + variant: d + norm_type: bn + freeze_at: 0 + return_idx: [0,1,2,3] + num_stages: 4 + lr_mult_list: [0.05, 0.05, 0.1, 0.15] + +FPN: + out_channel: 256 + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + + +BBoxHead: + head: TwoFCHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + use_random: True + +TwoFCHead: + out_channel: 1024 + +BBoxPostProcess: + decode: RCNNBox + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/visdrone/README.md b/configs/visdrone/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8fb78190c8fcb73163bc9674d42a4b7ab2673e85 --- /dev/null +++ b/configs/visdrone/README.md @@ -0,0 +1,50 @@ +# VisDrone-DET 检测模型 + +PaddleDetection团队提供了针对VisDrone-DET小目标数航拍场景的基于PP-YOLOE的检测模型,用户可以下载模型进行使用。整理后的COCO格式VisDrone-DET数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/smalldet/visdrone.zip),检测其中的10类,包括 `pedestrian(1), people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10)`,原始数据集[下载链接](https://github.com/VisDrone/VisDrone-Dataset)。 + +**注意:** +- VisDrone-DET数据集包括train集6471张,val集548张,test_dev集1610张,test-challenge集1580张(未开放检测框标注),前三者均有开放检测框标注。 +- 模型均只使用train集训练,在val集和test_dev集上验证精度,test_dev集图片数较多,精度参考性较高。 + + +## 原图训练: + +| 模型 | COCOAPI mAPval
    0.5:0.95 | COCOAPI mAPval
    0.5 | COCOAPI mAPtest_dev
    0.5:0.95 | COCOAPI mAPtest_dev
    0.5 | MatlabAPI mAPtest_dev
    0.5:0.95 | MatlabAPI mAPtest_dev
    0.5 | 下载 | 配置文件 | +|:---------|:------:|:------:| :----: | :------:| :------: | :------:| :----: | :------:| +|PP-YOLOE-s| 23.5 | 39.9 | 19.4 | 33.6 | 23.68 | 40.66 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_s_80e_visdrone.yml) | +|PP-YOLOE-P2-Alpha-s| 24.4 | 41.6 | 20.1 | 34.7 | 24.55 | 42.19 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_p2_alpha_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_s_p2_alpha_80e_visdrone.yml) | +|PP-YOLOE-l| 29.2 | 47.3 | 23.5 | 39.1 | 28.00 | 46.20 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_l_80e_visdrone.yml) | +|PP-YOLOE-P2-Alpha-l| 30.1 | 48.9 | 24.3 | 40.8 | 28.47 | 48.16 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_p2_alpha_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_l_p2_alpha_80e_visdrone.yml) | +|PP-YOLOE-Alpha-largesize-l| 41.9 | 65.0 | 32.3 | 53.0 | 37.13 | 61.15 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_alpha_largesize_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml) | +|PP-YOLOE-P2-Alpha-largesize-l| 41.3 | 64.5 | 32.4 | 53.1 | 37.49 | 51.54 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.pdparams) | [配置文件](./ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.yml) | + +## 切图训练: + +| 模型 | COCOAPI mAPval
    0.5:0.95 | COCOAPI mAPval
    0.5 | COCOAPI mAPtest_dev
    0.5:0.95 | COCOAPI mAPtest_dev
    0.5 | MatlabAPI mAPtest_dev
    0.5:0.95 | MatlabAPI mAPtest_dev
    0.5 | 下载 | 配置文件 | +|:---------|:------:|:------:| :----: | :------:| :------: | :------:| :----: | :------:| +|PP-YOLOE-l| 29.7 | 48.5 | 23.3 | 39.9 | - | - | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams) | [配置文件](../smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml) | + + +**注意:** +- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 +- 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)。 +- P2表示增加P2层(1/4下采样层)的特征,共输出4个PPYOLOEHead。 +- Alpha表示对CSPResNet骨干网络增加可一个学习权重参数Alpha参与训练。 +- largesize表示使用以1600尺度为基础的多尺度训练和1920尺度预测,相应的训练batch_size也减小,以速度来换取高精度。 +- MatlabAPI测试是使用官网评测工具[VisDrone2018-DET-toolkit](https://github.com/VisDrone/VisDrone2018-DET-toolkit)。 +- 切图训练模型的配置文件及训练相关流程请参照[smalldet](../smalldet)。 + + +## 引用 +``` +@ARTICLE{9573394, + author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + title={Detection and Tracking Meet Drones Challenge}, + year={2021}, + volume={}, + number={}, + pages={1-1}, + doi={10.1109/TPAMI.2021.3119563} +} +``` diff --git a/configs/visdrone/ppyoloe_crn_l_80e_visdrone.yml b/configs/visdrone/ppyoloe_crn_l_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..4a51e696ac6684adcaff42d5b26033d01413ca68 --- /dev/null +++ b/configs/visdrone/ppyoloe_crn_l_80e_visdrone.yml @@ -0,0 +1,36 @@ +_BASE_: [ + '../datasets/visdrone_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_l_80e_visdrone/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +depth_mult: 1.0 +width_mult: 1.0 + +TrainReader: + batch_size: 8 + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml b/configs/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..998f0fcb5344eb33574dd24a51f2753fb4dd1831 --- /dev/null +++ b/configs/visdrone/ppyoloe_crn_l_alpha_largesize_80e_visdrone.yml @@ -0,0 +1,55 @@ +_BASE_: [ + 'ppyoloe_crn_l_80e_visdrone.yml', +] +weights: output/ppyoloe_crn_l_alpha_largesize_80e_visdrone/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams + + +CSPResNet: + use_alpha: True + + +LearningRate: + base_lr: 0.0025 + + +worker_num: 2 +eval_height: &eval_height 1920 +eval_width: &eval_width 1920 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [1024, 1088, 1152, 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664, 1728, 1792, 1856, 1920], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 2 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/configs/visdrone/ppyoloe_crn_l_p2_alpha_80e_visdrone.yml b/configs/visdrone/ppyoloe_crn_l_p2_alpha_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..718f02903bf4069366910a302e561f68aafc3a62 --- /dev/null +++ b/configs/visdrone/ppyoloe_crn_l_p2_alpha_80e_visdrone.yml @@ -0,0 +1,23 @@ +_BASE_: [ + 'ppyoloe_crn_l_80e_visdrone.yml', +] +weights: output/ppyoloe_crn_l_p2_alpha_80e_visdrone/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams + + +TrainReader: + batch_size: 4 + +LearningRate: + base_lr: 0.005 + + +CSPResNet: + return_idx: [0, 1, 2, 3] + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192, 64] + +PPYOLOEHead: + fpn_strides: [32, 16, 8, 4] diff --git a/configs/visdrone/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.yml b/configs/visdrone/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..1cd8dc671dd5f112742126a434d83a9196853a0f --- /dev/null +++ b/configs/visdrone/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone.yml @@ -0,0 +1,62 @@ +_BASE_: [ + 'ppyoloe_crn_l_80e_visdrone.yml', +] +weights: output/ppyoloe_crn_l_p2_alpha_largesize_80e_visdrone/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams + + +LearningRate: + base_lr: 0.005 + + +CSPResNet: + return_idx: [0, 1, 2, 3] + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192, 64] + +PPYOLOEHead: + fpn_strides: [32, 16, 8, 4] + + +worker_num: 2 +eval_height: &eval_height 1920 +eval_width: &eval_width 1920 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [1024, 1088, 1152, 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664, 1728, 1792, 1856, 1920, 1984, 2048], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - PadGT: {} + batch_size: 1 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 1 diff --git a/configs/visdrone/ppyoloe_crn_s_80e_visdrone.yml b/configs/visdrone/ppyoloe_crn_s_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..db3d93d628f8754aac3be50060f17a14c4dda04d --- /dev/null +++ b/configs/visdrone/ppyoloe_crn_s_80e_visdrone.yml @@ -0,0 +1,36 @@ +_BASE_: [ + '../datasets/visdrone_detection.yml', + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_s_80e_visdrone/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +TrainReader: + batch_size: 8 + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 96 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 500 + score_threshold: 0.01 + nms_threshold: 0.6 diff --git a/configs/visdrone/ppyoloe_crn_s_p2_alpha_80e_visdrone.yml b/configs/visdrone/ppyoloe_crn_s_p2_alpha_80e_visdrone.yml new file mode 100644 index 0000000000000000000000000000000000000000..17d6299bb89e6e70dd420b0ec01743ae26c2af8c --- /dev/null +++ b/configs/visdrone/ppyoloe_crn_s_p2_alpha_80e_visdrone.yml @@ -0,0 +1,22 @@ +_BASE_: [ + 'ppyoloe_crn_s_80e_visdrone.yml', +] +weights: output/ppyoloe_crn_s_p2_alpha_80e_visdrone/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams + +TrainReader: + batch_size: 4 + +LearningRate: + base_lr: 0.005 + +CSPResNet: + return_idx: [0, 1, 2, 3] + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192, 64] + +PPYOLOEHead: + fpn_strides: [32, 16, 8, 4] diff --git a/configs/vitdet/README.md b/configs/vitdet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..9fd9ebdf5af95b4ad05a34c458d9e9760b9ed50b --- /dev/null +++ b/configs/vitdet/README.md @@ -0,0 +1,65 @@ +# Vision Transformer Detection + +## Introduction + +- [Context Autoencoder for Self-Supervised Representation Learning](https://arxiv.org/abs/2202.03026) +- [Benchmarking Detection Transfer Learning with Vision Transformers](https://arxiv.org/pdf/2111.11429.pdf) + +Object detection is a central downstream task used to +test if pre-trained network parameters confer benefits, such +as improved accuracy or training speed. The complexity +of object detection methods can make this benchmarking +non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive. + +## Model Zoo + +| Backbone | Pretrained | Model | Scheduler | Images/GPU | Box AP | Config | Download | +|:------:|:--------:|:--------------:|:--------------:|:--------------:|:------:|:------:|:--------:| +| ViT-base | CAE | Cascade RCNN | 1x | 1 | 52.7 | [config](./cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/cascade_rcnn_vit_base_hrfpn_cae_1x_coco.pdparams) | +| ViT-large | CAE | Cascade RCNN | 1x | 1 | 55.7 | [config](./cascade_rcnn_vit_large_hrfpn_cae_1x_coco.yml) | [model](https://bj.bcebos.com/v1/paddledet/models/cascade_rcnn_vit_large_hrfpn_cae_1x_coco.pdparams) | + +**Notes:** +- Model is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`. +- Base model is trained on 8x32G V100 GPU, large model on 8x80G A100. + + +## Citations +``` +@article{chen2022context, + title={Context autoencoder for self-supervised representation learning}, + author={Chen, Xiaokang and Ding, Mingyu and Wang, Xiaodi and Xin, Ying and Mo, Shentong and Wang, Yunhao and Han, Shumin and Luo, Ping and Zeng, Gang and Wang, Jingdong}, + journal={arXiv preprint arXiv:2202.03026}, + year={2022} +} + +@article{DBLP:journals/corr/abs-2111-11429, + author = {Yanghao Li and + Saining Xie and + Xinlei Chen and + Piotr Doll{\'{a}}r and + Kaiming He and + Ross B. Girshick}, + title = {Benchmarking Detection Transfer Learning with Vision Transformers}, + journal = {CoRR}, + volume = {abs/2111.11429}, + year = {2021}, + url = {https://arxiv.org/abs/2111.11429}, + eprinttype = {arXiv}, + eprint = {2111.11429}, + timestamp = {Fri, 26 Nov 2021 13:48:43 +0100}, + biburl = {https://dblp.org/rec/journals/corr/abs-2111-11429.bib}, + bibsource = {dblp computer science bibliography, https://dblp.org} +} + +@article{Cai_2019, + title={Cascade R-CNN: High Quality Object Detection and Instance Segmentation}, + ISSN={1939-3539}, + url={http://dx.doi.org/10.1109/tpami.2019.2956516}, + DOI={10.1109/tpami.2019.2956516}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + publisher={Institute of Electrical and Electronics Engineers (IEEE)}, + author={Cai, Zhaowei and Vasconcelos, Nuno}, + year={2019}, + pages={1–1} +} +``` diff --git a/configs/vitdet/_base_/optimizer_base_1x.yml b/configs/vitdet/_base_/optimizer_base_1x.yml new file mode 100644 index 0000000000000000000000000000000000000000..b822b3bf92a6a12facafe4b569a0ebcad3cf1d3b --- /dev/null +++ b/configs/vitdet/_base_/optimizer_base_1x.yml @@ -0,0 +1,22 @@ +epoch: 12 + +LearningRate: + base_lr: 0.0001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [9, 11] + - !LinearWarmup + start_factor: 0.001 + steps: 1000 + +OptimizerBuilder: + optimizer: + type: AdamWDL + betas: [0.9, 0.999] + layer_decay: 0.75 + weight_decay: 0.02 + num_layers: 12 + filter_bias_and_bn: True + skip_decay_names: ['pos_embed', 'cls_token'] + set_param_lr_func: 'layerwise_lr_decay' diff --git a/configs/vitdet/_base_/reader.yml b/configs/vitdet/_base_/reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..1af6175a931a571f1c6726f0f312591c07489d1d --- /dev/null +++ b/configs/vitdet/_base_/reader.yml @@ -0,0 +1,41 @@ +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + - RandomResizeCrop: {resizes: [400, 500, 600], cropsizes: [[384, 600], ], prob: 0.5} + - RandomResize: {target_size: [[480, 1333], [512, 1333], [544, 1333], [576, 1333], [608, 1333], [640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], keep_ratio: True, interp: 2} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 2 + shuffle: true + drop_last: true + collate_batch: false + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + - PadBatch: {pad_to_stride: 32} + batch_size: 1 + shuffle: false + drop_last: false + drop_empty: false + + +TestReader: + inputs_def: + image_shape: [-1, 3, 640, 640] + sample_transforms: + - Decode: {} + - LetterBoxResize: {target_size: 640} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_size: 1 + shuffle: false + drop_last: false diff --git a/configs/vitdet/cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml b/configs/vitdet/cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..23f766d75aad37f238cfebc233941d36b2d9a295 --- /dev/null +++ b/configs/vitdet/cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml @@ -0,0 +1,129 @@ + +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/reader.yml', + './_base_/optimizer_base_1x.yml' +] + +weights: output/cascade_rcnn_vit_base_hrfpn_cae_1x_coco/model_final + + +# runtime +log_iter: 100 +snapshot_epoch: 1 +find_unused_parameters: True + +use_gpu: true +norm_type: sync_bn + + +# reader +worker_num: 2 +TrainReader: + batch_size: 1 + + +# model +architecture: CascadeRCNN + +CascadeRCNN: + backbone: VisionTransformer + neck: HRFPN + rpn_head: RPNHead + bbox_head: CascadeHead + # post process + bbox_post_process: BBoxPostProcess + + +VisionTransformer: + patch_size: 16 + embed_dim: 768 + depth: 12 + num_heads: 12 + mlp_ratio: 4 + qkv_bias: True + drop_rate: 0.0 + drop_path_rate: 0.2 + init_values: 0.1 + final_norm: False + use_rel_pos_bias: False + use_sincos_pos_emb: True + epsilon: 0.000001 # 1e-6 + out_indices: [3, 5, 7, 11] + with_fpn: True + pretrained: https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_base_cae_pretrained.pdparams + +HRFPN: + out_channel: 256 + use_bias: True + +RPNHead: + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 2000 + topk_after_collect: True + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + loss_rpn_bbox: SmoothL1Loss + +SmoothL1Loss: + beta: 0.1111111111111111 + + +CascadeHead: + head: CascadeXConvNormHead + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + bbox_assigner: BBoxAssigner + bbox_loss: GIoULoss + num_cascade_stages: 3 + reg_class_agnostic: False + stage_loss_weights: [1, 0.5, 0.25] + loss_normalize_pos: True + +BBoxAssigner: + batch_size_per_im: 512 + bg_thresh: 0.5 + fg_thresh: 0.5 + fg_fraction: 0.25 + cascade_iou: [0.5, 0.6, 0.7] + use_random: True + + +CascadeXConvNormHead: + norm_type: bn + + +GIoULoss: + loss_weight: 10. + reduction: 'none' + eps: 0.000001 + + +BBoxPostProcess: + decode: + name: RCNNBox + prior_box_var: [30.0, 30.0, 15.0, 15.0] + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 diff --git a/configs/vitdet/cascade_rcnn_vit_large_hrfpn_cae_1x_coco.yml b/configs/vitdet/cascade_rcnn_vit_large_hrfpn_cae_1x_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..5f5bbfb46ea1bdb748d31b4d8ea793ebd903fa33 --- /dev/null +++ b/configs/vitdet/cascade_rcnn_vit_large_hrfpn_cae_1x_coco.yml @@ -0,0 +1,27 @@ +_BASE_: [ + './cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml' +] + +weights: output/cascade_rcnn_vit_large_hrfpn_cae_1x_coco/model_final + + +depth: &depth 24 +dim: &dim 1024 + +VisionTransformer: + img_size: [800, 1344] + embed_dim: *dim + depth: *depth + num_heads: 16 + drop_path_rate: 0.25 + out_indices: [7, 11, 15, 23] + pretrained: https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_large_cae_pretrained.pdparams + +HRFPN: + in_channels: [*dim, *dim, *dim, *dim] + +OptimizerBuilder: + optimizer: + layer_decay: 0.9 + weight_decay: 0.02 + num_layers: *depth diff --git a/configs/yolov3/README.md b/configs/yolov3/README.md index af4d07ce13d8e2ac6bf81d40ac4d25f5ab2061b3..41cee48c916f4530e43efa49f961239f52e60cd1 100644 --- a/configs/yolov3/README.md +++ b/configs/yolov3/README.md @@ -9,12 +9,12 @@ | DarkNet53(paper) | 608 | 8 | 270e | ---- | 33.0 | - | - | | DarkNet53(paper) | 416 | 8 | 270e | ---- | 31.0 | - | - | | DarkNet53(paper) | 320 | 8 | 270e | ---- | 28.2 | - | - | -| DarkNet53 | 608 | 8 | 270e | ---- | 39.0 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) | -| DarkNet53 | 416 | 8 | 270e | ---- | 37.5 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) | -| DarkNet53 | 320 | 8 | 270e | ---- | 34.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) | -| ResNet50_vd | 608 | 8 | 270e | ---- | 39.1 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) | -| ResNet50_vd | 416 | 8 | 270e | ---- | 36.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) | -| ResNet50_vd | 320 | 8 | 270e | ---- | 33.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) | +| DarkNet53 | 608 | 8 | 270e | ---- | 39.1 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) | +| DarkNet53 | 416 | 8 | 270e | ---- | 37.7 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) | +| DarkNet53 | 320 | 8 | 270e | ---- | 34.8 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) | +| ResNet50_vd | 608 | 8 | 270e | ---- | 40.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) | +| ResNet50_vd | 416 | 8 | 270e | ---- | 38.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) | +| ResNet50_vd | 320 | 8 | 270e | ---- | 35.1 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) | | ResNet34 | 608 | 8 | 270e | ---- | 36.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_r34_270e_coco.yml) | | ResNet34 | 416 | 8 | 270e | ---- | 34.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_r34_270e_coco.yml) | | ResNet34 | 320 | 8 | 270e | ---- | 31.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_r34_270e_coco.yml) | @@ -45,18 +45,6 @@ | MobileNet-V3-SSLD | 416 | 8 | 270e | - | 79.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | | MobileNet-V3-SSLD | 320 | 8 | 270e | - | 77.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) | -**注意:** YOLOv3均使用8GPU训练,训练270个epoch。由于动态图框架整体升级,以下几个PaddleDetection发布的权重模型评估时需要添加--bias字段, 例如 - -```bash -# 使用PaddleDetection发布的权重 -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/yolov3_darknet53_270e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams --bias -``` -主要有: - -1.yolov3_darknet53_270e_coco - -2.yolov3_r50vd_dcn_270e_coco - ## Citations ``` @misc{redmon2018yolov3, diff --git a/configs/yolov3/_base_/optimizer_40e.yml b/configs/yolov3/_base_/optimizer_40e.yml index 0f858df59921e20398e34d019277e39c10abd583..7cf676d7119162d55dc0a2566c0590457344cfd3 100644 --- a/configs/yolov3/_base_/optimizer_40e.yml +++ b/configs/yolov3/_base_/optimizer_40e.yml @@ -3,12 +3,12 @@ epoch: 40 LearningRate: base_lr: 0.0001 schedulers: - - !PiecewiseDecay + - name: PiecewiseDecay gamma: 0.1 milestones: - 32 - 36 - - !LinearWarmup + - name: LinearWarmup start_factor: 0.3333333333333333 steps: 100 diff --git a/configs/yolox/README.md b/configs/yolox/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a689b2981e06d439bc212c4fbb2ba4efa6860a7c --- /dev/null +++ b/configs/yolox/README.md @@ -0,0 +1,190 @@ +# YOLOX (YOLOX: Exceeding YOLO Series in 2021) + +## 内容 +- [模型库](#模型库) +- [使用说明](#使用说明) +- [速度测试](#速度测试) +- [引用](#引用) + + +## 模型库 +### YOLOX on COCO + +| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | 模型推理耗时(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| YOLOX-nano | 416 | 8 | 300e | 2.3 | 26.1 | 42.0 | 0.91 | 1.08 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_nano_300e_coco.pdparams) | [配置文件](./yolox_nano_300e_coco.yml) | +| YOLOX-tiny | 416 | 8 | 300e | 2.8 | 32.9 | 50.4 | 5.06 | 6.45 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_tiny_300e_coco.pdparams) | [配置文件](./yolox_tiny_300e_coco.yml) | +| YOLOX-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 9.0 | 26.8 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams) | [配置文件](./yolox_s_300e_coco.yml) | +| YOLOX-m | 640 | 8 | 300e | 5.8 | 46.9 | 65.7 | 25.3 | 73.8 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_m_300e_coco.pdparams) | [配置文件](./yolox_m_300e_coco.yml) | +| YOLOX-l | 640 | 8 | 300e | 9.3 | 50.1 | 68.8 | 54.2 | 155.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams) | [配置文件](./yolox_l_300e_coco.yml) | +| YOLOX-x | 640 | 8 | 300e | 16.6 | **51.8** | **70.6** | 99.1 | 281.9 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_x_300e_coco.pdparams) | [配置文件](./yolox_x_300e_coco.yml) | + + +| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | 模型推理耗时(ms) | mAPval
    0.5:0.95 | mAPval
    0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 | +| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: | +| YOLOX-cdn-tiny | 416 | 8 | 300e | 1.9 | 32.4 | 50.2 | 5.03 | 6.33 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_cdn_tiny_300e_coco.pdparams) | [配置文件](./yolox_cdn_tiny_300e_coco.yml) | +| YOLOX-crn-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 7.7 | 24.69 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_crn_s_300e_coco.pdparams) | [配置文件](./yolox_crn_s_300e_coco.yml) | +| YOLOX-ConvNeXt-s| 640 | 8 | 36e | - | **44.6** | **65.3** | 36.2 | 27.52 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_convnext_s_36e_coco.pdparams) | [配置文件](../convnext/yolox_convnext_s_36e_coco.yml) | + + +**注意:** + - YOLOX模型训练使用COCO train2017作为训练集,YOLOX-cdn表示使用与YOLOv5 releases v6.0之后版本相同的主干网络,YOLOX-crn表示使用与PPYOLOE相同的主干网络CSPResNet,YOLOX-ConvNeXt表示使用ConvNeXt作为主干网络; + - YOLOX模型训练过程中默认使用8 GPUs进行混合精度训练,默认每卡batch_size为8,默认lr为0.01为8卡总batch_size=64的设置,如果**GPU卡数**或者每卡**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率; + - 为保持高mAP的同时提高推理速度,可以将[yolox_cspdarknet.yml](_base_/yolox_cspdarknet.yml)中的`nms_top_k`修改为`1000`,将`keep_top_k`修改为`100`,将`score_threshold`修改为`0.01`,mAP会下降约0.1~0.2%; + - 为快速的demo演示效果,可以将[yolox_cspdarknet.yml](_base_/yolox_cspdarknet.yml)中的`score_threshold`修改为`0.25`,将`nms_threshold`修改为`0.45`,但mAP会下降较多; + - YOLOX模型推理速度测试采用单卡V100,batch size=1进行测试,使用**CUDA 10.2**, **CUDNN 7.6.5**,TensorRT推理速度测试使用**TensorRT 6.0.1.8**。 + - 参考[速度测试](#速度测试)以复现YOLOX推理速度测试结果,速度为**tensorRT-FP16**测速后的最快速度,**不包含数据预处理和模型输出后处理(NMS)**的耗时。 + - 如果你设置了`--run_benchmark=True`, 你首先需要安装以下依赖`pip install pynvml psutil GPUtil`。 + +## 使用教程 + +### 1.训练 +执行以下指令使用混合精度训练YOLOX +```bash +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolox/yolox_s_300e_coco.yml --amp --eval +``` +**注意:** +- `--amp`表示开启混合精度训练以避免显存溢出,`--eval`表示边训边验证。 + +### 2.评估 +执行以下命令在单个GPU上评估COCO val2017数据集 +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams +``` + +### 3.推理 +使用以下命令在单张GPU上预测图片,使用`--infer_img`推理单张图片以及使用`--infer_dir`推理文件中的所有图片。 +```bash +# 推理单张图片 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams --infer_img=demo/000000014439_640x640.jpg + +# 推理文件中的所有图片 +CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams --infer_dir=demo +``` + +### 4.导出模型 +YOLOX在GPU上推理部署或benchmark测速等需要通过`tools/export_model.py`导出模型。 + +当你**使用Paddle Inference但不使用TensorRT**时,运行以下的命令导出模型 + +```bash +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams +``` + +当你**使用Paddle Inference且使用TensorRT**时,需要指定`-o trt=True`来导出模型。 + +```bash +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams trt=True +``` + +如果你想将YOLOX模型导出为**ONNX格式**,参考 +[PaddleDetection模型导出为ONNX格式教程](../../deploy/EXPORT_ONNX_MODEL.md),运行以下命令: + +```bash + +# 导出推理模型 +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams + +# 安装paddle2onnx +pip install paddle2onnx + +# 转换成onnx格式 +paddle2onnx --model_dir output_inference/yolox_s_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file yolox_s_300e_coco.onnx +``` + +**注意:** ONNX模型目前只支持batch_size=1 + + +### 5.推理部署 +YOLOX可以使用以下方式进行部署: + - Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp) + - [Paddle-TensorRT](../../deploy/TENSOR_RT.md) + - [PaddleServing](https://github.com/PaddlePaddle/Serving) + - [PaddleSlim模型量化](../slim) + +运行以下命令导出模型 + +```bash +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams trt=True +``` + +**注意:** +- trt=True表示**使用Paddle Inference且使用TensorRT**进行测速,速度会更快,默认不加即为False,表示**使用Paddle Inference但不使用TensorRT**进行测速。 +- 如果是使用Paddle Inference在TensorRT FP16模式下部署,需要参考[Paddle Inference文档](https://www.paddlepaddle.org.cn/inference/master/user_guides/download_lib.html#python),下载并安装与你的CUDA, CUDNN和TensorRT相应的wheel包。 + +#### 5.1.Python部署 +`deploy/python/infer.py`使用上述导出后的Paddle Inference模型用于推理和benchnark测速,如果设置了`--run_benchmark=True`, 首先需要安装以下依赖`pip install pynvml psutil GPUtil`。 + +```bash +# Python部署推理单张图片 +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu + +# 推理文件夹下的所有图片 +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_dir=demo/ --device=gpu +``` + +#### 5.2. C++部署 +`deploy/cpp/build/main`使用上述导出后的Paddle Inference模型用于C++推理部署, 首先按照[docs](../../deploy/cpp/docs)编译安装环境。 +```bash +# C++部署推理单张图片 +./deploy/cpp/build/main --model_dir=output_inference/yolox_s_300e_coco/ --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=GPU --threshold=0.5 --output_dir=cpp_infer_output/yolox_s_300e_coco +``` + + +## 速度测试 + +为了公平起见,在[模型库](#模型库)中的速度测试结果均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致),需要在导出模型时指定`-o exclude_nms=True`。测速需设置`--run_benchmark=True`, 首先需要安装以下依赖`pip install pynvml psutil GPUtil`。 + +**使用Paddle Inference但不使用TensorRT**进行测速,执行以下命令: + +```bash +# 导出模型 +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams exclude_nms=True + +# 速度测试,使用run_benchmark=True +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=gpu --run_benchmark=True +``` + +**使用Paddle Inference且使用TensorRT**进行测速,执行以下命令: + +```bash +# 导出模型,使用trt=True +python tools/export_model.py -c configs/yolox/yolox_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams exclude_nms=True trt=True + +# 速度测试,使用run_benchmark=True +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True + +# tensorRT-FP32测速 +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True --run_mode=trt_fp32 + +# tensorRT-FP16测速 +python deploy/python/infer.py --model_dir=output_inference/yolox_s_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True --run_mode=trt_fp16 +``` +**注意:** +- 导出模型时指定`-o exclude_nms=True`仅作为测速时用,这样导出的模型其推理部署预测的结果不是最终检出框的结果。 +- [模型库](#模型库)中的速度测试结果为**tensorRT-FP16**测速后的最快速度,为**不包含数据预处理和模型输出后处理(NMS)**的耗时。 + +## FAQ + +
    +如何计算模型参数量 +可以将以下代码插入:[trainer.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/engine/trainer.py#L154) 来计算参数量。 +```python +params = sum([ + p.numel() for n, p in self.model.named_parameters() + if all([x not in n for x in ['_mean', '_variance']]) +]) # exclude BatchNorm running status +print('Params: ', params) +``` +
    + + +## Citations +``` + @article{yolox2021, + title={YOLOX: Exceeding YOLO Series in 2021}, + author={Ge, Zheng and Liu, Songtao and Wang, Feng and Li, Zeming and Sun, Jian}, + journal={arXiv preprint arXiv:2107.08430}, + year={2021} +} +``` diff --git a/configs/yolox/_base_/optimizer_300e.yml b/configs/yolox/_base_/optimizer_300e.yml new file mode 100644 index 0000000000000000000000000000000000000000..1853ad61ff3e8f222388a005db9e60640700c996 --- /dev/null +++ b/configs/yolox/_base_/optimizer_300e.yml @@ -0,0 +1,20 @@ +epoch: 300 + +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 300 + min_lr_ratio: 0.05 + last_plateau_epochs: 15 + - !ExpWarmup + epochs: 5 + +OptimizerBuilder: + optimizer: + type: Momentum + momentum: 0.9 + use_nesterov: True + regularizer: + factor: 0.0005 + type: L2 diff --git a/configs/yolox/_base_/yolox_cspdarknet.yml b/configs/yolox/_base_/yolox_cspdarknet.yml new file mode 100644 index 0000000000000000000000000000000000000000..24ef370c437e308c3a7e9da973fe3eea439faf17 --- /dev/null +++ b/configs/yolox/_base_/yolox_cspdarknet.yml @@ -0,0 +1,42 @@ +architecture: YOLOX +norm_type: sync_bn +use_ema: True +ema_decay: 0.9999 +ema_decay_type: "exponential" +act: silu +find_unused_parameters: True + +depth_mult: 1.0 +width_mult: 1.0 + +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + size_stride: 32 + size_range: [15, 25] # multi-scale range [480*480 ~ 800*800] + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: False + +YOLOCSPPAN: + depthwise: False + +YOLOXHead: + l1_epoch: 285 + depthwise: False + loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0} + assigner: + name: SimOTAAssigner + candidate_topk: 10 + use_vfl: False + nms: + name: MultiClassNMS + nms_top_k: 10000 + keep_top_k: 1000 + score_threshold: 0.001 + nms_threshold: 0.65 + # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%. + # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot. diff --git a/configs/yolox/_base_/yolox_reader.yml b/configs/yolox/_base_/yolox_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..a33b847b159a515248c8556a24bb29e779f1def8 --- /dev/null +++ b/configs/yolox/_base_/yolox_reader.yml @@ -0,0 +1,44 @@ +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - Mosaic: + prob: 1.0 + input_dim: [640, 640] + degrees: [-10, 10] + scale: [0.1, 2.0] + shear: [-2, 2] + translate: [-0.1, 0.1] + enable_mixup: True + mixup_prob: 1.0 + mixup_scale: [0.5, 1.5] + - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30} + - PadResize: {target_size: 640} + - RandomFlip: {} + batch_transforms: + - Permute: {} + batch_size: 8 + shuffle: True + drop_last: True + collate_batch: False + mosaic_epoch: 285 + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: True, interp: 1} + - Pad: {size: [640, 640], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 4 + + +TestReader: + inputs_def: + image_shape: [3, 640, 640] + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: True, interp: 1} + - Pad: {size: [640, 640], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 diff --git a/configs/yolox/yolox_cdn_tiny_300e_coco.yml b/configs/yolox/yolox_cdn_tiny_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..81c6c075d3620caa98dce2ebcd3b45bd694cef8d --- /dev/null +++ b/configs/yolox/yolox_cdn_tiny_300e_coco.yml @@ -0,0 +1,14 @@ +_BASE_: [ + 'yolox_tiny_300e_coco.yml' +] +depth_mult: 0.33 +width_mult: 0.375 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_cdn_tiny_300e_coco/model_final + +CSPDarkNet: + arch: "P5" # using the same backbone of YOLOv5 releases v6.0 and later version + return_idx: [2, 3, 4] + depthwise: False diff --git a/configs/yolox/yolox_crn_s_300e_coco.yml b/configs/yolox/yolox_crn_s_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..ae463113e5909f76905b70409ae75794a66430d7 --- /dev/null +++ b/configs/yolox/yolox_crn_s_300e_coco.yml @@ -0,0 +1,28 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 0.33 +width_mult: 0.50 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_crn_s_300e_coco/model_final +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams + + +YOLOX: + backbone: CSPResNet + neck: YOLOCSPPAN + head: YOLOXHead + size_stride: 32 + size_range: [15, 25] # multi-scale range [480*480 ~ 800*800] + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: True diff --git a/configs/yolox/yolox_l_300e_coco.yml b/configs/yolox/yolox_l_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..79cffd5e544b0d2cf4629c6a9f37e75eda4a5a6d --- /dev/null +++ b/configs/yolox/yolox_l_300e_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 1.0 +width_mult: 1.0 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_l_300e_coco/model_final diff --git a/configs/yolox/yolox_m_300e_coco.yml b/configs/yolox/yolox_m_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..4c25d7e2561cf120b60b712e621ad695debdb61c --- /dev/null +++ b/configs/yolox/yolox_m_300e_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 0.67 +width_mult: 0.75 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_m_300e_coco/model_final diff --git a/configs/yolox/yolox_nano_300e_coco.yml b/configs/yolox/yolox_nano_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..80b8b5c51fbc200ecce2ff10013b7e9a94300999 --- /dev/null +++ b/configs/yolox/yolox_nano_300e_coco.yml @@ -0,0 +1,81 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 0.33 +width_mult: 0.25 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_nano_300e_coco/model_final + + +### model config: +# Note: YOLOX-nano use depthwise conv in backbone, neck and head. +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + size_stride: 32 + size_range: [10, 20] # multi-scale range [320*320 ~ 640*640] + +CSPDarkNet: + arch: "X" + return_idx: [2, 3, 4] + depthwise: True + +YOLOCSPPAN: + depthwise: True + +YOLOXHead: + depthwise: True + + +### reader config: +# Note: YOLOX-tiny/nano uses 416*416 for evaluation and inference. +# And multi-scale training setting is in model config, TrainReader's operators use 640*640 as default. +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - Mosaic: + prob: 0.5 # 1.0 in YOLOX-tiny/s/m/l/x + input_dim: [640, 640] + degrees: [-10, 10] + scale: [0.5, 1.5] # [0.1, 2.0] in YOLOX-s/m/l/x + shear: [-2, 2] + translate: [-0.1, 0.1] + enable_mixup: False # True in YOLOX-s/m/l/x + - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30} + - PadResize: {target_size: 640} + - RandomFlip: {} + batch_transforms: + - Permute: {} + batch_size: 8 + shuffle: True + drop_last: True + collate_batch: False + mosaic_epoch: 285 + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [416, 416], keep_ratio: True, interp: 1} + - Pad: {size: [416, 416], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 8 + + +TestReader: + inputs_def: + image_shape: [3, 416, 416] + sample_transforms: + - Decode: {} + - Resize: {target_size: [416, 416], keep_ratio: True, interp: 1} + - Pad: {size: [416, 416], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 diff --git a/configs/yolox/yolox_s_300e_coco.yml b/configs/yolox/yolox_s_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..9ba6120a93ec1d5c46cc8d8dc88351671ff44349 --- /dev/null +++ b/configs/yolox/yolox_s_300e_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 0.33 +width_mult: 0.50 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_s_300e_coco/model_final diff --git a/configs/yolox/yolox_tiny_300e_coco.yml b/configs/yolox/yolox_tiny_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..c81c172d27982c460bbead78f966158c67de7bc2 --- /dev/null +++ b/configs/yolox/yolox_tiny_300e_coco.yml @@ -0,0 +1,69 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 0.33 +width_mult: 0.375 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_tiny_300e_coco/model_final + + +### model config: +YOLOX: + backbone: CSPDarkNet + neck: YOLOCSPPAN + head: YOLOXHead + size_stride: 32 + size_range: [10, 20] # multi-scale ragne [320*320 ~ 640*640] + + +### reader config: +# Note: YOLOX-tiny/nano uses 416*416 for evaluation and inference. +# And multi-scale training setting is in model config, TrainReader's operators use 640*640 as default. +worker_num: 4 +TrainReader: + sample_transforms: + - Decode: {} + - Mosaic: + prob: 1.0 + input_dim: [640, 640] + degrees: [-10, 10] + scale: [0.5, 1.5] # [0.1, 2.0] in YOLOX-s/m/l/x + shear: [-2, 2] + translate: [-0.1, 0.1] + enable_mixup: False # True in YOLOX-s/m/l/x + - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30} + - PadResize: {target_size: 640} + - RandomFlip: {} + batch_transforms: + - Permute: {} + batch_size: 8 + shuffle: True + drop_last: True + collate_batch: False + mosaic_epoch: 285 + + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [416, 416], keep_ratio: True, interp: 1} + - Pad: {size: [416, 416], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 8 + + +TestReader: + inputs_def: + image_shape: [3, 416, 416] + sample_transforms: + - Decode: {} + - Resize: {target_size: [416, 416], keep_ratio: True, interp: 1} + - Pad: {size: [416, 416], fill_value: [114., 114., 114.]} + - Permute: {} + batch_size: 1 diff --git a/configs/yolox/yolox_x_300e_coco.yml b/configs/yolox/yolox_x_300e_coco.yml new file mode 100644 index 0000000000000000000000000000000000000000..fd8e0d2eb0fbc2d8f052e549b71f9995aa325a05 --- /dev/null +++ b/configs/yolox/yolox_x_300e_coco.yml @@ -0,0 +1,13 @@ +_BASE_: [ + '../datasets/coco_detection.yml', + '../runtime.yml', + './_base_/optimizer_300e.yml', + './_base_/yolox_cspdarknet.yml', + './_base_/yolox_reader.yml' +] +depth_mult: 1.33 +width_mult: 1.25 + +log_iter: 100 +snapshot_epoch: 10 +weights: output/yolox_x_300e_coco/model_final diff --git a/deploy/EXPORT_ONNX_MODEL.md b/deploy/EXPORT_ONNX_MODEL.md index cad839c9a64f40d68af5b275b57b6eaee492f932..59d79730a1f70587aab4219a28a8111243609854 100644 --- a/deploy/EXPORT_ONNX_MODEL.md +++ b/deploy/EXPORT_ONNX_MODEL.md @@ -4,21 +4,31 @@ PaddleDetection模型支持保存为ONNX格式,目前测试支持的列表如 | 模型 | OP版本 | 备注 | | :---- | :----- | :--- | | YOLOv3 | 11 | 仅支持batch=1推理;模型导出需固定shape | -| PPYOLO | 11 | 仅支持batch=1推理;MatrixNMS将被转成NMS,精度略有变化;模型导出需固定shape | -| PPYOLOv2 | 11 | 仅支持batch=1推理;MatrixNMS将被转换NMS,精度略有变化;模型导出需固定shape | -| PPYOLO-Tiny | 11 | 仅支持batch=1推理;模型导出需固定shape | +| PP-YOLO | 11 | 仅支持batch=1推理;MatrixNMS将被转换NMS,精度略有变化;模型导出需固定shape | +| PP-YOLOv2 | 11 | 仅支持batch=1推理;MatrixNMS将被转换NMS,精度略有变化;模型导出需固定shape | +| PP-YOLO Tiny | 11 | 仅支持batch=1推理;模型导出需固定shape | +| PP-YOLOE | 11 | 仅支持batch=1推理;模型导出需固定shape | +| PP-PicoDet | 11 | 仅支持batch=1推理;模型导出需固定shape | | FCOS | 11 |仅支持batch=1推理 | | PAFNet | 11 |- | | TTFNet | 11 |-| | SSD | 11 |仅支持batch=1推理 | -| PicoDet | 11 |仅支持batch=1推理 | +| PP-TinyPose | 11 | - | +| Faster RCNN | 16 | 仅支持batch=1推理, 依赖0.9.7及以上版本| +| Mask RCNN | 16 | 仅支持batch=1推理, 依赖0.9.7及以上版本| +| Cascade RCNN | 16 | 仅支持batch=1推理, 依赖0.9.7及以上版本| +| Cascade Mask RCNN | 16 | 仅支持batch=1推理, 依赖0.9.7及以上版本| 保存ONNX的功能由[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX)提供,如在转换中有相关问题反馈,可在Paddle2ONNX的Github项目中通过[ISSUE](https://github.com/PaddlePaddle/Paddle2ONNX/issues)与工程师交流。 ## 导出教程 ### 步骤一、导出PaddlePaddle部署模型 -导出步骤参考文档[PaddleDetection部署模型导出教程](./EXPORT_MODEL.md), 以COCO数据集训练的YOLOv3为例,导出示例如下 + + +导出步骤参考文档[PaddleDetection部署模型导出教程](./EXPORT_MODEL.md), 导出示例如下 + +- 非RCNN系列模型, 以YOLOv3为例 ``` cd PaddleDetection python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml \ @@ -36,17 +46,45 @@ yolov3_darknet ``` > 注意导出时的参数`TestReader.inputs_def.image_shape`,对于YOLO系列模型注意导出时指定该参数,否则无法转换成功 +- RCNN系列模型,以Faster RCNN为例 + +RCNN系列模型导出ONNX模型时,需要去除模型中的控制流,因此需要额外添加`export_onnx=True` 字段 +``` +cd PaddleDetection +python tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams \ + export_onnx=True \ + --output_dir inference_model +``` + +导出的模型保存在`inference_model/faster_rcnn_r50_fpn_1x_coco/`目录中,结构如下 +``` +faster_rcnn_r50_fpn_1x_coco + ├── infer_cfg.yml # 模型配置文件信息 + ├── model.pdiparams # 静态图模型参数 + ├── model.pdiparams.info # 参数额外信息,一般无需关注 + └── model.pdmodel # 静态图模型文件 +``` + ### 步骤二、将部署模型转为ONNX格式 -安装Paddle2ONNX(高于或等于0.6版本) +安装Paddle2ONNX(高于或等于0.9.7版本) ``` pip install paddle2onnx ``` 使用如下命令转换 ``` +# YOLOv3 paddle2onnx --model_dir inference_model/yolov3_darknet53_270e_coco \ --model_filename model.pdmodel \ --params_filename model.pdiparams \ --opset_version 11 \ --save_file yolov3.onnx + +# Faster RCNN +paddle2onnx --model_dir inference_model/faster_rcnn_r50_fpn_1x_coco \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 16 \ + --save_file faster_rcnn.onnx ``` -转换后的模型即为在当前路径下的`yolov3.onnx` +转换后的模型即为在当前路径下的`yolov3.onnx`和`faster_rcnn.onnx` diff --git a/deploy/EXPORT_ONNX_MODEL_en.md b/deploy/EXPORT_ONNX_MODEL_en.md index 6f0a16664b1724da71c838a458f0545161058773..1f32a6655137a218c330a78ea8190b046d15743b 100644 --- a/deploy/EXPORT_ONNX_MODEL_en.md +++ b/deploy/EXPORT_ONNX_MODEL_en.md @@ -4,20 +4,30 @@ PaddleDetection Model support is saved in ONNX format and the list of current te | Model | OP Version | NOTE | | :---- | :----- | :--- | | YOLOv3 | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | -| PPYOLO | 11 | Only batch=1 inferring is supported. A MatrixNMS will be converted to an NMS with slightly different precision; Model export needs fixed shape | -| PPYOLOv2 | 11 | Only batch=1 inferring is supported. MatrixNMS will be converted to NMS with slightly different precision; Model export needs fixed shape | -| PPYOLO-Tiny | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | +| PP-YOLO | 11 | Only batch=1 inferring is supported. A MatrixNMS will be converted to an NMS with slightly different precision; Model export needs fixed shape | +| PP-YOLOv2 | 11 | Only batch=1 inferring is supported. MatrixNMS will be converted to NMS with slightly different precision; Model export needs fixed shape | +| PP-YOLO Tiny | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | +| PP-YOLOE | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | +| PP-PicoDet | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | | FCOS | 11 |Only batch=1 inferring is supported | | PAFNet | 11 |- | | TTFNet | 11 |-| | SSD | 11 |Only batch=1 inferring is supported | +| PP-TinyPose | 11 | - | +| Faster RCNN | 16 | Only batch=1 inferring is supported, require paddle2onnx>=0.9.7| +| Mask RCNN | 16 | Only batch=1 inferring is supported, require paddle2onnx>=0.9.7| +| Cascade RCNN | 16 | Only batch=1 inferring is supported, require paddle2onnx>=0.9.7| +| Cascade Mask RCNN | 16 | Only batch=1 inferring is supported, require paddle2onnx>=0.9.7| + The function of saving ONNX is provided by [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX). If there is feedback on related problems during conversion, Communicate with engineers in Paddle2ONNX's Github project via [ISSUE](https://github.com/PaddlePaddle/Paddle2ONNX/issues). ## Export Tutorial ### Step 1. Export the Paddle deployment model -Export procedure reference document[Tutorial on PaddleDetection deployment model export](./EXPORT_MODEL_en.md), take YOLOv3 of COCO dataset training as an example +Export procedure reference document[Tutorial on PaddleDetection deployment model export](./EXPORT_MODEL_en.md), for example: + +- Models except RCNN series, take YOLOv3 as example ``` cd PaddleDetection python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml \ @@ -35,17 +45,45 @@ yolov3_darknet ``` > check`TestReader.inputs_def.image_shape`, For YOLO series models, specify this parameter when exporting; otherwise, the conversion fails +- RCNN series models, take Faster RCNN as example + +The conditional block needs to be removed in RCNN series when export ONNX model. Add `export_onnx=True` in command line +``` +cd PaddleDetection +python tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams \ + export_onnx=True \ + --output_dir inference_model +``` +The derived models were saved in `inference_model/faster_rcnn_r50_fpn_1x_coco/`, with the structure as follows +``` +faster_rcnn_r50_fpn_1x_coco + ├── infer_cfg.yml # Model configuration file information + ├── model.pdiparams # Static diagram model parameters + ├── model.pdiparams.info # Parameter Information is not required + └── model.pdmodel # Static diagram model file +``` + + ### Step 2. Convert the deployment model to ONNX format -Install Paddle2ONNX (version 0.6 or higher) +Install Paddle2ONNX (version 0.9.7 or higher) ``` pip install paddle2onnx ``` Use the following command to convert ``` +# YOLOv3 paddle2onnx --model_dir inference_model/yolov3_darknet53_270e_coco \ --model_filename model.pdmodel \ --params_filename model.pdiparams \ --opset_version 11 \ --save_file yolov3.onnx + +# Faster RCNN +paddle2onnx --model_dir inference_model/faster_rcnn_r50_fpn_1x_coco \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 16 \ + --save_file faster_rcnn.onnx ``` -The transformed model is under the current path`yolov3.onnx` +The transformed model is under the current path`yolov3.onnx` and `faster_rcnn.onnx` diff --git a/deploy/README_en.md b/deploy/README_en.md index 8ac1f1ce2a34a21153fa885d98b6956ec5dd0112..f587b56b99e7a6b7c7ed31c5ae6307ade6e18126 100644 --- a/deploy/README_en.md +++ b/deploy/README_en.md @@ -21,7 +21,7 @@ Use the `tools/export_model.py` script to export the model and the configuration # The YOLOv3 model is derived python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams ``` -The prediction model will be exported to the `output_inference/yolov3_mobilenet_v1_roadsign` directory `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`. For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](EXPORT_MODEL_sh.md). +The prediction model will be exported to the `output_inference/yolov3_mobilenet_v1_roadsign` directory `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`. For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](./EXPORT_MODEL_en.md). ### 1.2 Use Paddle Inference to Make Predictions * Python deployment supports `CPU`, `GPU` and `XPU` environments, Windows, Linux, and NV Jetson embedded devices. Reference Documentation [Python Deployment](python/README.md) @@ -39,7 +39,7 @@ python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml ``` The prediction model will be exported to the `output_inference/yolov3_darknet53_270e_coco` directory `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`, `serving_client/` and `serving_server/` folder. -For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](EXPORT_MODEL_en.md). +For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](./EXPORT_MODEL_en.md). ### 2.2 Predictions are made using Paddle Serving * [Install PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md#installation) diff --git a/deploy/TENSOR_RT.md b/deploy/TENSOR_RT.md index a9d7677ee3a2af028937cc83a6a82155e2df4716..b1dd29789540746cce5f7ea3ce0a783e2178438d 100644 --- a/deploy/TENSOR_RT.md +++ b/deploy/TENSOR_RT.md @@ -1,8 +1,8 @@ # TensorRT预测部署教程 -TensorRT是NVIDIA提出的用于统一模型部署的加速库,可以应用于V100、JETSON Xavier等硬件,它可以极大提高预测速度。Paddle TensorRT教程请参考文档[使用Paddle-TensorRT库预测](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html#) +TensorRT是NVIDIA提出的用于统一模型部署的加速库,可以应用于V100、JETSON Xavier等硬件,它可以极大提高预测速度。Paddle TensorRT教程请参考文档[使用Paddle-TensorRT库预测](https://www.paddlepaddle.org.cn/inference/optimize/paddle_trt.html) ## 1. 安装PaddleInference预测库 -- Python安装包,请从[这里](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release) 下载带有tensorrt的安装包进行安装 +- Python安装包,请从[这里](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#python) 下载带有tensorrt的安装包进行安装 - CPP预测库,请从[这里](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html) 下载带有TensorRT编译的预测库 @@ -13,7 +13,7 @@ TensorRT是NVIDIA提出的用于统一模型部署的加速库,可以应用于 - PaddleDetection中部署预测要求TensorRT版本 > 6.0。 ## 2. 导出模型 -模型导出具体请参考文档[PaddleDetection模型导出教程](../EXPORT_MODEL.md)。 +模型导出具体请参考文档[PaddleDetection模型导出教程](./EXPORT_MODEL.md)。 ## 3. 开启TensorRT加速 ### 3.1 配置TensorRT @@ -43,7 +43,7 @@ TestReader: image_shape: [3,608,608] ... ``` -或者在导出模型时设置`-o TestReader.inputs_def.image_shape=[3,608,608]`,模型将会进行固定尺寸预测,具体请参考[PaddleDetection模型导出教程](../EXPORT_MODEL.md) 。 +或者在导出模型时设置`-o TestReader.inputs_def.image_shape=[3,608,608]`,模型将会进行固定尺寸预测,具体请参考[PaddleDetection模型导出教程](./EXPORT_MODEL.md) 。 可以通过[visualdl](https://www.paddlepaddle.org.cn/paddle/visualdl/demo/graph) 打开`model.pdmodel`文件,查看输入的第一个Tensor尺寸是否是固定的,如果不指定,尺寸会用`?`表示,如下图所示: ![img](../docs/images/input_shape.png) diff --git a/deploy/auto_compression/README.md b/deploy/auto_compression/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4578a4643a1933224e316d94a4062ce8987ccaca --- /dev/null +++ b/deploy/auto_compression/README.md @@ -0,0 +1,112 @@ +# 自动化压缩 + +目录: +- [1.简介](#1简介) +- [2.Benchmark](#2Benchmark) +- [3.开始自动压缩](#自动压缩流程) + - [3.1 环境准备](#31-准备环境) + - [3.2 准备数据集](#32-准备数据集) + - [3.3 准备预测模型](#33-准备预测模型) + - [3.4 测试模型精度](#34-测试模型精度) + - [3.5 自动压缩并产出模型](#35-自动压缩并产出模型) +- [4.预测部署](#4预测部署) + +## 1. 简介 +本示例使用PaddleDetection中Inference部署模型进行自动化压缩。本示例使用的自动化压缩策略为量化蒸馏。 + + +## 2.Benchmark + +### PP-YOLOE + +| 模型 | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 | 配置文件 | 量化模型 | +| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: | +| PP-YOLOE-l | 50.9 | - | 50.6 | 11.2ms | 7.7ms | **6.7ms** | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco_quant.tar) | + +- mAP的指标均在COCO val2017数据集中评测得到,IoU=0.5:0.95。 +- PP-YOLOE-l模型在Tesla V100的GPU环境下测试,并且开启TensorRT,batch_size=1,包含NMS,测试脚本是[benchmark demo](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy/python)。 + +## 3. 自动压缩流程 + +#### 3.1 准备环境 +- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim >= 2.3 +- PaddleDet >= 2.4 +- opencv-python + +安装paddlepaddle: +```shell +# CPU +pip install paddlepaddle +# GPU +pip install paddlepaddle-gpu +``` + +安装paddleslim: +```shell +pip install paddleslim +``` + +安装paddledet: +```shell +pip install paddledet +``` + +#### 3.2 准备数据集 + +本案例默认以COCO数据进行自动压缩实验,如果自定义COCO数据,或者其他格式数据,请参考[数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。 + +如果数据集为非COCO格式数据,请修改[configs](./configs)中reader配置文件中的Dataset字段。 + +以PP-YOLOE模型为例,如果已经准备好数据集,请直接修改[./configs/yolo_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。 + +#### 3.3 准备预测模型 + +预测模型的格式为:`model.pdmodel` 和 `model.pdiparams`两个,带`pdmodel`的是模型文件,带`pdiparams`后缀的是权重文件。 + + +根据[PaddleDetection文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#8-%E6%A8%A1%E5%9E%8B%E5%AF%BC%E5%87%BA) 导出Inference模型,具体可参考下方PP-YOLOE模型的导出示例: +- 下载代码 +``` +git clone https://github.com/PaddlePaddle/PaddleDetection.git +``` +- 导出预测模型 + +PPYOLOE-l模型,包含NMS:如快速体验,可直接下载[PP-YOLOE-l导出模型](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco.tar) +```shell +python tools/export_model.py \ + -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams \ + trt=True \ +``` + +#### 3.4 自动压缩并产出模型 + +蒸馏量化自动压缩示例通过run.py脚本启动,会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数,配置完成后便可对模型进行量化和蒸馏。具体运行命令为: + +- 单卡训练: +``` +export CUDA_VISIBLE_DEVICES=0 +python run.py --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/' +``` + +- 多卡训练: +``` +CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \ + --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/' +``` + +#### 3.5 测试模型精度 + +使用eval.py脚本得到模型的mAP: +``` +export CUDA_VISIBLE_DEVICES=0 +python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml +``` + +**注意**: +- 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。 + +## 4.预测部署 + +- 可以参考[PaddleDetection部署教程](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy),GPU上量化模型开启TensorRT并设置trt_int8模式进行部署。 diff --git a/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml b/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..cd39981c1eb44cb45ab6ed1a0386dbba92eaf660 --- /dev/null +++ b/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml @@ -0,0 +1,33 @@ + +Global: + reader_config: configs/yolo_reader.yml + input_list: ['image', 'scale_factor'] + arch: YOLO + Evaluation: True + model_dir: ./ppyoloe_crn_l_300e_coco + model_filename: model.pdmodel + params_filename: model.pdiparams + +Distillation: + alpha: 1.0 + loss: soft_label + +Quantization: + use_pact: true + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 5000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 6000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + diff --git a/deploy/auto_compression/configs/yolo_reader.yml b/deploy/auto_compression/configs/yolo_reader.yml new file mode 100644 index 0000000000000000000000000000000000000000..d1061453051e8f7408f4e605078956a8b634f13c --- /dev/null +++ b/deploy/auto_compression/configs/yolo_reader.yml @@ -0,0 +1,26 @@ +metric: COCO +num_classes: 80 + +# Datset configuration +TrainDataset: + !COCODataSet + image_dir: train2017 + anno_path: annotations/instances_train2017.json + dataset_dir: dataset/coco/ + +EvalDataset: + !COCODataSet + image_dir: val2017 + anno_path: annotations/instances_val2017.json + dataset_dir: dataset/coco/ + +worker_num: 0 + +# preprocess reader in test +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + batch_size: 4 diff --git a/deploy/auto_compression/eval.py b/deploy/auto_compression/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..6de8aff85ce5f3cffa4119a1a3c26e318101db74 --- /dev/null +++ b/deploy/auto_compression/eval.py @@ -0,0 +1,163 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import numpy as np +import argparse +import paddle +from ppdet.core.workspace import load_config, merge_config +from ppdet.core.workspace import create +from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval +from paddleslim.auto_compression.config_helpers import load_config as load_slim_config +from post_process import PPYOLOEPostProcess + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + '--config_path', + type=str, + default=None, + help="path of compression strategy config.", + required=True) + parser.add_argument( + '--devices', + type=str, + default='gpu', + help="which device used to compress.") + + return parser + + +def reader_wrapper(reader, input_list): + def gen(): + for data in reader: + in_dict = {} + if isinstance(input_list, list): + for input_name in input_list: + in_dict[input_name] = data[input_name] + elif isinstance(input_list, dict): + for input_name in input_list.keys(): + in_dict[input_list[input_name]] = data[input_name] + yield in_dict + + return gen + + +def convert_numpy_data(data, metric): + data_all = {} + data_all = {k: np.array(v) for k, v in data.items()} + if isinstance(metric, VOCMetric): + for k, v in data_all.items(): + if not isinstance(v[0], np.ndarray): + tmp_list = [] + for t in v: + tmp_list.append(np.array(t)) + data_all[k] = np.array(tmp_list) + else: + data_all = {k: np.array(v) for k, v in data.items()} + return data_all + + +def eval(): + + place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace() + exe = paddle.static.Executor(place) + + val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model( + global_config["model_dir"].rstrip('/'), + exe, + model_filename=global_config["model_filename"], + params_filename=global_config["params_filename"]) + print('Loaded model from: {}'.format(global_config["model_dir"])) + + metric = global_config['metric'] + for batch_id, data in enumerate(val_loader): + data_all = convert_numpy_data(data, metric) + data_input = {} + for k, v in data.items(): + if isinstance(global_config['input_list'], list): + if k in global_config['input_list']: + data_input[k] = np.array(v) + elif isinstance(global_config['input_list'], dict): + if k in global_config['input_list'].keys(): + data_input[global_config['input_list'][k]] = np.array(v) + + outs = exe.run(val_program, + feed=data_input, + fetch_list=fetch_targets, + return_numpy=False) + res = {} + if 'arch' in global_config and global_config['arch'] == 'PPYOLOE': + postprocess = PPYOLOEPostProcess( + score_threshold=0.01, nms_threshold=0.6) + res = postprocess(np.array(outs[0]), data_all['scale_factor']) + else: + for out in outs: + v = np.array(out) + if len(v.shape) > 1: + res['bbox'] = v + else: + res['bbox_num'] = v + metric.update(data_all, res) + if batch_id % 100 == 0: + print('Eval iter:', batch_id) + metric.accumulate() + metric.log() + metric.reset() + + +def main(): + global global_config + all_config = load_slim_config(FLAGS.config_path) + assert "Global" in all_config, "Key 'Global' not found in config file." + global_config = all_config["Global"] + reader_cfg = load_config(global_config['reader_config']) + + dataset = reader_cfg['EvalDataset'] + global val_loader + val_loader = create('EvalReader')(reader_cfg['EvalDataset'], + reader_cfg['worker_num'], + return_list=True) + metric = None + if reader_cfg['metric'] == 'COCO': + clsid2catid = {v: k for k, v in dataset.catid2clsid.items()} + anno_file = dataset.get_anno() + metric = COCOMetric( + anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox') + elif reader_cfg['metric'] == 'VOC': + metric = VOCMetric( + label_list=dataset.get_label_list(), + class_num=reader_cfg['num_classes'], + map_type=reader_cfg['map_type']) + elif reader_cfg['metric'] == 'KeyPointTopDownCOCOEval': + anno_file = dataset.get_anno() + metric = KeyPointTopDownCOCOEval(anno_file, + len(dataset), 17, 'output_eval') + else: + raise ValueError("metric currently only supports COCO and VOC.") + global_config['metric'] = metric + + eval() + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu'] + paddle.set_device(FLAGS.devices) + + main() diff --git a/deploy/auto_compression/post_process.py b/deploy/auto_compression/post_process.py new file mode 100644 index 0000000000000000000000000000000000000000..eea2f019548ec288a23e37b3bd2faf24f9a98935 --- /dev/null +++ b/deploy/auto_compression/post_process.py @@ -0,0 +1,157 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import cv2 + + +def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200): + """ + Args: + box_scores (N, 5): boxes in corner-form and probabilities. + iou_threshold: intersection over union threshold. + top_k: keep top_k results. If k <= 0, keep all the results. + candidate_size: only consider the candidates with the highest scores. + Returns: + picked: a list of indexes of the kept boxes + """ + scores = box_scores[:, -1] + boxes = box_scores[:, :-1] + picked = [] + indexes = np.argsort(scores) + indexes = indexes[-candidate_size:] + while len(indexes) > 0: + current = indexes[-1] + picked.append(current) + if 0 < top_k == len(picked) or len(indexes) == 1: + break + current_box = boxes[current, :] + indexes = indexes[:-1] + rest_boxes = boxes[indexes, :] + iou = iou_of( + rest_boxes, + np.expand_dims( + current_box, axis=0), ) + indexes = indexes[iou <= iou_threshold] + + return box_scores[picked, :] + + +def iou_of(boxes0, boxes1, eps=1e-5): + """Return intersection-over-union (Jaccard index) of boxes. + Args: + boxes0 (N, 4): ground truth boxes. + boxes1 (N or 1, 4): predicted boxes. + eps: a small number to avoid 0 as denominator. + Returns: + iou (N): IoU values. + """ + overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2]) + overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:]) + + overlap_area = area_of(overlap_left_top, overlap_right_bottom) + area0 = area_of(boxes0[..., :2], boxes0[..., 2:]) + area1 = area_of(boxes1[..., :2], boxes1[..., 2:]) + return overlap_area / (area0 + area1 - overlap_area + eps) + + +def area_of(left_top, right_bottom): + """Compute the areas of rectangles given two corners. + Args: + left_top (N, 2): left top corner. + right_bottom (N, 2): right bottom corner. + Returns: + area (N): return the area. + """ + hw = np.clip(right_bottom - left_top, 0.0, None) + return hw[..., 0] * hw[..., 1] + + +class PPYOLOEPostProcess(object): + """ + Args: + input_shape (int): network input image size + scale_factor (float): scale factor of ori image + """ + + def __init__(self, + score_threshold=0.4, + nms_threshold=0.5, + nms_top_k=10000, + keep_top_k=300): + self.score_threshold = score_threshold + self.nms_threshold = nms_threshold + self.nms_top_k = nms_top_k + self.keep_top_k = keep_top_k + + def _non_max_suppression(self, prediction, scale_factor): + batch_size = prediction.shape[0] + out_boxes_list = [] + box_num_list = [] + for batch_id in range(batch_size): + bboxes, confidences = prediction[batch_id][..., :4], prediction[ + batch_id][..., 4:] + # nms + picked_box_probs = [] + picked_labels = [] + for class_index in range(0, confidences.shape[1]): + probs = confidences[:, class_index] + mask = probs > self.score_threshold + probs = probs[mask] + if probs.shape[0] == 0: + continue + subset_boxes = bboxes[mask, :] + box_probs = np.concatenate( + [subset_boxes, probs.reshape(-1, 1)], axis=1) + box_probs = hard_nms( + box_probs, + iou_threshold=self.nms_threshold, + top_k=self.nms_top_k) + picked_box_probs.append(box_probs) + picked_labels.extend([class_index] * box_probs.shape[0]) + + if len(picked_box_probs) == 0: + out_boxes_list.append(np.empty((0, 4))) + + else: + picked_box_probs = np.concatenate(picked_box_probs) + # resize output boxes + picked_box_probs[:, 0] /= scale_factor[batch_id][1] + picked_box_probs[:, 2] /= scale_factor[batch_id][1] + picked_box_probs[:, 1] /= scale_factor[batch_id][0] + picked_box_probs[:, 3] /= scale_factor[batch_id][0] + + # clas score box + out_box = np.concatenate( + [ + np.expand_dims( + np.array(picked_labels), axis=-1), np.expand_dims( + picked_box_probs[:, 4], axis=-1), + picked_box_probs[:, :4] + ], + axis=1) + if out_box.shape[0] > self.keep_top_k: + out_box = out_box[out_box[:, 1].argsort()[::-1] + [:self.keep_top_k]] + out_boxes_list.append(out_box) + box_num_list.append(out_box.shape[0]) + + out_boxes_list = np.concatenate(out_boxes_list, axis=0) + box_num_list = np.array(box_num_list) + return out_boxes_list, box_num_list + + def __call__(self, outs, scale_factor): + out_boxes_list, box_num_list = self._non_max_suppression(outs, + scale_factor) + return {'bbox': out_boxes_list, 'bbox_num': box_num_list} diff --git a/deploy/auto_compression/run.py b/deploy/auto_compression/run.py new file mode 100644 index 0000000000000000000000000000000000000000..fe485fb8f769726f9aa32a5f2dad65a9f089816c --- /dev/null +++ b/deploy/auto_compression/run.py @@ -0,0 +1,183 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import sys +import numpy as np +import argparse +import paddle +from ppdet.core.workspace import load_config, merge_config +from ppdet.core.workspace import create +from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval +from paddleslim.auto_compression.config_helpers import load_config as load_slim_config +from paddleslim.auto_compression import AutoCompression +from post_process import PPYOLOEPostProcess + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + '--config_path', + type=str, + default=None, + help="path of compression strategy config.", + required=True) + parser.add_argument( + '--save_dir', + type=str, + default='output', + help="directory to save compressed model.") + parser.add_argument( + '--devices', + type=str, + default='gpu', + help="which device used to compress.") + + return parser + + +def reader_wrapper(reader, input_list): + def gen(): + for data in reader: + in_dict = {} + if isinstance(input_list, list): + for input_name in input_list: + in_dict[input_name] = data[input_name] + elif isinstance(input_list, dict): + for input_name in input_list.keys(): + in_dict[input_list[input_name]] = data[input_name] + yield in_dict + + return gen + + +def convert_numpy_data(data, metric): + data_all = {} + data_all = {k: np.array(v) for k, v in data.items()} + if isinstance(metric, VOCMetric): + for k, v in data_all.items(): + if not isinstance(v[0], np.ndarray): + tmp_list = [] + for t in v: + tmp_list.append(np.array(t)) + data_all[k] = np.array(tmp_list) + else: + data_all = {k: np.array(v) for k, v in data.items()} + return data_all + + +def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list): + metric = global_config['metric'] + for batch_id, data in enumerate(val_loader): + data_all = convert_numpy_data(data, metric) + data_input = {} + for k, v in data.items(): + if isinstance(global_config['input_list'], list): + if k in test_feed_names: + data_input[k] = np.array(v) + elif isinstance(global_config['input_list'], dict): + if k in global_config['input_list'].keys(): + data_input[global_config['input_list'][k]] = np.array(v) + outs = exe.run(compiled_test_program, + feed=data_input, + fetch_list=test_fetch_list, + return_numpy=False) + res = {} + if 'arch' in global_config and global_config['arch'] == 'PPYOLOE': + postprocess = PPYOLOEPostProcess( + score_threshold=0.01, nms_threshold=0.6) + res = postprocess(np.array(outs[0]), data_all['scale_factor']) + else: + for out in outs: + v = np.array(out) + if len(v.shape) > 1: + res['bbox'] = v + else: + res['bbox_num'] = v + + metric.update(data_all, res) + if batch_id % 100 == 0: + print('Eval iter:', batch_id) + metric.accumulate() + metric.log() + map_res = metric.get_results() + metric.reset() + map_key = 'keypoint' if 'arch' in global_config and global_config[ + 'arch'] == 'keypoint' else 'bbox' + return map_res[map_key][0] + + +def main(): + global global_config + all_config = load_slim_config(FLAGS.config_path) + assert "Global" in all_config, "Key 'Global' not found in config file." + global_config = all_config["Global"] + reader_cfg = load_config(global_config['reader_config']) + + train_loader = create('EvalReader')(reader_cfg['TrainDataset'], + reader_cfg['worker_num'], + return_list=True) + train_loader = reader_wrapper(train_loader, global_config['input_list']) + + if 'Evaluation' in global_config.keys() and global_config[ + 'Evaluation'] and paddle.distributed.get_rank() == 0: + eval_func = eval_function + dataset = reader_cfg['EvalDataset'] + global val_loader + _eval_batch_sampler = paddle.io.BatchSampler( + dataset, batch_size=reader_cfg['EvalReader']['batch_size']) + val_loader = create('EvalReader')(dataset, + reader_cfg['worker_num'], + batch_sampler=_eval_batch_sampler, + return_list=True) + metric = None + if reader_cfg['metric'] == 'COCO': + clsid2catid = {v: k for k, v in dataset.catid2clsid.items()} + anno_file = dataset.get_anno() + metric = COCOMetric( + anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox') + elif reader_cfg['metric'] == 'VOC': + metric = VOCMetric( + label_list=dataset.get_label_list(), + class_num=reader_cfg['num_classes'], + map_type=reader_cfg['map_type']) + elif reader_cfg['metric'] == 'KeyPointTopDownCOCOEval': + anno_file = dataset.get_anno() + metric = KeyPointTopDownCOCOEval(anno_file, + len(dataset), 17, 'output_eval') + else: + raise ValueError("metric currently only supports COCO and VOC.") + global_config['metric'] = metric + else: + eval_func = None + + ac = AutoCompression( + model_dir=global_config["model_dir"], + model_filename=global_config["model_filename"], + params_filename=global_config["params_filename"], + save_dir=FLAGS.save_dir, + config=all_config, + train_dataloader=train_loader, + eval_callback=eval_func) + ac.compress() + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu'] + paddle.set_device(FLAGS.devices) + + main() diff --git a/deploy/cpp/docs/linux_build.md b/deploy/cpp/docs/linux_build.md index 32348991b1be7a31c95e26def04917a100094bb5..ee28e73ee56db3ec46a1674a6af0cb3af1012b3e 100755 --- a/deploy/cpp/docs/linux_build.md +++ b/deploy/cpp/docs/linux_build.md @@ -1,7 +1,7 @@ # Linux平台编译指南 ## 说明 -本文档在 `Linux`平台使用`GCC 8.2`测试过,如果需要使用其他G++版本编译使用,则需要重新编译Paddle预测库,请参考: [从源码编译Paddle预测库](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html)。本文档使用的预置的opencv库是在ubuntu 16.04上用gcc4.8编译的,如果需要在ubuntu 16.04以外的系统环境编译,那么需自行编译opencv库。 +本文档在 `Linux`平台使用`GCC 8.2`测试过,如果需要使用其他G++版本编译使用,则需要重新编译Paddle预测库,请参考: [从源码编译Paddle预测库](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html)。本文档使用的预置的opencv库是在ubuntu 16.04上用gcc8.2编译的,如果需要在gcc8.2以外的环境编译,那么需自行编译opencv库。 ## 前置条件 * G++ 8.2 diff --git a/deploy/cpp/include/config_parser.h b/deploy/cpp/include/config_parser.h index 82d103723aa134ff72449a2d0ca3735b68c86fee..1f2e381c5284bb7ce16a6b06f858a32e83290f98 100644 --- a/deploy/cpp/include/config_parser.h +++ b/deploy/cpp/include/config_parser.h @@ -120,6 +120,10 @@ class ConfigPaser { } } + if (config["mask"].IsDefined()) { + mask_ = config["mask"].as(); + } + return true; } std::string mode_; @@ -132,6 +136,7 @@ class ConfigPaser { std::vector fpn_stride_; bool use_dynamic_shape_; float conf_thresh_; + bool mask_ = false; }; } // namespace PaddleDetection diff --git a/deploy/cpp/include/keypoint_detector.h b/deploy/cpp/include/keypoint_detector.h index 55eed8f9124176cb1d1551538ff14769acd4249f..ce6aa0e0692d215fc1a704afd37c3787fe8e42ef 100644 --- a/deploy/cpp/include/keypoint_detector.h +++ b/deploy/cpp/include/keypoint_detector.h @@ -33,12 +33,6 @@ using namespace paddle_infer; namespace PaddleDetection { -// Object KeyPoint Result -struct KeyPointResult { - // Keypoints: shape(N x 3); N: number of Joints; 3: x,y,conf - std::vector keypoints; - int num_joints = -1; -}; // Visualiztion KeyPoint Result cv::Mat VisualizeKptsResult(const cv::Mat& img, diff --git a/deploy/cpp/include/keypoint_postprocess.h b/deploy/cpp/include/keypoint_postprocess.h index 4239cdf7369bdb76fa3f715c755d601b40285c5b..fa0c7d55f06db986404eb23a7df1144a22e7f33f 100644 --- a/deploy/cpp/include/keypoint_postprocess.h +++ b/deploy/cpp/include/keypoint_postprocess.h @@ -14,11 +14,14 @@ #pragma once +#include #include #include #include #include +namespace PaddleDetection { + std::vector get_3rd_point(std::vector& a, std::vector& b); std::vector get_dir(float src_point_x, float src_point_y, float rot_rad); @@ -37,7 +40,8 @@ void transform_preds(std::vector& coords, std::vector& scale, std::vector& output_size, std::vector& dim, - std::vector& target_coords); + std::vector& target_coords, + bool affine = false); void box_to_center_scale(std::vector& box, int width, @@ -51,7 +55,7 @@ void get_max_preds(float* heatmap, float* maxvals, int batchid, int joint_idx); - + void get_final_preds(std::vector& heatmap, std::vector& dim, std::vector& idxout, @@ -61,3 +65,70 @@ void get_final_preds(std::vector& heatmap, std::vector& preds, int batchid, bool DARK = true); + +// Object KeyPoint Result +struct KeyPointResult { + // Keypoints: shape(N x 3); N: number of Joints; 3: x,y,conf + std::vector keypoints; + int num_joints = -1; +}; + +class PoseSmooth { + public: + explicit PoseSmooth(const int width, + const int height, + std::string filter_type = "OneEuro", + float alpha = 0.5, + float fc_d = 0.1, + float fc_min = 0.1, + float beta = 0.1, + float thres_mult = 0.3) + : width(width), + height(height), + alpha(alpha), + fc_d(fc_d), + fc_min(fc_min), + beta(beta), + filter_type(filter_type), + thres_mult(thres_mult){}; + + // Run predictor + KeyPointResult smooth_process(KeyPointResult* result); + void PointSmooth(KeyPointResult* result, + KeyPointResult* keypoint_smoothed, + std::vector thresholds, + int index); + float OneEuroFilter(float x_cur, float x_pre, int loc); + float smoothing_factor(float te, float fc); + float ExpSmoothing(float x_cur, float x_pre, int loc = 0); + + private: + int width = 0; + int height = 0; + float alpha = 0.; + float fc_d = 1.; + float fc_min = 0.; + float beta = 1.; + float thres_mult = 1.; + std::string filter_type = "OneEuro"; + std::vector thresholds = {0.005, + 0.005, + 0.005, + 0.005, + 0.005, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01, + 0.01}; + KeyPointResult x_prev_hat; + KeyPointResult dx_prev_hat; +}; +} // namespace PaddleDetection diff --git a/deploy/cpp/include/object_detector.h b/deploy/cpp/include/object_detector.h index 0a336c33401d5d7d3e4e27a22862fd666da1a36e..47bd29362c85eafc3825d25af73694803e2a1504 100644 --- a/deploy/cpp/include/object_detector.h +++ b/deploy/cpp/include/object_detector.h @@ -25,7 +25,7 @@ #include #include -#include "paddle_inference_api.h" // NOLINT +#include "paddle_inference_api.h" // NOLINT #include "include/config_parser.h" #include "include/picodet_postprocess.h" @@ -33,29 +33,25 @@ #include "include/utils.h" using namespace paddle_infer; - namespace PaddleDetection { // Generate visualization colormap for each class std::vector GenerateColorMap(int num_class); // Visualiztion Detection Result -cv::Mat VisualizeResult( - const cv::Mat& img, - const std::vector& results, - const std::vector& lables, - const std::vector& colormap, - const bool is_rbox); +cv::Mat +VisualizeResult(const cv::Mat &img, + const std::vector &results, + const std::vector &lables, + const std::vector &colormap, const bool is_rbox); class ObjectDetector { - public: - explicit ObjectDetector(const std::string& model_dir, - const std::string& device = "CPU", - bool use_mkldnn = false, - int cpu_threads = 1, - const std::string& run_mode = "paddle", - const int batch_size = 1, - const int gpu_id = 0, +public: + explicit ObjectDetector(const std::string &model_dir, + const std::string &device = "CPU", + bool use_mkldnn = false, int cpu_threads = 1, + const std::string &run_mode = "paddle", + const int batch_size = 1, const int gpu_id = 0, const int trt_min_shape = 1, const int trt_max_shape = 1280, const int trt_opt_shape = 640, @@ -78,25 +74,22 @@ class ObjectDetector { } // Load Paddle inference model - void LoadModel(const std::string& model_dir, - const int batch_size = 1, - const std::string& run_mode = "paddle"); + void LoadModel(const std::string &model_dir, const int batch_size = 1, + const std::string &run_mode = "paddle"); // Run predictor - void Predict(const std::vector imgs, - const double threshold = 0.5, - const int warmup = 0, - const int repeats = 1, - std::vector* result = nullptr, - std::vector* bbox_num = nullptr, - std::vector* times = nullptr); + void Predict(const std::vector imgs, const double threshold = 0.5, + const int warmup = 0, const int repeats = 1, + std::vector *result = nullptr, + std::vector *bbox_num = nullptr, + std::vector *times = nullptr); // Get Model Label list - const std::vector& GetLabelList() const { + const std::vector &GetLabelList() const { return config_.label_list_; } - private: +private: std::string device_ = "CPU"; int gpu_id_ = 0; int cpu_math_library_num_threads_ = 1; @@ -108,13 +101,18 @@ class ObjectDetector { int trt_opt_shape_ = 640; bool trt_calib_mode_ = false; // Preprocess image and copy data to input buffer - void Preprocess(const cv::Mat& image_mat); + void Preprocess(const cv::Mat &image_mat); // Postprocess result void Postprocess(const std::vector mats, - std::vector* result, - std::vector bbox_num, - std::vector output_data_, - bool is_rbox); + std::vector *result, + std::vector bbox_num, std::vector output_data_, + std::vector output_mask_data_, bool is_rbox); + + void SOLOv2Postprocess( + const std::vector mats, std::vector *result, + std::vector *bbox_num, std::vector out_bbox_num_data_, + std::vector out_label_data_, std::vector out_score_data_, + std::vector out_global_mask_data_, float threshold = 0.5); std::shared_ptr predictor_; Preprocessor preprocessor_; @@ -123,4 +121,4 @@ class ObjectDetector { ConfigPaser config_; }; -} // namespace PaddleDetection +} // namespace PaddleDetection diff --git a/deploy/cpp/include/preprocess_op.h b/deploy/cpp/include/preprocess_op.h index 33d7300b8fd84287cca91e86214084acec781030..a54bc2afb8aacbc55241b866ba41acc00491e4f3 100644 --- a/deploy/cpp/include/preprocess_op.h +++ b/deploy/cpp/include/preprocess_op.h @@ -74,7 +74,7 @@ class NormalizeImage : public PreprocessOp { // CHW or HWC std::vector mean_; std::vector scale_; - bool is_scale_; + bool is_scale_ = true; }; class Permute : public PreprocessOp { @@ -143,6 +143,38 @@ class TopDownEvalAffine : public PreprocessOp { std::vector trainsize_; }; +class WarpAffine : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + input_h_ = item["input_h"].as(); + input_w_ = item["input_w"].as(); + keep_res_ = item["keep_res"].as(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + int input_h_; + int input_w_; + int interp_ = 1; + bool keep_res_ = true; + int pad_ = 31; +}; + +class Pad : public PreprocessOp { + public: + virtual void Init(const YAML::Node& item) { + size_ = item["size"].as>(); + fill_value_ = item["fill_value"].as>(); + } + + virtual void Run(cv::Mat* im, ImageBlob* data); + + private: + std::vector size_; + std::vector fill_value_; +}; + void CropImg(cv::Mat& img, cv::Mat& crop_img, std::vector& area, @@ -183,6 +215,10 @@ class Preprocessor { return std::make_shared(); } else if (name == "TopDownEvalAffine") { return std::make_shared(); + } else if (name == "WarpAffine") { + return std::make_shared(); + }else if (name == "Pad") { + return std::make_shared(); } std::cerr << "can not find function of OP: " << name << " and return: nullptr" << std::endl; diff --git a/deploy/cpp/include/utils.h b/deploy/cpp/include/utils.h index 3802e1267176a050402d1fdf742e54a79f33ffb9..b41db0dacff17339ffcac591b7825cec09d3663d 100644 --- a/deploy/cpp/include/utils.h +++ b/deploy/cpp/include/utils.h @@ -14,13 +14,13 @@ #pragma once -#include -#include -#include -#include +#include #include +#include #include -#include +#include +#include +#include namespace PaddleDetection { @@ -32,8 +32,10 @@ struct ObjectResult { int class_id; // Confidence of detected object float confidence; + // Mask of detected object + std::vector mask; }; void nms(std::vector &input_boxes, float nms_threshold); -} // namespace PaddleDetection \ No newline at end of file +} // namespace PaddleDetection diff --git a/deploy/cpp/src/keypoint_postprocess.cc b/deploy/cpp/src/keypoint_postprocess.cc index 52ac8d3d36ec384bf3eb81c56356c6639a61433f..eb692b0a78bcf48ac96aa45b671300b9ff2db400 100644 --- a/deploy/cpp/src/keypoint_postprocess.cc +++ b/deploy/cpp/src/keypoint_postprocess.cc @@ -11,11 +11,13 @@ // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. -#include #include "include/keypoint_postprocess.h" +#include #define PI 3.1415926535 #define HALF_CIRCLE_DEGREE 180 +namespace PaddleDetection { + cv::Point2f get_3rd_point(cv::Point2f& a, cv::Point2f& b) { cv::Point2f direct{a.x - b.x, a.y - b.y}; return cv::Point2f(a.x - direct.y, a.y + direct.x); @@ -52,7 +54,7 @@ void get_affine_transform(std::vector& center, float dst_h = static_cast(output_size[1]); float rot_rad = rot * PI / HALF_CIRCLE_DEGREE; std::vector src_dir = get_dir(-0.5 * src_w, 0, rot_rad); - std::vector dst_dir{-0.5 * dst_w, 0.0}; + std::vector dst_dir{-0.5f * dst_w, 0.0}; cv::Point2f srcPoint2f[3], dstPoint2f[3]; srcPoint2f[0] = cv::Point2f(center[0], center[1]); srcPoint2f[1] = cv::Point2f(center[0] + src_dir[0], center[1] + src_dir[1]); @@ -74,11 +76,26 @@ void transform_preds(std::vector& coords, std::vector& scale, std::vector& output_size, std::vector& dim, - std::vector& target_coords) { - cv::Mat trans(2, 3, CV_64FC1); - get_affine_transform(center, scale, 0, output_size, trans, 1); - for (int p = 0; p < dim[1]; ++p) { - affine_tranform(coords[p * 2], coords[p * 2 + 1], trans, target_coords, p); + std::vector& target_coords, + bool affine) { + if (affine) { + cv::Mat trans(2, 3, CV_64FC1); + get_affine_transform(center, scale, 0, output_size, trans, 1); + for (int p = 0; p < dim[1]; ++p) { + affine_tranform( + coords[p * 2], coords[p * 2 + 1], trans, target_coords, p); + } + } else { + float heat_w = static_cast(output_size[0]); + float heat_h = static_cast(output_size[1]); + float x_scale = scale[0] / heat_w; + float y_scale = scale[1] / heat_h; + float offset_x = center[0] - scale[0] / 2.; + float offset_y = center[1] - scale[1] / 2.; + for (int i = 0; i < dim[1]; i++) { + target_coords[i * 3 + 1] = x_scale * coords[i * 2] + offset_x; + target_coords[i * 3 + 2] = y_scale * coords[i * 2 + 1] + offset_y; + } } } @@ -111,10 +128,10 @@ void get_max_preds(float* heatmap, void dark_parse(std::vector& heatmap, std::vector& dim, std::vector& coords, - int px, - int py, + int px, + int py, int index, - int ch){ + int ch) { /*DARK postpocessing, Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020). 1) offset = - hassian.inv() * derivative @@ -124,16 +141,17 @@ void dark_parse(std::vector& heatmap, 5) hassian = Mat([[dxx, dxy], [dxy, dyy]]) */ std::vector::const_iterator first1 = heatmap.begin() + index; - std::vector::const_iterator last1 = heatmap.begin() + index + dim[2] * dim[3]; + std::vector::const_iterator last1 = + heatmap.begin() + index + dim[2] * dim[3]; std::vector heatmap_ch(first1, last1); - cv::Mat heatmap_mat = cv::Mat(heatmap_ch).reshape(0,dim[2]); + cv::Mat heatmap_mat = cv::Mat(heatmap_ch).reshape(0, dim[2]); heatmap_mat.convertTo(heatmap_mat, CV_32FC1); cv::GaussianBlur(heatmap_mat, heatmap_mat, cv::Size(3, 3), 0, 0); - heatmap_mat = heatmap_mat.reshape(1,1); - heatmap_ch = std::vector(heatmap_mat.reshape(1,1)); + heatmap_mat = heatmap_mat.reshape(1, 1); + heatmap_ch = std::vector(heatmap_mat.reshape(1, 1)); float epsilon = 1e-10; - //sample heatmap to get values in around target location + // sample heatmap to get values in around target location float xy = log(fmax(heatmap_ch[py * dim[3] + px], epsilon)); float xr = log(fmax(heatmap_ch[py * dim[3] + px + 1], epsilon)); float xl = log(fmax(heatmap_ch[py * dim[3] + px - 1], epsilon)); @@ -149,22 +167,23 @@ void dark_parse(std::vector& heatmap, float xlyu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px - 1], epsilon)); float xlyd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px - 1], epsilon)); - //compute dx/dy and dxx/dyy with sampled values + // compute dx/dy and dxx/dyy with sampled values float dx = 0.5 * (xr - xl); float dy = 0.5 * (yu - yd); - float dxx = 0.25 * (xr2 - 2*xy + xl2); + float dxx = 0.25 * (xr2 - 2 * xy + xl2); float dxy = 0.25 * (xryu - xryd - xlyu + xlyd); - float dyy = 0.25 * (yu2 - 2*xy + yd2); + float dyy = 0.25 * (yu2 - 2 * xy + yd2); - //finally get offset by derivative and hassian, which combined by dx/dy and dxx/dyy - if(dxx * dyy - dxy*dxy != 0){ + // finally get offset by derivative and hassian, which combined by dx/dy and + // dxx/dyy + if (dxx * dyy - dxy * dxy != 0) { float M[2][2] = {dxx, dxy, dxy, dyy}; float D[2] = {dx, dy}; - cv::Mat hassian(2,2,CV_32F,M); - cv::Mat derivative(2,1,CV_32F,D); - cv::Mat offset = - hassian.inv() * derivative; - coords[ch * 2] += offset.at(0,0); - coords[ch * 2 + 1] += offset.at(1,0); + cv::Mat hassian(2, 2, CV_32F, M); + cv::Mat derivative(2, 1, CV_32F, D); + cv::Mat offset = -hassian.inv() * derivative; + coords[ch * 2] += offset.at(0, 0); + coords[ch * 2 + 1] += offset.at(1, 0); } } @@ -193,18 +212,18 @@ void get_final_preds(std::vector& heatmap, int px = int(coords[j * 2] + 0.5); int py = int(coords[j * 2 + 1] + 0.5); - if(DARK && px > 1 && px < heatmap_width - 2){ + if (DARK && px > 1 && px < heatmap_width - 2 && py > 1 && + py < heatmap_height - 2) { dark_parse(heatmap, dim, coords, px, py, index, j); - } - else{ + } else { if (px > 0 && px < heatmap_width - 1) { float diff_x = heatmap[index + py * dim[3] + px + 1] - - heatmap[index + py * dim[3] + px - 1]; + heatmap[index + py * dim[3] + px - 1]; coords[j * 2] += diff_x > 0 ? 1 : -1 * 0.25; } if (py > 0 && py < heatmap_height - 1) { float diff_y = heatmap[index + (py + 1) * dim[3] + px] - - heatmap[index + (py - 1) * dim[3] + px]; + heatmap[index + (py - 1) * dim[3] + px]; coords[j * 2 + 1] += diff_y > 0 ? 1 : -1 * 0.25; } } @@ -213,3 +232,85 @@ void get_final_preds(std::vector& heatmap, std::vector img_size{heatmap_width, heatmap_height}; transform_preds(coords, center, scale, img_size, dim, preds); } + +// Run predictor +KeyPointResult PoseSmooth::smooth_process(KeyPointResult* result) { + KeyPointResult keypoint_smoothed = *result; + if (this->x_prev_hat.num_joints == -1) { + this->x_prev_hat = *result; + this->dx_prev_hat = *result; + std::fill(dx_prev_hat.keypoints.begin(), dx_prev_hat.keypoints.end(), 0.); + return keypoint_smoothed; + } else { + for (int i = 0; i < result->num_joints; i++) { + this->PointSmooth(result, &keypoint_smoothed, this->thresholds, i); + } + return keypoint_smoothed; + } +} + +void PoseSmooth::PointSmooth(KeyPointResult* result, + KeyPointResult* keypoint_smoothed, + std::vector thresholds, + int index) { + float distance = sqrt(pow((result->keypoints[index * 3 + 1] - + this->x_prev_hat.keypoints[index * 3 + 1]) / + this->width, + 2) + + pow((result->keypoints[index * 3 + 2] - + this->x_prev_hat.keypoints[index * 3 + 2]) / + this->height, + 2)); + if (distance < thresholds[index] * this->thres_mult) { + keypoint_smoothed->keypoints[index * 3 + 1] = + this->x_prev_hat.keypoints[index * 3 + 1]; + keypoint_smoothed->keypoints[index * 3 + 2] = + this->x_prev_hat.keypoints[index * 3 + 2]; + } else { + if (this->filter_type == "OneEuro") { + keypoint_smoothed->keypoints[index * 3 + 1] = + this->OneEuroFilter(result->keypoints[index * 3 + 1], + this->x_prev_hat.keypoints[index * 3 + 1], + index * 3 + 1); + keypoint_smoothed->keypoints[index * 3 + 2] = + this->OneEuroFilter(result->keypoints[index * 3 + 2], + this->x_prev_hat.keypoints[index * 3 + 2], + index * 3 + 2); + } else { + keypoint_smoothed->keypoints[index * 3 + 1] = + this->ExpSmoothing(result->keypoints[index * 3 + 1], + this->x_prev_hat.keypoints[index * 3 + 1], + index * 3 + 1); + keypoint_smoothed->keypoints[index * 3 + 2] = + this->ExpSmoothing(result->keypoints[index * 3 + 2], + this->x_prev_hat.keypoints[index * 3 + 2], + index * 3 + 2); + } + } + return; +} + +float PoseSmooth::OneEuroFilter(float x_cur, float x_pre, int loc) { + float te = 1.0; + this->alpha = this->smoothing_factor(te, this->fc_d); + float dx_cur = (x_cur - x_pre) / te; + float dx_cur_hat = + this->ExpSmoothing(dx_cur, this->dx_prev_hat.keypoints[loc]); + + float fc = this->fc_min + this->beta * abs(dx_cur_hat); + this->alpha = this->smoothing_factor(te, fc); + float x_cur_hat = this->ExpSmoothing(x_cur, x_pre); + this->x_prev_hat.keypoints[loc] = x_cur_hat; + this->dx_prev_hat.keypoints[loc] = dx_cur_hat; + return x_cur_hat; +} + +float PoseSmooth::smoothing_factor(float te, float fc) { + float r = 2 * PI * fc * te; + return r / (r + 1); +} + +float PoseSmooth::ExpSmoothing(float x_cur, float x_pre, int loc) { + return this->alpha * x_cur + (1 - this->alpha) * x_pre; +} +} // namespace PaddleDetection diff --git a/deploy/cpp/src/main_keypoint.cc b/deploy/cpp/src/main_keypoint.cc index 7701d5ebb5d1dcbd6431917e86ce6169e982a155..da333f6ebf1aa7feafe12c3b2d5bf87654764fe1 100644 --- a/deploy/cpp/src/main_keypoint.cc +++ b/deploy/cpp/src/main_keypoint.cc @@ -219,6 +219,8 @@ void PredictVideo(const std::string& video_path, printf("create video writer failed!\n"); return; } + PaddleDetection::PoseSmooth smoother = + PaddleDetection::PoseSmooth(video_width, video_height); std::vector result; std::vector bbox_num; @@ -307,6 +309,13 @@ void PredictVideo(const std::string& video_path, scale_bs.clear(); } } + + if (result_kpts.size() == 1) { + for (int i = 0; i < result_kpts.size(); i++) { + result_kpts[i] = smoother.smooth_process(&(result_kpts[i])); + } + } + cv::Mat out_im = VisualizeKptsResult(frame, result_kpts, colormap_kpts); video_out.write(out_im); } else { diff --git a/deploy/cpp/src/object_detector.cc b/deploy/cpp/src/object_detector.cc index a99fcd515337e72ff59a09c7eeaa12072a774cc1..d4f2ceb5d7c07142e51e2b0008148e5d90b55adc 100644 --- a/deploy/cpp/src/object_detector.cc +++ b/deploy/cpp/src/object_detector.cc @@ -15,16 +15,15 @@ // for setprecision #include #include -#include "include/object_detector.h" -using namespace paddle_infer; +#include "include/object_detector.h" namespace PaddleDetection { // Load Model and create model predictor -void ObjectDetector::LoadModel(const std::string& model_dir, +void ObjectDetector::LoadModel(const std::string &model_dir, const int batch_size, - const std::string& run_mode) { + const std::string &run_mode) { paddle_infer::Config config; std::string prog_file = model_dir + OS_PATH_SEP + "model.pdmodel"; std::string params_file = model_dir + OS_PATH_SEP + "model.pdiparams"; @@ -42,27 +41,22 @@ void ObjectDetector::LoadModel(const std::string& model_dir, } else if (run_mode == "trt_int8") { precision = paddle_infer::Config::Precision::kInt8; } else { - printf( - "run_mode should be 'paddle', 'trt_fp32', 'trt_fp16' or " - "'trt_int8'"); + printf("run_mode should be 'paddle', 'trt_fp32', 'trt_fp16' or " + "'trt_int8'"); } // set tensorrt - config.EnableTensorRtEngine(1 << 30, - batch_size, - this->min_subgraph_size_, - precision, - false, - this->trt_calib_mode_); + config.EnableTensorRtEngine(1 << 30, batch_size, this->min_subgraph_size_, + precision, false, this->trt_calib_mode_); // set use dynamic shape if (this->use_dynamic_shape_) { - // set DynamicShsape for image tensor + // set DynamicShape for image tensor const std::vector min_input_shape = { - 1, 3, this->trt_min_shape_, this->trt_min_shape_}; + batch_size, 3, this->trt_min_shape_, this->trt_min_shape_}; const std::vector max_input_shape = { - 1, 3, this->trt_max_shape_, this->trt_max_shape_}; + batch_size, 3, this->trt_max_shape_, this->trt_max_shape_}; const std::vector opt_input_shape = { - 1, 3, this->trt_opt_shape_, this->trt_opt_shape_}; + batch_size, 3, this->trt_opt_shape_, this->trt_opt_shape_}; const std::map> map_min_input_shape = { {"image", min_input_shape}}; const std::map> map_max_input_shape = { @@ -70,8 +64,8 @@ void ObjectDetector::LoadModel(const std::string& model_dir, const std::map> map_opt_input_shape = { {"image", opt_input_shape}}; - config.SetTRTDynamicShapeInfo( - map_min_input_shape, map_max_input_shape, map_opt_input_shape); + config.SetTRTDynamicShapeInfo(map_min_input_shape, map_max_input_shape, + map_opt_input_shape); std::cout << "TensorRT dynamic shape enabled" << std::endl; } } @@ -96,13 +90,14 @@ void ObjectDetector::LoadModel(const std::string& model_dir, } // Visualiztion MaskDetector results -cv::Mat VisualizeResult( - const cv::Mat& img, - const std::vector& results, - const std::vector& lables, - const std::vector& colormap, - const bool is_rbox = false) { +cv::Mat +VisualizeResult(const cv::Mat &img, + const std::vector &results, + const std::vector &lables, + const std::vector &colormap, const bool is_rbox = false) { cv::Mat vis_img = img.clone(); + int img_h = vis_img.rows; + int img_w = vis_img.cols; for (int i = 0; i < results.size(); ++i) { // Configure color and text size std::ostringstream oss; @@ -136,30 +131,45 @@ cv::Mat VisualizeResult( cv::Rect roi = cv::Rect(results[i].rect[0], results[i].rect[1], w, h); // Draw roi object, text, and background cv::rectangle(vis_img, roi, roi_color, 2); + + // Draw mask + std::vector mask_v = results[i].mask; + if (mask_v.size() > 0) { + cv::Mat mask = cv::Mat(img_h, img_w, CV_32S); + std::memcpy(mask.data, mask_v.data(), mask_v.size() * sizeof(int)); + + cv::Mat colored_img = vis_img.clone(); + + std::vector contours; + cv::Mat hierarchy; + mask.convertTo(mask, CV_8U); + cv::findContours(mask, contours, hierarchy, cv::RETR_CCOMP, + cv::CHAIN_APPROX_SIMPLE); + cv::drawContours(colored_img, contours, -1, roi_color, -1, cv::LINE_8, + hierarchy, 100); + + cv::Mat debug_roi = vis_img; + colored_img = 0.4 * colored_img + 0.6 * vis_img; + colored_img.copyTo(vis_img, mask); + } } origin.x = results[i].rect[0]; origin.y = results[i].rect[1]; // Configure text background - cv::Rect text_back = cv::Rect(results[i].rect[0], - results[i].rect[1] - text_size.height, - text_size.width, - text_size.height); + cv::Rect text_back = + cv::Rect(results[i].rect[0], results[i].rect[1] - text_size.height, + text_size.width, text_size.height); // Draw text, and background cv::rectangle(vis_img, text_back, roi_color, -1); - cv::putText(vis_img, - text, - origin, - font_face, - font_scale, - cv::Scalar(255, 255, 255), - thickness); + cv::putText(vis_img, text, origin, font_face, font_scale, + cv::Scalar(255, 255, 255), thickness); } return vis_img; } -void ObjectDetector::Preprocess(const cv::Mat& ori_im) { +void ObjectDetector::Preprocess(const cv::Mat &ori_im) { // Clone the image : keep the original mat for postprocess cv::Mat im = ori_im.clone(); cv::cvtColor(im, im, cv::COLOR_BGR2RGB); @@ -168,20 +178,21 @@ void ObjectDetector::Preprocess(const cv::Mat& ori_im) { void ObjectDetector::Postprocess( const std::vector mats, - std::vector* result, - std::vector bbox_num, - std::vector output_data_, - bool is_rbox = false) { + std::vector *result, + std::vector bbox_num, std::vector output_data_, + std::vector output_mask_data_, bool is_rbox = false) { result->clear(); int start_idx = 0; + int total_num = std::accumulate(bbox_num.begin(), bbox_num.end(), 0); + int out_mask_dim = -1; + if (config_.mask_) { + out_mask_dim = output_mask_data_.size() / total_num; + } + for (int im_id = 0; im_id < mats.size(); im_id++) { cv::Mat raw_mat = mats[im_id]; int rh = 1; int rw = 1; - if (config_.arch_ == "Face") { - rh = raw_mat.rows; - rw = raw_mat.cols; - } for (int j = start_idx; j < start_idx + bbox_num[im_id]; j++) { if (is_rbox) { // Class id @@ -218,6 +229,17 @@ void ObjectDetector::Postprocess( result_item.rect = {xmin, ymin, xmax, ymax}; result_item.class_id = class_id; result_item.confidence = score; + + if (config_.mask_) { + std::vector mask; + for (int k = 0; k < out_mask_dim; ++k) { + if (output_mask_data_[k + j * out_mask_dim] > -1) { + mask.push_back(output_mask_data_[k + j * out_mask_dim]); + } + } + result_item.mask = mask; + } + result->push_back(result_item); } } @@ -225,13 +247,85 @@ void ObjectDetector::Postprocess( } } +// This function is to convert output result from SOLOv2 to class ObjectResult +void ObjectDetector::SOLOv2Postprocess( + const std::vector mats, std::vector *result, + std::vector *bbox_num, std::vector out_bbox_num_data_, + std::vector out_label_data_, std::vector out_score_data_, + std::vector out_global_mask_data_, float threshold) { + + for (int im_id = 0; im_id < mats.size(); im_id++) { + cv::Mat mat = mats[im_id]; + + int valid_bbox_count = 0; + for (int bbox_id = 0; bbox_id < out_bbox_num_data_[im_id]; ++bbox_id) { + if (out_score_data_[bbox_id] >= threshold) { + ObjectResult result_item; + result_item.class_id = out_label_data_[bbox_id]; + result_item.confidence = out_score_data_[bbox_id]; + std::vector global_mask; + + for (int k = 0; k < mat.rows * mat.cols; ++k) { + global_mask.push_back(static_cast( + out_global_mask_data_[k + bbox_id * mat.rows * mat.cols])); + } + + // find minimize bounding box from mask + cv::Mat mask(mat.rows, mat.cols, CV_32SC1); + std::memcpy(mask.data, global_mask.data(), + global_mask.size() * sizeof(int)); + + cv::Mat mask_fp; + cv::Mat rowSum; + cv::Mat colSum; + std::vector sum_of_row(mat.rows); + std::vector sum_of_col(mat.cols); + + mask.convertTo(mask_fp, CV_32FC1); + cv::reduce(mask_fp, colSum, 0, CV_REDUCE_SUM, CV_32FC1); + cv::reduce(mask_fp, rowSum, 1, CV_REDUCE_SUM, CV_32FC1); + + for (int row_id = 0; row_id < mat.rows; ++row_id) { + sum_of_row[row_id] = rowSum.at(row_id, 0); + } + + for (int col_id = 0; col_id < mat.cols; ++col_id) { + sum_of_col[col_id] = colSum.at(0, col_id); + } + + auto it = std::find_if(sum_of_row.begin(), sum_of_row.end(), + [](int x) { return x > 0.5; }); + int y1 = std::distance(sum_of_row.begin(), it); + + auto it2 = std::find_if(sum_of_col.begin(), sum_of_col.end(), + [](int x) { return x > 0.5; }); + int x1 = std::distance(sum_of_col.begin(), it2); + + auto rit = std::find_if(sum_of_row.rbegin(), sum_of_row.rend(), + [](int x) { return x > 0.5; }); + int y2 = std::distance(rit, sum_of_row.rend()); + + auto rit2 = std::find_if(sum_of_col.rbegin(), sum_of_col.rend(), + [](int x) { return x > 0.5; }); + int x2 = std::distance(rit2, sum_of_col.rend()); + + result_item.rect = {x1, y1, x2, y2}; + result_item.mask = global_mask; + + result->push_back(result_item); + valid_bbox_count++; + } + } + bbox_num->push_back(valid_bbox_count); + } +} + void ObjectDetector::Predict(const std::vector imgs, - const double threshold, - const int warmup, + const double threshold, const int warmup, const int repeats, - std::vector* result, - std::vector* bbox_num, - std::vector* times) { + std::vector *result, + std::vector *bbox_num, + std::vector *times) { auto preprocess_start = std::chrono::steady_clock::now(); int batch_size = imgs.size(); @@ -239,8 +333,14 @@ void ObjectDetector::Predict(const std::vector imgs, std::vector in_data_all; std::vector im_shape_all(batch_size * 2); std::vector scale_factor_all(batch_size * 2); - std::vector output_data_list_; + std::vector output_data_list_; std::vector out_bbox_num_data_; + std::vector out_mask_data_; + + // these parameters are for SOLOv2 output + std::vector out_score_data_; + std::vector out_global_mask_data_; + std::vector out_label_data_; // in_net img for each batch std::vector in_net_img_all(batch_size); @@ -255,9 +355,8 @@ void ObjectDetector::Predict(const std::vector imgs, scale_factor_all[bs_idx * 2] = inputs_.scale_factor_[0]; scale_factor_all[bs_idx * 2 + 1] = inputs_.scale_factor_[1]; - // TODO: reduce cost time - in_data_all.insert( - in_data_all.end(), inputs_.im_data_.begin(), inputs_.im_data_.end()); + in_data_all.insert(in_data_all.end(), inputs_.im_data_.begin(), + inputs_.im_data_.end()); // collect in_net img in_net_img_all[bs_idx] = inputs_.in_net_im_; @@ -276,10 +375,10 @@ void ObjectDetector::Predict(const std::vector imgs, pad_img.convertTo(pad_img, CV_32FC3); std::vector pad_data; pad_data.resize(rc * rh * rw); - float* base = pad_data.data(); + float *base = pad_data.data(); for (int i = 0; i < rc; ++i) { - cv::extractChannel( - pad_img, cv::Mat(rh, rw, CV_32FC1, base + i * rh * rw), i); + cv::extractChannel(pad_img, + cv::Mat(rh, rw, CV_32FC1, base + i * rh * rw), i); } in_data_all.insert(in_data_all.end(), pad_data.begin(), pad_data.end()); } @@ -290,7 +389,7 @@ void ObjectDetector::Predict(const std::vector imgs, auto preprocess_end = std::chrono::steady_clock::now(); // Prepare input tensor auto input_names = predictor_->GetInputNames(); - for (const auto& tensor_name : input_names) { + for (const auto &tensor_name : input_names) { auto in_tensor = predictor_->GetInputHandle(tensor_name); if (tensor_name == "image") { int rh = inputs_.in_net_shape_[0]; @@ -312,52 +411,118 @@ void ObjectDetector::Predict(const std::vector imgs, bool is_rbox = false; int reg_max = 7; int num_class = 80; - // warmup - for (int i = 0; i < warmup; i++) { - predictor_->Run(); - // Get output tensor - auto output_names = predictor_->GetOutputNames(); - for (int j = 0; j < output_names.size(); j++) { - auto output_tensor = predictor_->GetOutputHandle(output_names[j]); - std::vector output_shape = output_tensor->shape(); - int out_num = std::accumulate( - output_shape.begin(), output_shape.end(), 1, std::multiplies()); - if (output_tensor->type() == paddle_infer::DataType::INT32) { - out_bbox_num_data_.resize(out_num); - output_tensor->CopyToCpu(out_bbox_num_data_.data()); - } else { - std::vector out_data; - out_data.resize(out_num); - output_tensor->CopyToCpu(out_data.data()); - out_tensor_list.push_back(out_data); + + auto inference_start = std::chrono::steady_clock::now(); + if (config_.arch_ == "SOLOv2") { + // warmup + for (int i = 0; i < warmup; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + for (int j = 0; j < output_names.size(); j++) { + auto output_tensor = predictor_->GetOutputHandle(output_names[j]); + std::vector output_shape = output_tensor->shape(); + int out_num = std::accumulate(output_shape.begin(), output_shape.end(), + 1, std::multiplies()); + if (j == 0) { + out_bbox_num_data_.resize(out_num); + output_tensor->CopyToCpu(out_bbox_num_data_.data()); + } else if (j == 1) { + out_label_data_.resize(out_num); + output_tensor->CopyToCpu(out_label_data_.data()); + } else if (j == 2) { + out_score_data_.resize(out_num); + output_tensor->CopyToCpu(out_score_data_.data()); + } else if (config_.mask_ && (j == 3)) { + out_global_mask_data_.resize(out_num); + output_tensor->CopyToCpu(out_global_mask_data_.data()); + } } } - } - auto inference_start = std::chrono::steady_clock::now(); - for (int i = 0; i < repeats; i++) { - predictor_->Run(); - // Get output tensor - out_tensor_list.clear(); - output_shape_list.clear(); - auto output_names = predictor_->GetOutputNames(); - for (int j = 0; j < output_names.size(); j++) { - auto output_tensor = predictor_->GetOutputHandle(output_names[j]); - std::vector output_shape = output_tensor->shape(); - int out_num = std::accumulate( - output_shape.begin(), output_shape.end(), 1, std::multiplies()); - output_shape_list.push_back(output_shape); - if (output_tensor->type() == paddle_infer::DataType::INT32) { - out_bbox_num_data_.resize(out_num); - output_tensor->CopyToCpu(out_bbox_num_data_.data()); - } else { - std::vector out_data; - out_data.resize(out_num); - output_tensor->CopyToCpu(out_data.data()); - out_tensor_list.push_back(out_data); + inference_start = std::chrono::steady_clock::now(); + for (int i = 0; i < repeats; i++) { + predictor_->Run(); + // Get output tensor + out_tensor_list.clear(); + output_shape_list.clear(); + auto output_names = predictor_->GetOutputNames(); + for (int j = 0; j < output_names.size(); j++) { + auto output_tensor = predictor_->GetOutputHandle(output_names[j]); + std::vector output_shape = output_tensor->shape(); + int out_num = std::accumulate(output_shape.begin(), output_shape.end(), + 1, std::multiplies()); + output_shape_list.push_back(output_shape); + if (j == 0) { + out_bbox_num_data_.resize(out_num); + output_tensor->CopyToCpu(out_bbox_num_data_.data()); + } else if (j == 1) { + out_label_data_.resize(out_num); + output_tensor->CopyToCpu(out_label_data_.data()); + } else if (j == 2) { + out_score_data_.resize(out_num); + output_tensor->CopyToCpu(out_score_data_.data()); + } else if (config_.mask_ && (j == 3)) { + out_global_mask_data_.resize(out_num); + output_tensor->CopyToCpu(out_global_mask_data_.data()); + } + } + } + } else { + // warmup + for (int i = 0; i < warmup; i++) { + predictor_->Run(); + // Get output tensor + auto output_names = predictor_->GetOutputNames(); + for (int j = 0; j < output_names.size(); j++) { + auto output_tensor = predictor_->GetOutputHandle(output_names[j]); + std::vector output_shape = output_tensor->shape(); + int out_num = std::accumulate(output_shape.begin(), output_shape.end(), + 1, std::multiplies()); + if (config_.mask_ && (j == 2)) { + out_mask_data_.resize(out_num); + output_tensor->CopyToCpu(out_mask_data_.data()); + } else if (output_tensor->type() == paddle_infer::DataType::INT32) { + out_bbox_num_data_.resize(out_num); + output_tensor->CopyToCpu(out_bbox_num_data_.data()); + } else { + std::vector out_data; + out_data.resize(out_num); + output_tensor->CopyToCpu(out_data.data()); + out_tensor_list.push_back(out_data); + } + } + } + + inference_start = std::chrono::steady_clock::now(); + for (int i = 0; i < repeats; i++) { + predictor_->Run(); + // Get output tensor + out_tensor_list.clear(); + output_shape_list.clear(); + auto output_names = predictor_->GetOutputNames(); + for (int j = 0; j < output_names.size(); j++) { + auto output_tensor = predictor_->GetOutputHandle(output_names[j]); + std::vector output_shape = output_tensor->shape(); + int out_num = std::accumulate(output_shape.begin(), output_shape.end(), + 1, std::multiplies()); + output_shape_list.push_back(output_shape); + if (config_.mask_ && (j == 2)) { + out_mask_data_.resize(out_num); + output_tensor->CopyToCpu(out_mask_data_.data()); + } else if (output_tensor->type() == paddle_infer::DataType::INT32) { + out_bbox_num_data_.resize(out_num); + output_tensor->CopyToCpu(out_bbox_num_data_.data()); + } else { + std::vector out_data; + out_data.resize(out_num); + output_tensor->CopyToCpu(out_data.data()); + out_tensor_list.push_back(out_data); + } } } } + auto inference_end = std::chrono::steady_clock::now(); auto postprocess_start = std::chrono::steady_clock::now(); // Postprocessing result @@ -371,26 +536,24 @@ void ObjectDetector::Predict(const std::vector imgs, if (i == config_.fpn_stride_.size()) { reg_max = output_shape_list[i][2] / 4 - 1; } - float* buffer = new float[out_tensor_list[i].size()]; - memcpy(buffer, - &out_tensor_list[i][0], + float *buffer = new float[out_tensor_list[i].size()]; + memcpy(buffer, &out_tensor_list[i][0], out_tensor_list[i].size() * sizeof(float)); output_data_list_.push_back(buffer); } PaddleDetection::PicoDetPostProcess( - result, - output_data_list_, - config_.fpn_stride_, - inputs_.im_shape_, - inputs_.scale_factor_, - config_.nms_info_["score_threshold"].as(), - config_.nms_info_["nms_threshold"].as(), - num_class, - reg_max); + result, output_data_list_, config_.fpn_stride_, inputs_.im_shape_, + inputs_.scale_factor_, config_.nms_info_["score_threshold"].as(), + config_.nms_info_["nms_threshold"].as(), num_class, reg_max); bbox_num->push_back(result->size()); + } else if (config_.arch_ == "SOLOv2") { + SOLOv2Postprocess(imgs, result, bbox_num, out_bbox_num_data_, + out_label_data_, out_score_data_, out_global_mask_data_, + threshold); } else { is_rbox = output_shape_list[0][output_shape_list[0].size() - 1] % 10 == 0; - Postprocess(imgs, result, out_bbox_num_data_, out_tensor_list[0], is_rbox); + Postprocess(imgs, result, out_bbox_num_data_, out_tensor_list[0], + out_mask_data_, is_rbox); for (int k = 0; k < out_bbox_num_data_.size(); k++) { int tmp = out_bbox_num_data_[k]; bbox_num->push_back(tmp); @@ -426,4 +589,4 @@ std::vector GenerateColorMap(int num_class) { return colormap; } -} // namespace PaddleDetection +} // namespace PaddleDetection diff --git a/deploy/cpp/src/preprocess_op.cc b/deploy/cpp/src/preprocess_op.cc index 4ac3daa304e933e307596423442502a5bfc06da5..6147555be57a2739fcd4a773eb281aaa966763b0 100644 --- a/deploy/cpp/src/preprocess_op.cc +++ b/deploy/cpp/src/preprocess_op.cc @@ -60,12 +60,11 @@ void Permute::Run(cv::Mat* im, ImageBlob* data) { void Resize::Run(cv::Mat* im, ImageBlob* data) { auto resize_scale = GenerateScale(*im); - data->im_shape_ = {static_cast(im->cols * resize_scale.first), - static_cast(im->rows * resize_scale.second)}; - data->in_net_shape_ = {static_cast(im->cols * resize_scale.first), - static_cast(im->rows * resize_scale.second)}; cv::resize( *im, *im, cv::Size(), resize_scale.first, resize_scale.second, interp_); + + data->in_net_shape_ = {static_cast(im->rows), + static_cast(im->cols)}; data->im_shape_ = { static_cast(im->rows), static_cast(im->cols), }; @@ -154,6 +153,7 @@ float LetterBoxResize::GenerateScale(const cv::Mat& im) { void PadStride::Run(cv::Mat* im, ImageBlob* data) { if (stride_ <= 0) { + data->in_net_im_ = im->clone(); return; } int rc = im->channels(); @@ -177,13 +177,84 @@ void TopDownEvalAffine::Run(cv::Mat* im, ImageBlob* data) { }; } +void GetAffineTrans(const cv::Point2f center, + const cv::Point2f input_size, + const cv::Point2f output_size, + cv::Mat* trans) { + cv::Point2f srcTri[3]; + cv::Point2f dstTri[3]; + float src_w = input_size.x; + float dst_w = output_size.x; + float dst_h = output_size.y; + + cv::Point2f src_dir(0, -0.5 * src_w); + cv::Point2f dst_dir(0, -0.5 * dst_w); + + srcTri[0] = center; + srcTri[1] = center + src_dir; + cv::Point2f src_d = srcTri[0] - srcTri[1]; + srcTri[2] = srcTri[1] + cv::Point2f(-src_d.y, src_d.x); + + dstTri[0] = cv::Point2f(dst_w * 0.5, dst_h * 0.5); + dstTri[1] = cv::Point2f(dst_w * 0.5, dst_h * 0.5) + dst_dir; + cv::Point2f dst_d = dstTri[0] - dstTri[1]; + dstTri[2] = dstTri[1] + cv::Point2f(-dst_d.y, dst_d.x); + + *trans = cv::getAffineTransform(srcTri, dstTri); +} + +void WarpAffine::Run(cv::Mat* im, ImageBlob* data) { + cv::cvtColor(*im, *im, cv::COLOR_RGB2BGR); + cv::Mat trans(2, 3, CV_32FC1); + cv::Point2f center; + cv::Point2f input_size; + int h = im->rows; + int w = im->cols; + if (keep_res_) { + input_h_ = (h | pad_) + 1; + input_w_ = (w + pad_) + 1; + input_size = cv::Point2f(input_w_, input_h_); + center = cv::Point2f(w / 2, h / 2); + } else { + float s = std::max(h, w) * 1.0; + input_size = cv::Point2f(s, s); + center = cv::Point2f(w / 2., h / 2.); + } + cv::Point2f output_size(input_w_, input_h_); + + GetAffineTrans(center, input_size, output_size, &trans); + cv::warpAffine(*im, *im, trans, cv::Size(input_w_, input_h_)); + data->in_net_shape_ = { + static_cast(input_h_), static_cast(input_w_), + }; +} + +void Pad::Run(cv::Mat* im, ImageBlob* data) { + int h = size_[0]; + int w = size_[1]; + int rh = im->rows; + int rw = im->cols; + if (h == rh && w == rw){ + data->in_net_im_ = im->clone(); + return; + } + cv::copyMakeBorder( + *im, *im, 0, h - rh, 0, w - rw, cv::BORDER_CONSTANT, cv::Scalar(114)); + data->in_net_im_ = im->clone(); + data->in_net_shape_ = { + static_cast(im->rows), static_cast(im->cols), + }; +} + // Preprocessor op running order const std::vector Preprocessor::RUN_ORDER = {"InitInfo", "TopDownEvalAffine", "Resize", "LetterBoxResize", + "WarpAffine", "NormalizeImage", "PadStride", + "Pad", "Permute"}; void Preprocessor::Run(cv::Mat* im, ImageBlob* data) { @@ -242,7 +313,9 @@ bool CheckDynamicInput(const std::vector& imgs) { int h = imgs.at(0).rows; int w = imgs.at(0).cols; for (int i = 1; i < imgs.size(); ++i) { - if (imgs.at(i).rows != h || imgs.at(i).cols != w) { + int hi = imgs.at(i).rows; + int wi = imgs.at(i).cols; + if (hi != h || wi != w) { return true; } } diff --git a/deploy/cpp/src/tracker.cc b/deploy/cpp/src/tracker.cc index b00e31c4ec580f3b30fe4b10970f31623f47acb3..f40cb0dd699a4687f4f77714e4bc5ae5416141f6 100644 --- a/deploy/cpp/src/tracker.cc +++ b/deploy/cpp/src/tracker.cc @@ -58,8 +58,8 @@ bool JDETracker::update(const cv::Mat &dets, const cv::Mat &emb, std::vector(i, 4); - const cv::Mat <rb_ = dets(cv::Rect(0, i, 4, 1)); + float score = *dets.ptr(i, 1); + const cv::Mat <rb_ = dets(cv::Rect(2, i, 4, 1)); cv::Vec4f ltrb = mat2vec4f(ltrb_); const cv::Mat &embedding = emb(cv::Rect(0, i, emb.cols, 1)); candidates[i] = Trajectory(ltrb, score, embedding); diff --git a/deploy/end2end_ppyoloe/README.md b/deploy/end2end_ppyoloe/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d470dccffe7c9927eac6946d3ee47ea96c346a56 --- /dev/null +++ b/deploy/end2end_ppyoloe/README.md @@ -0,0 +1,99 @@ +# Export ONNX Model +## Download pretrain paddle models + +* [ppyoloe-s](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) +* [ppyoloe-m](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) +* [ppyoloe-l](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) +* [ppyoloe-x](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) +* [ppyoloe-s-400e](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) + + +## Export paddle model for deploying + +```shell +python ./tools/export_model.py \ + -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \ + -o weights=ppyoloe_crn_s_300e_coco.pdparams \ + trt=True \ + exclude_nms=True \ + TestReader.inputs_def.image_shape=[3,640,640] \ + --output_dir ./ + +# if you want to try ppyoloe-s-400e model +python ./tools/export_model.py \ + -c configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml \ + -o weights=ppyoloe_crn_s_400e_coco.pdparams \ + trt=True \ + exclude_nms=True \ + TestReader.inputs_def.image_shape=[3,640,640] \ + --output_dir ./ +``` + +## Check requirements +```shell +pip install onnx>=1.10.0 +pip install paddle2onnx +pip install onnx-simplifier +pip install onnx-graphsurgeon --index-url https://pypi.ngc.nvidia.com +# if use cuda-python infer, please install it +pip install cuda-python +# if use cupy infer, please install it +pip install cupy-cuda117 # cuda110-cuda117 are all available +``` + +## Export script +```shell +python ./deploy/end2end_ppyoloe/end2end.py \ + --model-dir ppyoloe_crn_s_300e_coco \ + --save-file ppyoloe_crn_s_300e_coco.onnx \ + --opset 11 \ + --batch-size 1 \ + --topk-all 100 \ + --iou-thres 0.6 \ + --conf-thres 0.4 +# if you want to try ppyoloe-s-400e model +python ./deploy/end2end_ppyoloe/end2end.py \ + --model-dir ppyoloe_crn_s_400e_coco \ + --save-file ppyoloe_crn_s_400e_coco.onnx \ + --opset 11 \ + --batch-size 1 \ + --topk-all 100 \ + --iou-thres 0.6 \ + --conf-thres 0.4 +``` +#### Description of all arguments + +- `--model-dir` : the path of ppyoloe export dir. +- `--save-file` : the path of export onnx. +- `--opset` : onnx opset version. +- `--img-size` : image size for exporting ppyoloe. +- `--batch-size` : batch size for exporting ppyoloe. +- `--topk-all` : topk objects for every image. +- `--iou-thres` : iou threshold for NMS algorithm. +- `--conf-thres` : confidence threshold for NMS algorithm. + +### TensorRT backend (TensorRT version>= 8.0.0) +#### TensorRT engine export +``` shell +/path/to/trtexec \ + --onnx=ppyoloe_crn_s_300e_coco.onnx \ + --saveEngine=ppyoloe_crn_s_300e_coco.engine \ + --fp16 # if export TensorRT fp16 model +# if you want to try ppyoloe-s-400e model +/path/to/trtexec \ + --onnx=ppyoloe_crn_s_400e_coco.onnx \ + --saveEngine=ppyoloe_crn_s_400e_coco.engine \ + --fp16 # if export TensorRT fp16 model +``` +#### TensorRT image infer + +``` shell +# cuda-python infer script +python ./deploy/end2end_ppyoloe/cuda-python.py ppyoloe_crn_s_300e_coco.engine +# cupy infer script +python ./deploy/end2end_ppyoloe/cupy-python.py ppyoloe_crn_s_300e_coco.engine +# if you want to try ppyoloe-s-400e model +python ./deploy/end2end_ppyoloe/cuda-python.py ppyoloe_crn_s_400e_coco.engine +# or +python ./deploy/end2end_ppyoloe/cuda-python.py ppyoloe_crn_s_400e_coco.engine +``` \ No newline at end of file diff --git a/deploy/end2end_ppyoloe/cuda-python.py b/deploy/end2end_ppyoloe/cuda-python.py new file mode 100644 index 0000000000000000000000000000000000000000..3c7bd7c84b3eeaa6bea55416d8a5eabd37ac4d33 --- /dev/null +++ b/deploy/end2end_ppyoloe/cuda-python.py @@ -0,0 +1,161 @@ +import sys +import requests +import cv2 +import random +import time +import numpy as np +import tensorrt as trt +from cuda import cudart +from pathlib import Path +from collections import OrderedDict, namedtuple + + +def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleup=True, stride=32): + # Resize and pad image while meeting stride-multiple constraints + shape = im.shape[:2] # current shape [height, width] + if isinstance(new_shape, int): + new_shape = (new_shape, new_shape) + + # Scale ratio (new / old) + r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) + if not scaleup: # only scale down, do not scale up (for better val mAP) + r = min(r, 1.0) + + # Compute padding + new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) + dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding + + if auto: # minimum rectangle + dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding + + dw /= 2 # divide padding into 2 sides + dh /= 2 + + if shape[::-1] != new_unpad: # resize + im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) + top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) + left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) + im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border + return im, r, (dw, dh) + + +w = Path(sys.argv[1]) + +assert w.exists() and w.suffix in ('.engine', '.plan'), 'Wrong engine path' + +names = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', + 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', + 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', + 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', + 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', + 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', + 'hair drier', 'toothbrush'] +colors = {name: [random.randint(0, 255) for _ in range(3)] for i, name in enumerate(names)} + +url = 'https://oneflow-static.oss-cn-beijing.aliyuncs.com/tripleMu/image1.jpg' +file = requests.get(url) +img = cv2.imdecode(np.frombuffer(file.content, np.uint8), 1) + +_, stream = cudart.cudaStreamCreate() + +mean = np.array([0.485, 0.456, 0.406], dtype=np.float32).reshape(1, 3, 1, 1) +std = np.array([0.229, 0.224, 0.225], dtype=np.float32).reshape(1, 3, 1, 1) + +# Infer TensorRT Engine +Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr')) +logger = trt.Logger(trt.Logger.ERROR) +trt.init_libnvinfer_plugins(logger, namespace="") +with open(w, 'rb') as f, trt.Runtime(logger) as runtime: + model = runtime.deserialize_cuda_engine(f.read()) +bindings = OrderedDict() +fp16 = False # default updated below +for index in range(model.num_bindings): + name = model.get_binding_name(index) + dtype = trt.nptype(model.get_binding_dtype(index)) + shape = tuple(model.get_binding_shape(index)) + data = np.empty(shape, dtype=np.dtype(dtype)) + _, data_ptr = cudart.cudaMallocAsync(data.nbytes, stream) + bindings[name] = Binding(name, dtype, shape, data, data_ptr) + if model.binding_is_input(index) and dtype == np.float16: + fp16 = True +binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items()) +context = model.create_execution_context() + +image = img.copy() +image, ratio, dwdh = letterbox(image, auto=False) +image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + +image_copy = image.copy() + +image = image.transpose((2, 0, 1)) +image = np.expand_dims(image, 0) +image = np.ascontiguousarray(image) + +im = image.astype(np.float32) +im /= 255 +im -= mean +im /= std + +_, image_ptr = cudart.cudaMallocAsync(im.nbytes, stream) +cudart.cudaMemcpyAsync(image_ptr, im.ctypes.data, im.nbytes, + cudart.cudaMemcpyKind.cudaMemcpyHostToDevice, stream) + +# warmup for 10 times +for _ in range(10): + tmp = np.random.randn(1, 3, 640, 640).astype(np.float32) + _, tmp_ptr = cudart.cudaMallocAsync(tmp.nbytes, stream) + binding_addrs['image'] = tmp_ptr + context.execute_v2(list(binding_addrs.values())) + +start = time.perf_counter() +binding_addrs['image'] = image_ptr +context.execute_v2(list(binding_addrs.values())) +print(f'Cost {(time.perf_counter() - start) * 1000}ms') + +nums = bindings['num_dets'].data +boxes = bindings['det_boxes'].data +scores = bindings['det_scores'].data +classes = bindings['det_classes'].data + +cudart.cudaMemcpyAsync(nums.ctypes.data, + bindings['num_dets'].ptr, + nums.nbytes, + cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, + stream) +cudart.cudaMemcpyAsync(boxes.ctypes.data, + bindings['det_boxes'].ptr, + boxes.nbytes, + cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, + stream) +cudart.cudaMemcpyAsync(scores.ctypes.data, + bindings['det_scores'].ptr, + scores.nbytes, + cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, + stream) +cudart.cudaMemcpyAsync(classes.ctypes.data, + bindings['det_classes'].ptr, + classes.data.nbytes, + cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, + stream) + +cudart.cudaStreamSynchronize(stream) +cudart.cudaStreamDestroy(stream) + +for i in binding_addrs.values(): + cudart.cudaFree(i) + +num = int(nums[0][0]) +box_img = boxes[0, :num].round().astype(np.int32) +score_img = scores[0, :num] +clss_img = classes[0, :num] +for i, (box, score, clss) in enumerate(zip(box_img, score_img, clss_img)): + name = names[int(clss)] + color = colors[name] + cv2.rectangle(image_copy, box[:2].tolist(), box[2:].tolist(), color, 2) + cv2.putText(image_copy, name, (int(box[0]), int(box[1]) - 2), cv2.FONT_HERSHEY_SIMPLEX, + 0.75, [225, 255, 255], thickness=2) + +cv2.imshow('Result', cv2.cvtColor(image_copy, cv2.COLOR_RGB2BGR)) +cv2.waitKey(0) diff --git a/deploy/end2end_ppyoloe/cupy-python.py b/deploy/end2end_ppyoloe/cupy-python.py new file mode 100644 index 0000000000000000000000000000000000000000..a66eb77ecf3aa4c76c143050764429a2a06e8ba1 --- /dev/null +++ b/deploy/end2end_ppyoloe/cupy-python.py @@ -0,0 +1,131 @@ +import sys +import requests +import cv2 +import random +import time +import numpy as np +import cupy as cp +import tensorrt as trt +from PIL import Image +from collections import OrderedDict, namedtuple +from pathlib import Path + + +def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleup=True, stride=32): + # Resize and pad image while meeting stride-multiple constraints + shape = im.shape[:2] # current shape [height, width] + if isinstance(new_shape, int): + new_shape = (new_shape, new_shape) + + # Scale ratio (new / old) + r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) + if not scaleup: # only scale down, do not scale up (for better val mAP) + r = min(r, 1.0) + + # Compute padding + new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) + dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding + + if auto: # minimum rectangle + dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding + + dw /= 2 # divide padding into 2 sides + dh /= 2 + + if shape[::-1] != new_unpad: # resize + im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) + top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) + left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) + im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border + return im, r, (dw, dh) + + +names = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', + 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', + 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', + 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', + 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', + 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', + 'hair drier', 'toothbrush'] +colors = {name: [random.randint(0, 255) for _ in range(3)] for i, name in enumerate(names)} + +url = 'https://oneflow-static.oss-cn-beijing.aliyuncs.com/tripleMu/image1.jpg' +file = requests.get(url) +img = cv2.imdecode(np.frombuffer(file.content, np.uint8), 1) + +w = Path(sys.argv[1]) + +assert w.exists() and w.suffix in ('.engine', '.plan'), 'Wrong engine path' + +mean = np.array([0.485, 0.456, 0.406], dtype=np.float32).reshape(1, 3, 1, 1) +std = np.array([0.229, 0.224, 0.225], dtype=np.float32).reshape(1, 3, 1, 1) + +mean = cp.asarray(mean) +std = cp.asarray(std) + +# Infer TensorRT Engine +Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr')) +logger = trt.Logger(trt.Logger.INFO) +trt.init_libnvinfer_plugins(logger, namespace="") +with open(w, 'rb') as f, trt.Runtime(logger) as runtime: + model = runtime.deserialize_cuda_engine(f.read()) +bindings = OrderedDict() +fp16 = False # default updated below +for index in range(model.num_bindings): + name = model.get_binding_name(index) + dtype = trt.nptype(model.get_binding_dtype(index)) + shape = tuple(model.get_binding_shape(index)) + data = cp.empty(shape, dtype=cp.dtype(dtype)) + bindings[name] = Binding(name, dtype, shape, data, int(data.data.ptr)) + if model.binding_is_input(index) and dtype == np.float16: + fp16 = True +binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items()) +context = model.create_execution_context() + +image = img.copy() +image, ratio, dwdh = letterbox(image, auto=False) +image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + +image_copy = image.copy() + +image = image.transpose((2, 0, 1)) +image = np.expand_dims(image, 0) +image = np.ascontiguousarray(image) + +im = cp.asarray(image) +im = im.astype(cp.float32) +im /= 255 +im -= mean +im /= std + +# warmup for 10 times +for _ in range(10): + tmp = cp.random.randn(1, 3, 640, 640).astype(cp.float32) + binding_addrs['image'] = int(tmp.data.ptr) + context.execute_v2(list(binding_addrs.values())) + +start = time.perf_counter() +binding_addrs['image'] = int(im.data.ptr) +context.execute_v2(list(binding_addrs.values())) +print(f'Cost {(time.perf_counter() - start) * 1000}ms') + +nums = bindings['num_dets'].data +boxes = bindings['det_boxes'].data +scores = bindings['det_scores'].data +classes = bindings['det_classes'].data + +num = int(nums[0][0]) +box_img = boxes[0, :num].round().astype(cp.int32) +score_img = scores[0, :num] +clss_img = classes[0, :num] +for i, (box, score, clss) in enumerate(zip(box_img, score_img, clss_img)): + name = names[int(clss)] + color = colors[name] + cv2.rectangle(image_copy, box[:2].tolist(), box[2:].tolist(), color, 2) + cv2.putText(image_copy, name, (int(box[0]), int(box[1]) - 2), cv2.FONT_HERSHEY_SIMPLEX, + 0.75, [225, 255, 255], thickness=2) + +cv2.imshow('Result', cv2.cvtColor(image_copy, cv2.COLOR_RGB2BGR)) +cv2.waitKey(0) diff --git a/deploy/end2end_ppyoloe/end2end.py b/deploy/end2end_ppyoloe/end2end.py new file mode 100644 index 0000000000000000000000000000000000000000..fcfbf019a5d5755768e7defd573203a20a020ef7 --- /dev/null +++ b/deploy/end2end_ppyoloe/end2end.py @@ -0,0 +1,97 @@ +import argparse +import onnx +import onnx_graphsurgeon as gs +import numpy as np + +from pathlib import Path +from paddle2onnx.legacy.command import program2onnx +from collections import OrderedDict + + +def main(opt): + model_dir = Path(opt.model_dir) + save_file = Path(opt.save_file) + assert model_dir.exists() and model_dir.is_dir() + if save_file.is_dir(): + save_file = (save_file / model_dir.stem).with_suffix('.onnx') + elif save_file.is_file() and save_file.suffix != '.onnx': + save_file = save_file.with_suffix('.onnx') + input_shape_dict = {'image': [opt.batch_size, 3, *opt.img_size], + 'scale_factor': [opt.batch_size, 2]} + program2onnx(str(model_dir), str(save_file), + 'model.pdmodel', 'model.pdiparams', + opt.opset, input_shape_dict=input_shape_dict) + onnx_model = onnx.load(save_file) + try: + import onnxsim + onnx_model, check = onnxsim.simplify(onnx_model) + assert check, 'assert check failed' + except Exception as e: + print(f'Simplifier failure: {e}') + onnx.checker.check_model(onnx_model) + graph = gs.import_onnx(onnx_model) + graph.fold_constants() + graph.cleanup().toposort() + mul = concat = None + for node in graph.nodes: + if node.op == 'Div' and node.i(0).op == 'Mul': + mul = node.i(0) + if node.op == 'Concat' and node.o().op == 'Reshape' and node.o().o().op == 'ReduceSum': + concat = node + + assert mul.outputs[0].shape[1] == concat.outputs[0].shape[2], 'Something wrong in outputs shape' + + anchors = mul.outputs[0].shape[1] + classes = concat.outputs[0].shape[1] + + scores = gs.Variable(name='scores', shape=[opt.batch_size, anchors, classes], dtype=np.float32) + graph.layer(op='Transpose', name='lastTranspose', + inputs=[concat.outputs[0]], + outputs=[scores], + attrs=OrderedDict(perm=[0, 2, 1])) + + graph.inputs = [graph.inputs[0]] + + attrs = OrderedDict( + plugin_version="1", + background_class=-1, + max_output_boxes=opt.topk_all, + score_threshold=opt.conf_thres, + iou_threshold=opt.iou_thres, + score_activation=False, + box_coding=0, ) + outputs = [gs.Variable("num_dets", np.int32, [opt.batch_size, 1]), + gs.Variable("det_boxes", np.float32, [opt.batch_size, opt.topk_all, 4]), + gs.Variable("det_scores", np.float32, [opt.batch_size, opt.topk_all]), + gs.Variable("det_classes", np.int32, [opt.batch_size, opt.topk_all])] + graph.layer(op='EfficientNMS_TRT', name="batched_nms", + inputs=[mul.outputs[0], scores], + outputs=outputs, + attrs=attrs) + graph.outputs = outputs + graph.cleanup().toposort() + onnx.save(gs.export_onnx(graph), save_file) + + +def parse_opt(): + parser = argparse.ArgumentParser() + parser.add_argument('--model-dir', type=str, + default=None, + help='paddle static model') + parser.add_argument('--save-file', type=str, + default=None, + help='onnx model save path') + parser.add_argument('--opset', type=int, default=11, help='opset version') + parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size') + parser.add_argument('--batch-size', type=int, default=1, help='batch size') + parser.add_argument('--topk-all', type=int, default=100, help='topk objects for every images') + parser.add_argument('--iou-thres', type=float, default=0.45, help='iou threshold for NMS') + parser.add_argument('--conf-thres', type=float, default=0.25, help='conf threshold for NMS') + opt = parser.parse_args() + opt.img_size *= 2 if len(opt.img_size) == 1 else 1 + return opt + + +if __name__ == '__main__': + opt = parse_opt() + main(opt) diff --git a/deploy/lite/README.md b/deploy/lite/README.md index e8b58e35309a225f189ca6f05b684195b48c0b75..30447460eb6c4ccdf5c1013d1ea2d631d9073fba 100644 --- a/deploy/lite/README.md +++ b/deploy/lite/README.md @@ -1,8 +1,8 @@ # Paddle-Lite端侧部署 -本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleDetection模型的详细步骤。 +[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)是飞桨轻量化推理引擎,为手机、IOT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。 +本目录提供了PaddleDetection中主要模型在Paddle-Lite上的端到端部署代码。用户可以通过本教程了解如何使用该部分代码,基于Paddle-Lite实现在移动端部署PaddleDetection模型。 -Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。 ## 1. 准备环境 @@ -26,14 +26,10 @@ export NDK_ROOT=[YOUR_NDK_PATH]/android-ndk-r17c ### 1.2 准备预测库 预测库有两种获取方式: -1. [**建议**]直接下载,预测库下载链接如下:(请注意使用模型FP32/16版本需要与库相对应) - |平台| 架构 | 预测库下载链接| - |-|-|-| - |Android| arm7 | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv7.clang.c++_static.with_extra.with_cv.tar.gz) | - | Android | arm8 | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv.tar.gz) | - | Android | arm8(FP16) | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8_clang_c++_static_with_extra_with_cv_with_fp16.tiny_publish_427e46.zip) | +1. [**建议**]直接从[Paddle-Lite Release](https://github.com/PaddlePaddle/Paddle-Lite/releases)中, 根据设备类型与架构选择对应的预编译库,请注意使用模型FP32/16版本需要与库相对应,库文件的说明请参考[官方文档](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#android-toolchain-gcc)。 -**注意**:1. 如果是从 Paddle-Lite [官方文档](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#android-toolchain-gcc)下载的预测库,注意选择`with_extra=ON,with_cv=ON`的下载链接。2. 目前只提供Android端demo,IOS端demo可以参考[Paddle-Lite IOS demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/master/PaddleLite-ios-demo) +**注意**:(1) 如果是从 Paddle-Lite [官方文档](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#android-toolchain-gcc)下载的预测库,注意选择`with_extra=ON,with_cv=ON`的下载链接。2. 目前只提供Android端demo,IOS端demo可以参考[Paddle-Lite IOS demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/master/PaddleLite-ios-demo) +(2)PP-PicoDet部署需要Paddle Lite 2.11以上版本。 2. 编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下(Lite库在不断更新,如若下列命令无效,请以Lite官方repo为主): @@ -48,7 +44,7 @@ git checkout develop ./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON --with_arm82_fp16=ON ``` -**注意**:编译Paddle-Lite获得预测库时,需要打开`--with_cv=ON --with_extra=ON`两个选项,`--arch`表示`arm`版本,这里指定为armv8,更多编译命令介绍请参考[链接](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_andriod.html#id2)。 +**注意**:编译Paddle-Lite获得预测库时,需要打开`--with_cv=ON --with_extra=ON`两个选项,`--arch`表示`arm`版本,这里指定为armv8,更多编译命令介绍请参考[链接](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_options.html)。 直接下载预测库并解压后,可以得到`inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv/`文件夹,通过编译Paddle-Lite得到的预测库位于`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/`文件夹下。 预测库的文件目录如下: @@ -74,23 +70,23 @@ inference_lite_lib.android.armv8/ | | `-- libpaddle_lite_jni.so | `-- src |-- demo C++和Java示例代码 -| |-- cxx C++ 预测库demo +| |-- cxx C++ 预测库demo, 请将本文档目录下的PaddleDetection相关代码拷贝至该文件夹下执行交叉编译。 | `-- java Java 预测库demo ``` ## 2 开始运行 -### 2.1 模型优化 +### 2.1 模型转换 -Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-Lite的`opt`工具可以自动对inference模型进行优化,目前支持两种优化方式,优化后的模型更轻量,模型运行速度更快。 +Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-Lite的`opt`工具可以自动对inference模型进行优化,并转换为推理所使用的文件格式。目前支持两种优化方式,优化后的模型更轻量,模型运行速度更快。 **注意**:如果已经准备好了 `.nb` 结尾的模型文件,可以跳过此步骤。 #### 2.1.1 安装paddle_lite_opt工具 -安装`paddle_lite_opt`工具有如下两种方法: +安装`paddle_lite_opt`工具有如下两种方法, **请注意**,无论使用哪种方法,请尽量保证`paddle_lite_opt`工具和预测库的版本一致,以避免未知的Bug。 1. [**建议**]pip安装paddlelite并进行转换 ```shell - pip install paddlelite==2.10rc + pip install paddlelite ``` 2. 源码编译Paddle-Lite生成`paddle_lite_opt`工具 @@ -122,13 +118,14 @@ Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括 |--optimize_out_type|输出模型类型,目前支持两种类型:protobuf和naive_buffer,其中naive_buffer是一种更轻量级的序列化/反序列化实现,默认为naive_buffer| |--optimize_out|优化模型的输出路径| |--valid_targets|指定模型可执行的backend,默认为arm。目前可支持x86、arm、opencl、npu、xpu,可以同时指定多个backend(以空格分隔),Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU(Kirin 810/990 Soc搭载的达芬奇架构NPU),应当设置为npu, arm| +| --enable_fp16| true/false,是否使用fp16进行推理。如果开启,需要使用对应fp16的预测库| 更详细的`paddle_lite_opt`工具使用说明请参考[使用opt转化模型文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/opt/opt_bin.html) `--model_file`表示inference模型的model文件地址,`--param_file`表示inference模型的param文件地址;`optimize_out`用于指定输出文件的名称(不需要添加`.nb`的后缀)。直接在命令行中运行`paddle_lite_opt`,也可以查看所有参数及其说明。 -#### 2.1.3 转换示例 +#### 2.1.2 转换示例 下面以PaddleDetection中的 `PicoDet` 模型为例,介绍使用`paddle_lite_opt`完成预训练模型到inference模型,再到Paddle-Lite优化模型的转换。 @@ -259,16 +256,20 @@ deploy/ } ``` -* `keypoint_runtime_config.json` 包含了关键点检测的超参数,请按需进行修改: +* `keypoint_runtime_config.json` 同时包含了目标检测和关键点检测的超参数,支持Top-Down方案的推理流程,请按需进行修改: ```shell { + "model_dir_det": "./model_det/", #检测模型路径 + "batch_size_det": 1, #检测模型预测时batchsize, 存在关键点模型时只能为1 + "threshold_det": 0.5, #检测器输出阈值 "model_dir_keypoint": "./model_keypoint/", #关键点模型路径(不使用需为空字符) "batch_size_keypoint": 8, #关键点预测时batchsize "threshold_keypoint": 0.5, #关键点输出阈值 "image_file": "demo.jpg", #测试图片 "image_dir": "", #测试图片文件夹 - "run_benchmark": true, #性能测试开关 + "run_benchmark": true, #性能测试开关 "cpu_threads": 4 #线程数 + "use_dark_decode": true #是否使用DARK解码关键点坐标 } ``` @@ -299,7 +300,7 @@ chmod 777 main ## FAQ Q1:如果想更换模型怎么办,需要重新按照流程走一遍吗? -A1:如果已经走通了上述步骤,更换模型只需要替换 `.nb` 模型文件即可,同时要注意修改下配置文件中的 `.nb` 文件路径以及类别映射文件(如有必要)。 +A1:如果已经走通了上述步骤,更换模型只需要替换 `.nb` 模型文件及其对应模型配置文件`infer_cfg.json`,同时要注意修改下配置文件中的 `.nb` 文件路径以及类别映射文件(如有必要)。 Q2:换一个图测试怎么做? A2:替换 deploy 下的测试图像为你想要测试的图像,使用 ADB 再次 push 到手机上即可。 diff --git a/deploy/lite/include/config_parser.h b/deploy/lite/include/config_parser.h index 5171885ca954f50a44d511d24b3ca23845462d45..60d94c69e3b17aa9afea5dfb90e286f44d63f0bc 100644 --- a/deploy/lite/include/config_parser.h +++ b/deploy/lite/include/config_parser.h @@ -29,7 +29,7 @@ namespace PaddleDetection { -void load_jsonf(std::string jsonfile, const Json::Value& jsondata); +void load_jsonf(std::string jsonfile, Json::Value& jsondata); // Inference model configuration parser class ConfigPaser { diff --git a/deploy/lite/include/keypoint_postprocess.h b/deploy/lite/include/keypoint_postprocess.h index 0d1e747f306e44679d0500272e80df8a5fb19ab9..4e0e54c2640104488ef85e733af1c16bdc2d86aa 100644 --- a/deploy/lite/include/keypoint_postprocess.h +++ b/deploy/lite/include/keypoint_postprocess.h @@ -33,7 +33,8 @@ void transform_preds(std::vector& coords, std::vector& scale, std::vector& output_size, std::vector& dim, - std::vector& target_coords); + std::vector& target_coords, + bool affine); void box_to_center_scale(std::vector& box, int width, int height, diff --git a/deploy/lite/src/config_parser.cc b/deploy/lite/src/config_parser.cc index ed139a17dc8b2535877f3981849fdca8ce16993c..70c43e76c2c85d2917eb1c3384304260c591b85c 100644 --- a/deploy/lite/src/config_parser.cc +++ b/deploy/lite/src/config_parser.cc @@ -16,7 +16,7 @@ namespace PaddleDetection { -void load_jsonf(std::string jsonfile, const Json::Value &jsondata) { +void load_jsonf(std::string jsonfile, Json::Value &jsondata) { std::ifstream ifs; ifs.open(jsonfile); diff --git a/deploy/lite/src/keypoint_postprocess.cc b/deploy/lite/src/keypoint_postprocess.cc index 5f28d2adcffaee6b2a3135eb828996c3b00488fa..6c75ece87c2c8f743f0f112ab6bd23fdcc96a270 100644 --- a/deploy/lite/src/keypoint_postprocess.cc +++ b/deploy/lite/src/keypoint_postprocess.cc @@ -74,11 +74,26 @@ void transform_preds(std::vector& coords, std::vector& scale, std::vector& output_size, std::vector& dim, - std::vector& target_coords) { - cv::Mat trans(2, 3, CV_64FC1); - get_affine_transform(center, scale, 0, output_size, trans, 1); - for (int p = 0; p < dim[1]; ++p) { - affine_tranform(coords[p * 2], coords[p * 2 + 1], trans, target_coords, p); + std::vector& target_coords, + bool affine=false) { + if (affine) { + cv::Mat trans(2, 3, CV_64FC1); + get_affine_transform(center, scale, 0, output_size, trans, 1); + for (int p = 0; p < dim[1]; ++p) { + affine_tranform( + coords[p * 2], coords[p * 2 + 1], trans, target_coords, p); + } + } else { + float heat_w = static_cast(output_size[0]); + float heat_h = static_cast(output_size[1]); + float x_scale = scale[0] / heat_w; + float y_scale = scale[1] / heat_h; + float offset_x = center[0] - scale[0] / 2.; + float offset_y = center[1] - scale[1] / 2.; + for (int i = 0; i < dim[1]; i++) { + target_coords[i * 3 + 1] = x_scale * coords[i * 2] + offset_x; + target_coords[i * 3 + 2] = y_scale * coords[i * 2 + 1] + offset_y; + } } } diff --git a/deploy/pipeline/README.md b/deploy/pipeline/README.md new file mode 100644 index 0000000000000000000000000000000000000000..29d86d125875b2afbe1b95aac004e90ee8803a56 --- /dev/null +++ b/deploy/pipeline/README.md @@ -0,0 +1,109 @@ +简体中文 | [English](README_en.md) + +# 实时行人分析工具 PP-Human + +**PP-Human是基于飞桨深度学习框架的业界首个开源产业级实时行人分析工具,具有功能丰富,应用广泛和部署高效三大优势。** + +![](https://user-images.githubusercontent.com/22989727/178965250-14be25c1-125d-4d90-8642-7a9b01fecbe2.gif) + +PP-Human支持图片/单镜头视频/多镜头视频多种输入方式,功能覆盖多目标跟踪、属性识别、行为分析及人流量计数与轨迹记录。能够广泛应用于智慧交通、智慧社区、工业巡检等领域。支持服务器端部署及TensorRT加速,T4服务器上可达到实时。 + +## 📣 近期更新 + +- 🔥 **2022.7.13:PP-Human v2发布,行为识别、人体属性识别、流量计数、跨镜跟踪四大产业特色功能全面升级,覆盖行人检测、跟踪、属性三类核心算法能力,提供保姆级全流程开发及模型优化策略。** +- 2022.4.18:新增PP-Human全流程实战教程, 覆盖训练、部署、动作类型扩展等内容,AIStudio项目请见[链接](https://aistudio.baidu.com/aistudio/projectdetail/3842982) +- 2022.4.10:新增PP-Human范例,赋能社区智能精细化管理, AIStudio快速上手教程[链接](https://aistudio.baidu.com/aistudio/projectdetail/3679564) +- 2022.4.5:全新发布实时行人分析工具PP-Human,支持行人跟踪、人流量统计、人体属性识别与摔倒检测四大能力,基于真实场景数据特殊优化,精准识别各类摔倒姿势,适应不同环境背景、光线及摄像角度 + +## 🔮 功能介绍与效果展示 + +| ⭐ 功能 | 💟 方案优势 | 💡示例图 | +| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | +| **跨镜跟踪(ReID)** | 超强性能:针对目标遮挡、完整度、模糊度等难点特殊优化,实现mAP 98.8、1.5ms/人 | | +| **属性分析** | 兼容多种数据格式:支持图片、视频输入

    高性能:融合开源数据集与企业真实数据进行训练,实现mAP 94.86、2ms/人

    支持26种属性:性别、年龄、眼镜、上衣、鞋子、帽子、背包等26种高频属性 | | +| **行为识别** | 功能丰富:支持摔倒、打架、抽烟、打电话、人员闯入五种高频异常行为识别

    鲁棒性强:对光照、视角、背景环境无限制

    性能高:与视频识别技术相比,模型计算量大幅降低,支持本地化与服务化快速部署

    训练速度快:仅需15分钟即可产出高精度行为识别模型 | | +| **人流量计数**
    **轨迹记录** | 简洁易用:单个参数即可开启人流量计数与轨迹记录功能 | | + +## 🗳 模型库 + +
    +单模型效果(点击展开) + +| 任务 | 适用场景 | 精度 | 预测速度(ms)| 模型体积 | 预测部署模型 | +| :---------: |:---------: |:--------------- | :-------: | :------: | :------: | +| 目标检测(高精度) | 图片输入 | mAP: 57.8 | 25.1ms | 182M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 目标检测(轻量级) | 图片输入 | mAP: 53.2 | 16.2ms | 27M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | +| 目标跟踪(高精度) | 视频输入 | MOTA: 82.2 | 31.8ms | 182M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 目标跟踪(轻量级) | 视频输入 | MOTA: 73.9 | 21.0ms |27M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | +| 属性识别(高精度) | 图片/视频输入 属性识别 | mA: 95.4 | 单人4.2ms | 86M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | +| 属性识别(轻量级) | 图片/视频输入 属性识别 | mA: 94.5 | 单人2.9ms | 7.2M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | +| 关键点检测 | 视频输入 行为识别 | AP: 87.1 | 单人5.7ms | 101M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) | +| 基于关键点序列分类 | 视频输入 行为识别 | 准确率: 96.43 | 单人0.07ms | 21.8M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | +| 基于人体id图像分类 | 视频输入 行为识别 | 准确率: 86.85 | 单人1.8ms | 45M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | +| 基于人体id检测 | 视频输入 行为识别 | AP50: 79.5 | 单人10.9ms | 27M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | +| 视频分类 | 视频输入 行为识别 | Accuracy: 89.0 | 19.7ms/1s视频 | 90M | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | +| ReID | 视频输入 跨镜跟踪 | mAP: 98.8 | 单人0.23ms | 85M |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) | + +
    + +
    +端到端模型效果(点击展开) + +| 任务 | 端到端速度(ms)| 模型方案 | 模型体积 | +| :---------: | :-------: | :------: |:------: | +| 行人检测(高精度) | 25.1ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人检测(轻量级) | 16.2ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| 行人跟踪(高精度) | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人跟踪(轻量级) | 21.0ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| 属性识别(高精度) | 单人8.5ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | 目标检测:182M
    属性识别:86M | +| 属性识别(轻量级) | 单人7.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | 目标检测:182M
    属性识别:86M | +| 摔倒识别 | 单人10ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [关键点检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
    [基于关键点行为识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | 多目标跟踪:182M
    关键点检测:101M
    基于关键点行为识别:21.8M | +| 闯入识别 | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 打架识别 | 19.7ms | [视频分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M | +| 抽烟识别 | 单人15.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | 目标检测:182M
    基于人体id的目标检测:27M | +| 打电话识别 | 单人ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的图像分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | 目标检测:182M
    基于人体id的图像分类:45M | + +
    + + +点击模型方案中的模型即可下载指定模型,下载后解压存放至`./output_inference`目录中 + +## 📚 文档教程 + +### [快速开始](docs/tutorials/PPHuman_QUICK_STARTED.md) + +### 行人属性/特征识别 + +* [快速开始](docs/tutorials/pphuman_attribute.md) +* [二次开发教程](../../docs/advanced_tutorials/customization/pphuman_attribute.md) + * 数据准备 + * 模型优化 + * 新增属性 + +### 行为识别 + +* [快速开始](docs/tutorials/pphuman_action.md) + * 摔倒检测 + * 打架识别 +* [二次开发教程](../../docs/advanced_tutorials/customization/action_recognotion/README.md) + * 方案选择 + * 数据准备 + * 模型优化 + * 新增行为 + +### 跨镜跟踪ReID + +* [快速开始](docs/tutorials/pphuman_mtmct.md) +* [二次开发教程](../../docs/advanced_tutorials/customization/pphuman_mtmct.md) + * 数据准备 + * 模型优化 + +### 行人跟踪、人流量计数与轨迹记录 + +* [快速开始](docs/tutorials/pphuman_mot.md) + * 行人跟踪 + * 人流量计数与轨迹记录 + * 区域闯入判断和计数 +* [二次开发教程](../../docs/advanced_tutorials/customization/pphuman_mot.md) + * 数据准备 + * 模型优化 diff --git a/deploy/pipeline/README_en.md b/deploy/pipeline/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..227d08ec7b1467d48c365629373b09c196c32528 --- /dev/null +++ b/deploy/pipeline/README_en.md @@ -0,0 +1,113 @@ +[简体中文](README.md) | English + +# Real Time Pedestrian Analysis Tool PP-Human + +**PP-Human is the industry's first open-sourced real-time pedestrian analysis tool based on PaddlePaddle deep learning framework. It has three major features: rich functions, wide application, and efficient deployment.** + + + +![](https://user-images.githubusercontent.com/22989727/178965250-14be25c1-125d-4d90-8642-7a9b01fecbe2.gif) + + + +PP-Human supports various inputs such as images, single-camera, and multi-camera videos. It covers multi-object tracking, attributes recognition, behavior analysis, visitor traffic statistics, and trace records. PP-Human can be applied to fields including Smart Transportation, Smart Community, and industrial inspections. It can also be deployed on server sides and TensorRT accelerator. On the T4 server, it could achieve real-time analysis. + +## 📣 Updates + +- 🔥 **2022.7.13:PP-Human v2 launched with a full upgrade of four industrial features: behavior analysis, attributes recognition, visitor traffic statistics and ReID. It provides a strong core algorithm for pedestrian detection, tracking and attribute analysis with a simple and detailed development process and model optimization strategy.** +- 2022.4.18: Add PP-Human practical tutorials, including training, deployment, and action expansion. Details for AIStudio project please see [Link](https://aistudio.baidu.com/aistudio/projectdetail/3842982) + +- 2022.4.10: Add PP-Human examples; empower refined management of intelligent community management. A quick start for AIStudio [Link](https://aistudio.baidu.com/aistudio/projectdetail/3679564) +- 2022.4.5: Launch the real-time pedestrian analysis tool PP-Human. It supports pedestrian tracking, visitor traffic statistics, attributes recognition, and falling detection. Due to its specific optimization of real-scene data, it can accurately recognize various falling gestures, and adapt to different environmental backgrounds, light and camera angles. + +## 🔮 Features and demonstration + +| ⭐ Feature | 💟 Advantages | 💡Example | +| -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | +| **ReID** | Extraordinary performance: special optimization for technical challenges such as target occlusion, uncompleted and blurry objects to achieve mAP 98.8, 1.5ms/person | | +| **Attribute analysis** | Compatible with a variety of data formats: support for images, video input

    High performance: Integrated open-sourced datasets with real enterprise data for training, achieved mAP 94.86, 2ms/person

    Support 26 attributes: gender, age, glasses, tops, shoes, hats, backpacks and other 26 high-frequency attributes | | +| **Behaviour detection** | Rich function: support five high-frequency anomaly behavior detection of falling, fighting, smoking, telephoning, and intrusion

    Robust: unlimited by different environmental backgrounds, light, and camera angles.

    High performance: Compared with video recognition technology, it takes significantly smaller computation resources; support localization and service-oriented rapid deployment

    Fast training: only takes 15 minutes to produce high precision behavior detection models | | +| **Visitor traffic statistics**
    **Trace record** | Simple and easy to use: single parameter to initiate functions of visitor traffic statistics and trace record | | + +## 🗳 Model Zoo + +
    + Single model results (click to expand) + +| Task | Application | Accuracy | Inference speed(ms) | Model size | Inference deployment model | +|:-------------------------------------------:|:---------------------------------------:|:--------------- |:--------------------:|:----------:|:-------------------------------------------------------------------------------------------------------:| +| Object detection (high precision) | Image input | mAP: 57.8 | 25.1ms | 182M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| Object detection (Lightweight) | Image input | mAP: 53.2 | 16.2ms | 27M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | +| Object tracking (high precision) | Video input | MOTA: 82.2 | 31.8ms | 182M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| Object tracking (high precision) | Video input | MOTA: 73.9 | 21.0ms | 27M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | +| Attribute recognition (high precision) | Image/Video input Attribute recognition | mA: 95.4 | Single person 4.2ms | 86M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | +| Attribute recognition (Lightweight) | Image/Video input Attribute recognition | mA: 94.5 | Single person 2.9ms | 7.2M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | +| Keypoint detection | Video input Attribute recognition | AP: 87.1 | Single person 5.7ms | 101M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) | +| Classification based on key point sequences | Video input Attribute recognition | Accuracy: 96.43 | Single person 0.07ms | 21.8M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | +| Detection based on Human ID | Video input Attribute recognition | Accuracy: 86.85 | Single person 1.8ms | 45M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | +| Detection based on Human ID | Video input Attribute recognition | AP50: 79.5 | Single person 10.9ms | 27M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | +| Video classification | Video input Attribute recognition | Accuracy: 89.0 | 19.7ms/1s Video | 90M | [Link](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | +| ReID | Video input ReID | mAP: 98.8 | Single person 0.23ms | 85M | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) | + +
    + +
    +End-to-end model results (click to expand) + +| Task | End-to-End Speed(ms) | Model | Size | +|:--------------------------------------:|:--------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------:| +| Pedestrian detection (high precision) | 25.1ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Pedestrian detection (lightweight) | 16.2ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| Pedestrian tracking (high precision) | 31.8ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Pedestrian tracking (lightweight) | 21.0ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| Attribute recognition (high precision) | Single person8.5ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | Object detection:182M
    Attribute recognition:86M | +| Attribute recognition (lightweight) | Single person 7.1ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | Object detection:182M
    Attribute recognition:86M | +| Falling detection | Single person 10ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Keypoint detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
    [Behavior detection based on key points](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | Multi-object tracking:182M
    Keypoint detection:101M
    Behavior detection based on key points: 21.8M | +| Intrusion detection | 31.8ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| Fighting detection | 19.7ms | [Video classification](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M | +| Smoking detection | Single person 15.1ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Object detection based on Human Id](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | Object detection:182M
    Object detection based on Human ID: 27M | +| Phoning detection | Single person ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [Image classification based on Human ID](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | Object detection:182M
    Image classification based on Human ID:45M | + +
    + +Click to download the model, then unzip and save it in the `. /output_inference`. + +## 📚 Doc Tutorials + +### [A Quick Start](docs/tutorials/PPHuman_QUICK_STARTED.md) + +### Pedestrian attribute/feature recognition + +* [A quick start](docs/tutorials/pphuman_attribute.md) +* [Customized development tutorials](../../docs/advanced_tutorials/customization/pphuman_attribute.md) + * Data Preparation + * Model Optimization + * New Attributes + +### Behavior detection + +* [A quick start](docs/tutorials/pphuman_action.md) + * Falling detection + * Fighting detection +* [Customized development tutorials](../../docs/advanced_tutorials/customization/action_recognotion/README.md) + * Solution Selection + * Data Preparation + * Model Optimization + * New Attributes + +### ReID + +* [A quick start](docs/tutorials/pphuman_mtmct.md) +* [Customized development tutorials](../../docs/advanced_tutorials/customization/pphuman_mtmct.md) + * Data Preparation + * Model Optimization + +### Pedestrian tracking, visitor traffic statistics, trace records + +* [A quick start](docs/tutorials/pphuman_mot.md) + * Pedestrian tracking, + * Visitor traffic statistics + * Regional intrusion diagnosis and counting +* [Customized development tutorials](../../docs/advanced_tutorials/customization/pphuman_mot.md) + * Data Preparation + * Model Optimization diff --git a/deploy/pphuman/__init__.py b/deploy/pipeline/__init__.py similarity index 100% rename from deploy/pphuman/__init__.py rename to deploy/pipeline/__init__.py diff --git a/deploy/pipeline/cfg_utils.py b/deploy/pipeline/cfg_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..1e42b29c04068a55fe09d5c16c92fcc25b1fb7cd --- /dev/null +++ b/deploy/pipeline/cfg_utils.py @@ -0,0 +1,204 @@ +import ast +import yaml +import copy +import argparse +from argparse import ArgumentParser, RawDescriptionHelpFormatter + + +class ArgsParser(ArgumentParser): + def __init__(self): + super(ArgsParser, self).__init__( + formatter_class=RawDescriptionHelpFormatter) + self.add_argument( + "-o", "--opt", nargs='*', help="set configuration options") + + def parse_args(self, argv=None): + args = super(ArgsParser, self).parse_args(argv) + assert args.config is not None, \ + "Please specify --config=configure_file_path." + args.opt = self._parse_opt(args.opt) + return args + + def _parse_opt(self, opts): + config = {} + if not opts: + return config + for s in opts: + s = s.strip() + k, v = s.split('=', 1) + if '.' not in k: + config[k] = yaml.load(v, Loader=yaml.Loader) + else: + keys = k.split('.') + if keys[0] not in config: + config[keys[0]] = {} + cur = config[keys[0]] + for idx, key in enumerate(keys[1:]): + if idx == len(keys) - 2: + cur[key] = yaml.load(v, Loader=yaml.Loader) + else: + cur[key] = {} + cur = cur[key] + return config + + +def argsparser(): + parser = ArgsParser() + + parser.add_argument( + "--config", + type=str, + default=None, + help=("Path of configure"), + required=True) + parser.add_argument( + "--image_file", type=str, default=None, help="Path of image file.") + parser.add_argument( + "--image_dir", + type=str, + default=None, + help="Dir of image file, `image_file` has a higher priority.") + parser.add_argument( + "--video_file", + type=str, + default=None, + help="Path of video file, `video_file` or `camera_id` has a highest priority." + ) + parser.add_argument( + "--video_dir", + type=str, + default=None, + help="Dir of video file, `video_file` has a higher priority.") + parser.add_argument( + "--camera_id", + type=int, + default=-1, + help="device id of camera to predict.") + parser.add_argument( + "--output_dir", + type=str, + default="output", + help="Directory of output visualization files.") + parser.add_argument( + "--run_mode", + type=str, + default='paddle', + help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)") + parser.add_argument( + "--device", + type=str, + default='cpu', + help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU." + ) + parser.add_argument( + "--enable_mkldnn", + type=ast.literal_eval, + default=False, + help="Whether use mkldnn with CPU.") + parser.add_argument( + "--cpu_threads", type=int, default=1, help="Num of threads with CPU.") + parser.add_argument( + "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.") + parser.add_argument( + "--trt_max_shape", + type=int, + default=1280, + help="max_shape for TensorRT.") + parser.add_argument( + "--trt_opt_shape", + type=int, + default=640, + help="opt_shape for TensorRT.") + parser.add_argument( + "--trt_calib_mode", + type=bool, + default=False, + help="If the model is produced by TRT offline quantitative " + "calibration, trt_calib_mode need to set True.") + parser.add_argument( + "--do_entrance_counting", + action='store_true', + help="Whether counting the numbers of identifiers entering " + "or getting out from the entrance. Note that only support single-class MOT." + ) + parser.add_argument( + "--do_break_in_counting", + action='store_true', + help="Whether counting the numbers of identifiers break in " + "the area. Note that only support single-class MOT and " + "the video should be taken by a static camera.") + parser.add_argument( + "--region_type", + type=str, + default='horizontal', + help="Area type for entrance counting or break in counting, 'horizontal' and " + "'vertical' used when do entrance counting. 'custom' used when do break in counting. " + "Note that only support single-class MOT, and the video should be taken by a static camera." + ) + parser.add_argument( + '--region_polygon', + nargs='+', + type=int, + default=[], + help="Clockwise point coords (x0,y0,x1,y1...) of polygon of area when " + "do_break_in_counting. Note that only support single-class MOT and " + "the video should be taken by a static camera.") + parser.add_argument( + "--secs_interval", + type=int, + default=2, + help="The seconds interval to count after tracking") + parser.add_argument( + "--draw_center_traj", + action='store_true', + help="Whether drawing the trajectory of center") + + return parser + + +def merge_cfg(args): + # load config + with open(args.config) as f: + pred_config = yaml.safe_load(f) + + def merge(cfg, arg): + # update cfg from arg directly + merge_cfg = copy.deepcopy(cfg) + for k, v in cfg.items(): + if k in arg: + merge_cfg[k] = arg[k] + else: + if isinstance(v, dict): + merge_cfg[k] = merge(v, arg) + + return merge_cfg + + def merge_opt(cfg, arg): + merge_cfg = copy.deepcopy(cfg) + # merge opt + if 'opt' in arg.keys() and arg['opt']: + for name, value in arg['opt'].items( + ): # example: {'MOT': {'batch_size': 3}} + if name not in merge_cfg.keys(): + print("No", name, "in config file!") + continue + for sub_k, sub_v in value.items(): + if sub_k not in merge_cfg[name].keys(): + print("No", sub_k, "in config file of", name, "!") + continue + merge_cfg[name][sub_k] = sub_v + + return merge_cfg + + args_dict = vars(args) + pred_config = merge(pred_config, args_dict) + pred_config = merge_opt(pred_config, args_dict) + + return pred_config + + +def print_arguments(cfg): + print('----------- Running Arguments -----------') + buffer = yaml.dump(cfg) + print(buffer) + print('------------------------------------------') diff --git a/deploy/pipeline/config/examples/infer_cfg_calling.yml b/deploy/pipeline/config/examples/infer_cfg_calling.yml new file mode 100644 index 0000000000000000000000000000000000000000..8d74712aa0b8c9f16d7b89aec5307d16253438c3 --- /dev/null +++ b/deploy/pipeline/config/examples/infer_cfg_calling.yml @@ -0,0 +1,17 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +ID_BASED_CLSACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip + batch_size: 8 + threshold: 0.8 + display_frames: 80 + skip_frame_num: 2 + enable: True diff --git a/deploy/pipeline/config/examples/infer_cfg_fall_down.yml b/deploy/pipeline/config/examples/infer_cfg_fall_down.yml new file mode 100644 index 0000000000000000000000000000000000000000..5dc38bb23b161cb7e84e027a3a4dd381da3d246b --- /dev/null +++ b/deploy/pipeline/config/examples/infer_cfg_fall_down.yml @@ -0,0 +1,22 @@ +crop_thresh: 0.5 +kpt_thresh: 0.2 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +KPT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip + batch_size: 8 + +SKELETON_ACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip + batch_size: 1 + max_frames: 50 + display_frames: 80 + coord_size: [384, 512] + enable: True diff --git a/deploy/pipeline/config/examples/infer_cfg_fight_recognition.yml b/deploy/pipeline/config/examples/infer_cfg_fight_recognition.yml new file mode 100644 index 0000000000000000000000000000000000000000..76826ebaa45c0345e94d5ab8218293844cc96697 --- /dev/null +++ b/deploy/pipeline/config/examples/infer_cfg_fight_recognition.yml @@ -0,0 +1,11 @@ +visual: True +warmup_frame: 50 + +VIDEO_ACTION: + model_dir: https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip + batch_size: 1 + frame_len: 8 + sample_freq: 7 + short_size: 340 + target_size: 320 + enable: True diff --git a/deploy/pipeline/config/examples/infer_cfg_human_attr.yml b/deploy/pipeline/config/examples/infer_cfg_human_attr.yml new file mode 100644 index 0000000000000000000000000000000000000000..b4de76b3be2b4113c327c88ced7838224ced125b --- /dev/null +++ b/deploy/pipeline/config/examples/infer_cfg_human_attr.yml @@ -0,0 +1,15 @@ +crop_thresh: 0.5 +attr_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +ATTR: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip + batch_size: 8 + enable: True diff --git a/deploy/pipeline/config/examples/infer_cfg_human_mot.yml b/deploy/pipeline/config/examples/infer_cfg_human_mot.yml new file mode 100644 index 0000000000000000000000000000000000000000..7b9e739d4aa9139055c732728370fc414f31cfee --- /dev/null +++ b/deploy/pipeline/config/examples/infer_cfg_human_mot.yml @@ -0,0 +1,9 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True diff --git a/deploy/pipeline/config/examples/infer_cfg_reid.yml b/deploy/pipeline/config/examples/infer_cfg_reid.yml new file mode 100644 index 0000000000000000000000000000000000000000..42c7f6f20d7c194aa03b8d54df81958211b72452 --- /dev/null +++ b/deploy/pipeline/config/examples/infer_cfg_reid.yml @@ -0,0 +1,14 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +REID: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip + batch_size: 16 + enable: True diff --git a/deploy/pipeline/config/examples/infer_cfg_smoking.yml b/deploy/pipeline/config/examples/infer_cfg_smoking.yml new file mode 100644 index 0000000000000000000000000000000000000000..41a1475303ee25fe6f35c58d39891a868d9cecab --- /dev/null +++ b/deploy/pipeline/config/examples/infer_cfg_smoking.yml @@ -0,0 +1,17 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +ID_BASED_DETACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip + batch_size: 8 + threshold: 0.6 + display_frames: 80 + skip_frame_num: 2 + enable: True diff --git a/deploy/pipeline/config/infer_cfg_pphuman.yml b/deploy/pipeline/config/infer_cfg_pphuman.yml new file mode 100644 index 0000000000000000000000000000000000000000..a75c3222946691e50c8cba086e013feaaf718539 --- /dev/null +++ b/deploy/pipeline/config/infer_cfg_pphuman.yml @@ -0,0 +1,62 @@ +crop_thresh: 0.5 +attr_thresh: 0.5 +kpt_thresh: 0.2 +visual: True +warmup_frame: 50 + +DET: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + batch_size: 1 + +MOT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: False + +KPT: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip + batch_size: 8 + +ATTR: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip + batch_size: 8 + enable: False + +VIDEO_ACTION: + model_dir: https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip + batch_size: 1 + frame_len: 8 + sample_freq: 7 + short_size: 340 + target_size: 320 + enable: False + +SKELETON_ACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip + batch_size: 1 + max_frames: 50 + display_frames: 80 + coord_size: [384, 512] + enable: False + +ID_BASED_DETACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip + batch_size: 8 + threshold: 0.6 + display_frames: 80 + skip_frame_num: 2 + enable: False + +ID_BASED_CLSACTION: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip + batch_size: 8 + threshold: 0.8 + display_frames: 80 + skip_frame_num: 2 + enable: False + +REID: + model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip + batch_size: 16 + enable: False diff --git a/deploy/pipeline/config/infer_cfg_ppvehicle.yml b/deploy/pipeline/config/infer_cfg_ppvehicle.yml new file mode 100644 index 0000000000000000000000000000000000000000..87d84dee9a16908bfbec258a0f0531ac6a4f692c --- /dev/null +++ b/deploy/pipeline/config/infer_cfg_ppvehicle.yml @@ -0,0 +1,35 @@ +crop_thresh: 0.5 +visual: True +warmup_frame: 50 + +DET: + model_dir: output_inference/mot_ppyoloe_l_36e_ppvehicle/ + batch_size: 1 + +MOT: + model_dir: output_inference/mot_ppyoloe_l_36e_ppvehicle/ + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: False + +VEHICLE_PLATE: + det_model_dir: output_inference/ch_PP-OCRv3_det_infer/ + det_limit_side_len: 480 + det_limit_type: "max" + rec_model_dir: output_inference/ch_PP-OCRv3_rec_infer/ + rec_image_shape: [3, 48, 320] + rec_batch_num: 6 + word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt + enable: False + +VEHICLE_ATTR: + model_dir: output_inference/vehicle_attribute_infer/ + batch_size: 8 + color_threshold: 0.5 + type_threshold: 0.5 + enable: False + +REID: + model_dir: output_inference/vehicle_reid_model/ + batch_size: 16 + enable: False diff --git a/deploy/pphuman/config/tracker_config.yml b/deploy/pipeline/config/tracker_config.yml similarity index 58% rename from deploy/pphuman/config/tracker_config.yml rename to deploy/pipeline/config/tracker_config.yml index 5182da93e3f19bd49421e09d1ec01be0c0f11643..c4f3c894655c3a8c58bdfb1b124d98427eeef4df 100644 --- a/deploy/pphuman/config/tracker_config.yml +++ b/deploy/pipeline/config/tracker_config.yml @@ -2,7 +2,8 @@ # The tracker of MOT JDE Detector (such as FairMOT) is exported together with the model. # Here 'min_box_area' and 'vertical_ratio' are set for pedestrian, you can modify for other objects tracking. -type: JDETracker # 'JDETracker' or 'DeepSORTTracker' +type: OCSORTTracker # choose one tracker in ['JDETracker', 'OCSORTTracker'] + # BYTETracker JDETracker: @@ -11,16 +12,17 @@ JDETracker: conf_thres: 0.6 low_conf_thres: 0.1 match_thres: 0.9 - min_box_area: 100 - vertical_ratio: 1.6 # for pedestrian + min_box_area: 0 + vertical_ratio: 0 # 1.6 for pedestrian + -DeepSORTTracker: - input_size: [64, 192] +OCSORTTracker: + det_thresh: 0.4 + max_age: 30 + min_hits: 3 + iou_threshold: 0.3 + delta_t: 3 + inertia: 0.2 + vertical_ratio: 0 min_box_area: 0 - vertical_ratio: -1 - budget: 100 - max_age: 70 - n_init: 3 - metric_type: cosine - matching_threshold: 0.2 - max_iou_distance: 0.9 + use_byte: False diff --git a/deploy/pphuman/datacollector.py b/deploy/pipeline/datacollector.py similarity index 64% rename from deploy/pphuman/datacollector.py rename to deploy/pipeline/datacollector.py index cd459aad0680418bcd087d00662b0c310151ffc3..794711f04868d2dd70b13e235825472f855a290b 100644 --- a/deploy/pphuman/datacollector.py +++ b/deploy/pipeline/datacollector.py @@ -14,6 +14,7 @@ import os import copy +from collections import Counter class Result(object): @@ -23,8 +24,13 @@ class Result(object): 'mot': dict(), 'attr': dict(), 'kpt': dict(), - 'action': dict(), - 'reid': dict() + 'video_action': dict(), + 'skeleton_action': dict(), + 'reid': dict(), + 'det_action': dict(), + 'cls_action': dict(), + 'vehicleplate': dict(), + 'vehicle_attr': dict() } def update(self, res, name): @@ -35,10 +41,13 @@ class Result(object): return self.res_dict[name] return None + def clear(self, name): + self.res_dict[name].clear() + class DataCollector(object): """ - DataCollector of pphuman Pipeline, collect results in every frames and assign it to each track ids. + DataCollector of Pipeline, collect results in every frames and assign it to each track ids. mainly used in mtmct. data struct: @@ -50,13 +59,13 @@ class DataCollector(object): - qualities(list of float): Nx[float] - attrs(list of attr): refer to attrs for details - kpts(list of kpts): refer to kpts for details - - actions(list of actions): refer to actions for details + - skeleton_action(list of skeleton_action): refer to skeleton_action for details ... - [idN] """ def __init__(self): - #id, frame, rect, score, label, attrs, kpts, actions + #id, frame, rect, score, label, attrs, kpts, skeleton_action self.mots = { "frames": [], "rects": [], @@ -64,7 +73,8 @@ class DataCollector(object): "kpts": [], "features": [], "qualities": [], - "actions": [] + "skeleton_action": [], + "vehicleplate": [] } self.collector = {} @@ -72,15 +82,20 @@ class DataCollector(object): mot_res = Result.get('mot') attr_res = Result.get('attr') kpt_res = Result.get('kpt') - action_res = Result.get('action') + skeleton_action_res = Result.get('skeleton_action') reid_res = Result.get('reid') + vehicleplate_res = Result.get('vehicleplate') + + rects = [] + if reid_res is not None: + rects = reid_res['rects'] + elif mot_res is not None: + rects = mot_res['boxes'] - rects = reid_res['rects'] if reid_res is not None else mot_res['boxes'] for idx, mot_item in enumerate(rects): ids = int(mot_item[0]) if ids not in self.collector: self.collector[ids] = copy.deepcopy(self.mots) - self.collector[ids]["frames"].append(frameid) self.collector[ids]["rects"].append([mot_item[2:]]) if attr_res: @@ -88,16 +103,29 @@ class DataCollector(object): if kpt_res: self.collector[ids]["kpts"].append( [kpt_res['keypoint'][0][idx], kpt_res['keypoint'][1][idx]]) - if action_res: - self.collector[ids]["actions"].append(action_res[idx + 1]) + if skeleton_action_res and (idx + 1) in skeleton_action_res: + self.collector[ids]["skeleton_action"].append( + skeleton_action_res[idx + 1]) else: # action model generate result per X frames, Not available every frames - self.collector[ids]["actions"].append(None) + self.collector[ids]["skeleton_action"].append(None) if reid_res: self.collector[ids]["features"].append(reid_res['features'][ idx]) self.collector[ids]["qualities"].append(reid_res['qualities'][ idx]) + if vehicleplate_res and vehicleplate_res['plate'][idx] != "": + self.collector[ids]["vehicleplate"].append(vehicleplate_res[ + 'plate'][idx]) def get_res(self): return self.collector + + def get_carlp(self, trackid): + lps = self.collector[trackid]["vehicleplate"] + counter = Counter(lps) + carlp = counter.most_common() + if len(carlp) > 0: + return carlp[0][0] + else: + return None diff --git a/deploy/pphuman/docs/images/action.gif b/deploy/pipeline/docs/images/action.gif similarity index 100% rename from deploy/pphuman/docs/images/action.gif rename to deploy/pipeline/docs/images/action.gif diff --git a/deploy/pphuman/docs/images/attribute.gif b/deploy/pipeline/docs/images/attribute.gif similarity index 100% rename from deploy/pphuman/docs/images/attribute.gif rename to deploy/pipeline/docs/images/attribute.gif diff --git a/deploy/pphuman/docs/images/c1.gif b/deploy/pipeline/docs/images/c1.gif similarity index 100% rename from deploy/pphuman/docs/images/c1.gif rename to deploy/pipeline/docs/images/c1.gif diff --git a/deploy/pphuman/docs/images/c2.gif b/deploy/pipeline/docs/images/c2.gif similarity index 100% rename from deploy/pphuman/docs/images/c2.gif rename to deploy/pipeline/docs/images/c2.gif diff --git a/deploy/pipeline/docs/images/calling.gif b/deploy/pipeline/docs/images/calling.gif new file mode 100644 index 0000000000000000000000000000000000000000..52046b249b145d2099c7360d3c56abc3b51764bd Binary files /dev/null and b/deploy/pipeline/docs/images/calling.gif differ diff --git a/deploy/pipeline/docs/images/fight_demo.gif b/deploy/pipeline/docs/images/fight_demo.gif new file mode 100644 index 0000000000000000000000000000000000000000..4add8047baf382ff24a459ff4a74de7ac91d704a Binary files /dev/null and b/deploy/pipeline/docs/images/fight_demo.gif differ diff --git a/deploy/pphuman/docs/images/mot.gif b/deploy/pipeline/docs/images/mot.gif similarity index 100% rename from deploy/pphuman/docs/images/mot.gif rename to deploy/pipeline/docs/images/mot.gif diff --git a/deploy/pipeline/docs/images/smoking.gif b/deploy/pipeline/docs/images/smoking.gif new file mode 100644 index 0000000000000000000000000000000000000000..f354a71e478353f1ce5cb2270467e273ec74c84e Binary files /dev/null and b/deploy/pipeline/docs/images/smoking.gif differ diff --git a/deploy/pipeline/docs/images/vehicle_attribute.gif b/deploy/pipeline/docs/images/vehicle_attribute.gif new file mode 100644 index 0000000000000000000000000000000000000000..b2ed19934d5cab5229a732ee9cdb45458ab0e8a1 Binary files /dev/null and b/deploy/pipeline/docs/images/vehicle_attribute.gif differ diff --git a/deploy/pipeline/docs/tutorials/PPHuman_QUICK_STARTED.md b/deploy/pipeline/docs/tutorials/PPHuman_QUICK_STARTED.md new file mode 100644 index 0000000000000000000000000000000000000000..fad7845e469ecadc6cfd719a6d63bc1344fe8a81 --- /dev/null +++ b/deploy/pipeline/docs/tutorials/PPHuman_QUICK_STARTED.md @@ -0,0 +1,194 @@ +# 快速开始 + +## 目录 + +- [环境准备](#环境准备) +- [模型下载](#模型下载) +- [配置文件说明](#配置文件说明) +- [预测部署](#预测部署) + - [参数说明](#参数说明) +- [方案介绍](#方案介绍) + - [行人检测](#行人检测) + - [行人跟踪](#行人跟踪) + - [跨镜行人跟踪](#跨镜行人跟踪) + - [属性识别](#属性识别) + - [行为识别](#行为识别) + +## 环境准备 + +环境要求: PaddleDetection版本 >= release/2.4 或 develop版本 + +PaddlePaddle和PaddleDetection安装 + +``` +# PaddlePaddle CUDA10.1 +python -m pip install paddlepaddle-gpu==2.2.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html + +# PaddlePaddle CPU +python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + +# 克隆PaddleDetection仓库 +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +# 安装其他依赖 +cd PaddleDetection +pip install -r requirements.txt +``` + +1. 详细安装文档参考[文档](../../../../docs/tutorials/INSTALL_cn.md) +2. 如果需要TensorRT推理加速(测速方式),请安装带`TensorRT版本Paddle`。您可以从[Paddle安装包](https://paddleinference.paddlepaddle.org.cn/v2.2/user_guides/download_lib.html#python)下载安装,或者按照[指导文档](https://www.paddlepaddle.org.cn/inference/master/optimize/paddle_trt.html)使用docker或自编译方式准备Paddle环境。 + +## 模型下载 + +PP-Human提供了目标检测、属性识别、行为识别、ReID预训练模型,以实现不同使用场景,用户可以直接下载使用 + +| 任务 | 端到端速度(ms)| 模型方案 | 模型体积 | +| :---------: | :-------: | :------: |:------: | +| 行人检测(高精度) | 25.1ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人检测(轻量级) | 16.2ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| 行人跟踪(高精度) | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M | +| 行人跟踪(轻量级) | 21.0ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M | +| 属性识别(高精度) | 单人8.5ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | 目标检测:182M
    属性识别:86M | +| 属性识别(轻量级) | 单人7.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | 目标检测:182M
    属性识别:86M | +| 摔倒识别 | 单人10ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [关键点检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
    [基于关键点行为识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | 多目标跟踪:182M
    关键点检测:101M
    基于关键点行为识别:21.8M | +| 闯入识别 | 31.8ms | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 多目标跟踪:182M | +| 打架识别 | 19.7ms | [视频分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M | +| 抽烟识别 | 单人15.1ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | 目标检测:182M
    基于人体id的目标检测:27M | +| 打电话识别 | 单人ms | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)
    [基于人体id的图像分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | 目标检测:182M
    基于人体id的图像分类:45M | + +下载模型后,解压至`./output_inference`文件夹。 + +在配置文件中,模型路径默认为模型的下载路径,如果用户不修改,则在推理时会自动下载对应的模型。 + +**注意:** + +- 模型精度为融合数据集结果,数据集包含开源数据集和企业数据集 +- ReID模型精度为Market1501数据集测试结果 +- 预测速度为T4下,开启TensorRT FP16的效果, 模型预测速度包含数据预处理、模型预测、后处理全流程 + +## 配置文件说明 + +PP-Human相关配置位于```deploy/pipeline/config/infer_cfg_pphuman.yml```中,存放模型路径,该配置文件中包含了目前PP-Human支持的所有功能。如果想要查看某个单一功能的配置,请参见```deploy/pipeline/config/examples/```中相关配置。此外,配置文件中的内容可以通过```-o```命令行参数修改,如修改属性的模型目录,则可通过```-o ATTR.model_dir="DIR_PATH"```进行设置。 + +功能及任务类型对应表单如下: + +| 输入类型 | 功能 | 任务类型 | 配置项 | +|-------|-------|----------|-----| +| 图片 | 属性识别 | 目标检测 属性识别 | DET ATTR | +| 单镜头视频 | 属性识别 | 多目标跟踪 属性识别 | MOT ATTR | +| 单镜头视频 | 行为识别 | 多目标跟踪 关键点检测 摔倒识别 | MOT KPT SKELETON_ACTION | + +例如基于视频输入的属性识别,任务类型包含多目标跟踪和属性识别,具体配置如下: + +``` +crop_thresh: 0.5 +attr_thresh: 0.5 +visual: True + +MOT: + model_dir: output_inference/mot_ppyoloe_l_36e_pipeline/ + tracker_config: deploy/pipeline/config/tracker_config.yml + batch_size: 1 + enable: True + +ATTR: + model_dir: output_inference/strongbaseline_r50_30e_pa100k/ + batch_size: 8 + enable: True +``` + +**注意:** + +- 如果用户需要实现不同任务,可以在配置文件对应enable选项设置为True。 + + +## 预测部署 + +``` +# 行人检测,指定配置文件路径和测试图片 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --image_file=test_image.jpg --device=gpu [--run_mode trt_fp16] + +# 行人跟踪,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的MOT部分enable设置为```True``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_file=test_video.mp4 --device=gpu [--run_mode trt_fp16] + +# 行人跟踪,指定配置文件路径,模型路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的MOT部分enable设置为```True``` +# 命令行中指定的模型路径优先级高于配置文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_file=test_video.mp4 --device=gpu [--run_mode trt_fp16] + +# 行人属性识别,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的ATTR部分enable设置为```True``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_file=test_video.mp4 --device=gpu [--run_mode trt_fp16] + +# 行为识别,以摔倒识别为例,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的SKELETON_ACTION部分enable设置为```True``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_file=test_video.mp4 --device=gpu [--run_mode trt_fp16] + +# 行人跨境跟踪,指定配置文件路径和测试视频列表文件夹,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的REID部分enable设置为```True``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_dir=mtmct_dir/ --device=gpu [--run_mode trt_fp16] + +# 行人跨境跟踪,指定配置文件路径和测试视频列表文件夹,直接使用```deploy/pipeline/config/examples/infer_cfg_reid.yml```配置文件,并利用```-o```命令修改跟踪模型路径 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_reid.yml --video_dir=mtmct_dir/ -o MOT.model_dir="mot_model_dir" --device=gpu [--run_mode trt_fp16] + +``` + +对rtsp流的支持,video_file后面的视频地址更换为rtsp流地址,示例如下: +``` +# 行人属性识别,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_pphuman.yml```中的ATTR部分enable设置为```True``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o visual=False --video_file=rtsp://[YOUR_RTSP_SITE] --device=gpu [--run_mode trt_fp16] +``` + +### 参数说明 + +| 参数 | 是否必须|含义 | +|-------|-------|----------| +| --config | Yes | 配置文件路径 | +| -o | Option | 覆盖配置文件中对应的配置 | +| --image_file | Option | 需要预测的图片 | +| --image_dir | Option | 要预测的图片文件夹路径 | +| --video_file | Option | 需要预测的视频,或者rtsp流地址 | +| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测,可设置为:0 - (摄像头数目-1) ),预测过程中在可视化界面按`q`退出输出预测结果到:output/output.mp4| +| --device | Option | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU`| +| --output_dir | Option|可视化结果保存的根目录,默认为output/| +| --run_mode | Option |使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8)| +| --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速,默认为False | +| --cpu_threads | Option| 设置cpu线程数,默认为1 | +| --trt_calib_mode | Option| TensorRT是否使用校准功能,默认为False。使用TensorRT的int8功能时,需设置为True,使用PaddleSlim量化后的模型时需要设置为False | +| --do_entrance_counting | Option | 是否统计出入口流量,默认为False | +| --draw_center_traj | Option | 是否绘制跟踪轨迹,默认为False | + +## 方案介绍 + +PP-Human v2整体方案如下图所示: + +
    + +
    + + +### 行人检测 +- 采用PP-YOLOE L 作为目标检测模型 +- 详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/)和[检测跟踪文档](pphuman_mot.md) + +### 行人跟踪 +- 采用SDE方案完成行人跟踪 +- 检测模型使用PP-YOLOE L(高精度)和S(轻量级) +- 跟踪模块采用OC-SORT方案 +- 详细文档参考[OC-SORT](../../../../configs/mot/ocsort)和[检测跟踪文档](pphuman_mot.md) + +### 跨镜行人跟踪 +- 使用PP-YOLOE + OC-SORT得到单镜头多目标跟踪轨迹 +- 使用ReID(StrongBaseline网络)对每一帧的检测结果提取特征 +- 多镜头轨迹特征进行匹配,得到跨镜头跟踪结果 +- 详细文档参考[跨镜跟踪](pphuman_mtmct.md) + +### 属性识别 +- 使用PP-YOLOE + OC-SORT跟踪人体 +- 使用StrongBaseline(多分类模型)完成识别属性,主要属性包括年龄、性别、帽子、眼睛、上衣下衣款式、背包等 +- 详细文档参考[属性识别](pphuman_attribute.md) + +### 行为识别: +- 提供四种行为识别方案 +- 1. 基于骨骼点的行为识别,例如摔倒识别 +- 2. 基于图像分类的行为识别,例如打电话识别 +- 3. 基于检测的行为识别,例如吸烟识别 +- 4. 基于视频分类的行为识别,例如打架识别 +- 详细文档参考[行为识别](pphuman_action.md) diff --git a/deploy/pipeline/docs/tutorials/PPVehicle_QUICK_STARTED.md b/deploy/pipeline/docs/tutorials/PPVehicle_QUICK_STARTED.md new file mode 100644 index 0000000000000000000000000000000000000000..6a2f3d64a04f94c38a4b72322bc7255838ee78c0 --- /dev/null +++ b/deploy/pipeline/docs/tutorials/PPVehicle_QUICK_STARTED.md @@ -0,0 +1,137 @@ +# 快速开始 + +## 目录 + +- [环境准备](#环境准备) +- [模型下载](#模型下载) +- [配置文件说明](#配置文件说明) +- [预测部署](#预测部署) + - [参数说明](#参数说明) +- [方案介绍](#方案介绍) + - [车辆检测](#车辆检测) + - [车辆跟踪](#车辆跟踪) + - [车牌识别](#车牌识别) + - [属性识别](#属性识别) + + +## 环境准备 + +环境要求: PaddleDetection版本 >= release/2.4 或 develop版本 + +PaddlePaddle和PaddleDetection安装 + +``` +# PaddlePaddle CUDA10.1 +python -m pip install paddlepaddle-gpu==2.2.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html + +# PaddlePaddle CPU +python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + +# 克隆PaddleDetection仓库 +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +# 安装其他依赖 +cd PaddleDetection +pip install -r requirements.txt +``` + +1. 详细安装文档参考[文档](../../../../docs/tutorials/INSTALL_cn.md) +2. 如果需要TensorRT推理加速(测速方式),请安装带`TensorRT版本Paddle`。您可以从[Paddle安装包](https://paddleinference.paddlepaddle.org.cn/v2.2/user_guides/download_lib.html#python)下载安装,或者按照[指导文档](https://www.paddlepaddle.org.cn/inference/master/optimize/paddle_trt.html)使用docker或自编译方式准备Paddle环境。 + +## 模型下载 + + +## 配置文件说明 + +PP-Vehicle相关配置位于```deploy/pipeline/config/infer_cfg_ppvehicle.yml```中,存放模型路径,完成不同功能需要设置不同的任务类型 + +功能及任务类型对应表单如下: + +| 输入类型 | 功能 | 任务类型 | 配置项 | +|-------|-------|----------|-----| +| 图片 | 属性识别 | 目标检测 属性识别 | DET ATTR | +| 单镜头视频 | 属性识别 | 多目标跟踪 属性识别 | MOT ATTR | +| 单镜头视频 | 属性识别 | 多目标跟踪 属性识别 | MOT VEHICLEPLATE | + +例如基于视频输入的属性识别,任务类型包含多目标跟踪和属性识别,具体配置如下: + +``` + +``` + +**注意:** + +- 如果用户需要实现不同任务,可以在配置文件对应enable选项设置为True。 +- 如果用户仅需要修改模型文件路径,可以在命令行中加入 `--model_dir det=ppyoloe/` 即可,也可以手动修改配置文件中的相应模型路径,详细说明参考下方参数说明文档。 + + +## 预测部署 + +``` +# 车辆检测,指定配置文件路径和测试图片 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml --image_file=test_image.jpg --device=gpu [--run_mode trt_fp16] + +# 车辆跟踪,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_ppvehicle.yml```中的MOT部分enable设置为```True``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml --video_file=test_video.mp4 --device=gpu [--run_mode trt_fp16] + +# 车辆跟踪,指定配置文件路径,模型路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_ppvehicle.yml```中的MOT部分enable设置为```True``` +# 命令行中指定的模型路径优先级高于配置文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml --video_file=test_video.mp4 --device=gpu --model_dir det=ppyoloe/ [--run_mode trt_fp16] + +# 车辆属性识别,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_ppvehicle.yml```中的ATTR部分enable设置为```True``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml --video_file=test_video.mp4 --device=gpu [--run_mode trt_fp16] + +``` + +对rtsp流的支持,video_file后面的视频地址更换为rtsp流地址,示例如下: +``` +# 车辆属性识别,指定配置文件路径和测试视频,在配置文件```deploy/pipeline/config/infer_cfg_ppvehicle.yml```中的ATTR部分enable设置为```True``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml -o visual=False --video_file=rtsp://[YOUR_RTSP_SITE] --device=gpu [--run_mode trt_fp16] +``` + +### 参数说明 + +| 参数 | 是否必须|含义 | +|-------|-------|----------| +| --config | Yes | 配置文件路径 | +| --model_dir | Option | 各任务模型路径,优先级高于配置文件, 例如`--model_dir det=better_det/ attr=better_attr/`| +| --image_file | Option | 需要预测的图片 | +| --image_dir | Option | 要预测的图片文件夹路径 | +| --video_file | Option | 需要预测的视频,或者rtsp流地址 | +| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测,可设置为:0 - (摄像头数目-1) ),预测过程中在可视化界面按`q`退出输出预测结果到:output/output.mp4| +| --device | Option | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU`| +| --output_dir | Option|可视化结果保存的根目录,默认为output/| +| --run_mode | Option |使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8)| +| --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速,默认为False | +| --cpu_threads | Option| 设置cpu线程数,默认为1 | +| --trt_calib_mode | Option| TensorRT是否使用校准功能,默认为False。使用TensorRT的int8功能时,需设置为True,使用PaddleSlim量化后的模型时需要设置为False | +| --do_entrance_counting | Option | 是否统计出入口流量,默认为False | +| --draw_center_traj | Option | 是否绘制跟踪轨迹,默认为False | + +## 方案介绍 + +PP-Vehicle v2整体方案如下图所示: + +
    + +
    + + +### 车辆检测 +- 采用PP-YOLOE L 作为目标检测模型 +- 详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/)和[检测跟踪文档](ppvehicle_mot.md) + +### 车辆跟踪 +- 采用SDE方案完成车辆跟踪 +- 检测模型使用PP-YOLOE L(高精度)和S(轻量级) +- 跟踪模块采用OC-SORT方案 +- 详细文档参考[OC-SORT](../../../../configs/mot/ocsort)和[检测跟踪文档](ppvehicle_mot.md) + +### 属性识别 +- 使用PaddleClas提供的特色模型PP-LCNet,实现对车辆颜色及车型属性的识别。 +- 详细文档参考[属性识别](ppvehicle_attribute.md) + +### 车牌识别 +- 使用PaddleOCR特色模型ch_PP-OCRv3_det+ch_PP-OCRv3_rec模型,识别车牌号码 +- 详细文档参考[属性识别](ppvehicle_plate.md) diff --git a/deploy/pipeline/docs/tutorials/pphuman_action.md b/deploy/pipeline/docs/tutorials/pphuman_action.md new file mode 100644 index 0000000000000000000000000000000000000000..cf0c5af501b465de810d37813032bcb316e6fde5 --- /dev/null +++ b/deploy/pipeline/docs/tutorials/pphuman_action.md @@ -0,0 +1,270 @@ +[English](pphuman_action_en.md) | 简体中文 + +# PP-Human行为识别模块 + +## 目录 + +- [基于骨骼点的行为识别](#基于骨骼点的行为识别) +- [基于图像分类的行为识别](#基于图像分类的行为识别) +- [基于检测的行为识别](#基于检测的行为识别) +- [基于行人轨迹的行为识别](#基于行人轨迹的行为识别) +- [基于视频分类的行为识别](#基于视频分类的行为识别) + +行为识别在智慧社区,安防监控等方向具有广泛应用,根据行为的不同,PP-Human中集成了基于视频分类、基于检测、基于图像分类,基于行人轨迹以及基于骨骼点的行为识别模块,方便用户根据需求进行选择。 + +## 基于骨骼点的行为识别 + +应用行为:摔倒识别 + +
    + +
    数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用
    +
    + +### 模型库 + +基于骨骼点的行为识别包含行人检测/跟踪,关键点检测和摔倒行为识别三个模型,首先需要下载以下预训练模型 + +| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 | +|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: | +| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 16.2ms
    跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 关键点识别 | HRNet | AP: 87.1 | 单人 2.9ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)| +| 摔倒行为识别 | ST-GCN | 准确率: 96.43 | 单人 2.7ms | - |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | + +注: +1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/),[CrowdHuman](http://www.crowdhuman.org/),[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。 +2. 关键点模型使用[COCO](https://cocodataset.org/),[UAV-Human](https://github.com/SUTDCV/UAV-Human)和部分业务数据融合训练, 精度在业务数据测试集上得到。 +3. 摔倒行为识别模型使用[NTU-RGB+D](https://rose1.ntu.edu.sg/dataset/actionRecognition/),[UR Fall Detection Dataset](http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html)和部分业务数据融合训练,精度在业务数据测试集上得到。 +4. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。 + +### 配置说明 +[配置文件](../../config/infer_cfg_pphuman.yml)中与行为识别相关的参数如下: +``` +SKELETON_ACTION: # 基于骨骼点的行为识别模型配置 + model_dir: output_inference/STGCN # 模型所在路径 + batch_size: 1 # 预测批大小。 当前仅支持为1进行推理 + max_frames: 50 # 动作片段对应的帧数。在行人ID对应时序骨骼点结果时达到该帧数后,会通过行为识别模型判断该段序列的动作类型。与训练设置一致时效果最佳。 + display_frames: 80 # 显示帧数。当预测结果为摔倒时,在对应人物ID中显示状态的持续时间。 + coord_size: [384, 512] # 坐标统一缩放到的尺度大小。与训练设置一致时效果最佳。 + enable: False # 是否开启该功能 +``` + +### 使用方法 +1. 从`模型库`中下载`行人检测/跟踪`、`关键点识别`、`摔倒行为识别`三个预测部署模型并解压到```./output_inference```路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 目前行为识别模块仅支持视频输入,根据期望开启的行为识别方案类型,设置infer_cfg_pphuman.yml中`SKELETON_ACTION`的enable: True, 然后启动命令如下: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ +``` +3. 若修改模型路径,有以下两种方式: + + - ```./deploy/pipeline/config/infer_cfg_pphuman.yml```下可以配置不同模型路径,关键点模型和摔倒行为识别模型分别对应`KPT`和`SKELETON_ACTION`字段,修改对应字段下的路径为实际期望的路径即可。 + - 命令行中增加`--model_dir`修改模型路径: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --model_dir kpt=./dark_hrnet_w32_256x192 action=./STGCN +``` +4. 启动命令中的完整参数说明,请参考[参数说明](./PPHuman_QUICK_STARTED.md)。 + + +### 方案说明 +1. 使用多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)。 +2. 通过行人检测框的坐标在输入视频的对应帧中截取每个行人。 +3. 使用[关键点识别模型](../../../../configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml)得到对应的17个骨骼特征点。骨骼特征点的顺序及类型与COCO一致,详见[如何准备关键点数据集](../../../../docs/tutorials/data/PrepareKeypointDataSet.md)中的`COCO数据集`部分。 +4. 每个跟踪ID对应的目标行人各自累计骨骼特征点结果,组成该人物的时序关键点序列。当累计到预定帧数或跟踪丢失后,使用行为识别模型判断时序关键点序列的动作类型。当前版本模型支持摔倒行为的识别,预测得到的`class id`对应关系为: +``` +0: 摔倒, +1: 其他 +``` +- 摔倒行为识别模型使用了[ST-GCN](https://arxiv.org/abs/1801.07455),并基于[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md)套件完成模型训练。 + +## 基于图像分类的行为识别 + +应用行为:打电话识别 + +
    + +
    数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用
    +
    + +### 模型库 + +基于图像分类的行为识别包含行人检测/跟踪,打电话识别两个模型,首先需要下载以下预训练模型 + +| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 | +|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: | +| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 16.2ms
    跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 打电话识别 | PP-HGNet | 准确率: 86.85 | 单人 2.94ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | + + +注: +1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/),[CrowdHuman](http://www.crowdhuman.org/),[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。 +2. 打电话行为识别模型使用[UAV-Human](https://github.com/SUTDCV/UAV-Human)的打电话行为部分进行训练和测试。 +3. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。 + +### 配置说明 +[配置文件](../../config/infer_cfg_pphuman.yml)中相关的参数如下: +``` +ID_BASED_CLSACTION: # 基于分类的行为识别模型配置 + model_dir: output_inference/PPHGNet_tiny_calling_halfbody # 模型所在路径 + batch_size: 8 # 预测批大小 + threshold: 0.45 #识别为对应行为的阈值 + display_frames: 80 # 显示帧数。当识别到对应动作时,在对应人物ID中显示状态的持续时间。 + enable: False # 是否开启该功能 +``` + +### 使用方法 +1. 从`模型库`中下载`行人检测/跟踪`、`打电话行为识别`两个预测部署模型并解压到`./output_inference`路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 修改配置文件`deploy/pipeline/config/infer_cfg_pphuman.yml`中`ID_BASED_CLSACTION`下的`enable`为`True`; +3. 仅支持输入视频,启动命令如下: +``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu +``` +4. 启动命令中的完整参数说明,请参考[参数说明](./PPHuman_QUICK_STARTED.md)。 + +### 方案说明 +1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)。 +2. 通过行人检测框的坐标在输入视频的对应帧中截取每个行人。 +3. 通过在帧级别的行人图像通过图像分类的方式实现。当图片所属类别为对应行为时,即认为在一定时间段内该人物处于该行为状态中。该任务使用[PP-HGNet](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/PP-HGNet.md)实现,当前版本模型支持打电话行为的识别,预测得到的`class id`对应关系为: +``` +0: 打电话, +1: 其他 +``` +- 基于分类的行为识别基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/PP-HGNet.md#3.3)完成模型训练。 + + +## 基于检测的行为识别 + +应用行为:吸烟识别 + +
    + +
    数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用
    +
    + +### 模型库 +在这里,我们提供了行人检测/跟踪、吸烟行为识别的预训练模型,用户可以直接下载使用。 + +| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 | +|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: | +| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 16.2ms
    跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 吸烟行为识别 | PP-YOLOE | mAP: 39.7 | 单人 2.0ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.pdparams) | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | + +注: +1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/),[CrowdHuman](http://www.crowdhuman.org/),[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。 +2. 抽烟行为识别模型使用业务数据进行训练和测试。 +3. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。 + +### 配置说明 +[配置文件](../../config/infer_cfg_pphuman.yml)中相关的参数如下: +``` +ID_BASED_DETACTION: # 基于检测的行为识别模型配置 + model_dir: output_inference/ppyoloe_crn_s_80e_smoking_visdrone # 模型所在路径 + batch_size: 8 # 预测批大小 + threshold: 0.4 # 识别为对应行为的阈值 + display_frames: 80 # 显示帧数。当识别到对应动作时,在对应人物ID中显示状态的持续时间。 + enable: False # 是否开启该功能 +``` + +### 使用方法 +1. 从`模型库`中下载`行人检测/跟踪`、`抽烟行为识别`两个预测部署模型并解压到`./output_inference`路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 修改配置文件`deploy/pipeline/config/infer_cfg_pphuman.yml`中`ID_BASED_DETACTION`下的`enable`为`True`; +3. 仅支持输入视频,启动命令如下: +``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu +``` +4. 启动命令中的完整参数说明,请参考[参数说明](./PPHuman_QUICK_STARTED.md)。 + +### 方案说明 +1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)。 +2. 通过行人检测框的坐标在输入视频的对应帧中截取每个行人。 +3. 通过在帧级别的行人图像中检测该行为的典型特定目标实现。当检测到特定目标(在这里即烟头)以后,即认为在一定时间段内该人物处于该行为状态中。该任务使用[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)实现,当前版本模型支持吸烟行为的识别,预测得到的`class id`对应关系为: +``` +0: 吸烟, +1: 其他 +``` + +## 基于行人轨迹的行为识别 + +应用行为:闯入识别 + +
    + +
    + +具体使用请参照[PP-Human检测跟踪模块](pphuman_mot.md)的`5. 区域闯入判断和计数`。 + +### 方案说明 +1. 使用多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),跟踪方案为OC-SORT,详细文档参考[OC-SORT](../../../../configs/mot/ocsort)。 +2. 通过行人检测框的下边界中点在相邻帧位于用户所选区域的内外位置,来识别是否闯入所选区域。 + + +## 基于视频分类的行为识别 + +应用行为:打架识别 + +
    + +
    数据来源及版权归属:Surveillance Camera Fight Dataset。
    +
    + +该方案关注的场景为监控摄像头下的打架行为识别。打架行为涉及多人,基于骨骼点技术的方案更适用于单人的行为识别。此外,打架行为对时序信息依赖较强,基于检测和分类的方案也不太适用。由于监控场景背景复杂,人的密集程度、光线、拍摄角度等都会对识别造成影响,本方案采用基于视频分类的方式判断视频中是否存在打架行为。针对摄像头距离人较远的情况,通过增大输入图像分辨率优化。由于训练数据有限,采用数据增强的方式提升模型的泛化性能。 + +### 模型库 +在这里,我们提供了打架识别的预训练模型,用户可以直接下载使用。 + +| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 | +|:---------------------|:---------:|:------:|:------:| :------: |:---------------------------------------------------------------------------------: | +| 打架识别 | PP-TSM | 准确率:89.06% | 2s视频 128ms | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) | + +注: +1. 打架识别模型基于6个公开数据集训练得到:Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、Hockey Fight Detection Dataset、Video Fight Detection Dataset、Real Life Violence Situations Dataset、UBI Abnormal Event Detection Dataset。 +2. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。 + + +### 配置说明 +[配置文件](../../config/infer_cfg_pphuman.yml)中与行为识别相关的参数如下: +``` +VIDEO_ACTION: # 基于视频分类的行为识别模型配置 + model_dir: output_inference/ppTSM # 模型所在路径 + batch_size: 1 # 预测批大小。当前仅支持为1进行推理 + frame_len: 8 # 累计抽样帧数量,达到该数量后执行一次识别 + sample_freq: 7 # 抽样频率,即间隔多少帧抽样一帧 + short_size: 340 # 视频帧尺度变换最小边的长度 + target_size: 320 # 目标视频帧的大小 + enable: False # 是否开启该功能 +``` + +### 使用方法 +1. 从上表链接中下载`打架识别`任务的预测部署模型并解压到`./output_inference`路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 修改配置文件`deploy/pphuman/config/infer_cfg_pphuman.yml`中`VIDEO_ACTION`下的`enable`为`True`; +3. 仅支持输入视频,启动命令如下: +``` +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu +``` +4. 启动命令中的完整参数说明,请参考[参数说明](./PPHuman_QUICK_STARTED.md)。 + + +### 方案说明 + +目前打架识别模型使用的是[PP-TSM](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md),并在PP-TSM视频分类模型训练流程的基础上修改适配,完成模型训练。对于输入的视频或者视频流,进行等间隔抽帧,当视频帧累计到指定数目时,输入到视频分类模型中判断是否存在打架行为。 + + +## 参考文献 +``` +@inproceedings{stgcn2018aaai, + title = {Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition}, + author = {Sijie Yan and Yuanjun Xiong and Dahua Lin}, + booktitle = {AAAI}, + year = {2018}, +} +````` diff --git a/deploy/pipeline/docs/tutorials/pphuman_action_en.md b/deploy/pipeline/docs/tutorials/pphuman_action_en.md new file mode 100644 index 0000000000000000000000000000000000000000..943a5c5678997816f66941e785f4ac95256acf83 --- /dev/null +++ b/deploy/pipeline/docs/tutorials/pphuman_action_en.md @@ -0,0 +1,275 @@ +English | [简体中文](pphuman_action.md) + +# Action Recognition Module of PP-Human + +Action Recognition is widely used in the intelligent community/smart city, and security monitoring. PP-Human provides the module of video-classification-based, detection-based, image-classification-based and skeleton-based action recognition. + +## Model Zoo + +There are multiple available pretrained models including pedestrian detection/tracking, keypoint detection, fighting, calling, smoking and fall detection models. Users can download and use them directly. + +| Task | Algorithm | Precision | Inference Speed(ms) | Model Weights |Model Inference and Deployment | +|:----------------------------- |:---------:|:-------------------------:|:-----------------------------------:| :-----------------: |:-----------------------------------------------------------------------------------------:| +| Pedestrian Detection/Tracking | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | Detection: 28ms
    Tracking:33.1ms |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.pdparams) |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| Calling Recognition | PP-HGNet | Precision Rate: 86.85 | Single Person 2.94ms | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.pdparams) | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | +| Smoking Recognition | PP-YOLOE | mAP: 39.7 | Single Person 2.0ms | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.pdparams) | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | +| Keypoint Detection | HRNet | AP: 87.1 | Single Person 2.9ms |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.pdparams) |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) | +| Falling Recognition | ST-GCN | Precision Rate: 96.43 | Single Person 2.7ms | - |[Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | +| Fighting Recognition | PP-TSM | Precision Rate: 89.06% | 128ms for a 2sec video | [Link](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [Link](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) | + +Note: + +1. The precision of the pedestrian detection/ tracking model is obtained by trainning and testing on [MOT17](https://motchallenge.net/), [CrowdHuman](http://www.crowdhuman.org/), [HIEVE](http://humaninevents.org/) and some business data. + +2. The keypoint detection model is trained on [COCO](https://cocodataset.org/), [UAV-Human](https://github.com/SUTDCV/UAV-Human), and some business data, and the precision is obtained on test sets of business data. + +3. The falling action recognition model is trained on [NTU-RGB+D](https://rose1.ntu.edu.sg/dataset/actionRecognition/), [UR Fall Detection Dataset](http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html), and some business data, and the precision is obtained on the testing set of business data. + +4. The calling action recognition model is trained and tested on [UAV-Human](https://github.com/SUTDCV/UAV-Human), by using video frames of calling in this dataset. + +5. The smoking action recognition model is trained and tested on business data. + +6. The fighting action recognition model is trained and tested on 6 public datasets, including Surveillance Camera Fight Dataset, A Dataset for Automatic Violence Detection in Videos, Hockey Fight Detection Dataset, Video Fight Detection Dataset, Real Life Violence Situations Dataset, UBI Abnormal Event Detection Dataset. + +7. The inference speed is the speed of using TensorRT FP16 on NVIDIA T4, including the total time of data pre-training, model inference, and post-processing. + + +## Skeleton-based action recognition -- falling detection + +
    Data source and copyright owner:Skyinfor +Technology. Thanks for the provision of actual scenario data, which are only +used for academic research here.
    + +
    + +### Description of Configuration + +Parameters related to action recognition in the [config file](../../config/infer_cfg_pphuman.yml) are as follow: + +``` +SKELETON_ACTION: # Config for skeleton-based action recognition model + model_dir: output_inference/STGCN # Path of the model + batch_size: 1 # The size of the inference batch. Current now only support 1. + max_frames: 50 # The number of frames of action segments. When frames of time-ordered skeleton keypoints of each pedestrian ID achieve the max value,the action type will be judged by the action recognition model. If the setting is the same as the training, there will be an ideal inference result. + display_frames: 80 # The number of display frames. When the inferred action type is falling down, the time length of the act will be displayed in the ID. + coord_size: [384, 512] # The unified size of the coordinate, which is the best when it is the same as the training setting. + enable: False # Whether to enable this function +``` + + +## How to Use + +1. Download models `Pedestrian Detection/Tracking`, `Keypoint Detection` and `Falling Recognition` from the links in the Model Zoo and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. + +2. Now the only available input is the video input in the action recognition module. set the "enable: True" of `SKELETON_ACTION` in infer_cfg_pphuman.yml. And then run the command: + + ```python + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu + ``` + +3. There are two ways to modify the model path: + + - In ```./deploy/pipeline/config/infer_cfg_pphuman.yml```, you can configurate different model paths,which is proper only if you match keypoint models and action recognition models with the fields of `KPT` and `SKELETON_ACTION` respectively, and modify the corresponding path of each field into the expected path. + - Add `--model_dir` in the command line to revise the model path: + + ```python + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --model_dir kpt=./dark_hrnet_w32_256x192 action=./STGCN + ``` +4. For detailed parameter description, please refer to [Parameter Description](./PPHuman_QUICK_STARTED.md) + +### Introduction to the Solution + +1. Get the pedestrian detection box and the tracking ID number of the video input through object detection and multi-object tracking. The adopted model is PP-YOLOE, and for details, please refer to [PP-YOLOE](../../../../configs/ppyoloe). + +2. Capture every pedestrian in frames of the input video accordingly by using the coordinate of the detection box. +3. In this strategy, we use the [keypoint detection model](../../../../configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml) to obtain 17 skeleton keypoints. Their sequences and types are identical to those of COCO. For details, please refer to the `COCO dataset` part of [how to prepare keypoint datasets](../../../../docs/tutorials/data/PrepareKeypointDataSet_en.md). + +4. Each target pedestrian with a tracking ID has their own accumulation of skeleton keypoints, which is used to form a keypoint sequence in time order. When the number of accumulated frames reach a preset threshold or the tracking is lost, the action recognition model will be applied to judging the action type of the time-ordered keypoint sequence. The current model only supports the recognition of the act of falling down, and the relationship between the action type and `class id` is: + +``` +0: Fall down + +1: Others +``` +- The falling action recognition model uses [ST-GCN](https://arxiv.org/abs/1801.07455), and employ the [PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md) toolkit to complete model training. + +## Image-Classification-Based Action Recognition -- Calling Recognition + +
    Data source and copyright owner:Skyinfor +Technology. Thanks for the provision of actual scenario data, which are only +used for academic research here.
    + +
    + +### Description of Configuration + +Parameters related to action recognition in the [config file](../../config/infer_cfg_pphuman.yml) are as follow: + +``` +ID_BASED_CLSACTION: # config for classfication-based action recognition model + model_dir: output_inference/PPHGNet_tiny_calling_halfbody # Path of the model + batch_size: 8 # The size of the inference batch + threshold: 0.45 # Threshold for corresponding behavior + display_frames: 80 # The number of display frames. When the corresponding action is detected, the time length of the act will be displayed in the ID. + enable: False # Whether to enable this function +``` + +### How to Use + +1. Download models `Pedestrian Detection/Tracking` and `Calling Recognition` from the links in `Model Zoo` and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. + +2. Now the only available input is the video input in the action recognition module. Set the "enable: True" of `ID_BASED_CLSACTION` in infer_cfg_pphuman.yml. + +3. Run this command: + ```python + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu + ``` +4. For detailed parameter description, please refer to [Parameter Description](./PPHuman_QUICK_STARTED.md) + +### Introduction to the Solution +1. Get the pedestrian detection box and the tracking ID number of the video input through object detection and multi-object tracking. The adopted model is PP-YOLOE, and for details, please refer to [PP-YOLOE](../../../configs/ppyoloe). + +2. Capture every pedestrian in frames of the input video accordingly by using the coordinate of the detection box. +3. With image classification through pedestrian images at the frame level, when the category to which the image belongs is the corresponding behavior, it is considered that the character is in the behavior state for a certain period of time. This task is implemented with [PP-HGNet](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/PP-HGNet.md). In current version, the behavior of calling is supported and the relationship between the action type and `class id` is: +``` +0: Calling + +1: Others +``` + + +## Detection-based Action Recognition -- Smoking Detection + +
    Data source and copyright owner:Skyinfor +Technology. Thanks for the provision of actual scenario data, which are only +used for academic research here.
    + +
    + +### Description of Configuration + +Parameters related to action recognition in the [config file](../../config/infer_cfg_pphuman.yml) are as follow: +``` +ID_BASED_DETACTION: # Config for detection-based action recognition model + model_dir: output_inference/ppyoloe_crn_s_80e_smoking_visdrone # Path of the model + batch_size: 8 # The size of the inference batch + threshold: 0.4 # Threshold for corresponding behavior. + display_frames: 80 # The number of display frames. When the corresponding action is detected, the time length of the act will be displayed in the ID. + enable: False # Whether to enable this function +``` + +### How to Use + +1. Download models `Pedestrian Detection/Tracking` and `Smoking Recognition` from the links in `Model Zoo` and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. + +2. Now the only available input is the video input in the action recognition module. set the "enable: True" of `ID_BASED_DETACTION` in infer_cfg_pphuman.yml. + +3. Run this command: + ```python + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu + ``` +4. For detailed parameter description, please refer to [Parameter Description](./PPHuman_QUICK_STARTED.md) + +### Introduction to the Solution +1. Get the pedestrian detection box and the tracking ID number of the video input through object detection and multi-object tracking. The adopted model is PP-YOLOE, and for details, please refer to [PP-YOLOE](../../../../configs/ppyoloe). + +2. Capture every pedestrian in frames of the input video accordingly by using the coordinate of the detection box. + +3. We detecting the typical specific target of this behavior in frame-level pedestrian images. When a specific target (in this case, cigarette is the target) is detected, it is considered that the character is in the behavior state for a certain period of time. This task is implemented by [PP-YOLOE](../../../../configs/ppyoloe/). In current version, the behavior of smoking is supported and the relationship between the action type and `class id` is: + +``` +0: Smoking + +1: Others +``` + +## Video-Classification-Based Action Recognition -- Fighting Detection +With wider and wider deployment of surveillance cameras, it is time-consuming and labor-intensive and inefficient to manually check whether there are abnormal behaviors such as fighting. AI + security assistant smart security. A fight recognition module is integrated into PP-Human to identify whether there is fighting in the video. We provide pre-trained models that users can download and use directly. + +| Task | Model | Acc. | Speed(ms) | Weight | Deploy Model | +| ---- | ---- | ---------- | ---- | ---- | ---------- | +| Fighting Detection | PP-TSM | 89.06% | 128ms for a 2-sec video| [Link](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [Link](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) | + + +The model is trained with 6 public dataset, including Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、Hockey Fight Detection Dataset、Video Fight Detection Dataset、Real Life Violence Situations Dataset、UBI Abnormal Event Detection Dataset. + +This project focuses on is the identification of fighting behavior under surveillance cameras. Fighting behavior involves multiple people, and the skeleton-based technology is more suitable for single-person behavior recognition. In addition, fighting behavior is strongly dependent on timing information, and the detection and classification-based scheme is not suitable. Due to the complex background of the monitoring scene, the density of people, light, filming angle may affect the accuracy. This solution uses video-classification-based method to determine whether there is fighting in the video. +For the case where the camera is far away from the person, it is optimized by increasing the resolution of the input image. Due to the limited training data, data augmentation is used to improve the generalization performance of the model. + + +### Description of Configuration + +Parameters related to action recognition in the [config file](../../config/infer_cfg_pphuman.yml) are as follow: +``` +VIDEO_ACTION: # Config for detection-based action recognition model + model_dir: output_inference/ppTSM # Path of the model + batch_size: 1 # The size of the inference batch. Current now only support 1. + frame_len: 8 # Accumulate the number of sampling frames. Inference will be executed when sampled frames reached this value. + sample_freq: 7 # Sampling frequency. It means how many frames to sample one frame. + short_size: 340 # The shortest length for video frame scaling transforms. + target_size: 320 # Target size for input video + enable: False # Whether to enable this function +``` + +### How to Use + +1. Download model `Fighting Detection` from the links of the above table and unzip it to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. + +2. Modify the file names in the `ppTSM` folder to `model.pdiparams, model.pdiparams.info and model.pdmodel`; + +3. Now the only available input is the video input in the action recognition module. set the "enable: True" of `VIDEO_ACTION` in infer_cfg_pphuman.yml. + +4. Run this command: + ```python + python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu + ``` +5. For detailed parameter description, please refer to [Parameter Description](./PPHuman_QUICK_STARTED.md). + + +The result is shown as follow: + +
    + +
    + +Data source and copyright owner: Surveillance Camera Fight Dataset. + +### Introduction to the Solution +The current fight recognition model is using [PP-TSM](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md), and adaptated to complete the model training. For the input video or video stream, we extraction frame at a certain interval. When the video frame accumulates to the specified number, it is input into the video classification model to determine whether there is fighting. + + +## Custom Training + +The pretrained models are provided and can be used directly, including pedestrian detection/ tracking, keypoint detection, smoking, calling and fighting recognition. If users need to train custom action or optimize the model performance, please refer the link below. + +| Task | Model | Development Document | +| ---- | ---- | -------- | +| pedestrian detection/tracking | PP-YOLOE | [doc](../../../../configs/ppyoloe/README.md#getting-start) | +| keypoint detection | HRNet | [doc](../../../../configs/keypoint/README_en.md#3training-and-testing) | +| action recognition (fall down) | ST-GCN | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md) | +| action recognition (smoking) | PP-YOLOE | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_det.md) | +| action recognition (calling) | PP-HGNet | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md) | +| action recognition (fighting) | PP-TSM | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md) | + + +## Reference + +``` +@inproceedings{stgcn2018aaai, + title = {Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition}, + author = {Sijie Yan and Yuanjun Xiong and Dahua Lin}, + booktitle = {AAAI}, + year = {2018}, +} +``` diff --git a/deploy/pphuman/docs/attribute.md b/deploy/pipeline/docs/tutorials/pphuman_attribute.md similarity index 40% rename from deploy/pphuman/docs/attribute.md rename to deploy/pipeline/docs/tutorials/pphuman_attribute.md index 63c72810fd7029ddf7d90c811a70e91edc31ff94..16109606c0cbfbac9794512648bf34a7150d0cb7 100644 --- a/deploy/pphuman/docs/attribute.md +++ b/deploy/pipeline/docs/tutorials/pphuman_attribute.md @@ -1,4 +1,4 @@ -[English](attribute_en.md) | 简体中文 +[English](pphuman_attribute_en.md) | 简体中文 # PP-Human属性识别模块 @@ -6,53 +6,78 @@ | 任务 | 算法 | 精度 | 预测速度(ms) |下载链接 | |:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | -| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 28ms
    跟踪:33.1ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | -| 行人属性分析 | StrongBaseline | mA: 94.86 | 单人 2ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | +| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 16.2ms
    跟踪:22.3ms |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 行人属性高精度模型 | PP-HGNet_small | mA: 95.4 | 单人 1.54ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | +| 行人属性轻量级模型 | PP-LCNet_x1_0 | mA: 94.5 | 单人 0.54ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip) | +| 行人属性精度与速度均衡模型 | PP-HGNet_tiny | mA: 95.2 | 单人 1.14ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_person_attribute_952_infer.zip) | -1. 检测/跟踪模型精度为MOT17,CrowdHuman,HIEVE和部分业务数据融合训练测试得到 -2. 行人属性分析精度为PA100k,RAPv2,PETA和部分业务数据融合训练测试得到 -3. 预测速度为T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程 + +1. 检测/跟踪模型精度为[MOT17](https://motchallenge.net/),[CrowdHuman](http://www.crowdhuman.org/),[HIEVE](http://humaninevents.org/)和部分业务数据融合训练测试得到。 +2. 行人属性分析精度为[PA100k](https://github.com/xh-liu/HydraPlus-Net#pa-100k-dataset),[RAPv2](http://www.rapdataset.com/rapv2.html),[PETA](http://mmlab.ie.cuhk.edu.hk/projects/PETA.html)和部分业务数据融合训练测试得到 +3. 预测速度为V100 机器上使用TensorRT FP16时的速度, 该处测速速度为模型预测速度 +4. 属性模型应用依赖跟踪模型结果,请在[跟踪模型页面](./pphuman_mot.md)下载跟踪模型,依自身需求选择高精或轻量级下载。 +5. 模型下载后解压放置在PaddleDetection/output_inference/目录下。 ## 使用方法 -1. 从上表链接中下载模型并解压到```./output_inference```路径下 -2. 图片输入时,启动命令如下 +1. 从上表链接中下载模型并解压到```PaddleDetection/output_inference```路径下,并修改配置文件中模型路径,也可默认自动下载模型。设置```deploy/pipeline/config/infer_cfg_pphuman.yml```中`ATTR`的enable: True + +`infer_cfg_pphuman.yml`中配置项说明: +``` +ATTR: #模块名称 + model_dir: output_inference/PPLCNet_x1_0_person_attribute_945_infer/ #模型路径 + batch_size: 8 #推理最大batchsize + enable: False #功能是否开启 +``` + +2. 图片输入时,启动命令如下(更多命令参数说明,请参考[快速开始-参数说明](./PPHuman_QUICK_STARTED.md#41-参数说明))。 ```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ +#单张图片 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --image_file=test_image.jpg \ --device=gpu \ - --enable_attr=True + +#图片文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --image_dir=images/ \ + --device=gpu \ + ``` 3. 视频输入时,启动命令如下 ```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ +#单个视频文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --video_file=test_video.mp4 \ --device=gpu \ - --enable_attr=True + +#视频文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_dir=test_videos/ \ + --device=gpu \ ``` + 4. 若修改模型路径,有以下两种方式: - - ```./deploy/pphuman/config/infer_cfg.yml```下可以配置不同模型路径,属性识别模型修改ATTR字段下配置 - - **(推荐)**命令行中增加`--model_dir`修改模型路径: + - 方法一:```./deploy/pipeline/config/infer_cfg_pphuman.yml```下可以配置不同模型路径,属性识别模型修改ATTR字段下配置 + - 方法二:命令行中增加`--model_dir`修改模型路径: ```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --video_file=test_video.mp4 \ --device=gpu \ - --enable_attr=True \ - --model_dir det=ppyoloe/ + --model_dir attr=output_inference/PPLCNet_x1_0_person_attribute_945_infer/ ``` 测试效果如下:
    - +
    数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用 ## 方案说明 -1. 目标检测/多目标跟踪获取图片/视频输入中的行人检测框,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../configs/ppyoloe) +1. 目标检测/多目标跟踪获取图片/视频输入中的行人检测框,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../configs/ppyoloe/README_cn.md) 2. 通过行人检测框的坐标在输入图像中截取每个行人 3. 使用属性识别分析每个行人对应属性,属性类型与PA100k数据集相同,具体属性列表如下: ``` @@ -73,7 +98,7 @@ python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ - 穿靴:是、否 ``` -4. 属性识别模型方案为[StrongBaseline](https://arxiv.org/pdf/2107.03576.pdf),模型结构为基于ResNet50的多分类网络结构,引入Weighted BCE loss和EMA提升模型效果。 +4. 属性识别模型方案为[StrongBaseline](https://arxiv.org/pdf/2107.03576.pdf),模型结构为基于PP-HGNet、PP-LCNet的多分类网络结构,引入Weighted BCE loss提升模型效果。 ## 参考文献 ``` diff --git a/deploy/pphuman/docs/attribute_en.md b/deploy/pipeline/docs/tutorials/pphuman_attribute_en.md similarity index 46% rename from deploy/pphuman/docs/attribute_en.md rename to deploy/pipeline/docs/tutorials/pphuman_attribute_en.md index 38cbc7a7861f1b6902c3dd1f6186b00d46d9769f..70c55de0edea1a2c3b0574efd7835c10d94f74e6 100644 --- a/deploy/pphuman/docs/attribute_en.md +++ b/deploy/pipeline/docs/tutorials/pphuman_attribute_en.md @@ -1,4 +1,4 @@ -English | [简体中文](attribute.md) +English | [简体中文](pphuman_attribute.md) # Attribute Recognition Modules of PP-Human @@ -6,40 +6,61 @@ Pedestrian attribute recognition has been widely used in the intelligent communi | Task | Algorithm | Precision | Inference Speed(ms) | Download Link | |:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | -| Pedestrian Detection/ Tracking | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | Detection: 28ms
    Tracking:33.1ms | [Download Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | -| Pedestrian Attribute Analysis | StrongBaseline | ma: 94.86 | Per Person 2ms | [Download Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.tar) | +| High-Precision Model | PP-HGNet_small | mA: 95.4 | per person 1.54ms | [Download](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.tar) | +| Fast Model | PP-LCNet_x1_0 | mA: 94.5 | per person 0.54ms | [Download](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.tar) | +| Balanced Model | PP-HGNet_tiny | mA: 95.2 | per person 1.14ms | [Download](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_person_attribute_952_infer.tar) | -1. The precision of detection/ tracking models is obtained by training and testing on the dataset consist of MOT17, CrowdHuman, HIEVE, and some business data. -2. The precision of pedestiran attribute analysis is obtained by training and testing on the dataset consist of PA100k, RAPv2, PETA, and some business data. -3. The inference speed is T4, the speed of using TensorRT FP16. +1. The precision of pedestiran attribute analysis is obtained by training and testing on the dataset consist of [PA100k](https://github.com/xh-liu/HydraPlus-Net#pa-100k-dataset),[RAPv2](http://www.rapdataset.com/rapv2.html),[PETA](http://mmlab.ie.cuhk.edu.hk/projects/PETA.html) and some business data. +2. The inference speed is V100, the speed of using TensorRT FP16. +3. This model of Attribute is based on the result of tracking, please download tracking model in the [Page of Mot](./pphuman_mot_en.md). The High precision and Faster model are both available. +4. You should place the model unziped in the directory of `PaddleDetection/output_inference/`. ## Instruction -1. Download the model from the link in the above table, and unzip it to```./output_inference```. -2. When inputting the image, run the command as follows: +1. Download the model from the link in the above table, and unzip it to```./output_inference```, and set the "enable: True" in ATTR of infer_cfg_pphuman.yml + +The meaning of configs of `infer_cfg_pphuman.yml`: +``` +ATTR: #module name + model_dir: output_inference/PPLCNet_x1_0_person_attribute_945_infer/ #model path + batch_size: 8 #maxmum batchsize when inference + enable: False #whether to enable this model +``` + +2. When inputting the image, run the command as follows (please refer to [QUICK_STARTED-Parameters](./PPHuman_QUICK_STARTED.md#41-参数说明) for more details): ```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ +#single image +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --image_file=test_image.jpg \ --device=gpu \ - --enable_attr=True + +#image directory +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --image_dir=images/ \ + --device=gpu \ + ``` 3. When inputting the video, run the command as follows: ```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ +#a single video file +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --video_file=test_video.mp4 \ --device=gpu \ - --enable_attr=True + +#directory of videos +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_dir=test_videos/ \ + --device=gpu \ ``` 4. If you want to change the model path, there are two methods: - - In ```./deploy/pphuman/config/infer_cfg.yml``` you can configurate different model paths. In attribute recognition models, you can modify the configuration in the field of ATTR. - - Add `--model_dir` in the command line to change the model path: + - The first: In ```./deploy/pipeline/config/infer_cfg_pphuman.yml``` you can configurate different model paths. In attribute recognition models, you can modify the configuration in the field of ATTR. + - The second: Add `--model_dir` in the command line to change the model path: ```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --video_file=test_video.mp4 \ --device=gpu \ - --enable_attr=True \ - --model_dir det=ppyoloe/ + --model_dir attr=output_inference/PPLCNet_x1_0_person_attribute_945_infer/ ``` The test result is: @@ -50,7 +71,7 @@ The test result is: Data Source and Copyright:Skyinfor Technology. Thanks for the provision of actual scenario data, which are only used for academic research here. -## Introduction to the Solution +## Introduction to the Solution 1. The PP-YOLOE model is used to handle detection boxs of input images/videos from object detection/ multi-object tracking. For details, please refer to the document [PP-YOLOE](../../../configs/ppyoloe). 2. Capture every pedestrian in the input images with the help of coordiantes of detection boxes. @@ -62,7 +83,7 @@ Data Source and Copyright:Skyinfor Technology. Thanks for the provision of act - Accessories: Glasses; Hat; None - HoldObjectsInFront: Yes; No - Bag: BackPack; ShoulderBag; HandBag -- TopStyle: UpperStride; UpperLogo; UpperPlaid; UpperSplice +- TopStyle: UpperStride; UpperLogo; UpperPlaid; UpperSplice - BottomStyle: LowerStripe; LowerPattern - ShortSleeve: Yes; No - LongSleeve: Yes; No @@ -73,7 +94,7 @@ Data Source and Copyright:Skyinfor Technology. Thanks for the provision of act - Boots: Yes; No ``` -4. The model adopted in the attribute recognition is [StrongBaseline](https://arxiv.org/pdf/2107.03576.pdf), where the structure is the multi-class network structure based on ResNet50, and Weighted BCE loss and EMA are introduced for effect optimization. +4. The model adopted in the attribute recognition is [StrongBaseline](https://arxiv.org/pdf/2107.03576.pdf), where the structure is the multi-class network structure based on PP-HGNet、PP-LCNet, and Weighted BCE loss is introduced for effect optimization. ## Reference ``` diff --git a/deploy/pphuman/docs/mot.md b/deploy/pipeline/docs/tutorials/pphuman_mot.md similarity index 33% rename from deploy/pphuman/docs/mot.md rename to deploy/pipeline/docs/tutorials/pphuman_mot.md index 72e3045cf56cd30185a64eff879d08748a05a38b..cc26da04549f1f74041d939a52b8bc36e4e399b2 100644 --- a/deploy/pphuman/docs/mot.md +++ b/deploy/pipeline/docs/tutorials/pphuman_mot.md @@ -1,58 +1,90 @@ +[English](pphuman_mot_en.md) | 简体中文 + # PP-Human检测跟踪模块 行人检测与跟踪在智慧社区,工业巡检,交通监控等方向都具有广泛应用,PP-Human中集成了检测跟踪模块,是关键点检测、属性行为识别等任务的基础。我们提供了预训练模型,用户可以直接下载使用。 | 任务 | 算法 | 精度 | 预测速度(ms) |下载链接 | |:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | -| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 28ms
    跟踪:33.1ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 行人检测/跟踪 | PP-YOLOE-l | mAP: 57.8
    MOTA: 82.2 | 检测: 25.1ms
    跟踪:31.8ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| 行人检测/跟踪 | PP-YOLOE-s | mAP: 53.2
    MOTA: 73.9 | 检测: 16.2ms
    跟踪:21.0ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | -1. 检测/跟踪模型精度为MOT17,CrowdHuman,HIEVE和部分业务数据融合训练测试得到 +1. 检测/跟踪模型精度为[COCO-Person](http://cocodataset.org/), [CrowdHuman](http://www.crowdhuman.org/), [HIEVE](http://humaninevents.org/) 和部分业务数据融合训练测试得到,验证集为业务数据 2. 预测速度为T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程 ## 使用方法 -1. 从上表链接中下载模型并解压到```./output_inference```路径下 -2. 图片输入时,启动命令如下 +1. 从上表链接中下载模型并解压到```./output_inference```路径下,并修改配置文件中模型路径。默认为自动下载模型,无需做改动。 +2. 图片输入时,是纯检测任务,启动命令如下 ```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --image_file=test_image.jpg \ --device=gpu ``` -3. 视频输入时,启动命令如下 +3. 视频输入时,是跟踪任务,注意首先设置infer_cfg_pphuman.yml中的MOT配置的enable=True,然后启动命令如下 ```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --video_file=test_video.mp4 \ --device=gpu ``` 4. 若修改模型路径,有以下两种方式: - - ```./deploy/pphuman/config/infer_cfg.yml```下可以配置不同模型路径,检测和跟踪模型分别对应`DET`和`MOT`字段,修改对应字段下的路径为实际期望的路径即可。 + - ```./deploy/pipeline/config/infer_cfg_pphuman.yml```下可以配置不同模型路径,检测和跟踪模型分别对应`DET`和`MOT`字段,修改对应字段下的路径为实际期望的路径即可。 - 命令行中增加`--model_dir`修改模型路径: ```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --video_file=test_video.mp4 \ --device=gpu \ - --model_dir det=ppyoloe/ + --region_type=horizontal \ --do_entrance_counting \ - --draw_center_traj + --draw_center_traj \ + --model_dir det=ppyoloe/ ``` **注意:** - - `--do_entrance_counting`表示是否统计出入口流量,不设置即默认为False + - `--do_entrance_counting`表示是否统计出入口流量,不设置即默认为False。 - `--draw_center_traj`表示是否绘制跟踪轨迹,不设置即默认为False。注意绘制跟踪轨迹的测试视频最好是静止摄像头拍摄的。 + - `--region_type`表示流量计数的区域,当设置`--do_entrance_counting`时可选择`horizontal`或者`vertical`,默认是`horizontal`,表示以视频图片的中心水平线为出入口,同一物体框的中心点在相邻两秒内分别在区域中心水平线的两侧,即完成计数加一。 测试效果如下:
    - +
    数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用 +5. 区域闯入判断和计数 + +注意首先设置infer_cfg_pphuman.yml中的MOT配置的enable=True,然后启动命令如下 +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --draw_center_traj \ + --do_break_in_counting \ + --region_type=custom \ + --region_polygon 200 200 400 200 300 400 100 400 +``` +**注意:** + - `--do_break_in_counting`表示是否进行区域出入后计数,不设置即默认为False。 + - `--region_type`表示流量计数的区域,当设置`--do_break_in_counting`时仅可选择`custom`,默认是`custom`,表示以用户自定义区域为出入口,同一物体框的下边界中点坐标在相邻两秒内从区域外到区域内,即完成计数加一。 + - `--region_polygon`表示用户自定义区域的多边形的点坐标序列,每两个为一对点坐标(x,y),按顺时针顺序连成一个封闭区域,至少需要3对点也即6个整数,默认值是`[]`,需要用户自行设置点坐标。用户可以运行[此段代码](../../tools/get_video_info.py)获取所测视频的分辨率帧数,以及可以自定义画出自己想要的多边形区域的可视化并自己调整。 + 自定义多边形区域的可视化代码运行如下: + ```python + python get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400 + ``` + +测试效果如下: + +
    + +
    + ## 方案说明 -1. 目标检测/多目标跟踪获取图片/视频输入中的行人检测框,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../configs/ppyoloe) -2. 多目标跟踪模型方案基于[ByteTrack](https://arxiv.org/pdf/2110.06864.pdf),采用PP-YOLOE替换原文的YOLOX作为检测器,采用BYTETracker作为跟踪器。 +1. 使用目标检测/多目标跟踪技术来获取图片/视频输入中的行人检测框,检测模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe)。 +2. 多目标跟踪模型方案采用[ByteTrack](https://arxiv.org/pdf/2110.06864.pdf)和[OC-SORT](https://arxiv.org/pdf/2203.14360.pdf),采用PP-YOLOE替换原文的YOLOX作为检测器,采用BYTETracker和OCSORTTracker作为跟踪器,详细文档参考[ByteTrack](../../../../configs/mot/bytetrack)和[OC-SORT](../../../../configs/mot/ocsort)。 ## 参考文献 ``` @@ -62,4 +94,11 @@ python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ journal={arXiv preprint arXiv:2110.06864}, year={2021} } + +@article{cao2022observation, + title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking}, + author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris}, + journal={arXiv preprint arXiv:2203.14360}, + year={2022} +} ``` diff --git a/deploy/pipeline/docs/tutorials/pphuman_mot_en.md b/deploy/pipeline/docs/tutorials/pphuman_mot_en.md new file mode 100644 index 0000000000000000000000000000000000000000..9944a71833d70dd9d810a37dd5e6e5a1130c3d6b --- /dev/null +++ b/deploy/pipeline/docs/tutorials/pphuman_mot_en.md @@ -0,0 +1,102 @@ +English | [简体中文](pphuman_mot.md) + +# Detection and Tracking Module of PP-Human + +Pedestrian detection and tracking is widely used in the intelligent community, industrial inspection, transportation monitoring and so on. PP-Human has the detection and tracking module, which is fundamental to keypoint detection, attribute action recognition, etc. Users enjoy easy access to pretrained models here. + +| Task | Algorithm | Precision | Inference Speed(ms) | Download Link | +|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | +| Pedestrian Detection/ Tracking | PP-YOLOE-l | mAP: 57.8
    MOTA: 82.2 | Detection: 25.1ms
    Tracking:31.8ms | [Download](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | +| Pedestrian Detection/ Tracking | PP-YOLOE-s | mAP: 53.2
    MOTA: 73.9 | Detection: 16.2ms
    Tracking:21.0ms | [Download](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | + +1. The precision of the pedestrian detection/ tracking model is obtained by trainning and testing on [COCO-Person](http://cocodataset.org/), [CrowdHuman](http://www.crowdhuman.org/), [HIEVE](http://humaninevents.org/) and some business data. +2. The inference speed is the speed of using TensorRT FP16 on T4, the total number of data pre-training, model inference, and post-processing. + +## How to Use + +1. Download models from the links of the above table and unizp them to ```./output_inference```. +2. When use the image as input, it's a detection task, the start command is as follows: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --image_file=test_image.jpg \ + --device=gpu +``` +3. When use the video as input, it's a tracking task, first you should set the "enable: True" in MOT of infer_cfg_pphuman.yml, and then the start command is as follows: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu +``` +4. There are two ways to modify the model path: + + - In `./deploy/pipeline/config/infer_cfg_pphuman.yml`, you can configurate different model paths,which is proper only if you match keypoint models and action recognition models with the fields of `DET` and `MOT` respectively, and modify the corresponding path of each field into the expected path. + - Add `--model_dir` in the command line to revise the model path: + +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --region_type=horizontal \ + --do_entrance_counting \ + --draw_center_traj \ + --model_dir det=ppyoloe/ + +``` +**Note:** + + - `--do_entrance_counting` is whether to calculate flow at the gateway, and the default setting is False. + - `--draw_center_traj` means whether to draw the track, and the default setting is False. It's worth noting that the test video of track drawing should be filmed by the still camera. + - `--region_type` means the region type of flow counting. When set `--do_entrance_counting`, you can select from `horizontal` or `vertical`, the default setting is `horizontal`, means that the central horizontal line of the video picture is used as the entrance and exit, and when the central point of the same object box is on both sides of the central horizontal line of the area in two adjacent seconds, the counting plus one is completed. + +The test result is: + +
    + +
    + +Data source and copyright owner:Skyinfor Technology. Thanks for the provision of actual scenario data, which are only used for academic research here. + +5. Break in and counting + +Please set the "enable: True" in MOT of infer_cfg_pphuman.yml at first, and then the start command is as follows: +```python +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --draw_center_traj \ + --do_break_in_counting \ + --region_type=custom \ + --region_polygon 200 200 400 200 300 400 100 400 +``` + +**Note:** + - `--do_break_in_counting` is whether to calculate flow when break in the user-defined region, and the default setting is False. + - `--region_type` means the region type of flow counting. When set `--do_break_in_counting`, only `custom` can be selected, and the default is `custom`, which means that the user-defined region is used as the entrance and exit, and when the midpoint coords of the bottom boundary of the same object moves from outside to inside the region within two adjacent seconds, the counting plus one is completed. + - `--region_polygon` means the point coords sequence of the polygon in the user-defined region. Every two integers are a pair of point coords (x,y), which are connected into a closed area in clockwise order. At least 3 pairs of points, that is, 6 integers, are required. The default value is `[]`, and the user needs to set the point coords by himself. Users can run this [code](../../tools/get_video_info.py) to obtain the resolution and frame number of the measured video, and can customize the visualization of drawing the polygon area they want and adjust it by themselves. + The visualization code of the custom polygon region runs as follows: + ```python + python get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400 + ``` + +The test result is: + +
    + +
    + + +## Introduction to the Solution + +1. Get the pedestrian detection box of the image/ video input through object detection and multi-object tracking. The detection model is PP-YOLOE, please refer to [PP-YOLOE](../../../../configs/ppyoloe) for details. + +2. The multi-object tracking solution is based on [ByteTrack](https://arxiv.org/pdf/2110.06864.pdf) and [OC-SORT](https://arxiv.org/pdf/2203.14360.pdf), and replace the original YOLOX with PP-YOLOE as the detector,and BYTETracker or OC-SORT Tracker as the tracker, please refer to [ByteTrack](../../../../configs/mot/bytetrack) and [OC-SORT](../../../../configs/mot/ocsort). + +## Reference +``` +@article{zhang2021bytetrack, + title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box}, + author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang}, + journal={arXiv preprint arXiv:2110.06864}, + year={2021} +} +``` diff --git a/deploy/pphuman/docs/mtmct.md b/deploy/pipeline/docs/tutorials/pphuman_mtmct.md similarity index 43% rename from deploy/pphuman/docs/mtmct.md rename to deploy/pipeline/docs/tutorials/pphuman_mtmct.md index 8549baad741859d83ea8658d4be354e6e9d52820..894c18ee0d5541f082f1f70b0250549519768020 100644 --- a/deploy/pphuman/docs/mtmct.md +++ b/deploy/pipeline/docs/tutorials/pphuman_mtmct.md @@ -1,22 +1,24 @@ +[English](pphuman_mtmct_en.md) | 简体中文 + # PP-Human跨镜头跟踪模块 跨镜头跟踪任务,是在单镜头跟踪的基础上,实现不同摄像头中人员的身份匹配关联。在安放、智慧零售等方向有较多的应用。 -PP-Human跨镜头跟踪模块主要目的在于提供一套简洁、高效的跨境跟踪Pipeline,REID模型完全基于开源数据集训练。 +PP-Human跨镜头跟踪模块主要目的在于提供一套简洁、高效的跨镜跟踪Pipeline,REID模型完全基于开源数据集训练。 ## 使用方法 -1. 下载模型 [REID模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) 并解压到```./output_inference```路径下, MOT模型请参考[mot说明](./mot.md)文件下载。 +1. 下载模型 [行人跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)和[REID模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) 并解压到```./output_inference```路径下,修改配置文件中模型路径。也可简单起见直接用默认配置,自动下载模型。 MOT模型请参考[mot说明](./pphuman_mot.md)文件下载。 -2. 跨镜头跟踪模式下,要求输入的多个视频放在同一目录下,命令如下: +2. 跨镜头跟踪模式下,要求输入的多个视频放在同一目录下,同时开启infer_cfg_pphuman.yml 中的REID选择中的enable=True, 命令如下: ```python -python3 deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --video_dir=[your_video_file_directory] --device=gpu +python3 deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_dir=[your_video_file_directory] --device=gpu ``` -3. 相关配置在`./deploy/pphuman/config/infer_cfg.yml`文件中修改: +3. 相关配置在`./deploy/pipeline/config/infer_cfg_pphuman.yml`文件中修改: ```python -python3 deploy/pphuman/pipeline.py - --config deploy/pphuman/config/infer_cfg.yml +python3 deploy/pipeline/pipeline.py + --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_dir=[your_video_file_directory] --device=gpu --model_dir reid=reid_best/ @@ -46,11 +48,11 @@ python3 deploy/pphuman/pipeline.py id聚类、重新分配id ``` -2. 模型方案为[reid-centroids](https://github.com/mikwieczorek/centroids-reid), Backbone为ResNet50, 主要特色为利用相同id的多个特征提升相似度效果。 -本跨境跟踪中所用REID模型在上述基础上,整合多个开源数据集并压缩模型特征到128维以提升泛华性能。大幅提升了在实际应用中的泛化效果。 +2. 模型方案为[reid-strong-baseline](https://github.com/michuanhaohao/reid-strong-baseline), Backbone为ResNet50, 主要特色为模型结构简单。 +本跨镜跟踪中所用REID模型在上述基础上,整合多个开源数据集并压缩模型特征到128维以提升泛化性能。大幅提升了在实际应用中的泛化效果。 ### 其他建议 -- 提供的REID模型基于开源数据集训练得到,建议加入自有数据,训练更加强有力的REID模型,将非常明显提升跨境跟踪效果。 +- 提供的REID模型基于开源数据集训练得到,建议加入自有数据,训练更加强有力的REID模型,将非常明显提升跨镜跟踪效果。 - 质量评估部分基于简单逻辑+OpenCV实现,效果有限,如果有条件建议针对性训练质量判断模型。 @@ -58,22 +60,32 @@ python3 deploy/pphuman/pipeline.py - camera 1:
    - +
    - camera 2:
    - +
    ## 参考文献 ``` -@article{Wieczorek2021OnTU, - title={On the Unreasonable Effectiveness of Centroids in Image Retrieval}, - author={Mikolaj Wieczorek and Barbara Rychalska and Jacek Dabrowski}, - journal={ArXiv}, - year={2021}, - volume={abs/2104.13643} +@InProceedings{Luo_2019_CVPR_Workshops, +author = {Luo, Hao and Gu, Youzhi and Liao, Xingyu and Lai, Shenqi and Jiang, Wei}, +title = {Bag of Tricks and a Strong Baseline for Deep Person Re-Identification}, +booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, +month = {June}, +year = {2019} +} + +@ARTICLE{Luo_2019_Strong_TMM, +author={H. {Luo} and W. {Jiang} and Y. {Gu} and F. {Liu} and X. {Liao} and S. {Lai} and J. {Gu}}, +journal={IEEE Transactions on Multimedia}, +title={A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification}, +year={2019}, +pages={1-1}, +doi={10.1109/TMM.2019.2958756}, +ISSN={1941-0077}, } ``` diff --git a/deploy/pipeline/docs/tutorials/pphuman_mtmct_en.md b/deploy/pipeline/docs/tutorials/pphuman_mtmct_en.md new file mode 100644 index 0000000000000000000000000000000000000000..0321d2a52d511a00b0c095a3878c3a959e646292 --- /dev/null +++ b/deploy/pipeline/docs/tutorials/pphuman_mtmct_en.md @@ -0,0 +1,94 @@ +English | [简体中文](pphuman_mtmct.md) + +# Multi-Target Multi-Camera Tracking Module of PP-Human + +Multi-target multi-camera tracking, or MTMCT, matches the identity of a person in different cameras based on the single-camera tracking. MTMCT is usually applied to the security system and the smart retailing. +The MTMCT module of PP-Human aims to provide a multi-target multi-camera pipleline which is simple, and efficient. + +## How to Use + +1. Download [REID model](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) and unzip it to ```./output_inference```. For the MOT model, please refer to [mot description](./pphuman_mot.md). + +2. In the MTMCT mode, input videos are required to be put in the same directory. set the REID "enable: True" in the infer_cfg_pphuman.yml. The command line is: +```python +python3 deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --video_dir=[your_video_file_directory] --device=gpu +``` + +3. Configuration can be modified in `./deploy/pipeline/config/infer_cfg_pphuman.yml`. + +```python +python3 deploy/pipeline/pipeline.py + --config deploy/pipeline/config/infer_cfg_pphuman.yml + --video_dir=[your_video_file_directory] + --device=gpu + --model_dir reid=reid_best/ +``` + +## Intorduction to the Solution + +MTMCT module consists of the multi-target multi-camera tracking pipeline and the REID model. + +1. Multi-Target Multi-Camera Tracking Pipeline + +``` + +single-camera tracking[id+bbox] + │ +capture the target in the original image according to bbox——│ + │ │ + REID model quality assessment (covered or not, complete or not, brightness, etc.) + │ │ + [feature] [quality] + │ │ + datacollector—————│ + │ + sort out and filter features + │ + calculate the similarity of IDs in the videos + │ + make the IDs cluster together and rearrange them +``` + +2. The model solution is [reid-strong-baseline](https://github.com/michuanhaohao/reid-strong-baseline), with ResNet50 as the backbone. + +Under the above circumstances, the REID model used in MTMCT integrates open-source datasets and compresses model features to 128-dimensional features to optimize the generalization. In this way, the actual generalization result becomes much better. + +### Other Suggestions + +- The provided REID model is obtained from open-source dataset training. It is recommended to add your own data to get a more powerful REID model, notably improving the MTMCT effect. +- The quality assessment is based on simple logic +OpenCV, whose effect is limited. If possible, it is advisable to conduct specific training on the quality assessment model. + + +### Example + +- camera 1: +
    + +
    + +- camera 2: +
    + +
    + + +## Reference +``` +@InProceedings{Luo_2019_CVPR_Workshops, +author = {Luo, Hao and Gu, Youzhi and Liao, Xingyu and Lai, Shenqi and Jiang, Wei}, +title = {Bag of Tricks and a Strong Baseline for Deep Person Re-Identification}, +booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, +month = {June}, +year = {2019} +} + +@ARTICLE{Luo_2019_Strong_TMM, +author={H. {Luo} and W. {Jiang} and Y. {Gu} and F. {Liu} and X. {Liao} and S. {Lai} and J. {Gu}}, +journal={IEEE Transactions on Multimedia}, +title={A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification}, +year={2019}, +pages={1-1}, +doi={10.1109/TMM.2019.2958756}, +ISSN={1941-0077}, +} +``` diff --git a/deploy/pipeline/docs/tutorials/ppvehicle_attribute.md b/deploy/pipeline/docs/tutorials/ppvehicle_attribute.md new file mode 100644 index 0000000000000000000000000000000000000000..46da107f2d30dec357b458da591f66371af1476a --- /dev/null +++ b/deploy/pipeline/docs/tutorials/ppvehicle_attribute.md @@ -0,0 +1,116 @@ + +# PP-Vehicle属性识别模块 + +车辆属性识别在智慧城市,智慧交通等方向具有广泛应用。在PP-Vehicle中,集成了车辆属性识别模块,可识别车辆颜色及车型属性的识别。 + +| 任务 | 算法 | 精度 | 预测速度 | 下载链接| +|-----------|------|-----------|----------|---------------| +| 车辆检测/跟踪 | PP-YOLOE | - | - | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | +| 车辆属性识别 | PPLCNet | 90.81 | 2.36 ms | [预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | + + +注意: +1. 属性模型预测速度是基于Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 测试得到,开启 MKLDNN 加速策略,线程数为10。 +2. 关于PP-LCNet的介绍可以参考[PP-LCNet](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/models/PP-LCNet.md)介绍,相关论文可以查阅[PP-LCNet paper](https://arxiv.org/abs/2109.15099)。 +3. 属性模型的训练和精度测试均基于[VeRi数据集](https://www.v7labs.com/open-datasets/veri-dataset)。 + + +- 当前提供的预训练模型支持识别10种车辆颜色及9种车型,同VeRi数据集,具体如下: + +```yaml +# 车辆颜色 +- "yellow" +- "orange" +- "green" +- "gray" +- "red" +- "blue" +- "white" +- "golden" +- "brown" +- "black" + +# 车型 +- "sedan" +- "suv" +- "van" +- "hatchback" +- "mpv" +- "pickup" +- "bus" +- "truck" +- "estate" +``` + +## 使用方法 + +### 配置项说明 + +配置文件中与属性相关的参数如下: +``` +VEHICLE_ATTR: + model_dir: output_inference/vehicle_attribute_infer/ # 车辆属性模型调用路径 + batch_size: 8 # 模型预测时的batch_size大小 + color_threshold: 0.5 # 颜色属性阈值,需要置信度达到此阈值才会确定具体颜色,否则为'Unknown‘ + type_threshold: 0.5 # 车型属性阈值,需要置信度达到此阈值才会确定具体属性,否则为'Unknown‘ + enable: False # 是否开启该功能 +``` + +### 使用命令 + +1. 从模型库下载`车辆检测/跟踪`, `车辆属性识别`两个预测部署模型并解压到`./output_inference`路径下;默认会自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +2. 修改配置文件中`VEHICLE_ATTR`项的`enable: True`,以启用该功能。 +3. 图片输入时,启动命令如下(更多命令参数说明,请参考[快速开始-参数说明](./PPVehicle_QUICK_STARTED.md)): + +```bash +# 预测单张图片文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_file=test_image.jpg \ + --device=gpu + +# 预测包含一张或多张图片的文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --image_dir=images/ \ + --device=gpu +``` + +4. 视频输入时,启动命令如下: + +```bash +#预测单个视频文件 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu + +#预测包含一个或多个视频的文件夹 +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_dir=test_videos/ \ + --device=gpu +``` + +5. 若修改模型路径,有以下两种方式: + + - 方法一:`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`下可以配置不同模型路径,属性识别模型修改`VEHICLE_ATTR`字段下配置 + - 方法二:命令行中增加--model_dir修改模型路径: + +```bash +python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_ppvehicle.yml \ + --video_file=test_video.mp4 \ + --device=gpu \ + --model_dir vehicle_attr=output_inference/vehicle_attribute_infer +``` + +测试效果如下: + +
    + +
    + +## 方案说明 +车辆属性模型使用了[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) 的超轻量图像分类方案(PULC,Practical Ultra Lightweight image Classification)。关于该模型的数据准备、训练、测试等详细内容,请见[PULC 车辆属性识别模型](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/PULC/PULC_vehicle_attribute.md). + +车辆属性识别模型选用了轻量级、高精度的PPLCNet。并在该模型的基础上,进一步使用了以下优化方案: + +- 使用SSLD预训练模型,在不改变推理速度的前提下,精度可以提升约0.5个百分点 +- 融合EDA数据增强策略,精度可以再提升0.52个百分点 +- 使用SKL-UGI知识蒸馏, 精度可以继续提升0.23个百分点 diff --git a/deploy/pipeline/docs/tutorials/ppvehicle_mot.md b/deploy/pipeline/docs/tutorials/ppvehicle_mot.md new file mode 100644 index 0000000000000000000000000000000000000000..7022481526137770c14ace7440bbea4e99511edb --- /dev/null +++ b/deploy/pipeline/docs/tutorials/ppvehicle_mot.md @@ -0,0 +1,20 @@ + +# PP-Vehicle车辆跟踪模块 + +【应用介绍】 + +【模型下载】 + +## 使用方法 + +【配置项说明】 + +【使用命令】 + +【效果展示】 + +## 方案说明 + +【实现方案及特色】 + +## 参考文献 diff --git a/deploy/pipeline/docs/tutorials/ppvehicle_plate.md b/deploy/pipeline/docs/tutorials/ppvehicle_plate.md new file mode 100644 index 0000000000000000000000000000000000000000..9f3ea6fcbc29a90fa83259aab61acaddc79f703f --- /dev/null +++ b/deploy/pipeline/docs/tutorials/ppvehicle_plate.md @@ -0,0 +1,20 @@ + +# PP-Vehicle车牌识别模块 + +【应用介绍】 + +【模型下载】 + +## 使用方法 + +【配置项说明】 + +【使用命令】 + +【效果展示】 + +## 方案说明 + +【实现方案及特色】 + +## 参考文献 diff --git a/static/ppdet/utils/download.py b/deploy/pipeline/download.py similarity index 37% rename from static/ppdet/utils/download.py rename to deploy/pipeline/download.py index d7e367a0526c3077052c3936844b9132a43e4160..f243838b74310f7cdfc5035f7c17d54985d8de85 100644 --- a/static/ppdet/utils/download.py +++ b/deploy/pipeline/download.py @@ -1,4 +1,4 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,269 +12,76 @@ # See the License for the specific language governing permissions and # limitations under the License. -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os +import os, sys import os.path as osp -import shutil +import hashlib import requests +import shutil import tqdm -import hashlib -import binascii -import base64 +import time import tarfile import zipfile +from paddle.utils.download import _get_unique_endpoints -from .voc_utils import create_list - -import logging -logger = logging.getLogger(__name__) - -__all__ = [ - 'get_weights_path', 'get_dataset_path', 'download_dataset', - 'create_voc_list' -] - -WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/weights/static") -DATASET_HOME = osp.expanduser("~/.cache/paddle/dataset") - -# dict of {dataset_name: (download_info, sub_dirs)} -# download info: [(url, md5sum)] -DATASETS = { - 'coco': ([ - ( - 'http://images.cocodataset.org/zips/train2017.zip', - 'cced6f7f71b7629ddf16f17bbcfab6b2', ), - ( - 'http://images.cocodataset.org/zips/val2017.zip', - '442b8da7639aecaf257c1dceb8ba8c80', ), - ( - 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip', - 'f4bbac642086de4f52a3fdda2de5fa2c', ), - ], ["annotations", "train2017", "val2017"]), - 'voc': ([ - ( - 'http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar', - '6cd6e144f989b92b3379bac3b3de84fd', ), - ( - 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar', - 'c52e279531787c972589f7e41ab4ae64', ), - ( - 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar', - 'b6e924de25625d8de591ea690078ad9f', ), - ], ["VOCdevkit/VOC2012", "VOCdevkit/VOC2007"]), - 'wider_face': ([ - ( - 'https://dataset.bj.bcebos.com/wider_face/WIDER_train.zip', - '3fedf70df600953d25982bcd13d91ba2', ), - ( - 'https://dataset.bj.bcebos.com/wider_face/WIDER_val.zip', - 'dfa7d7e790efa35df3788964cf0bbaea', ), - ( - 'https://dataset.bj.bcebos.com/wider_face/wider_face_split.zip', - 'a4a898d6193db4b9ef3260a68bad0dc7', ), - ], ["WIDER_train", "WIDER_val", "wider_face_split"]), - 'fruit': ([( - 'https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit.tar', - 'baa8806617a54ccf3685fa7153388ae6', ), ], - ['Annotations', 'JPEGImages']), - 'roadsign_voc': ([( - 'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar', - '8d629c0f880dd8b48de9aeff44bf1f3e', ), ], ['annotations', 'images']), - 'roadsign_coco': ([( - 'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_coco.tar', - '49ce5a9b5ad0d6266163cd01de4b018e', ), ], ['annotations', 'images']), - 'objects365': (), -} +PPDET_WEIGHTS_DOWNLOAD_URL_PREFIX = 'https://paddledet.bj.bcebos.com/' DOWNLOAD_RETRY_LIMIT = 3 - -def get_weights_path(url): - """Get weights path from WEIGHT_HOME, if not exists, - download it from url. - """ - path, _ = get_path(url, WEIGHTS_HOME) - return path +WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/infer_weights") -def get_dataset_path(path, annotation, image_dir): - """ - If path exists, return path. - Otherwise, get dataset path from DATASET_HOME, if not exists, - download it. +def is_url(path): """ - if _dataset_exists(path, annotation, image_dir): - return path - - logger.info("Dataset {} is not valid for reason above, try searching {} or " - "downloading dataset...".format( - osp.realpath(path), DATASET_HOME)) - - data_name = os.path.split(path.strip().lower())[-1] - for name, dataset in DATASETS.items(): - if data_name == name: - logger.debug("Parse dataset_dir {} as dataset " - "{}".format(path, name)) - if name == 'objects365': - raise NotImplementedError( - "Dataset {} is not valid for download automatically. " - "Please apply and download the dataset from " - "https://www.objects365.org/download.html".format(name)) - data_dir = osp.join(DATASET_HOME, name) - # For VOC-style datasets, only check subdirs - if name in ['voc', 'fruit', 'roadsign_voc']: - exists = True - for sub_dir in dataset[1]: - check_dir = osp.join(data_dir, sub_dir) - if osp.exists(check_dir): - logger.info("Found {}".format(check_dir)) - else: - exists = False - if exists: - return data_dir - - # voc exist is checked above, voc is not exist here - check_exist = name != 'voc' and name != 'fruit' and name != 'roadsign_voc' - for url, md5sum in dataset[0]: - get_path(url, data_dir, md5sum, check_exist) - - # voc should create list after download - if name == 'voc': - create_voc_list(data_dir) - return data_dir - - # not match any dataset in DATASETS - raise ValueError( - "Dataset {} is not valid and cannot parse dataset type " - "'{}' for automaticly downloading, which only supports " - "'voc' , 'coco', 'wider_face', 'fruit' and 'roadsign_voc' currently". - format(path, osp.split(path)[-1])) - - -def create_voc_list(data_dir, devkit_subdir='VOCdevkit'): - logger.debug("Create voc file list...") - devkit_dir = osp.join(data_dir, devkit_subdir) - year_dirs = [osp.join(devkit_dir, x) for x in os.listdir(devkit_dir)] - - # NOTE: since using auto download VOC - # dataset, VOC default label list should be used, - # do not generate label_list.txt here. For default - # label, see ../data/source/voc.py - create_list(year_dirs, data_dir) - logger.debug("Create voc file list finished") - - -def map_path(url, root_dir): - # parse path after download to decompress under root_dir - fname = osp.split(url)[-1] - zip_formats = ['.zip', '.tar', '.gz'] - fpath = fname - for zip_format in zip_formats: - fpath = fpath.replace(zip_format, '') - return osp.join(root_dir, fpath) - - -def get_path(url, root_dir, md5sum=None, check_exist=True): - """ Download from given url to root_dir. - if file or directory specified by url is exists under - root_dir, return the path directly, otherwise download - from url and decompress it, return the path. - - url (str): download url - root_dir (str): root dir for downloading, it should be - WEIGHTS_HOME or DATASET_HOME - md5sum (str): md5 sum of download package + Whether path is URL. + Args: + path (string): URL string or not. """ - # parse path after download to decompress under root_dir - fullpath = map_path(url, root_dir) + return path.startswith('http://') \ + or path.startswith('https://') \ + or path.startswith('ppdet://') - # For same zip file, decompressed directory name different - # from zip file name, rename by following map - decompress_name_map = { - "VOCtrainval_11-May-2012": "VOCdevkit/VOC2012", - "VOCtrainval_06-Nov-2007": "VOCdevkit/VOC2007", - "VOCtest_06-Nov-2007": "VOCdevkit/VOC2007", - "annotations_trainval": "annotations" - } - for k, v in decompress_name_map.items(): - if fullpath.find(k) >= 0: - fullpath = osp.join(osp.split(fullpath)[0], v) - if osp.exists(fullpath) and check_exist: - # If fullpath is a directory, it has been decompressed - # checking MD5 is impossible, so we skip checking when - # fullpath is a directory here - if osp.isdir(fullpath) or \ - _md5check_from_req(fullpath, - requests.get(url, stream=True)): - logger.debug("Found {}".format(fullpath)) - return fullpath, True - else: - if osp.isdir(fullpath): - shutil.rmtree(fullpath) - else: - os.remove(fullpath) +def parse_url(url): + url = url.replace("ppdet://", PPDET_WEIGHTS_DOWNLOAD_URL_PREFIX) + return url - fullname = _download(url, root_dir, md5sum) - # new weights format whose postfix is 'pdparams', - # which is not need to decompress - if osp.splitext(fullname)[-1] != '.pdparams': - _decompress(fullname) +def map_path(url, root_dir, path_depth=1): + # parse path after download to decompress under root_dir + assert path_depth > 0, "path_depth should be a positive integer" + dirname = url + for _ in range(path_depth): + dirname = osp.dirname(dirname) + fpath = osp.relpath(url, dirname) - return fullpath, False + zip_formats = ['.zip', '.tar', '.gz'] + for zip_format in zip_formats: + fpath = fpath.replace(zip_format, '') + return osp.join(root_dir, fpath) -def download_dataset(path, dataset=None): - if dataset not in DATASETS.keys(): - logger.error("Unknown dataset {}, it should be " - "{}".format(dataset, DATASETS.keys())) - return - dataset_info = DATASETS[dataset][0] - for info in dataset_info: - get_path(info[0], path, info[1], False) - logger.debug("Download dataset {} finished.".format(dataset)) +def _md5check(fullname, md5sum=None): + if md5sum is None: + return True + md5 = hashlib.md5() + with open(fullname, 'rb') as f: + for chunk in iter(lambda: f.read(4096), b""): + md5.update(chunk) + calc_md5sum = md5.hexdigest() -def _dataset_exists(path, annotation, image_dir): - """ - Check if user define dataset exists - """ - if not osp.exists(path): - logger.debug("Config dataset_dir {} is not exits, " - "dataset config is not valid".format(path)) + if calc_md5sum != md5sum: return False - - if annotation: - annotation_path = osp.join(path, annotation) - if not osp.exists(annotation_path): - logger.error("Config dataset_dir {} is not exits!".format(path)) - - if not osp.isfile(annotation_path): - logger.warning("Config annotation {} is not a " - "file, dataset config is not " - "valid".format(annotation_path)) - return False - if image_dir: - image_path = osp.join(path, image_dir) - if not osp.exists(image_path): - logger.warning("Config dataset_dir {} is not exits!".format(path)) - - if not osp.isdir(image_path): - logger.warning("Config image_dir {} is not a " - "directory, dataset config is not " - "valid".format(image_path)) - return False return True +def _check_exist_file_md5(filename, md5sum, url): + return _md5check(filename, md5sum) + + def _download(url, path, md5sum=None): """ Download from url, save to path. - url (str): download url path (str): download to given path """ @@ -285,14 +92,17 @@ def _download(url, path, md5sum=None): fullname = osp.join(path, fname) retry_cnt = 0 - while not (osp.exists(fullname) and _md5check(fullname, md5sum)): + while not (osp.exists(fullname) and _check_exist_file_md5(fullname, md5sum, + url)): if retry_cnt < DOWNLOAD_RETRY_LIMIT: retry_cnt += 1 else: raise RuntimeError("Download from {} failed. " "Retry limit reached".format(url)) - logger.info("Downloading {} from {}".format(fname, url)) + # NOTE: windows path join may incur \, which is invalid in url + if sys.platform == "win32": + url = url.replace('\\', '/') req = requests.get(url, stream=True) if req.status_code != 200: @@ -315,54 +125,69 @@ def _download(url, path, md5sum=None): for chunk in req.iter_content(chunk_size=1024): if chunk: f.write(chunk) + shutil.move(tmp_fullname, fullname) + return fullname - # check md5 after download in Content-MD5 in req.headers - if _md5check_from_req(tmp_fullname, req): - shutil.move(tmp_fullname, fullname) - return fullname + +def _download_dist(url, path, md5sum=None): + env = os.environ + if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env: + trainer_id = int(env['PADDLE_TRAINER_ID']) + num_trainers = int(env['PADDLE_TRAINERS_NUM']) + if num_trainers <= 1: + return _download(url, path, md5sum) else: - logger.warning( - "Download from url imcomplete, try downloading again...") - os.remove(tmp_fullname) - continue - - -def _md5check_from_req(weights_path, req): - # For weights in bcebos URLs, MD5 value is contained - # in request header as 'content_md5' - content_md5 = req.headers.get('content-md5') - if not content_md5 or _md5check( - weights_path, - binascii.hexlify(base64.b64decode(content_md5.strip('"'))).decode( - )): - return True + fname = osp.split(url)[-1] + fullname = osp.join(path, fname) + lock_path = fullname + '.download.lock' + + if not osp.isdir(path): + os.makedirs(path) + + if not osp.exists(fullname): + from paddle.distributed import ParallelEnv + unique_endpoints = _get_unique_endpoints(ParallelEnv() + .trainer_endpoints[:]) + with open(lock_path, 'w'): # touch + os.utime(lock_path, None) + if ParallelEnv().current_endpoint in unique_endpoints: + _download(url, path, md5sum) + os.remove(lock_path) + else: + while os.path.exists(lock_path): + time.sleep(0.5) + return fullname else: - return False + return _download(url, path, md5sum) -def _md5check(fullname, md5sum=None): - if md5sum is None: - return True - - logger.debug("File {} md5 checking...".format(fullname)) - md5 = hashlib.md5() - with open(fullname, 'rb') as f: - for chunk in iter(lambda: f.read(4096), b""): - md5.update(chunk) - calc_md5sum = md5.hexdigest() - - if calc_md5sum != md5sum: - logger.warning("File {} md5 check failed, {}(calc) != " - "{}(base)".format(fullname, calc_md5sum, md5sum)) - return False - return True +def _move_and_merge_tree(src, dst): + """ + Move src directory to dst, if dst is already exists, + merge src to dst + """ + if not osp.exists(dst): + shutil.move(src, dst) + elif osp.isfile(src): + shutil.move(src, dst) + else: + for fp in os.listdir(src): + src_fp = osp.join(src, fp) + dst_fp = osp.join(dst, fp) + if osp.isdir(src_fp): + if osp.isdir(dst_fp): + _move_and_merge_tree(src_fp, dst_fp) + else: + shutil.move(src_fp, dst_fp) + elif osp.isfile(src_fp) and \ + not osp.isfile(dst_fp): + shutil.move(src_fp, dst_fp) def _decompress(fname): """ Decompress for zip and tar file """ - logger.info("Decompressing {}...".format(fname)) # For protecting decompressing interupted, # decompress to fpath_tmp directory firstly, if decompress @@ -380,6 +205,8 @@ def _decompress(fname): elif fname.find('zip') >= 0: with zipfile.ZipFile(fname) as zf: zf.extractall(path=fpath_tmp) + elif fname.find('.txt') >= 0: + return else: raise TypeError("Unsupport compress file type {}".format(fname)) @@ -392,24 +219,95 @@ def _decompress(fname): os.remove(fname) -def _move_and_merge_tree(src, dst): +def _decompress_dist(fname): + env = os.environ + if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env: + trainer_id = int(env['PADDLE_TRAINER_ID']) + num_trainers = int(env['PADDLE_TRAINERS_NUM']) + if num_trainers <= 1: + _decompress(fname) + else: + lock_path = fname + '.decompress.lock' + from paddle.distributed import ParallelEnv + unique_endpoints = _get_unique_endpoints(ParallelEnv() + .trainer_endpoints[:]) + # NOTE(dkp): _decompress_dist always performed after + # _download_dist, in _download_dist sub-trainers is waiting + # for download lock file release with sleeping, if decompress + # prograss is very fast and finished with in the sleeping gap + # time, e.g in tiny dataset such as coco_ce, spine_coco, main + # trainer may finish decompress and release lock file, so we + # only craete lock file in main trainer and all sub-trainer + # wait 1s for main trainer to create lock file, for 1s is + # twice as sleeping gap, this waiting time can keep all + # trainer pipeline in order + # **change this if you have more elegent methods** + if ParallelEnv().current_endpoint in unique_endpoints: + with open(lock_path, 'w'): # touch + os.utime(lock_path, None) + _decompress(fname) + os.remove(lock_path) + else: + time.sleep(1) + while os.path.exists(lock_path): + time.sleep(0.5) + else: + _decompress(fname) + + +def get_path(url, root_dir=WEIGHTS_HOME, md5sum=None, check_exist=True): + """ Download from given url to root_dir. + if file or directory specified by url is exists under + root_dir, return the path directly, otherwise download + from url and decompress it, return the path. + url (str): download url + root_dir (str): root dir for downloading + md5sum (str): md5 sum of download package """ - Move src directory to dst, if dst is already exists, - merge src to dst + # parse path after download to decompress under root_dir + fullpath = map_path(url, root_dir) + + # For same zip file, decompressed directory name different + # from zip file name, rename by following map + decompress_name_map = {"ppTSM_fight": "ppTSM", } + for k, v in decompress_name_map.items(): + if fullpath.find(k) >= 0: + fullpath = osp.join(osp.split(fullpath)[0], v) + + if osp.exists(fullpath) and check_exist: + if not osp.isfile(fullpath) or \ + _check_exist_file_md5(fullpath, md5sum, url): + return fullpath, True + else: + os.remove(fullpath) + + fullname = _download_dist(url, root_dir, md5sum) + + # new weights format which postfix is 'pdparams' not + # need to decompress + if osp.splitext(fullname)[-1] not in ['.pdparams', '.yml']: + _decompress_dist(fullname) + + return fullpath, False + + +def get_weights_path(url): + """Get weights path from WEIGHTS_HOME, if not exists, + download it from url. """ - if not osp.exists(dst): - shutil.move(src, dst) - elif osp.isfile(src): - shutil.move(src, dst) - else: - for fp in os.listdir(src): - src_fp = osp.join(src, fp) - dst_fp = osp.join(dst, fp) - if osp.isdir(src_fp): - if osp.isdir(dst_fp): - _move_and_merge_tree(src_fp, dst_fp) - else: - shutil.move(src_fp, dst_fp) - elif osp.isfile(src_fp) and \ - not osp.isfile(dst_fp): - shutil.move(src_fp, dst_fp) + url = parse_url(url) + path, _ = get_path(url, WEIGHTS_HOME) + return path + + +def auto_download_model(model_path): + # auto download + if is_url(model_path): + weight = get_weights_path(model_path) + return weight + return None + + +if __name__ == "__main__": + model_path = "https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip" + auto_download_model(model_path) diff --git a/deploy/pphuman/pipe_utils.py b/deploy/pipeline/pipe_utils.py similarity index 61% rename from deploy/pphuman/pipe_utils.py rename to deploy/pipeline/pipe_utils.py index b55ac9a867cf027662d9b3d84e39f119f28d123a..d4a6307525bb903cc7f09331dbbdbaba5c707fd6 100644 --- a/deploy/pphuman/pipe_utils.py +++ b/deploy/pipeline/pipe_utils.py @@ -15,7 +15,6 @@ import time import os import ast -import argparse import glob import yaml import copy @@ -24,108 +23,6 @@ import numpy as np from python.keypoint_preprocess import EvalAffine, TopDownEvalAffine, expand_crop -def argsparser(): - parser = argparse.ArgumentParser(description=__doc__) - parser.add_argument( - "--config", - type=str, - default=None, - help=("Path of configure"), - required=True) - parser.add_argument( - "--image_file", type=str, default=None, help="Path of image file.") - parser.add_argument( - "--image_dir", - type=str, - default=None, - help="Dir of image file, `image_file` has a higher priority.") - parser.add_argument( - "--video_file", - type=str, - default=None, - help="Path of video file, `video_file` or `camera_id` has a highest priority." - ) - parser.add_argument( - "--video_dir", - type=str, - default=None, - help="Dir of video file, `video_file` has a higher priority.") - parser.add_argument( - "--model_dir", nargs='*', help="set model dir in pipeline") - parser.add_argument( - "--camera_id", - type=int, - default=-1, - help="device id of camera to predict.") - parser.add_argument( - "--enable_attr", - type=ast.literal_eval, - default=False, - help="Whether use attribute recognition.") - parser.add_argument( - "--enable_action", - type=ast.literal_eval, - default=False, - help="Whether use action recognition.") - parser.add_argument( - "--output_dir", - type=str, - default="output", - help="Directory of output visualization files.") - parser.add_argument( - "--run_mode", - type=str, - default='paddle', - help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)") - parser.add_argument( - "--device", - type=str, - default='cpu', - help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU." - ) - parser.add_argument( - "--enable_mkldnn", - type=ast.literal_eval, - default=False, - help="Whether use mkldnn with CPU.") - parser.add_argument( - "--cpu_threads", type=int, default=1, help="Num of threads with CPU.") - parser.add_argument( - "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.") - parser.add_argument( - "--trt_max_shape", - type=int, - default=1280, - help="max_shape for TensorRT.") - parser.add_argument( - "--trt_opt_shape", - type=int, - default=640, - help="opt_shape for TensorRT.") - parser.add_argument( - "--trt_calib_mode", - type=bool, - default=False, - help="If the model is produced by TRT offline quantitative " - "calibration, trt_calib_mode need to set True.") - parser.add_argument( - "--do_entrance_counting", - action='store_true', - help="Whether counting the numbers of identifiers entering " - "or getting out from the entrance. Note that only support one-class" - "counting, multi-class counting is coming soon.") - parser.add_argument( - "--secs_interval", - type=int, - default=2, - help="The seconds interval to count after tracking") - parser.add_argument( - "--draw_center_traj", - action='store_true', - help="Whether drawing the trajectory of center") - return parser - - class Times(object): def __init__(self): self.time = 0. @@ -162,8 +59,13 @@ class PipeTimer(Times): 'mot': Times(), 'attr': Times(), 'kpt': Times(), - 'action': Times(), - 'reid': Times() + 'video_action': Times(), + 'skeleton_action': Times(), + 'reid': Times(), + 'det_action': Times(), + 'cls_action': Times(), + 'vehicle_attr': Times(), + 'vehicleplate': Times() } self.img_num = 0 @@ -207,57 +109,15 @@ class PipeTimer(Times): dic['kpt'] = round(self.module_time['kpt'].value() / max(1, self.img_num), 4) if average else self.module_time['kpt'].value() - dic['action'] = round( - self.module_time['action'].value() / max(1, self.img_num), - 4) if average else self.module_time['action'].value() + dic['video_action'] = self.module_time['video_action'].value() + dic['skeleton_action'] = round( + self.module_time['skeleton_action'].value() / max(1, self.img_num), + 4) if average else self.module_time['skeleton_action'].value() dic['img_num'] = self.img_num return dic -def merge_model_dir(args, model_dir): - # set --model_dir DET=ppyoloe/ to overwrite the model_dir in config file - task_set = ['DET', 'ATTR', 'MOT', 'KPT', 'ACTION'] - if not model_dir: - return args - for md in model_dir: - md = md.strip() - k, v = md.split('=', 1) - k_upper = k.upper() - assert k_upper in task_set, 'Illegal type of task, expect task are: {}, but received {}'.format( - task_set, k) - args[k_upper].update({'model_dir': v}) - return args - - -def merge_cfg(args): - with open(args.config) as f: - pred_config = yaml.safe_load(f) - - def merge(cfg, arg): - merge_cfg = copy.deepcopy(cfg) - for k, v in cfg.items(): - if k in arg: - merge_cfg[k] = arg[k] - else: - if isinstance(v, dict): - merge_cfg[k] = merge(v, arg) - return merge_cfg - - args_dict = vars(args) - model_dir = args_dict.pop('model_dir') - pred_config = merge_model_dir(pred_config, model_dir) - pred_config = merge(pred_config, args_dict) - return pred_config - - -def print_arguments(cfg): - print('----------- Running Arguments -----------') - buffer = yaml.dump(cfg) - print(buffer) - print('------------------------------------------') - - def get_test_images(infer_dir, infer_img): """ Get image path list in TEST mode @@ -297,6 +157,8 @@ def crop_image_with_det(batch_input, det_res, thresh=0.3): crop_res = [] for b_id, input in enumerate(batch_input): boxes_num_i = boxes_num[b_id] + if boxes_num_i == 0: + continue boxes_i = boxes[start_idx:start_idx + boxes_num_i, :] score_i = score[start_idx:start_idx + boxes_num_i] res = [] diff --git a/deploy/pphuman/pipeline.py b/deploy/pipeline/pipeline.py similarity index 33% rename from deploy/pphuman/pipeline.py rename to deploy/pipeline/pipeline.py index 4d6fa014ae783b61c4464b2e292c5d745a5297d1..6705083e5681704b3c0b2c96fb0f40f69ba5d35d 100644 --- a/deploy/pphuman/pipeline.py +++ b/deploy/pipeline/pipeline.py @@ -15,39 +15,45 @@ import os import yaml import glob -from collections import defaultdict - import cv2 import numpy as np import math import paddle import sys import copy -from collections import Sequence -from reid import ReID +from collections import Sequence, defaultdict from datacollector import DataCollector, Result -from mtmct import mtmct_process # add deploy path of PadleDetection to sys.path parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) sys.path.insert(0, parent_path) +from cfg_utils import argsparser, print_arguments, merge_cfg +from pipe_utils import PipeTimer +from pipe_utils import get_test_images, crop_image_with_det, crop_image_with_mot, parse_mot_res, parse_mot_keypoint + from python.infer import Detector, DetectorPicoDet -from python.attr_infer import AttrDetector from python.keypoint_infer import KeyPointDetector from python.keypoint_postprocess import translate_to_ori_images -from python.action_infer import ActionRecognizer -from python.action_utils import KeyPointBuff, ActionVisualHelper - -from pipe_utils import argsparser, print_arguments, merge_cfg, PipeTimer -from pipe_utils import get_test_images, crop_image_with_det, crop_image_with_mot, parse_mot_res, parse_mot_keypoint -from python.preprocess import decode_image -from python.visualize import visualize_box_mask, visualize_attr, visualize_pose, visualize_action +from python.preprocess import decode_image, ShortSizeScale +from python.visualize import visualize_box_mask, visualize_attr, visualize_pose, visualize_action, visualize_vehicleplate from pptracking.python.mot_sde_infer import SDE_Detector from pptracking.python.mot.visualize import plot_tracking_dict from pptracking.python.mot.utils import flow_statistic +from pphuman.attr_infer import AttrDetector +from pphuman.video_action_infer import VideoActionRecognizer +from pphuman.action_infer import SkeletonActionRecognizer, DetActionRecognizer, ClsActionRecognizer +from pphuman.action_utils import KeyPointBuff, ActionVisualHelper +from pphuman.reid import ReID +from pphuman.mtmct import mtmct_process + +from ppvehicle.vehicle_plate import PlateRecognizer +from ppvehicle.vehicle_attr import VehicleAttr + +from download import auto_download_model + class Pipeline(object): """ @@ -60,8 +66,6 @@ class Pipeline(object): then all the images in directory will be predicted, default as None video_file (string|None): the path of video file, default as None camera_id (int): the device id of camera to predict, default as -1 - enable_attr (bool): whether use attribute recognition, default as false - enable_action (bool): whether use action recognition, default as false device (string): the device to predict, options are: CPU/GPU/XPU, default as CPU run_mode (string): the mode of prediction, options are: @@ -77,79 +81,44 @@ class Pipeline(object): draw_center_traj (bool): Whether drawing the trajectory of center, default as False secs_interval (int): The seconds interval to count after tracking, default as 10 do_entrance_counting(bool): Whether counting the numbers of identifiers entering - or getting out from the entrance, default as False,only support single class + or getting out from the entrance, default as False, only support single class counting in MOT. """ - def __init__(self, - cfg, - image_file=None, - image_dir=None, - video_file=None, - video_dir=None, - camera_id=-1, - enable_attr=False, - enable_action=True, - device='CPU', - run_mode='paddle', - trt_min_shape=1, - trt_max_shape=1280, - trt_opt_shape=640, - trt_calib_mode=False, - cpu_threads=1, - enable_mkldnn=False, - output_dir='output', - draw_center_traj=False, - secs_interval=10, - do_entrance_counting=False): + def __init__(self, args, cfg): self.multi_camera = False + reid_cfg = cfg.get('REID', False) + self.enable_mtmct = reid_cfg['enable'] if reid_cfg else False self.is_video = False - self.output_dir = output_dir + self.output_dir = args.output_dir self.vis_result = cfg['visual'] - self.input = self._parse_input(image_file, image_dir, video_file, - video_dir, camera_id) + self.input = self._parse_input(args.image_file, args.image_dir, + args.video_file, args.video_dir, + args.camera_id) if self.multi_camera: - self.predictor = [ - PipePredictor( - cfg, - is_video=True, - multi_camera=True, - enable_attr=enable_attr, - enable_action=enable_action, - device=device, - run_mode=run_mode, - trt_min_shape=trt_min_shape, - trt_max_shape=trt_max_shape, - trt_opt_shape=trt_opt_shape, - cpu_threads=cpu_threads, - enable_mkldnn=enable_mkldnn, - output_dir=output_dir) for i in self.input - ] + self.predictor = [] + for name in self.input: + predictor_item = PipePredictor( + args, cfg, is_video=True, multi_camera=True) + predictor_item.set_file_name(name) + self.predictor.append(predictor_item) + else: - self.predictor = PipePredictor( - cfg, - self.is_video, - enable_attr=enable_attr, - enable_action=enable_action, - device=device, - run_mode=run_mode, - trt_min_shape=trt_min_shape, - trt_max_shape=trt_max_shape, - trt_opt_shape=trt_opt_shape, - trt_calib_mode=trt_calib_mode, - cpu_threads=cpu_threads, - enable_mkldnn=enable_mkldnn, - output_dir=output_dir, - draw_center_traj=draw_center_traj, - secs_interval=secs_interval, - do_entrance_counting=do_entrance_counting) + self.predictor = PipePredictor(args, cfg, self.is_video) if self.is_video: - self.predictor.set_file_name(video_file) - - self.output_dir = output_dir - self.draw_center_traj = draw_center_traj - self.secs_interval = secs_interval - self.do_entrance_counting = do_entrance_counting + self.predictor.set_file_name(args.video_file) + + self.output_dir = args.output_dir + self.draw_center_traj = args.draw_center_traj + self.secs_interval = args.secs_interval + self.do_entrance_counting = args.do_entrance_counting + self.do_break_in_counting = args.do_break_in_counting + self.region_type = args.region_type + self.region_polygon = args.region_polygon + if self.region_type == 'custom': + assert len( + self.region_polygon + ) > 6, 'region_type is custom, region_polygon should be at least 3 pairs of point coords.' def _parse_input(self, image_file, image_dir, video_file, video_dir, camera_id): @@ -162,7 +131,9 @@ class Pipeline(object): self.multi_camera = False elif video_file is not None: - assert os.path.exists(video_file), "video_file not exists." + assert os.path.exists( + video_file + ) or 'rtsp' in video_file, "video_file not exists and not an rtsp site." self.multi_camera = False input = video_file self.is_video = True @@ -184,7 +155,7 @@ class Pipeline(object): else: raise ValueError( - "Illegal Input, please set one of ['video_file','camera_id','image_file', 'image_dir']" + "Illegal Input, please set one of ['video_file', 'camera_id', 'image_file', 'image_dir']" ) return input @@ -196,16 +167,56 @@ class Pipeline(object): predictor.run(input) collector_data = predictor.get_result() multi_res.append(collector_data) - mtmct_process( - multi_res, - self.input, - mtmct_vis=self.vis_result, - output_dir=self.output_dir) + if self.enable_mtmct: + mtmct_process( + multi_res, + self.input, + mtmct_vis=self.vis_result, + output_dir=self.output_dir) else: self.predictor.run(self.input) +def get_model_dir(cfg): + # auto download inference model + model_dir_dict = {} + for key in cfg.keys(): + if type(cfg[key]) == dict and \ + ("enable" in cfg[key].keys() and cfg[key]['enable'] + or "enable" not in cfg[key].keys()): + + if "model_dir" in cfg[key].keys(): + model_dir = cfg[key]["model_dir"] + downloaded_model_dir = auto_download_model(model_dir) + if downloaded_model_dir: + model_dir = downloaded_model_dir + model_dir_dict[key] = model_dir + print(key, " model dir:", model_dir) + elif key == "VEHICLE_PLATE": + det_model_dir = cfg[key]["det_model_dir"] + downloaded_det_model_dir = auto_download_model(det_model_dir) + if downloaded_det_model_dir: + det_model_dir = downloaded_det_model_dir + model_dir_dict["det_model_dir"] = det_model_dir + print("det_model_dir model dir:", det_model_dir) + + rec_model_dir = cfg[key]["rec_model_dir"] + downloaded_rec_model_dir = auto_download_model(rec_model_dir) + if downloaded_rec_model_dir: + rec_model_dir = downloaded_rec_model_dir + model_dir_dict["rec_model_dir"] = rec_model_dir + print("rec_model_dir model dir:", rec_model_dir) + elif key == "MOT": # for idbased and skeletonbased actions + model_dir = cfg[key]["model_dir"] + downloaded_model_dir = auto_download_model(model_dir) + if downloaded_model_dir: + model_dir = downloaded_model_dir + model_dir_dict[key] = model_dir + + return model_dir_dict + + class PipePredictor(object): """ Predictor in single camera @@ -219,7 +230,8 @@ class PipePredictor(object): 1. Tracking 2. Tracking -> Attribute - 3. Tracking -> KeyPoint -> Action Recognition + 3. Tracking -> KeyPoint -> SkeletonAction Recognition + 4. VideoAction Recognition Args: cfg (dict): config of models in pipeline @@ -227,8 +239,6 @@ class PipePredictor(object): multi_camera (bool): whether to use multi camera in pipeline, default as False camera_id (int): the device id of camera to predict, default as -1 - enable_attr (bool): whether use attribute recognition, default as false - enable_action (bool): whether use action recognition, default as false device (string): the device to predict, options are: CPU/GPU/XPU, default as CPU run_mode (string): the mode of prediction, options are: @@ -244,52 +254,95 @@ class PipePredictor(object): draw_center_traj (bool): Whether drawing the trajectory of center, default as False secs_interval (int): The seconds interval to count after tracking, default as 10 do_entrance_counting(bool): Whether counting the numbers of identifiers entering - or getting out from the entrance, default as False,only support single class + or getting out from the entrance, default as False, only support single class counting in MOT. """ - def __init__(self, - cfg, - is_video=True, - multi_camera=False, - enable_attr=False, - enable_action=False, - device='CPU', - run_mode='paddle', - trt_min_shape=1, - trt_max_shape=1280, - trt_opt_shape=640, - trt_calib_mode=False, - cpu_threads=1, - enable_mkldnn=False, - output_dir='output', - draw_center_traj=False, - secs_interval=10, - do_entrance_counting=False): - - if enable_attr and not cfg.get('ATTR', False): - ValueError( - 'enable_attr is set to True, please set ATTR in config file') - if enable_action and (not cfg.get('ACTION', False) or - not cfg.get('KPT', False)): - ValueError( - 'enable_action is set to True, please set KPT and ACTION in config file' - ) - - self.with_attr = cfg.get('ATTR', False) and enable_attr - self.with_action = cfg.get('ACTION', False) and enable_action - self.with_mtmct = cfg.get('REID', False) and multi_camera - if self.with_attr: - print('Attribute Recognition enabled') - if self.with_action: - print('Action Recognition enabled') - if multi_camera: - if not self.with_mtmct: - print( - 'Warning!!! MTMCT enabled, but cannot find REID config in [infer_cfg.yml], please check!' - ) - else: - print("MTMCT enabled") + def __init__(self, args, cfg, is_video=True, multi_camera=False): + device = args.device + run_mode = args.run_mode + trt_min_shape = args.trt_min_shape + trt_max_shape = args.trt_max_shape + trt_opt_shape = args.trt_opt_shape + trt_calib_mode = args.trt_calib_mode + cpu_threads = args.cpu_threads + enable_mkldnn = args.enable_mkldnn + output_dir = args.output_dir + draw_center_traj = args.draw_center_traj + secs_interval = args.secs_interval + do_entrance_counting = args.do_entrance_counting + do_break_in_counting = args.do_break_in_counting + region_type = args.region_type + region_polygon = args.region_polygon + + # general module for pphuman and ppvehicle + self.with_mot = cfg.get('MOT', False)['enable'] if cfg.get( + 'MOT', False) else False + self.with_human_attr = cfg.get('ATTR', False)['enable'] if cfg.get( + 'ATTR', False) else False + if self.with_mot: + print('Multi-Object Tracking enabled') + if self.with_human_attr: + print('Human Attribute Recognition enabled') + + # only for pphuman + self.with_skeleton_action = cfg.get( + 'SKELETON_ACTION', False)['enable'] if cfg.get('SKELETON_ACTION', + False) else False + self.with_video_action = cfg.get( + 'VIDEO_ACTION', False)['enable'] if cfg.get('VIDEO_ACTION', + False) else False + self.with_idbased_detaction = cfg.get( + 'ID_BASED_DETACTION', False)['enable'] if cfg.get( + 'ID_BASED_DETACTION', False) else False + self.with_idbased_clsaction = cfg.get( + 'ID_BASED_CLSACTION', False)['enable'] if cfg.get( + 'ID_BASED_CLSACTION', False) else False + self.with_mtmct = cfg.get('REID', False)['enable'] if cfg.get( + 'REID', False) else False + + if self.with_skeleton_action: + print('SkeletonAction Recognition enabled') + if self.with_video_action: + print('VideoAction Recognition enabled') + if self.with_idbased_detaction: + print('IDBASED Detection Action Recognition enabled') + if self.with_idbased_clsaction: + print('IDBASED Classification Action Recognition enabled') + if self.with_mtmct: + print("MTMCT enabled") + + # only for ppvehicle + self.with_vehicleplate = cfg.get( + 'VEHICLE_PLATE', False)['enable'] if cfg.get('VEHICLE_PLATE', + False) else False + if self.with_vehicleplate: + print('Vehicle Plate Recognition enabled') + + self.with_vehicle_attr = cfg.get( + 'VEHICLE_ATTR', False)['enable'] if cfg.get('VEHICLE_ATTR', + False) else False + if self.with_vehicle_attr: + print('Vehicle Attribute Recognition enabled') + + self.modebase = { + "framebased": False, + "videobased": False, + "idbased": False, + "skeletonbased": False + } + + self.basemode = { + "MOT": "idbased", + "ATTR": "idbased", + "VIDEO_ACTION": "videobased", + "SKELETON_ACTION": "skeletonbased", + "ID_BASED_DETACTION": "idbased", + "ID_BASED_CLSACTION": "idbased", + "REID": "idbased", + "VEHICLE_PLATE": "idbased", + "VEHICLE_ATTR": "idbased", + } self.is_video = is_video self.multi_camera = multi_camera @@ -298,6 +351,9 @@ class PipePredictor(object): self.draw_center_traj = draw_center_traj self.secs_interval = secs_interval self.do_entrance_counting = do_entrance_counting + self.do_break_in_counting = do_break_in_counting + self.region_type = region_type + self.region_polygon = region_polygon self.warmup_frame = self.cfg['warmup_frame'] self.pipeline_res = Result() @@ -305,102 +361,236 @@ class PipePredictor(object): self.file_name = None self.collector = DataCollector() + # auto download inference model + model_dir_dict = get_model_dir(self.cfg) + if not is_video: det_cfg = self.cfg['DET'] - model_dir = det_cfg['model_dir'] + model_dir = model_dir_dict['DET'] batch_size = det_cfg['batch_size'] self.det_predictor = Detector( model_dir, device, run_mode, batch_size, trt_min_shape, trt_max_shape, trt_opt_shape, trt_calib_mode, cpu_threads, enable_mkldnn) - if self.with_attr: + if self.with_human_attr: attr_cfg = self.cfg['ATTR'] - model_dir = attr_cfg['model_dir'] + model_dir = model_dir_dict['ATTR'] batch_size = attr_cfg['batch_size'] + basemode = self.basemode['ATTR'] + self.modebase[basemode] = True self.attr_predictor = AttrDetector( model_dir, device, run_mode, batch_size, trt_min_shape, trt_max_shape, trt_opt_shape, trt_calib_mode, cpu_threads, enable_mkldnn) + if self.with_vehicle_attr: + vehicleattr_cfg = self.cfg['VEHICLE_ATTR'] + model_dir = model_dir_dict['VEHICLE_ATTR'] + batch_size = vehicleattr_cfg['batch_size'] + color_threshold = vehicleattr_cfg['color_threshold'] + type_threshold = vehicleattr_cfg['type_threshold'] + basemode = self.basemode['VEHICLE_ATTR'] + self.modebase[basemode] = True + self.vehicle_attr_predictor = VehicleAttr( + model_dir, device, run_mode, batch_size, trt_min_shape, + trt_max_shape, trt_opt_shape, trt_calib_mode, cpu_threads, + enable_mkldnn, color_threshold, type_threshold) + else: - mot_cfg = self.cfg['MOT'] - model_dir = mot_cfg['model_dir'] - tracker_config = mot_cfg['tracker_config'] - batch_size = mot_cfg['batch_size'] - self.mot_predictor = SDE_Detector( - model_dir, - tracker_config, - device, - run_mode, - batch_size, - trt_min_shape, - trt_max_shape, - trt_opt_shape, - trt_calib_mode, - cpu_threads, - enable_mkldnn, - draw_center_traj=draw_center_traj, - secs_interval=secs_interval, - do_entrance_counting=do_entrance_counting) - if self.with_attr: + if self.with_human_attr: attr_cfg = self.cfg['ATTR'] - model_dir = attr_cfg['model_dir'] + model_dir = model_dir_dict['ATTR'] batch_size = attr_cfg['batch_size'] + basemode = self.basemode['ATTR'] + self.modebase[basemode] = True self.attr_predictor = AttrDetector( model_dir, device, run_mode, batch_size, trt_min_shape, trt_max_shape, trt_opt_shape, trt_calib_mode, cpu_threads, enable_mkldnn) - if self.with_action: - kpt_cfg = self.cfg['KPT'] - kpt_model_dir = kpt_cfg['model_dir'] - kpt_batch_size = kpt_cfg['batch_size'] - action_cfg = self.cfg['ACTION'] - action_model_dir = action_cfg['model_dir'] - action_batch_size = action_cfg['batch_size'] - action_frames = action_cfg['max_frames'] - display_frames = action_cfg['display_frames'] - self.coord_size = action_cfg['coord_size'] - - self.kpt_predictor = KeyPointDetector( - kpt_model_dir, + if self.with_idbased_detaction: + idbased_detaction_cfg = self.cfg['ID_BASED_DETACTION'] + model_dir = model_dir_dict['ID_BASED_DETACTION'] + batch_size = idbased_detaction_cfg['batch_size'] + basemode = self.basemode['ID_BASED_DETACTION'] + threshold = idbased_detaction_cfg['threshold'] + display_frames = idbased_detaction_cfg['display_frames'] + skip_frame_num = idbased_detaction_cfg['skip_frame_num'] + self.modebase[basemode] = True + + self.det_action_predictor = DetActionRecognizer( + model_dir, device, run_mode, - kpt_batch_size, + batch_size, trt_min_shape, trt_max_shape, trt_opt_shape, trt_calib_mode, cpu_threads, enable_mkldnn, - use_dark=False) - self.kpt_buff = KeyPointBuff(action_frames) - - self.action_predictor = ActionRecognizer( - action_model_dir, + threshold=threshold, + display_frames=display_frames, + skip_frame_num=skip_frame_num) + self.det_action_visual_helper = ActionVisualHelper(1) + + if self.with_idbased_clsaction: + idbased_clsaction_cfg = self.cfg['ID_BASED_CLSACTION'] + model_dir = model_dir_dict['ID_BASED_CLSACTION'] + batch_size = idbased_clsaction_cfg['batch_size'] + basemode = self.basemode['ID_BASED_CLSACTION'] + threshold = idbased_clsaction_cfg['threshold'] + self.modebase[basemode] = True + display_frames = idbased_clsaction_cfg['display_frames'] + skip_frame_num = idbased_clsaction_cfg['skip_frame_num'] + + self.cls_action_predictor = ClsActionRecognizer( + model_dir, device, run_mode, - action_batch_size, + batch_size, trt_min_shape, trt_max_shape, trt_opt_shape, trt_calib_mode, cpu_threads, enable_mkldnn, - window_size=action_frames) + threshold=threshold, + display_frames=display_frames, + skip_frame_num=skip_frame_num) + self.cls_action_visual_helper = ActionVisualHelper(1) + + if self.with_skeleton_action: + skeleton_action_cfg = self.cfg['SKELETON_ACTION'] + skeleton_action_model_dir = model_dir_dict['SKELETON_ACTION'] + skeleton_action_batch_size = skeleton_action_cfg['batch_size'] + skeleton_action_frames = skeleton_action_cfg['max_frames'] + display_frames = skeleton_action_cfg['display_frames'] + self.coord_size = skeleton_action_cfg['coord_size'] + basemode = self.basemode['SKELETON_ACTION'] + self.modebase[basemode] = True + + self.skeleton_action_predictor = SkeletonActionRecognizer( + skeleton_action_model_dir, + device, + run_mode, + skeleton_action_batch_size, + trt_min_shape, + trt_max_shape, + trt_opt_shape, + trt_calib_mode, + cpu_threads, + enable_mkldnn, + window_size=skeleton_action_frames) + self.skeleton_action_visual_helper = ActionVisualHelper( + display_frames) + + if self.modebase["skeletonbased"]: + kpt_cfg = self.cfg['KPT'] + kpt_model_dir = model_dir_dict['KPT'] + kpt_batch_size = kpt_cfg['batch_size'] + self.kpt_predictor = KeyPointDetector( + kpt_model_dir, + device, + run_mode, + kpt_batch_size, + trt_min_shape, + trt_max_shape, + trt_opt_shape, + trt_calib_mode, + cpu_threads, + enable_mkldnn, + use_dark=False) + self.kpt_buff = KeyPointBuff(skeleton_action_frames) + + if self.with_vehicleplate: + vehicleplate_cfg = self.cfg['VEHICLE_PLATE'] + self.vehicleplate_detector = PlateRecognizer(args, + vehicleplate_cfg) + basemode = self.basemode['VEHICLE_PLATE'] + self.modebase[basemode] = True + + if self.with_vehicle_attr: + vehicleattr_cfg = self.cfg['VEHICLE_ATTR'] + model_dir = model_dir_dict['VEHICLE_ATTR'] + batch_size = vehicleattr_cfg['batch_size'] + color_threshold = vehicleattr_cfg['color_threshold'] + type_threshold = vehicleattr_cfg['type_threshold'] + basemode = self.basemode['VEHICLE_ATTR'] + self.modebase[basemode] = True + self.vehicle_attr_predictor = VehicleAttr( + model_dir, device, run_mode, batch_size, trt_min_shape, + trt_max_shape, trt_opt_shape, trt_calib_mode, cpu_threads, + enable_mkldnn, color_threshold, type_threshold) - self.action_visual_helper = ActionVisualHelper(display_frames) + if self.with_mtmct: + reid_cfg = self.cfg['REID'] + model_dir = model_dir_dict['REID'] + batch_size = reid_cfg['batch_size'] + basemode = self.basemode['REID'] + self.modebase[basemode] = True + self.reid_predictor = ReID( + model_dir, device, run_mode, batch_size, trt_min_shape, + trt_max_shape, trt_opt_shape, trt_calib_mode, cpu_threads, + enable_mkldnn) - if self.with_mtmct: - reid_cfg = self.cfg['REID'] - model_dir = reid_cfg['model_dir'] - batch_size = reid_cfg['batch_size'] - self.reid_predictor = ReID(model_dir, device, run_mode, batch_size, - trt_min_shape, trt_max_shape, - trt_opt_shape, trt_calib_mode, - cpu_threads, enable_mkldnn) + if self.with_mot or self.modebase["idbased"] or self.modebase[ + "skeletonbased"]: + mot_cfg = self.cfg['MOT'] + model_dir = model_dir_dict['MOT'] + tracker_config = mot_cfg['tracker_config'] + batch_size = mot_cfg['batch_size'] + basemode = self.basemode['MOT'] + self.modebase[basemode] = True + self.mot_predictor = SDE_Detector( + model_dir, + tracker_config, + device, + run_mode, + batch_size, + trt_min_shape, + trt_max_shape, + trt_opt_shape, + trt_calib_mode, + cpu_threads, + enable_mkldnn, + draw_center_traj=draw_center_traj, + secs_interval=secs_interval, + do_entrance_counting=do_entrance_counting, + do_break_in_counting=do_break_in_counting, + region_type=region_type, + region_polygon=region_polygon) + + if self.with_video_action: + video_action_cfg = self.cfg['VIDEO_ACTION'] + + basemode = self.basemode['VIDEO_ACTION'] + self.modebase[basemode] = True + + video_action_model_dir = model_dir_dict['VIDEO_ACTION'] + video_action_batch_size = video_action_cfg['batch_size'] + short_size = video_action_cfg["short_size"] + target_size = video_action_cfg["target_size"] + + self.video_action_predictor = VideoActionRecognizer( + model_dir=video_action_model_dir, + short_size=short_size, + target_size=target_size, + device=device, + run_mode=run_mode, + batch_size=video_action_batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn) def set_file_name(self, path): - self.file_name = os.path.split(path)[-1] + if path is not None: + self.file_name = os.path.split(path)[-1] + else: + # use camera id + self.file_name = None def get_result(self): return self.collector.get_res() @@ -435,7 +625,7 @@ class PipePredictor(object): self.pipe_timer.module_time['det'].end() self.pipeline_res.update(det_res, 'det') - if self.with_attr: + if self.with_human_attr: crop_inputs = crop_image_with_det(batch_input, det_res) attr_res_list = [] @@ -453,6 +643,24 @@ class PipePredictor(object): attr_res = {'output': attr_res_list} self.pipeline_res.update(attr_res, 'attr') + if self.with_vehicle_attr: + crop_inputs = crop_image_with_det(batch_input, det_res) + vehicle_attr_res_list = [] + + if i > self.warmup_frame: + self.pipe_timer.module_time['vehicle_attr'].start() + + for crop_input in crop_inputs: + attr_res = self.vehicle_attr_predictor.predict_image( + crop_input, visual=False) + vehicle_attr_res_list.extend(attr_res['output']) + + if i > self.warmup_frame: + self.pipe_timer.module_time['vehicle_attr'].end() + + attr_res = {'output': vehicle_attr_res_list} + self.pipeline_res.update(attr_res, 'vehicle_attr') + self.pipe_timer.img_num += len(batch_input) if i > self.warmup_frame: self.pipe_timer.total_time.end() @@ -466,6 +674,8 @@ class PipePredictor(object): # mot -> pose -> action capture = cv2.VideoCapture(video_file) video_out_name = 'output.mp4' if self.file_name is None else self.file_name + if "rtsp" in video_file: + video_out_name = video_out_name + "_rtsp.mp4" # Get Video info : resolution, fps, frame count width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) @@ -490,119 +700,233 @@ class PipePredictor(object): out_id_list = list() prev_center = dict() records = list() - entrance = [0, height / 2., width, height / 2.] + if self.do_entrance_counting or self.do_break_in_counting: + if self.region_type == 'horizontal': + entrance = [0, height / 2., width, height / 2.] + elif self.region_type == 'vertical': + entrance = [width / 2, 0., width / 2, height] + elif self.region_type == 'custom': + entrance = [] + assert len( + self.region_polygon + ) % 2 == 0, "region_polygon should be pairs of coords points when do break_in counting." + for i in range(0, len(self.region_polygon), 2): + entrance.append( + [self.region_polygon[i], self.region_polygon[i + 1]]) + entrance.append([width, height]) + else: + raise ValueError("region_type:{} unsupported.".format( + self.region_type)) + video_fps = fps + video_action_imgs = [] + + if self.with_video_action: + short_size = self.cfg["VIDEO_ACTION"]["short_size"] + scale = ShortSizeScale(short_size) + while (1): if frame_id % 10 == 0: print('frame id: ', frame_id) + ret, frame = capture.read() if not ret: break + frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) - if frame_id > self.warmup_frame: - self.pipe_timer.total_time.start() - self.pipe_timer.module_time['mot'].start() - res = self.mot_predictor.predict_image( - [copy.deepcopy(frame)], visual=False) + if self.modebase["idbased"] or self.modebase["skeletonbased"]: + if frame_id > self.warmup_frame: + self.pipe_timer.total_time.start() + self.pipe_timer.module_time['mot'].start() + res = self.mot_predictor.predict_image( + [copy.deepcopy(frame_rgb)], visual=False) - if frame_id > self.warmup_frame: - self.pipe_timer.module_time['mot'].end() - - # mot output format: id, class, score, xmin, ymin, xmax, ymax - mot_res = parse_mot_res(res) - - # flow_statistic only support single class MOT - boxes, scores, ids = res[0] # batch size = 1 in MOT - mot_result = (frame_id + 1, boxes[0], scores[0], - ids[0]) # single class - statistic = flow_statistic( - mot_result, self.secs_interval, self.do_entrance_counting, - video_fps, entrance, id_set, interval_id_set, in_id_list, - out_id_list, prev_center, records) - records = statistic['records'] - - # nothing detected - if len(mot_res['boxes']) == 0: - frame_id += 1 if frame_id > self.warmup_frame: - self.pipe_timer.img_num += 1 - self.pipe_timer.total_time.end() - if self.cfg['visual']: - _, _, fps = self.pipe_timer.get_total_time() - im = self.visualize_video(frame, mot_res, frame_id, - fps) # visualize - writer.write(im) - continue - - self.pipeline_res.update(mot_res, 'mot') - if self.with_attr or self.with_action: + self.pipe_timer.module_time['mot'].end() + + # mot output format: id, class, score, xmin, ymin, xmax, ymax + mot_res = parse_mot_res(res) + + # flow_statistic only support single class MOT + boxes, scores, ids = res[0] # batch size = 1 in MOT + mot_result = (frame_id + 1, boxes[0], scores[0], + ids[0]) # single class + statistic = flow_statistic( + mot_result, self.secs_interval, self.do_entrance_counting, + self.do_break_in_counting, self.region_type, video_fps, + entrance, id_set, interval_id_set, in_id_list, out_id_list, + prev_center, records) + records = statistic['records'] + + # nothing detected + if len(mot_res['boxes']) == 0: + frame_id += 1 + if frame_id > self.warmup_frame: + self.pipe_timer.img_num += 1 + self.pipe_timer.total_time.end() + if self.cfg['visual']: + _, _, fps = self.pipe_timer.get_total_time() + im = self.visualize_video(frame, mot_res, frame_id, fps, + entrance, records, + center_traj) # visualize + writer.write(im) + if self.file_name is None: # use camera_id + cv2.imshow('Paddle-Pipeline', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + continue + + self.pipeline_res.update(mot_res, 'mot') crop_input, new_bboxes, ori_bboxes = crop_image_with_mot( - frame, mot_res) + frame_rgb, mot_res) - if self.with_attr: - if frame_id > self.warmup_frame: - self.pipe_timer.module_time['attr'].start() - attr_res = self.attr_predictor.predict_image( - crop_input, visual=False) - if frame_id > self.warmup_frame: - self.pipe_timer.module_time['attr'].end() - self.pipeline_res.update(attr_res, 'attr') + if self.with_vehicleplate and frame_id % 10 == 0: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicleplate'].start() + plate_input, _, _ = crop_image_with_mot( + frame_rgb, mot_res, expand=False) + platelicense = self.vehicleplate_detector.get_platelicense( + plate_input) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicleplate'].end() + self.pipeline_res.update(platelicense, 'vehicleplate') + else: + self.pipeline_res.clear('vehicleplate') - if self.with_action: - if frame_id > self.warmup_frame: - self.pipe_timer.module_time['kpt'].start() - kpt_pred = self.kpt_predictor.predict_image( - crop_input, visual=False) - keypoint_vector, score_vector = translate_to_ori_images( - kpt_pred, np.array(new_bboxes)) - kpt_res = {} - kpt_res['keypoint'] = [ - keypoint_vector.tolist(), score_vector.tolist() - ] if len(keypoint_vector) > 0 else [[], []] - kpt_res['bbox'] = ori_bboxes - if frame_id > self.warmup_frame: - self.pipe_timer.module_time['kpt'].end() + if self.with_human_attr: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['attr'].start() + attr_res = self.attr_predictor.predict_image( + crop_input, visual=False) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['attr'].end() + self.pipeline_res.update(attr_res, 'attr') - self.pipeline_res.update(kpt_res, 'kpt') + if self.with_vehicle_attr: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicle_attr'].start() + attr_res = self.vehicle_attr_predictor.predict_image( + crop_input, visual=False) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['vehicle_attr'].end() + self.pipeline_res.update(attr_res, 'vehicle_attr') - self.kpt_buff.update(kpt_res, mot_res) # collect kpt output - state = self.kpt_buff.get_state( - ) # whether frame num is enough or lost tracker + if self.with_idbased_detaction: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['det_action'].start() + det_action_res = self.det_action_predictor.predict( + crop_input, mot_res) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['det_action'].end() + self.pipeline_res.update(det_action_res, 'det_action') - action_res = {} - if state: + if self.cfg['visual']: + self.det_action_visual_helper.update(det_action_res) + + if self.with_idbased_clsaction: if frame_id > self.warmup_frame: - self.pipe_timer.module_time['action'].start() - collected_keypoint = self.kpt_buff.get_collected_keypoint( - ) # reoragnize kpt output with ID - action_input = parse_mot_keypoint(collected_keypoint, - self.coord_size) - action_res = self.action_predictor.predict_skeleton_with_mot( - action_input) + self.pipe_timer.module_time['cls_action'].start() + cls_action_res = self.cls_action_predictor.predict_with_mot( + crop_input, mot_res) if frame_id > self.warmup_frame: - self.pipe_timer.module_time['action'].end() - self.pipeline_res.update(action_res, 'action') + self.pipe_timer.module_time['cls_action'].end() + self.pipeline_res.update(cls_action_res, 'cls_action') - if self.cfg['visual']: - self.action_visual_helper.update(action_res) + if self.cfg['visual']: + self.cls_action_visual_helper.update(cls_action_res) - if self.with_mtmct: - crop_input, img_qualities, rects = self.reid_predictor.crop_image_with_mot( - frame, mot_res) - if frame_id > self.warmup_frame: - self.pipe_timer.module_time['reid'].start() - reid_res = self.reid_predictor.predict_batch(crop_input) + if self.with_skeleton_action: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['kpt'].start() + kpt_pred = self.kpt_predictor.predict_image( + crop_input, visual=False) + keypoint_vector, score_vector = translate_to_ori_images( + kpt_pred, np.array(new_bboxes)) + kpt_res = {} + kpt_res['keypoint'] = [ + keypoint_vector.tolist(), score_vector.tolist() + ] if len(keypoint_vector) > 0 else [[], []] + kpt_res['bbox'] = ori_bboxes + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['kpt'].end() + + self.pipeline_res.update(kpt_res, 'kpt') + + self.kpt_buff.update(kpt_res, mot_res) # collect kpt output + state = self.kpt_buff.get_state( + ) # whether frame num is enough or lost tracker + + skeleton_action_res = {} + if state: + if frame_id > self.warmup_frame: + self.pipe_timer.module_time[ + 'skeleton_action'].start() + collected_keypoint = self.kpt_buff.get_collected_keypoint( + ) # reoragnize kpt output with ID + skeleton_action_input = parse_mot_keypoint( + collected_keypoint, self.coord_size) + skeleton_action_res = self.skeleton_action_predictor.predict_skeleton_with_mot( + skeleton_action_input) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['skeleton_action'].end() + self.pipeline_res.update(skeleton_action_res, + 'skeleton_action') + + if self.cfg['visual']: + self.skeleton_action_visual_helper.update( + skeleton_action_res) + + if self.with_mtmct and frame_id % 10 == 0: + crop_input, img_qualities, rects = self.reid_predictor.crop_image_with_mot( + frame_rgb, mot_res) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['reid'].start() + reid_res = self.reid_predictor.predict_batch(crop_input) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['reid'].end() + + reid_res_dict = { + 'features': reid_res, + "qualities": img_qualities, + "rects": rects + } + self.pipeline_res.update(reid_res_dict, 'reid') + else: + self.pipeline_res.clear('reid') + + if self.with_video_action: + # get the params + frame_len = self.cfg["VIDEO_ACTION"]["frame_len"] + sample_freq = self.cfg["VIDEO_ACTION"]["sample_freq"] + + if sample_freq * frame_len > frame_count: # video is too short + sample_freq = int(frame_count / frame_len) + + # filter the warmup frames if frame_id > self.warmup_frame: - self.pipe_timer.module_time['reid'].end() + self.pipe_timer.module_time['video_action'].start() + + # collect frames + if frame_id % sample_freq == 0: + # Scale image + scaled_img = scale(frame_rgb) + video_action_imgs.append(scaled_img) + + # the number of collected frames is enough to predict video action + if len(video_action_imgs) == frame_len: + classes, scores = self.video_action_predictor.predict( + video_action_imgs) + if frame_id > self.warmup_frame: + self.pipe_timer.module_time['video_action'].end() + + video_action_res = {"class": classes[0], "score": scores[0]} + self.pipeline_res.update(video_action_res, 'video_action') + + print("video_action_res:", video_action_res) - reid_res_dict = { - 'features': reid_res, - "qualities": img_qualities, - "rects": rects - } - self.pipeline_res.update(reid_res_dict, 'reid') + video_action_imgs.clear() # next clip self.collector.append(frame_id, self.pipeline_res) @@ -613,10 +937,14 @@ class PipePredictor(object): if self.cfg['visual']: _, _, fps = self.pipe_timer.get_total_time() - im = self.visualize_video(frame, self.pipeline_res, frame_id, - fps, entrance, records, - center_traj) # visualize + im = self.visualize_video( + frame, self.pipeline_res, self.collector, frame_id, fps, + entrance, records, center_traj) # visualize writer.write(im) + if self.file_name is None: # use camera_id + cv2.imshow('Paddle-Pipeline', im) + if cv2.waitKey(1) & 0xFF == ord('q'): + break writer.release() print('save result to {}'.format(out_path)) @@ -624,6 +952,7 @@ class PipePredictor(object): def visualize_video(self, image, result, + collector, frame_id, fps, entrance=None, @@ -650,26 +979,51 @@ class PipePredictor(object): online_scores[0] = scores online_ids[0] = ids - image = plot_tracking_dict( - image, - num_classes, - online_tlwhs, - online_ids, - online_scores, - frame_id=frame_id, - fps=fps, - do_entrance_counting=self.do_entrance_counting, - entrance=entrance, - records=records, - center_traj=center_traj) - - attr_res = result.get('attr') - if attr_res is not None: + if mot_res is not None: + image = plot_tracking_dict( + image, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=self.mot_predictor.pred_config.labels, + do_entrance_counting=self.do_entrance_counting, + do_break_in_counting=self.do_break_in_counting, + entrance=entrance, + records=records, + center_traj=center_traj) + + human_attr_res = result.get('attr') + if human_attr_res is not None: boxes = mot_res['boxes'][:, 1:] - attr_res = attr_res['output'] - image = visualize_attr(image, attr_res, boxes) + human_attr_res = human_attr_res['output'] + image = visualize_attr(image, human_attr_res, boxes) image = np.array(image) + vehicle_attr_res = result.get('vehicle_attr') + if vehicle_attr_res is not None: + boxes = mot_res['boxes'][:, 1:] + vehicle_attr_res = vehicle_attr_res['output'] + image = visualize_attr(image, vehicle_attr_res, boxes) + image = np.array(image) + + if mot_res is not None: + vehicleplate = False + plates = [] + for trackid in mot_res['boxes'][:, 0]: + plate = collector.get_carlp(trackid) + if plate != None: + vehicleplate = True + plates.append(plate) + else: + plates.append("") + if vehicleplate: + boxes = mot_res['boxes'][:, 1:] + image = visualize_vehicleplate(image, plates, boxes) + image = np.array(image) + kpt_res = result.get('kpt') if kpt_res is not None: image = visualize_pose( @@ -678,17 +1032,53 @@ class PipePredictor(object): visual_thresh=self.cfg['kpt_thresh'], returnimg=True) - action_res = result.get('action') - if action_res is not None: + video_action_res = result.get('video_action') + if video_action_res is not None: + video_action_score = None + if video_action_res and video_action_res["class"] == 1: + video_action_score = video_action_res["score"] + mot_boxes = None + if mot_res: + mot_boxes = mot_res['boxes'] + image = visualize_action( + image, + mot_boxes, + action_visual_collector=None, + action_text="SkeletonAction", + video_action_score=video_action_score, + video_action_text="Fight") + + visual_helper_for_display = [] + action_to_display = [] + + skeleton_action_res = result.get('skeleton_action') + if skeleton_action_res is not None: + visual_helper_for_display.append(self.skeleton_action_visual_helper) + action_to_display.append("Falling") + + det_action_res = result.get('det_action') + if det_action_res is not None: + visual_helper_for_display.append(self.det_action_visual_helper) + action_to_display.append("Smoking") + + cls_action_res = result.get('cls_action') + if cls_action_res is not None: + visual_helper_for_display.append(self.cls_action_visual_helper) + action_to_display.append("Calling") + + if len(visual_helper_for_display) > 0: image = visualize_action(image, mot_res['boxes'], - self.action_visual_helper, "Falling") + visual_helper_for_display, + action_to_display) return image def visualize_image(self, im_files, images, result): start_idx, boxes_num_i = 0, 0 det_res = result.get('det') - attr_res = result.get('attr') + human_attr_res = result.get('attr') + vehicle_attr_res = result.get('vehicle_attr') + for i, (im_file, im) in enumerate(zip(im_files, images)): if det_res is not None: det_res_i = {} @@ -702,10 +1092,15 @@ class PipePredictor(object): threshold=self.cfg['crop_thresh']) im = np.ascontiguousarray(np.copy(im)) im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) - if attr_res is not None: - attr_res_i = attr_res['output'][start_idx:start_idx + - boxes_num_i] - im = visualize_attr(im, attr_res_i, det_res_i['boxes']) + if human_attr_res is not None: + human_attr_res_i = human_attr_res['output'][start_idx:start_idx + + boxes_num_i] + im = visualize_attr(im, human_attr_res_i, det_res_i['boxes']) + if vehicle_attr_res is not None: + vehicle_attr_res_i = vehicle_attr_res['output'][ + start_idx:start_idx + boxes_num_i] + im = visualize_attr(im, vehicle_attr_res_i, det_res_i['boxes']) + img_name = os.path.split(im_file)[-1] if not os.path.exists(self.output_dir): os.makedirs(self.output_dir) @@ -716,21 +1111,17 @@ class PipePredictor(object): def main(): - cfg = merge_cfg(FLAGS) + cfg = merge_cfg(FLAGS) # use command params to update config print_arguments(cfg) - pipeline = Pipeline( - cfg, FLAGS.image_file, FLAGS.image_dir, FLAGS.video_file, - FLAGS.video_dir, FLAGS.camera_id, FLAGS.enable_attr, - FLAGS.enable_action, FLAGS.device, FLAGS.run_mode, FLAGS.trt_min_shape, - FLAGS.trt_max_shape, FLAGS.trt_opt_shape, FLAGS.trt_calib_mode, - FLAGS.cpu_threads, FLAGS.enable_mkldnn, FLAGS.output_dir, - FLAGS.draw_center_traj, FLAGS.secs_interval, FLAGS.do_entrance_counting) + pipeline = Pipeline(FLAGS, cfg) pipeline.run() if __name__ == '__main__': paddle.enable_static() + + # parse params from command parser = argsparser() FLAGS = parser.parse_args() FLAGS.device = FLAGS.device.upper() diff --git a/deploy/python/action_infer.py b/deploy/pipeline/pphuman/action_infer.py similarity index 47% rename from deploy/python/action_infer.py rename to deploy/pipeline/pphuman/action_infer.py index 8df189f9014909d1674e31406b557dcae567b5bd..1cf0c8baaa25c31e27072ae4371f9bb60054e689 100644 --- a/deploy/python/action_infer.py +++ b/deploy/pipeline/pphuman/action_infer.py @@ -28,12 +28,13 @@ parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) sys.path.insert(0, parent_path) from paddle.inference import Config, create_predictor -from utils import argsparser, Timer, get_current_memory_mb -from benchmark_utils import PaddleInferBenchmark -from infer import Detector, print_arguments +from python.utils import argsparser, Timer, get_current_memory_mb +from python.benchmark_utils import PaddleInferBenchmark +from python.infer import Detector, print_arguments +from attr_infer import AttrDetector -class ActionRecognizer(Detector): +class SkeletonActionRecognizer(Detector): """ Args: model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml @@ -67,8 +68,8 @@ class ActionRecognizer(Detector): threshold=0.5, window_size=100, random_pad=False): - assert batch_size == 1, "ActionRecognizer only support batch_size=1 now." - super(ActionRecognizer, self).__init__( + assert batch_size == 1, "SkeletonActionRecognizer only support batch_size=1 now." + super(SkeletonActionRecognizer, self).__init__( model_dir=model_dir, device=device, run_mode=run_mode, @@ -263,8 +264,328 @@ def get_test_skeletons(input_file): "Now only support input with shape: (N, C, T, K, M) or (C, T, K, M)") +class DetActionRecognizer(object): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + threshold (float): The threshold of score for action feature object detection. + display_frames (int): The duration for corresponding detected action. + skip_frame_num (int): The number of frames for interval prediction. A skipped frame will + reuse the result of its last frame. If it is set to 0, no frame will be skipped. Default + is 0. + + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + display_frames=20, + skip_frame_num=0): + super(DetActionRecognizer, self).__init__() + self.detector = Detector( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold) + self.threshold = threshold + self.frame_life = display_frames + self.result_history = {} + self.skip_frame_num = skip_frame_num + self.skip_frame_cnt = 0 + self.id_in_last_frame = [] + + def predict(self, images, mot_result): + if self.skip_frame_cnt == 0 or (not self.check_id_is_same(mot_result)): + det_result = self.detector.predict_image(images, visual=False) + result = self.postprocess(det_result, mot_result) + else: + result = self.reuse_result(mot_result) + + self.skip_frame_cnt += 1 + if self.skip_frame_cnt >= self.skip_frame_num: + self.skip_frame_cnt = 0 + + return result + + def postprocess(self, det_result, mot_result): + np_boxes_num = det_result['boxes_num'] + if np_boxes_num[0] <= 0: + return [[], []] + + mot_bboxes = mot_result.get('boxes') + + cur_box_idx = 0 + mot_id = [] + act_res = [] + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + + # Current now, class 0 is positive, class 1 is negative. + action_ret = {'class': 1.0, 'score': -1.0} + box_num = np_boxes_num[idx] + boxes = det_result['boxes'][cur_box_idx:cur_box_idx + box_num] + cur_box_idx += box_num + isvalid = (boxes[:, 1] > self.threshold) & (boxes[:, 0] == 0) + valid_boxes = boxes[isvalid, :] + + if valid_boxes.shape[0] >= 1: + action_ret['class'] = valid_boxes[0, 0] + action_ret['score'] = valid_boxes[0, 1] + self.result_history[ + tracker_id] = [0, self.frame_life, valid_boxes[0, 1]] + else: + history_det, life_remain, history_score = self.result_history.get( + tracker_id, [1, self.frame_life, -1.0]) + action_ret['class'] = history_det + action_ret['score'] = -1.0 + life_remain -= 1 + if life_remain <= 0 and tracker_id in self.result_history: + del (self.result_history[tracker_id]) + elif tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + else: + self.result_history[tracker_id] = [ + history_det, life_remain, history_score + ] + + mot_id.append(tracker_id) + act_res.append(action_ret) + result = list(zip(mot_id, act_res)) + self.id_in_last_frame = mot_id + + return result + + def check_id_is_same(self, mot_result): + mot_bboxes = mot_result.get('boxes') + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + if tracker_id not in self.id_in_last_frame: + return False + return True + + def reuse_result(self, mot_result): + # This function reusing previous results of the same ID directly. + mot_bboxes = mot_result.get('boxes') + + mot_id = [] + act_res = [] + + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + history_cls, life_remain, history_score = self.result_history.get( + tracker_id, [1, 0, -1.0]) + + life_remain -= 1 + if tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + + action_ret = {'class': history_cls, 'score': history_score} + mot_id.append(tracker_id) + act_res.append(action_ret) + + result = list(zip(mot_id, act_res)) + self.id_in_last_frame = mot_id + + return result + + +class ClsActionRecognizer(AttrDetector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + threshold (float): The threshold of score for action feature object detection. + display_frames (int): The duration for corresponding detected action. + skip_frame_num (int): The number of frames for interval prediction. A skipped frame will + reuse the result of its last frame. If it is set to 0, no frame will be skipped. Default + is 0. + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + display_frames=80, + skip_frame_num=0): + super(ClsActionRecognizer, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir, + threshold=threshold) + self.threshold = threshold + self.frame_life = display_frames + self.result_history = {} + self.skip_frame_num = skip_frame_num + self.skip_frame_cnt = 0 + self.id_in_last_frame = [] + + def predict_with_mot(self, images, mot_result): + if self.skip_frame_cnt == 0 or (not self.check_id_is_same(mot_result)): + images = self.crop_half_body(images) + cls_result = self.predict_image(images, visual=False)["output"] + result = self.match_action_with_id(cls_result, mot_result) + else: + result = self.reuse_result(mot_result) + + self.skip_frame_cnt += 1 + if self.skip_frame_cnt >= self.skip_frame_num: + self.skip_frame_cnt = 0 + + return result + + def crop_half_body(self, images): + crop_images = [] + for image in images: + h = image.shape[0] + crop_images.append(image[:h // 2 + 1, :, :]) + return crop_images + + def postprocess(self, inputs, result): + # postprocess output of predictor + im_results = result['output'] + batch_res = [] + for res in im_results: + action_res = res.tolist() + for cid, score in enumerate(action_res): + action_res[cid] = score + batch_res.append(action_res) + result = {'output': batch_res} + return result + + def match_action_with_id(self, cls_result, mot_result): + mot_bboxes = mot_result.get('boxes') + + mot_id = [] + act_res = [] + + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + + cls_id_res = 1 + cls_score_res = -1.0 + for cls_id in range(len(cls_result[idx])): + score = cls_result[idx][cls_id] + if score > cls_score_res: + cls_id_res = cls_id + cls_score_res = score + + # Current now, class 0 is positive, class 1 is negative. + if cls_id_res == 1 or (cls_id_res == 0 and + cls_score_res < self.threshold): + history_cls, life_remain, history_score = self.result_history.get( + tracker_id, [1, self.frame_life, -1.0]) + cls_id_res = history_cls + cls_score_res = 1 - cls_score_res + life_remain -= 1 + if life_remain <= 0 and tracker_id in self.result_history: + del (self.result_history[tracker_id]) + elif tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + else: + self.result_history[ + tracker_id] = [cls_id_res, life_remain, cls_score_res] + else: + self.result_history[ + tracker_id] = [cls_id_res, self.frame_life, cls_score_res] + + action_ret = {'class': cls_id_res, 'score': cls_score_res} + mot_id.append(tracker_id) + act_res.append(action_ret) + result = list(zip(mot_id, act_res)) + self.id_in_last_frame = mot_id + + return result + + def check_id_is_same(self, mot_result): + mot_bboxes = mot_result.get('boxes') + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + if tracker_id not in self.id_in_last_frame: + return False + return True + + def reuse_result(self, mot_result): + # This function reusing previous results of the same ID directly. + mot_bboxes = mot_result.get('boxes') + + mot_id = [] + act_res = [] + + for idx in range(len(mot_bboxes)): + tracker_id = mot_bboxes[idx, 0] + history_cls, life_remain, history_score = self.result_history.get( + tracker_id, [1, 0, -1.0]) + + life_remain -= 1 + if tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + + action_ret = {'class': history_cls, 'score': history_score} + mot_id.append(tracker_id) + act_res.append(action_ret) + + result = list(zip(mot_id, act_res)) + self.id_in_last_frame = mot_id + + return result + + def main(): - detector = ActionRecognizer( + detector = SkeletonActionRecognizer( FLAGS.model_dir, device=FLAGS.device, run_mode=FLAGS.run_mode, @@ -305,7 +626,7 @@ def main(): } det_log = PaddleInferBenchmark(detector.config, model_info, data_info, perf_info, mems) - det_log('Action') + det_log('SkeletonAction') if __name__ == '__main__': diff --git a/deploy/python/action_utils.py b/deploy/pipeline/pphuman/action_utils.py similarity index 92% rename from deploy/python/action_utils.py rename to deploy/pipeline/pphuman/action_utils.py index 0fbc92a8aa842dbe92ee61b119be9e8be2ebfac1..483116584e1e5e52aced38dd10ff170014a1b439 100644 --- a/deploy/python/action_utils.py +++ b/deploy/pipeline/pphuman/action_utils.py @@ -68,7 +68,7 @@ class KeyPointBuff(object): def get_collected_keypoint(self): """ - Output (List): List of keypoint results for Action Recognition task, where + Output (List): List of keypoint results for Skeletonbased Recognition task, where the format of each element is [tracker_id, KeyPointSequence of tracker_id] """ output = [] @@ -104,6 +104,10 @@ class ActionVisualHelper(object): def update(self, action_res_list): for mot_id, action_res in action_res_list: + if mot_id in self.action_history: + if int(action_res["class"]) != 0 and int(self.action_history[ + mot_id]["class"]) == 0: + continue action_info = self.action_history.get(mot_id, {}) action_info["class"] = action_res["class"] action_info["life_remain"] = self.frame_life diff --git a/deploy/python/attr_infer.py b/deploy/pipeline/pphuman/attr_infer.py similarity index 96% rename from deploy/python/attr_infer.py rename to deploy/pipeline/pphuman/attr_infer.py index ba034639a959a89df9ed49b7256b316ab541773f..dfbdd8f6898c2cb63946d2afac661648ae1ab98d 100644 --- a/deploy/python/attr_infer.py +++ b/deploy/pipeline/pphuman/attr_infer.py @@ -29,11 +29,11 @@ import sys parent_path = os.path.abspath(os.path.join(__file__, *(['..']))) sys.path.insert(0, parent_path) -from benchmark_utils import PaddleInferBenchmark -from preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, LetterBoxResize, WarpAffine -from visualize import visualize_attr -from utils import argsparser, Timer, get_current_memory_mb -from infer import Detector, get_test_images, print_arguments, load_predictor +from python.benchmark_utils import PaddleInferBenchmark +from python.preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, LetterBoxResize, WarpAffine +from python.visualize import visualize_attr +from python.utils import argsparser, Timer, get_current_memory_mb +from python.infer import Detector, get_test_images, print_arguments, load_predictor from PIL import Image, ImageDraw, ImageFont @@ -142,13 +142,12 @@ class AttrDetector(Detector): bag_label = bag if bag_score > self.threshold else 'No bag' label_res.append(bag_label) # upper - upper_res = res[4:8] upper_label = 'Upper:' sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve' upper_label += ' {}'.format(sleeve) - for i, r in enumerate(upper_res): - if r > self.threshold: - upper_label += ' {}'.format(upper_list[i]) + upper_res = res[4:8] + if np.max(upper_res) > self.threshold: + upper_label += ' {}'.format(upper_list[np.argmax(upper_res)]) label_res.append(upper_label) # lower lower_res = res[8:14] diff --git a/deploy/pphuman/mtmct.py b/deploy/pipeline/pphuman/mtmct.py similarity index 85% rename from deploy/pphuman/mtmct.py rename to deploy/pipeline/pphuman/mtmct.py index 30f84724809753b577503b3bb59d50a21731ddb1..8ab72f4a4351d3872f1ae36881fc10f07653eae1 100644 --- a/deploy/pphuman/mtmct.py +++ b/deploy/pipeline/pphuman/mtmct.py @@ -12,15 +12,21 @@ # See the License for the specific language governing permissions and # limitations under the License. -import motmetrics as mm from pptracking.python.mot.visualize import plot_tracking +from python.visualize import visualize_attr import os import re import cv2 import gc import numpy as np -from sklearn import preprocessing -from sklearn.cluster import AgglomerativeClustering +try: + from sklearn import preprocessing + from sklearn.cluster import AgglomerativeClustering +except: + print( + 'Warning: Unable to use MTMCT in PP-Human, please install sklearn, for example: `pip install sklearn`' + ) + pass import pandas as pd from tqdm import tqdm from functools import reduce @@ -98,7 +104,8 @@ def get_mtmct_matching_results(pred_mtmct_file, secs_interval=0.5, return camera_results, cid_tid_fid_results -def save_mtmct_vis_results(camera_results, captures, output_dir): +def save_mtmct_vis_results(camera_results, captures, output_dir, + multi_res=None): # camera_results: 'cid, tid, fid, x1, y1, w, h' camera_ids = list(camera_results.keys()) @@ -113,15 +120,15 @@ def save_mtmct_vis_results(camera_results, captures, output_dir): cid = camera_ids[idx] basename = os.path.basename(video_file) video_out_name = "vis_" + basename - print("Start visualizing output video: {}".format(video_out_name)) out_path = os.path.join(save_dir, video_out_name) + print("Start visualizing output video: {}".format(out_path)) # Get Video info : resolution, fps, frame count width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) fps = int(capture.get(cv2.CAP_PROP_FPS)) frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) - fourcc = cv2.VideoWriter_fourcc(* 'mp4v') + fourcc = cv2.VideoWriter_fourcc(*'mp4v') writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) frame_id = 0 while (1): @@ -138,6 +145,28 @@ def save_mtmct_vis_results(camera_results, captures, output_dir): boxes = frame_results[:, -4:] ids = frame_results[:, 1] image = plot_tracking(frame, boxes, ids, frame_id=frame_id, fps=fps) + + # add attr vis + if multi_res: + tid_list = [ + 'c' + str(idx) + '_' + 't' + str(int(j)) + for j in range(1, len(ids) + 1) + ] # c0_t1, c0_t2... + all_attr_result = [multi_res[i]["attrs"] + for i in tid_list] # all cid_tid result + if any( + all_attr_result + ): # at least one cid_tid[attrs] is not None will goes to attrs_vis + attr_res = [] + for k in tid_list: + if (frame_id - 1) >= len(multi_res[k]['attrs']): + t_attr = None + else: + t_attr = multi_res[k]['attrs'][frame_id - 1] + attr_res.append(t_attr) + image = visualize_attr( + image, attr_res, boxes, is_mtmct=True) + writer.write(image) writer.release() @@ -297,10 +326,9 @@ def distill_idfeat(mot_res): feature_new = feature_list #if available frames number is more than 200, take one frame data per 20 frames - if len(qualities_new) > 200: - skipf = 20 - else: - skipf = max(10, len(qualities_new) // 10) + skipf = 1 + if len(qualities_new) > 20: + skipf = 2 quality_skip = np.array(qualities_new[::skipf]) feature_skip = np.array(feature_new[::skipf]) @@ -322,6 +350,8 @@ def res2dict(multi_res): for tid, res in c_res.items(): key = "c" + str(cid) + "_t" + str(tid) if key not in cid_tid_dict: + if len(res["rects"]) < 10: + continue cid_tid_dict[key] = res cid_tid_dict[key]['mean_feat'] = distill_idfeat(res) return cid_tid_dict @@ -329,6 +359,9 @@ def res2dict(multi_res): def mtmct_process(multi_res, captures, mtmct_vis=True, output_dir="output"): cid_tid_dict = res2dict(multi_res) + if len(cid_tid_dict) == 0: + print("no tracking result found, mtmct will be skiped.") + return map_tid = sub_cluster(cid_tid_dict) if not os.path.exists(output_dir): @@ -340,4 +373,8 @@ def mtmct_process(multi_res, captures, mtmct_vis=True, output_dir="output"): camera_results, cid_tid_fid_res = get_mtmct_matching_results( pred_mtmct_file) - save_mtmct_vis_results(camera_results, captures, output_dir=output_dir) + save_mtmct_vis_results( + camera_results, + captures, + output_dir=output_dir, + multi_res=cid_tid_dict) diff --git a/deploy/pphuman/reid.py b/deploy/pipeline/pphuman/reid.py similarity index 99% rename from deploy/pphuman/reid.py rename to deploy/pipeline/pphuman/reid.py index cef4029239f7e0f635547506282c2527bf687353..3f4d59d78e8273385353a45248a059d523eb478c 100644 --- a/deploy/pphuman/reid.py +++ b/deploy/pipeline/pphuman/reid.py @@ -73,7 +73,7 @@ class ReID(object): self.det_times = Timer() self.cpu_mem, self.gpu_mem, self.gpu_util = 0, 0, 0 self.batch_size = batch_size - self.input_wh = [128, 256] + self.input_wh = (128, 256) def set_config(self, model_dir): return PredictConfig(model_dir) diff --git a/deploy/pipeline/pphuman/video_action_infer.py b/deploy/pipeline/pphuman/video_action_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..cb12bd71b6c421e7105e3e2ed60ce2f9a519e1bb --- /dev/null +++ b/deploy/pipeline/pphuman/video_action_infer.py @@ -0,0 +1,296 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob + +import cv2 +import numpy as np +import math +import paddle +import sys +from collections import Sequence +import paddle.nn.functional as F + +# add deploy path of PadleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) +sys.path.insert(0, parent_path) + +from paddle.inference import Config, create_predictor +from python.utils import argsparser, Timer, get_current_memory_mb +from python.benchmark_utils import PaddleInferBenchmark +from python.infer import Detector, print_arguments +from video_action_preprocess import VideoDecoder, Sampler, Scale, CenterCrop, Normalization, Image2Array + + +def softmax(x): + f_x = np.exp(x) / np.sum(np.exp(x)) + return f_x + + +class VideoActionRecognizer(object): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + num_seg=8, + seg_len=1, + short_size=256, + target_size=224, + top_k=1, + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + ir_optim=True): + + self.num_seg = num_seg + self.seg_len = seg_len + self.short_size = short_size + self.target_size = target_size + self.top_k = top_k + + assert batch_size == 1, "VideoActionRecognizer only support batch_size=1 now." + + self.model_dir = model_dir + self.device = device + self.run_mode = run_mode + self.batch_size = batch_size + self.trt_min_shape = trt_min_shape + self.trt_max_shape = trt_max_shape + self.trt_opt_shape = trt_opt_shape + self.trt_calib_mode = trt_calib_mode + self.cpu_threads = cpu_threads + self.enable_mkldnn = enable_mkldnn + self.ir_optim = ir_optim + + self.recognize_times = Timer() + + model_file_path = glob.glob(os.path.join(model_dir, "*.pdmodel"))[0] + params_file_path = glob.glob(os.path.join(model_dir, "*.pdiparams"))[0] + self.config = Config(model_file_path, params_file_path) + + if device == "GPU" or device == "gpu": + self.config.enable_use_gpu(8000, 0) + else: + self.config.disable_gpu() + if self.enable_mkldnn: + # cache 10 different shapes for mkldnn to avoid memory leak + self.config.set_mkldnn_cache_capacity(10) + self.config.enable_mkldnn() + + self.config.switch_ir_optim(self.ir_optim) # default true + + precision_map = { + 'trt_int8': Config.Precision.Int8, + 'trt_fp32': Config.Precision.Float32, + 'trt_fp16': Config.Precision.Half + } + if run_mode in precision_map.keys(): + self.config.enable_tensorrt_engine( + max_batch_size=8, precision_mode=precision_map[run_mode]) + + self.config.enable_memory_optim() + # use zero copy + self.config.switch_use_feed_fetch_ops(False) + + self.predictor = create_predictor(self.config) + + def preprocess_batch(self, file_list): + batched_inputs = [] + for file in file_list: + inputs = self.preprocess(file) + batched_inputs.append(inputs) + batched_inputs = [ + np.concatenate([item[i] for item in batched_inputs]) + for i in range(len(batched_inputs[0])) + ] + self.input_file = file_list + return batched_inputs + + def get_timer(self): + return self.recognize_times + + def predict(self, input): + ''' + Args: + input (str) or (list): video file path or image data list + Returns: + results (dict): + ''' + + input_names = self.predictor.get_input_names() + input_tensor = self.predictor.get_input_handle(input_names[0]) + + output_names = self.predictor.get_output_names() + output_tensor = self.predictor.get_output_handle(output_names[0]) + + # preprocess + self.recognize_times.preprocess_time_s.start() + if type(input) == str: + inputs = self.preprocess_video(input) + else: + inputs = self.preprocess_frames(input) + self.recognize_times.preprocess_time_s.end() + + inputs = np.expand_dims( + inputs, axis=0).repeat( + self.batch_size, axis=0).copy() + + input_tensor.copy_from_cpu(inputs) + + # model prediction + self.recognize_times.inference_time_s.start() + self.predictor.run() + self.recognize_times.inference_time_s.end() + + output = output_tensor.copy_to_cpu() + + # postprocess + self.recognize_times.postprocess_time_s.start() + classes, scores = self.postprocess(output) + self.recognize_times.postprocess_time_s.end() + + return classes, scores + + def preprocess_frames(self, frame_list): + """ + frame_list: list, frame list + return: list + """ + + results = {} + results['frames_len'] = len(frame_list) + results["imgs"] = frame_list + + img_mean = [0.485, 0.456, 0.406] + img_std = [0.229, 0.224, 0.225] + ops = [ + CenterCrop(self.target_size), Image2Array(), + Normalization(img_mean, img_std) + ] + for op in ops: + results = op(results) + + res = np.expand_dims(results['imgs'], axis=0).copy() + return [res] + + def preprocess_video(self, input_file): + """ + input_file: str, file path + return: list + """ + assert os.path.isfile(input_file) is not None, "{0} not exists".format( + input_file) + + results = {'filename': input_file} + img_mean = [0.485, 0.456, 0.406] + img_std = [0.229, 0.224, 0.225] + ops = [ + VideoDecoder(), Sampler( + self.num_seg, self.seg_len, valid_mode=True), + Scale(self.short_size), CenterCrop(self.target_size), Image2Array(), + Normalization(img_mean, img_std) + ] + for op in ops: + results = op(results) + + res = np.expand_dims(results['imgs'], axis=0).copy() + return [res] + + def postprocess(self, output): + output = output.flatten() # numpy.ndarray + output = softmax(output) + classes = np.argpartition(output, -self.top_k)[-self.top_k:] + classes = classes[np.argsort(-output[classes])] + scores = output[classes] + return classes, scores + + +def main(): + if not FLAGS.run_benchmark: + assert FLAGS.batch_size == 1 + assert FLAGS.use_fp16 is False + else: + assert FLAGS.use_gpu is True + + recognizer = VideoActionRecognizer( + FLAGS.model_dir, + short_size=FLAGS.short_size, + target_size=FLAGS.target_size, + device=FLAGS.device, + run_mode=FLAGS.run_mode, + batch_size=FLAGS.batch_size, + trt_min_shape=FLAGS.trt_min_shape, + trt_max_shape=FLAGS.trt_max_shape, + trt_opt_shape=FLAGS.trt_opt_shape, + trt_calib_mode=FLAGS.trt_calib_mode, + cpu_threads=FLAGS.cpu_threads, + enable_mkldnn=FLAGS.enable_mkldnn, ) + + if not FLAGS.run_benchmark: + classes, scores = recognizer.predict(FLAGS.video_file) + print("Current video file: {}".format(FLAGS.video_file)) + print("\ttop-1 class: {0}".format(classes[0])) + print("\ttop-1 score: {0}".format(scores[0])) + else: + cm, gm, gu = get_current_memory_mb() + mems = {'cpu_rss_mb': cm, 'gpu_rss_mb': gm, 'gpu_util': gu * 100} + + perf_info = recognizer.recognize_times.report() + model_dir = FLAGS.model_dir + mode = FLAGS.run_mode + model_info = { + 'model_name': model_dir.strip('/').split('/')[-1], + 'precision': mode.split('_')[-1] + } + data_info = { + 'batch_size': FLAGS.batch_size, + 'shape': "dynamic_shape", + 'data_num': perf_info['img_num'] + } + recognize_log = PaddleInferBenchmark(recognizer.config, model_info, + data_info, perf_info, mems) + recognize_log('Fight') + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/deploy/pipeline/pphuman/video_action_preprocess.py b/deploy/pipeline/pphuman/video_action_preprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..f6f9f11f7aee643ebfc070073f18f7e28bebf9dd --- /dev/null +++ b/deploy/pipeline/pphuman/video_action_preprocess.py @@ -0,0 +1,545 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np +from collections.abc import Sequence +from PIL import Image +import paddle + + +class Sampler(object): + """ + Sample frames id. + NOTE: Use PIL to read image here, has diff with CV2 + Args: + num_seg(int): number of segments. + seg_len(int): number of sampled frames in each segment. + valid_mode(bool): True or False. + Returns: + frames_idx: the index of sampled #frames. + """ + + def __init__(self, + num_seg, + seg_len, + frame_interval=None, + valid_mode=True, + dense_sample=False, + linspace_sample=False, + use_pil=True): + self.num_seg = num_seg + self.seg_len = seg_len + self.frame_interval = frame_interval + self.valid_mode = valid_mode + self.dense_sample = dense_sample + self.linspace_sample = linspace_sample + self.use_pil = use_pil + + def _get(self, frames_idx, results): + data_format = results['format'] + + if data_format == "frame": + frame_dir = results['frame_dir'] + imgs = [] + for idx in frames_idx: + img = Image.open( + os.path.join(frame_dir, results['suffix'].format( + idx))).convert('RGB') + imgs.append(img) + + elif data_format == "video": + if results['backend'] == 'cv2': + frames = np.array(results['frames']) + imgs = [] + for idx in frames_idx: + imgbuf = frames[idx] + img = Image.fromarray(imgbuf, mode='RGB') + imgs.append(img) + elif results['backend'] == 'decord': + container = results['frames'] + if self.use_pil: + frames_select = container.get_batch(frames_idx) + # dearray_to_img + np_frames = frames_select.asnumpy() + imgs = [] + for i in range(np_frames.shape[0]): + imgbuf = np_frames[i] + imgs.append(Image.fromarray(imgbuf, mode='RGB')) + else: + if frames_idx.ndim != 1: + frames_idx = np.squeeze(frames_idx) + frame_dict = { + idx: container[idx].asnumpy() + for idx in np.unique(frames_idx) + } + imgs = [frame_dict[idx] for idx in frames_idx] + elif results['backend'] == 'pyav': + imgs = [] + frames = np.array(results['frames']) + for idx in frames_idx: + imgbuf = frames[idx] + imgs.append(imgbuf) + imgs = np.stack(imgs) # thwc + else: + raise NotImplementedError + else: + raise NotImplementedError + results['imgs'] = imgs # all image data + return results + + def _get_train_clips(self, num_frames): + ori_seg_len = self.seg_len * self.frame_interval + avg_interval = (num_frames - ori_seg_len + 1) // self.num_seg + + if avg_interval > 0: + base_offsets = np.arange(self.num_seg) * avg_interval + clip_offsets = base_offsets + np.random.randint( + avg_interval, size=self.num_seg) + elif num_frames > max(self.num_seg, ori_seg_len): + clip_offsets = np.sort( + np.random.randint( + num_frames - ori_seg_len + 1, size=self.num_seg)) + elif avg_interval == 0: + ratio = (num_frames - ori_seg_len + 1.0) / self.num_seg + clip_offsets = np.around(np.arange(self.num_seg) * ratio) + else: + clip_offsets = np.zeros((self.num_seg, ), dtype=np.int) + return clip_offsets + + def _get_test_clips(self, num_frames): + ori_seg_len = self.seg_len * self.frame_interval + avg_interval = (num_frames - ori_seg_len + 1) / float(self.num_seg) + if num_frames > ori_seg_len - 1: + base_offsets = np.arange(self.num_seg) * avg_interval + clip_offsets = (base_offsets + avg_interval / 2.0).astype(np.int) + else: + clip_offsets = np.zeros((self.num_seg, ), dtype=np.int) + return clip_offsets + + def __call__(self, results): + """ + Args: + frames_len: length of frames. + return: + sampling id. + """ + frames_len = int(results['frames_len']) # total number of frames + + frames_idx = [] + if self.frame_interval is not None: + assert isinstance(self.frame_interval, int) + if not self.valid_mode: + offsets = self._get_train_clips(frames_len) + else: + offsets = self._get_test_clips(frames_len) + + offsets = offsets[:, None] + np.arange(self.seg_len)[ + None, :] * self.frame_interval + offsets = np.concatenate(offsets) + + offsets = offsets.reshape((-1, self.seg_len)) + offsets = np.mod(offsets, frames_len) + offsets = np.concatenate(offsets) + + if results['format'] == 'video': + frames_idx = offsets + elif results['format'] == 'frame': + frames_idx = list(offsets + 1) + else: + raise NotImplementedError + + return self._get(frames_idx, results) + + print("self.frame_interval:", self.frame_interval) + + if self.linspace_sample: # default if False + if 'start_idx' in results and 'end_idx' in results: + offsets = np.linspace(results['start_idx'], results['end_idx'], + self.num_seg) + else: + offsets = np.linspace(0, frames_len - 1, self.num_seg) + offsets = np.clip(offsets, 0, frames_len - 1).astype(np.int64) + if results['format'] == 'video': + frames_idx = list(offsets) + frames_idx = [x % frames_len for x in frames_idx] + elif results['format'] == 'frame': + frames_idx = list(offsets + 1) + else: + raise NotImplementedError + return self._get(frames_idx, results) + + average_dur = int(frames_len / self.num_seg) + + print("results['format']:", results['format']) + + if self.dense_sample: # For ppTSM, default is False + if not self.valid_mode: # train + sample_pos = max(1, 1 + frames_len - 64) + t_stride = 64 // self.num_seg + start_idx = 0 if sample_pos == 1 else np.random.randint( + 0, sample_pos - 1) + offsets = [(idx * t_stride + start_idx) % frames_len + 1 + for idx in range(self.num_seg)] + frames_idx = offsets + else: + sample_pos = max(1, 1 + frames_len - 64) + t_stride = 64 // self.num_seg + start_list = np.linspace(0, sample_pos - 1, num=10, dtype=int) + offsets = [] + for start_idx in start_list.tolist(): + offsets += [(idx * t_stride + start_idx) % frames_len + 1 + for idx in range(self.num_seg)] + frames_idx = offsets + else: + for i in range(self.num_seg): + idx = 0 + if not self.valid_mode: + if average_dur >= self.seg_len: + idx = random.randint(0, average_dur - self.seg_len) + idx += i * average_dur + elif average_dur >= 1: + idx += i * average_dur + else: + idx = i + else: + if average_dur >= self.seg_len: + idx = (average_dur - 1) // 2 + idx += i * average_dur + elif average_dur >= 1: + idx += i * average_dur + else: + idx = i + + for jj in range(idx, idx + self.seg_len): + if results['format'] == 'video': + frames_idx.append(int(jj % frames_len)) + elif results['format'] == 'frame': + frames_idx.append(jj + 1) + + elif results['format'] == 'MRI': + frames_idx.append(jj) + else: + raise NotImplementedError + + return self._get(frames_idx, results) + + +class Scale(object): + """ + Scale images. + Args: + short_size(float | int): Short size of an image will be scaled to the short_size. + fixed_ratio(bool): Set whether to zoom according to a fixed ratio. default: True + do_round(bool): Whether to round up when calculating the zoom ratio. default: False + backend(str): Choose pillow or cv2 as the graphics processing backend. default: 'pillow' + """ + + def __init__(self, + short_size, + fixed_ratio=True, + keep_ratio=None, + do_round=False, + backend='pillow'): + self.short_size = short_size + assert (fixed_ratio and not keep_ratio) or ( + not fixed_ratio + ), "fixed_ratio and keep_ratio cannot be true at the same time" + self.fixed_ratio = fixed_ratio + self.keep_ratio = keep_ratio + self.do_round = do_round + + assert backend in [ + 'pillow', 'cv2' + ], "Scale's backend must be pillow or cv2, but get {backend}" + + self.backend = backend + + def __call__(self, results): + """ + Performs resize operations. + Args: + imgs (Sequence[PIL.Image]): List where each item is a PIL.Image. + For example, [PIL.Image0, PIL.Image1, PIL.Image2, ...] + return: + resized_imgs: List where each item is a PIL.Image after scaling. + """ + imgs = results['imgs'] + resized_imgs = [] + for i in range(len(imgs)): + img = imgs[i] + if isinstance(img, np.ndarray): + h, w, _ = img.shape + elif isinstance(img, Image.Image): + w, h = img.size + else: + raise NotImplementedError + + if w <= h: + ow = self.short_size + if self.fixed_ratio: # default is True + oh = int(self.short_size * 4.0 / 3.0) + elif not self.keep_ratio: # no + oh = self.short_size + else: + scale_factor = self.short_size / w + oh = int(h * float(scale_factor) + + 0.5) if self.do_round else int(h * + self.short_size / w) + ow = int(w * float(scale_factor) + + 0.5) if self.do_round else int(w * + self.short_size / h) + else: + oh = self.short_size + if self.fixed_ratio: + ow = int(self.short_size * 4.0 / 3.0) + elif not self.keep_ratio: # no + ow = self.short_size + else: + scale_factor = self.short_size / h + oh = int(h * float(scale_factor) + + 0.5) if self.do_round else int(h * + self.short_size / w) + ow = int(w * float(scale_factor) + + 0.5) if self.do_round else int(w * + self.short_size / h) + + if type(img) == np.ndarray: + img = Image.fromarray(img, mode='RGB') + + if self.backend == 'pillow': + resized_imgs.append(img.resize((ow, oh), Image.BILINEAR)) + elif self.backend == 'cv2' and (self.keep_ratio is not None): + resized_imgs.append( + cv2.resize( + img, (ow, oh), interpolation=cv2.INTER_LINEAR)) + else: + resized_imgs.append( + Image.fromarray( + cv2.resize( + np.asarray(img), (ow, oh), + interpolation=cv2.INTER_LINEAR))) + results['imgs'] = resized_imgs + return results + + +class CenterCrop(object): + """ + Center crop images + Args: + target_size(int): Center crop a square with the target_size from an image. + do_round(bool): Whether to round up the coordinates of the upper left corner of the cropping area. default: True + """ + + def __init__(self, target_size, do_round=True, backend='pillow'): + self.target_size = target_size + self.do_round = do_round + self.backend = backend + + def __call__(self, results): + """ + Performs Center crop operations. + Args: + imgs: List where each item is a PIL.Image. + For example, [PIL.Image0, PIL.Image1, PIL.Image2, ...] + return: + ccrop_imgs: List where each item is a PIL.Image after Center crop. + """ + imgs = results['imgs'] + ccrop_imgs = [] + th, tw = self.target_size, self.target_size + if isinstance(imgs, paddle.Tensor): + h, w = imgs.shape[-2:] + x1 = int(round((w - tw) / 2.0)) if self.do_round else (w - tw) // 2 + y1 = int(round((h - th) / 2.0)) if self.do_round else (h - th) // 2 + ccrop_imgs = imgs[:, :, y1:y1 + th, x1:x1 + tw] + else: + for img in imgs: + if self.backend == 'pillow': + w, h = img.size + elif self.backend == 'cv2': + h, w, _ = img.shape + else: + raise NotImplementedError + assert (w >= self.target_size) and (h >= self.target_size), \ + "image width({}) and height({}) should be larger than crop size".format( + w, h, self.target_size) + x1 = int(round((w - tw) / 2.0)) if self.do_round else ( + w - tw) // 2 + y1 = int(round((h - th) / 2.0)) if self.do_round else ( + h - th) // 2 + if self.backend == 'cv2': + ccrop_imgs.append(img[y1:y1 + th, x1:x1 + tw]) + elif self.backend == 'pillow': + ccrop_imgs.append(img.crop((x1, y1, x1 + tw, y1 + th))) + results['imgs'] = ccrop_imgs + return results + + +class Image2Array(object): + """ + transfer PIL.Image to Numpy array and transpose dimensions from 'dhwc' to 'dchw'. + Args: + transpose: whether to transpose or not, default True, False for slowfast. + """ + + def __init__(self, transpose=True, data_format='tchw'): + assert data_format in [ + 'tchw', 'cthw' + ], "Target format must in ['tchw', 'cthw'], but got {data_format}" + self.transpose = transpose + self.data_format = data_format + + def __call__(self, results): + """ + Performs Image to NumpyArray operations. + Args: + imgs: List where each item is a PIL.Image. + For example, [PIL.Image0, PIL.Image1, PIL.Image2, ...] + return: + np_imgs: Numpy array. + """ + imgs = results['imgs'] + if 'backend' in results and results[ + 'backend'] == 'pyav': # [T,H,W,C] in [0, 1] + if self.transpose: + if self.data_format == 'tchw': + t_imgs = imgs.transpose((0, 3, 1, 2)) # tchw + else: + t_imgs = imgs.transpose((3, 0, 1, 2)) # cthw + results['imgs'] = t_imgs + else: + t_imgs = np.stack(imgs).astype('float32') + if self.transpose: + if self.data_format == 'tchw': + t_imgs = t_imgs.transpose(0, 3, 1, 2) # tchw + else: + t_imgs = t_imgs.transpose(3, 0, 1, 2) # cthw + results['imgs'] = t_imgs + return results + + +class VideoDecoder(object): + """ + Decode mp4 file to frames. + Args: + filepath: the file path of mp4 file + """ + + def __init__(self, + backend='cv2', + mode='train', + sampling_rate=32, + num_seg=8, + num_clips=1, + target_fps=30): + + self.backend = backend + # params below only for TimeSformer + self.mode = mode + self.sampling_rate = sampling_rate + self.num_seg = num_seg + self.num_clips = num_clips + self.target_fps = target_fps + + def __call__(self, results): + """ + Perform mp4 decode operations. + return: + List where each item is a numpy array after decoder. + """ + file_path = results['filename'] + results['format'] = 'video' + results['backend'] = self.backend + + if self.backend == 'cv2': # here + cap = cv2.VideoCapture(file_path) + videolen = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) + + sampledFrames = [] + for i in range(videolen): + ret, frame = cap.read() + # maybe first frame is empty + if ret == False: + continue + img = frame[:, :, ::-1] + sampledFrames.append(img) + results['frames'] = sampledFrames + results['frames_len'] = len(sampledFrames) + + elif self.backend == 'decord': + container = de.VideoReader(file_path) + frames_len = len(container) + results['frames'] = container + results['frames_len'] = frames_len + else: + raise NotImplementedError + return results + + +class Normalization(object): + """ + Normalization. + Args: + mean(Sequence[float]): mean values of different channels. + std(Sequence[float]): std values of different channels. + tensor_shape(list): size of mean, default [3,1,1]. For slowfast, [1,1,1,3] + """ + + def __init__(self, mean, std, tensor_shape=[3, 1, 1], inplace=False): + if not isinstance(mean, Sequence): + raise TypeError( + 'Mean must be list, tuple or np.ndarray, but got {type(mean)}') + if not isinstance(std, Sequence): + raise TypeError( + 'Std must be list, tuple or np.ndarray, but got {type(std)}') + + self.inplace = inplace + if not inplace: + self.mean = np.array(mean).reshape(tensor_shape).astype(np.float32) + self.std = np.array(std).reshape(tensor_shape).astype(np.float32) + else: + self.mean = np.array(mean, dtype=np.float32) + self.std = np.array(std, dtype=np.float32) + + def __call__(self, results): + """ + Performs normalization operations. + Args: + imgs: Numpy array. + return: + np_imgs: Numpy array after normalization. + """ + + if self.inplace: # default is False + n = len(results['imgs']) + h, w, c = results['imgs'][0].shape + norm_imgs = np.empty((n, h, w, c), dtype=np.float32) + for i, img in enumerate(results['imgs']): + norm_imgs[i] = img + + for img in norm_imgs: # [n,h,w,c] + mean = np.float64(self.mean.reshape(1, -1)) # [1, 3] + stdinv = 1 / np.float64(self.std.reshape(1, -1)) # [1, 3] + cv2.subtract(img, mean, img) + cv2.multiply(img, stdinv, img) + else: + imgs = results['imgs'] + norm_imgs = imgs / 255.0 + norm_imgs -= self.mean + norm_imgs /= self.std + if 'backend' in results and results['backend'] == 'pyav': + norm_imgs = paddle.to_tensor(norm_imgs, dtype=paddle.float32) + results['imgs'] = norm_imgs + return results diff --git a/deploy/pipeline/ppvehicle/rec_word_dict.txt b/deploy/pipeline/ppvehicle/rec_word_dict.txt new file mode 100644 index 0000000000000000000000000000000000000000..84b885d8352226e49b1d5d791b8f43a663e246aa --- /dev/null +++ b/deploy/pipeline/ppvehicle/rec_word_dict.txt @@ -0,0 +1,6623 @@ +' +疗 +绚 +诚 +娇 +溜 +题 +贿 +者 +廖 +更 +纳 +加 +奉 +公 +一 +就 +汴 +计 +与 +路 +房 +原 +妇 +2 +0 +8 +- +7 +其 +> +: +] +, +, +骑 +刈 +全 +消 +昏 +傈 +安 +久 +钟 +嗅 +不 +影 +处 +驽 +蜿 +资 +关 +椤 +地 +瘸 +专 +问 +忖 +票 +嫉 +炎 +韵 +要 +月 +田 +节 +陂 +鄙 +捌 +备 +拳 +伺 +眼 +网 +盎 +大 +傍 +心 +东 +愉 +汇 +蹿 +科 +每 +业 +里 +航 +晏 +字 +平 +录 +先 +1 +3 +彤 +鲶 +产 +稍 +督 +腴 +有 +象 +岳 +注 +绍 +在 +泺 +文 +定 +核 +名 +水 +过 +理 +让 +偷 +率 +等 +这 +发 +” +为 +含 +肥 +酉 +相 +鄱 +七 +编 +猥 +锛 +日 +镀 +蒂 +掰 +倒 +辆 +栾 +栗 +综 +涩 +州 +雌 +滑 +馀 +了 +机 +块 +司 +宰 +甙 +兴 +矽 +抚 +保 +用 +沧 +秩 +如 +收 +息 +滥 +页 +疑 +埠 +! +! +姥 +异 +橹 +钇 +向 +下 +跄 +的 +椴 +沫 +国 +绥 +獠 +报 +开 +民 +蜇 +何 +分 +凇 +长 +讥 +藏 +掏 +施 +羽 +中 +讲 +派 +嘟 +人 +提 +浼 +间 +世 +而 +古 +多 +倪 +唇 +饯 +控 +庚 +首 +赛 +蜓 +味 +断 +制 +觉 +技 +替 +艰 +溢 +潮 +夕 +钺 +外 +摘 +枋 +动 +双 +单 +啮 +户 +枇 +确 +锦 +曜 +杜 +或 +能 +效 +霜 +盒 +然 +侗 +电 +晁 +放 +步 +鹃 +新 +杖 +蜂 +吒 +濂 +瞬 +评 +总 +隍 +对 +独 +合 +也 +是 +府 +青 +天 +诲 +墙 +组 +滴 +级 +邀 +帘 +示 +已 +时 +骸 +仄 +泅 +和 +遨 +店 +雇 +疫 +持 +巍 +踮 +境 +只 +亨 +目 +鉴 +崤 +闲 +体 +泄 +杂 +作 +般 +轰 +化 +解 +迂 +诿 +蛭 +璀 +腾 +告 +版 +服 +省 +师 +小 +规 +程 +线 +海 +办 +引 +二 +桧 +牌 +砺 +洄 +裴 +修 +图 +痫 +胡 +许 +犊 +事 +郛 +基 +柴 +呼 +食 +研 +奶 +律 +蛋 +因 +葆 +察 +戏 +褒 +戒 +再 +李 +骁 +工 +貂 +油 +鹅 +章 +啄 +休 +场 +给 +睡 +纷 +豆 +器 +捎 +说 +敏 +学 +会 +浒 +设 +诊 +格 +廓 +查 +来 +霓 +室 +溆 +¢ +诡 +寥 +焕 +舜 +柒 +狐 +回 +戟 +砾 +厄 +实 +翩 +尿 +五 +入 +径 +惭 +喹 +股 +宇 +篝 +| +; +美 +期 +云 +九 +祺 +扮 +靠 +锝 +槌 +系 +企 +酰 +阊 +暂 +蚕 +忻 +豁 +本 +羹 +执 +条 +钦 +H +獒 +限 +进 +季 +楦 +于 +芘 +玖 +铋 +茯 +未 +答 +粘 +括 +样 +精 +欠 +矢 +甥 +帷 +嵩 +扣 +令 +仔 +风 +皈 +行 +支 +部 +蓉 +刮 +站 +蜡 +救 +钊 +汗 +松 +嫌 +成 +可 +. +鹤 +院 +从 +交 +政 +怕 +活 +调 +球 +局 +验 +髌 +第 +韫 +谗 +串 +到 +圆 +年 +米 +/ +* +友 +忿 +检 +区 +看 +自 +敢 +刃 +个 +兹 +弄 +流 +留 +同 +没 +齿 +星 +聆 +轼 +湖 +什 +三 +建 +蛔 +儿 +椋 +汕 +震 +颧 +鲤 +跟 +力 +情 +璺 +铨 +陪 +务 +指 +族 +训 +滦 +鄣 +濮 +扒 +商 +箱 +十 +召 +慷 +辗 +所 +莞 +管 +护 +臭 +横 +硒 +嗓 +接 +侦 +六 +露 +党 +馋 +驾 +剖 +高 +侬 +妪 +幂 +猗 +绺 +骐 +央 +酐 +孝 +筝 +课 +徇 +缰 +门 +男 +西 +项 +句 +谙 +瞒 +秃 +篇 +教 +碲 +罚 +声 +呐 +景 +前 +富 +嘴 +鳌 +稀 +免 +朋 +啬 +睐 +去 +赈 +鱼 +住 +肩 +愕 +速 +旁 +波 +厅 +健 +茼 +厥 +鲟 +谅 +投 +攸 +炔 +数 +方 +击 +呋 +谈 +绩 +别 +愫 +僚 +躬 +鹧 +胪 +炳 +招 +喇 +膨 +泵 +蹦 +毛 +结 +5 +4 +谱 +识 +陕 +粽 +婚 +拟 +构 +且 +搜 +任 +潘 +比 +郢 +妨 +醪 +陀 +桔 +碘 +扎 +选 +哈 +骷 +楷 +亿 +明 +缆 +脯 +监 +睫 +逻 +婵 +共 +赴 +淝 +凡 +惦 +及 +达 +揖 +谩 +澹 +减 +焰 +蛹 +番 +祁 +柏 +员 +禄 +怡 +峤 +龙 +白 +叽 +生 +闯 +起 +细 +装 +谕 +竟 +聚 +钙 +上 +导 +渊 +按 +艾 +辘 +挡 +耒 +盹 +饪 +臀 +记 +邮 +蕙 +受 +各 +医 +搂 +普 +滇 +朗 +茸 +带 +翻 +酚 +( +光 +堤 +墟 +蔷 +万 +幻 +〓 +瑙 +辈 +昧 +盏 +亘 +蛀 +吉 +铰 +请 +子 +假 +闻 +税 +井 +诩 +哨 +嫂 +好 +面 +琐 +校 +馊 +鬣 +缂 +营 +访 +炖 +占 +农 +缀 +否 +经 +钚 +棵 +趟 +张 +亟 +吏 +茶 +谨 +捻 +论 +迸 +堂 +玉 +信 +吧 +瞠 +乡 +姬 +寺 +咬 +溏 +苄 +皿 +意 +赉 +宝 +尔 +钰 +艺 +特 +唳 +踉 +都 +荣 +倚 +登 +荐 +丧 +奇 +涵 +批 +炭 +近 +符 +傩 +感 +道 +着 +菊 +虹 +仲 +众 +懈 +濯 +颞 +眺 +南 +释 +北 +缝 +标 +既 +茗 +整 +撼 +迤 +贲 +挎 +耱 +拒 +某 +妍 +卫 +哇 +英 +矶 +藩 +治 +他 +元 +领 +膜 +遮 +穗 +蛾 +飞 +荒 +棺 +劫 +么 +市 +火 +温 +拈 +棚 +洼 +转 +果 +奕 +卸 +迪 +伸 +泳 +斗 +邡 +侄 +涨 +屯 +萋 +胭 +氡 +崮 +枞 +惧 +冒 +彩 +斜 +手 +豚 +随 +旭 +淑 +妞 +形 +菌 +吲 +沱 +争 +驯 +歹 +挟 +兆 +柱 +传 +至 +包 +内 +响 +临 +红 +功 +弩 +衡 +寂 +禁 +老 +棍 +耆 +渍 +织 +害 +氵 +渑 +布 +载 +靥 +嗬 +虽 +苹 +咨 +娄 +库 +雉 +榜 +帜 +嘲 +套 +瑚 +亲 +簸 +欧 +边 +6 +腿 +旮 +抛 +吹 +瞳 +得 +镓 +梗 +厨 +继 +漾 +愣 +憨 +士 +策 +窑 +抑 +躯 +襟 +脏 +参 +贸 +言 +干 +绸 +鳄 +穷 +藜 +音 +折 +详 +) +举 +悍 +甸 +癌 +黎 +谴 +死 +罩 +迁 +寒 +驷 +袖 +媒 +蒋 +掘 +模 +纠 +恣 +观 +祖 +蛆 +碍 +位 +稿 +主 +澧 +跌 +筏 +京 +锏 +帝 +贴 +证 +糠 +才 +黄 +鲸 +略 +炯 +饱 +四 +出 +园 +犀 +牧 +容 +汉 +杆 +浈 +汰 +瑷 +造 +虫 +瘩 +怪 +驴 +济 +应 +花 +沣 +谔 +夙 +旅 +价 +矿 +以 +考 +s +u +呦 +晒 +巡 +茅 +准 +肟 +瓴 +詹 +仟 +褂 +译 +桌 +混 +宁 +怦 +郑 +抿 +些 +余 +鄂 +饴 +攒 +珑 +群 +阖 +岔 +琨 +藓 +预 +环 +洮 +岌 +宀 +杲 +瀵 +最 +常 +囡 +周 +踊 +女 +鼓 +袭 +喉 +简 +范 +薯 +遐 +疏 +粱 +黜 +禧 +法 +箔 +斤 +遥 +汝 +奥 +直 +贞 +撑 +置 +绱 +集 +她 +馅 +逗 +钧 +橱 +魉 +[ +恙 +躁 +唤 +9 +旺 +膘 +待 +脾 +惫 +购 +吗 +依 +盲 +度 +瘿 +蠖 +俾 +之 +镗 +拇 +鲵 +厝 +簧 +续 +款 +展 +啃 +表 +剔 +品 +钻 +腭 +损 +清 +锶 +统 +涌 +寸 +滨 +贪 +链 +吠 +冈 +伎 +迥 +咏 +吁 +览 +防 +迅 +失 +汾 +阔 +逵 +绀 +蔑 +列 +川 +凭 +努 +熨 +揪 +利 +俱 +绉 +抢 +鸨 +我 +即 +责 +膦 +易 +毓 +鹊 +刹 +玷 +岿 +空 +嘞 +绊 +排 +术 +估 +锷 +违 +们 +苟 +铜 +播 +肘 +件 +烫 +审 +鲂 +广 +像 +铌 +惰 +铟 +巳 +胍 +鲍 +康 +憧 +色 +恢 +想 +拷 +尤 +疳 +知 +S +Y +F +D +A +峄 +裕 +帮 +握 +搔 +氐 +氘 +难 +墒 +沮 +雨 +叁 +缥 +悴 +藐 +湫 +娟 +苑 +稠 +颛 +簇 +后 +阕 +闭 +蕤 +缚 +怎 +佞 +码 +嘤 +蔡 +痊 +舱 +螯 +帕 +赫 +昵 +升 +烬 +岫 +、 +疵 +蜻 +髁 +蕨 +隶 +烛 +械 +丑 +盂 +梁 +强 +鲛 +由 +拘 +揉 +劭 +龟 +撤 +钩 +呕 +孛 +费 +妻 +漂 +求 +阑 +崖 +秤 +甘 +通 +深 +补 +赃 +坎 +床 +啪 +承 +吼 +量 +暇 +钼 +烨 +阂 +擎 +脱 +逮 +称 +P +神 +属 +矗 +华 +届 +狍 +葑 +汹 +育 +患 +窒 +蛰 +佼 +静 +槎 +运 +鳗 +庆 +逝 +曼 +疱 +克 +代 +官 +此 +麸 +耧 +蚌 +晟 +例 +础 +榛 +副 +测 +唰 +缢 +迹 +灬 +霁 +身 +岁 +赭 +扛 +又 +菡 +乜 +雾 +板 +读 +陷 +徉 +贯 +郁 +虑 +变 +钓 +菜 +圾 +现 +琢 +式 +乐 +维 +渔 +浜 +左 +吾 +脑 +钡 +警 +T +啵 +拴 +偌 +漱 +湿 +硕 +止 +骼 +魄 +积 +燥 +联 +踢 +玛 +则 +窿 +见 +振 +畿 +送 +班 +钽 +您 +赵 +刨 +印 +讨 +踝 +籍 +谡 +舌 +崧 +汽 +蔽 +沪 +酥 +绒 +怖 +财 +帖 +肱 +私 +莎 +勋 +羔 +霸 +励 +哼 +帐 +将 +帅 +渠 +纪 +婴 +娩 +岭 +厘 +滕 +吻 +伤 +坝 +冠 +戊 +隆 +瘁 +介 +涧 +物 +黍 +并 +姗 +奢 +蹑 +掣 +垸 +锴 +命 +箍 +捉 +病 +辖 +琰 +眭 +迩 +艘 +绌 +繁 +寅 +若 +毋 +思 +诉 +类 +诈 +燮 +轲 +酮 +狂 +重 +反 +职 +筱 +县 +委 +磕 +绣 +奖 +晋 +濉 +志 +徽 +肠 +呈 +獐 +坻 +口 +片 +碰 +几 +村 +柿 +劳 +料 +获 +亩 +惕 +晕 +厌 +号 +罢 +池 +正 +鏖 +煨 +家 +棕 +复 +尝 +懋 +蜥 +锅 +岛 +扰 +队 +坠 +瘾 +钬 +@ +卧 +疣 +镇 +譬 +冰 +彷 +频 +黯 +据 +垄 +采 +八 +缪 +瘫 +型 +熹 +砰 +楠 +襁 +箐 +但 +嘶 +绳 +啤 +拍 +盥 +穆 +傲 +洗 +盯 +塘 +怔 +筛 +丿 +台 +恒 +喂 +葛 +永 +¥ +烟 +酒 +桦 +书 +砂 +蚝 +缉 +态 +瀚 +袄 +圳 +轻 +蛛 +超 +榧 +遛 +姒 +奘 +铮 +右 +荽 +望 +偻 +卡 +丶 +氰 +附 +做 +革 +索 +戚 +坨 +桷 +唁 +垅 +榻 +岐 +偎 +坛 +莨 +山 +殊 +微 +骇 +陈 +爨 +推 +嗝 +驹 +澡 +藁 +呤 +卤 +嘻 +糅 +逛 +侵 +郓 +酌 +德 +摇 +※ +鬃 +被 +慨 +殡 +羸 +昌 +泡 +戛 +鞋 +河 +宪 +沿 +玲 +鲨 +翅 +哽 +源 +铅 +语 +照 +邯 +址 +荃 +佬 +顺 +鸳 +町 +霭 +睾 +瓢 +夸 +椁 +晓 +酿 +痈 +咔 +侏 +券 +噎 +湍 +签 +嚷 +离 +午 +尚 +社 +锤 +背 +孟 +使 +浪 +缦 +潍 +鞅 +军 +姹 +驶 +笑 +鳟 +鲁 +》 +孽 +钜 +绿 +洱 +礴 +焯 +椰 +颖 +囔 +乌 +孔 +巴 +互 +性 +椽 +哞 +聘 +昨 +早 +暮 +胶 +炀 +隧 +低 +彗 +昝 +铁 +呓 +氽 +藉 +喔 +癖 +瑗 +姨 +权 +胱 +韦 +堑 +蜜 +酋 +楝 +砝 +毁 +靓 +歙 +锲 +究 +屋 +喳 +骨 +辨 +碑 +武 +鸠 +宫 +辜 +烊 +适 +坡 +殃 +培 +佩 +供 +走 +蜈 +迟 +翼 +况 +姣 +凛 +浔 +吃 +飘 +债 +犟 +金 +促 +苛 +崇 +坂 +莳 +畔 +绂 +兵 +蠕 +斋 +根 +砍 +亢 +欢 +恬 +崔 +剁 +餐 +榫 +快 +扶 +‖ +濒 +缠 +鳜 +当 +彭 +驭 +浦 +篮 +昀 +锆 +秸 +钳 +弋 +娣 +瞑 +夷 +龛 +苫 +拱 +致 +% +嵊 +障 +隐 +弑 +初 +娓 +抉 +汩 +累 +蓖 +" +唬 +助 +苓 +昙 +押 +毙 +破 +城 +郧 +逢 +嚏 +獭 +瞻 +溱 +婿 +赊 +跨 +恼 +璧 +萃 +姻 +貉 +灵 +炉 +密 +氛 +陶 +砸 +谬 +衔 +点 +琛 +沛 +枳 +层 +岱 +诺 +脍 +榈 +埂 +征 +冷 +裁 +打 +蹴 +素 +瘘 +逞 +蛐 +聊 +激 +腱 +萘 +踵 +飒 +蓟 +吆 +取 +咙 +簋 +涓 +矩 +曝 +挺 +揣 +座 +你 +史 +舵 +焱 +尘 +苏 +笈 +脚 +溉 +榨 +诵 +樊 +邓 +焊 +义 +庶 +儋 +蟋 +蒲 +赦 +呷 +杞 +诠 +豪 +还 +试 +颓 +茉 +太 +除 +紫 +逃 +痴 +草 +充 +鳕 +珉 +祗 +墨 +渭 +烩 +蘸 +慕 +璇 +镶 +穴 +嵘 +恶 +骂 +险 +绋 +幕 +碉 +肺 +戳 +刘 +潞 +秣 +纾 +潜 +銮 +洛 +须 +罘 +销 +瘪 +汞 +兮 +屉 +r +林 +厕 +质 +探 +划 +狸 +殚 +善 +煊 +烹 +〒 +锈 +逯 +宸 +辍 +泱 +柚 +袍 +远 +蹋 +嶙 +绝 +峥 +娥 +缍 +雀 +徵 +认 +镱 +谷 += +贩 +勉 +撩 +鄯 +斐 +洋 +非 +祚 +泾 +诒 +饿 +撬 +威 +晷 +搭 +芍 +锥 +笺 +蓦 +候 +琊 +档 +礁 +沼 +卵 +荠 +忑 +朝 +凹 +瑞 +头 +仪 +弧 +孵 +畏 +铆 +突 +衲 +车 +浩 +气 +茂 +悖 +厢 +枕 +酝 +戴 +湾 +邹 +飚 +攘 +锂 +写 +宵 +翁 +岷 +无 +喜 +丈 +挑 +嗟 +绛 +殉 +议 +槽 +具 +醇 +淞 +笃 +郴 +阅 +饼 +底 +壕 +砚 +弈 +询 +缕 +庹 +翟 +零 +筷 +暨 +舟 +闺 +甯 +撞 +麂 +茌 +蔼 +很 +珲 +捕 +棠 +角 +阉 +媛 +娲 +诽 +剿 +尉 +爵 +睬 +韩 +诰 +匣 +危 +糍 +镯 +立 +浏 +阳 +少 +盆 +舔 +擘 +匪 +申 +尬 +铣 +旯 +抖 +赘 +瓯 +居 +ˇ +哮 +游 +锭 +茏 +歌 +坏 +甚 +秒 +舞 +沙 +仗 +劲 +潺 +阿 +燧 +郭 +嗖 +霏 +忠 +材 +奂 +耐 +跺 +砀 +输 +岖 +媳 +氟 +极 +摆 +灿 +今 +扔 +腻 +枝 +奎 +药 +熄 +吨 +话 +q +额 +慑 +嘌 +协 +喀 +壳 +埭 +视 +著 +於 +愧 +陲 +翌 +峁 +颅 +佛 +腹 +聋 +侯 +咎 +叟 +秀 +颇 +存 +较 +罪 +哄 +岗 +扫 +栏 +钾 +羌 +己 +璨 +枭 +霉 +煌 +涸 +衿 +键 +镝 +益 +岢 +奏 +连 +夯 +睿 +冥 +均 +糖 +狞 +蹊 +稻 +爸 +刿 +胥 +煜 +丽 +肿 +璃 +掸 +跚 +灾 +垂 +樾 +濑 +乎 +莲 +窄 +犹 +撮 +战 +馄 +软 +络 +显 +鸢 +胸 +宾 +妲 +恕 +埔 +蝌 +份 +遇 +巧 +瞟 +粒 +恰 +剥 +桡 +博 +讯 +凯 +堇 +阶 +滤 +卖 +斌 +骚 +彬 +兑 +磺 +樱 +舷 +两 +娱 +福 +仃 +差 +找 +桁 +÷ +净 +把 +阴 +污 +戬 +雷 +碓 +蕲 +楚 +罡 +焖 +抽 +妫 +咒 +仑 +闱 +尽 +邑 +菁 +爱 +贷 +沥 +鞑 +牡 +嗉 +崴 +骤 +塌 +嗦 +订 +拮 +滓 +捡 +锻 +次 +坪 +杩 +臃 +箬 +融 +珂 +鹗 +宗 +枚 +降 +鸬 +妯 +阄 +堰 +盐 +毅 +必 +杨 +崃 +俺 +甬 +状 +莘 +货 +耸 +菱 +腼 +铸 +唏 +痤 +孚 +澳 +懒 +溅 +翘 +疙 +杷 +淼 +缙 +骰 +喊 +悉 +砻 +坷 +艇 +赁 +界 +谤 +纣 +宴 +晃 +茹 +归 +饭 +梢 +铡 +街 +抄 +肼 +鬟 +苯 +颂 +撷 +戈 +炒 +咆 +茭 +瘙 +负 +仰 +客 +琉 +铢 +封 +卑 +珥 +椿 +镧 +窨 +鬲 +寿 +御 +袤 +铃 +萎 +砖 +餮 +脒 +裳 +肪 +孕 +嫣 +馗 +嵇 +恳 +氯 +江 +石 +褶 +冢 +祸 +阻 +狈 +羞 +银 +靳 +透 +咳 +叼 +敷 +芷 +啥 +它 +瓤 +兰 +痘 +懊 +逑 +肌 +往 +捺 +坊 +甩 +呻 +〃 +沦 +忘 +膻 +祟 +菅 +剧 +崆 +智 +坯 +臧 +霍 +墅 +攻 +眯 +倘 +拢 +骠 +铐 +庭 +岙 +瓠 +′ +缺 +泥 +迢 +捶 +? +? +郏 +喙 +掷 +沌 +纯 +秘 +种 +听 +绘 +固 +螨 +团 +香 +盗 +妒 +埚 +蓝 +拖 +旱 +荞 +铀 +血 +遏 +汲 +辰 +叩 +拽 +幅 +硬 +惶 +桀 +漠 +措 +泼 +唑 +齐 +肾 +念 +酱 +虚 +屁 +耶 +旗 +砦 +闵 +婉 +馆 +拭 +绅 +韧 +忏 +窝 +醋 +葺 +顾 +辞 +倜 +堆 +辋 +逆 +玟 +贱 +疾 +董 +惘 +倌 +锕 +淘 +嘀 +莽 +俭 +笏 +绑 +鲷 +杈 +择 +蟀 +粥 +嗯 +驰 +逾 +案 +谪 +褓 +胫 +哩 +昕 +颚 +鲢 +绠 +躺 +鹄 +崂 +儒 +俨 +丝 +尕 +泌 +啊 +萸 +彰 +幺 +吟 +骄 +苣 +弦 +脊 +瑰 +〈 +诛 +镁 +析 +闪 +剪 +侧 +哟 +框 +螃 +守 +嬗 +燕 +狭 +铈 +缮 +概 +迳 +痧 +鲲 +俯 +售 +笼 +痣 +扉 +挖 +满 +咋 +援 +邱 +扇 +歪 +便 +玑 +绦 +峡 +蛇 +叨 +〖 +泽 +胃 +斓 +喋 +怂 +坟 +猪 +该 +蚬 +炕 +弥 +赞 +棣 +晔 +娠 +挲 +狡 +创 +疖 +铕 +镭 +稷 +挫 +弭 +啾 +翔 +粉 +履 +苘 +哦 +楼 +秕 +铂 +土 +锣 +瘟 +挣 +栉 +习 +享 +桢 +袅 +磨 +桂 +谦 +延 +坚 +蔚 +噗 +署 +谟 +猬 +钎 +恐 +嬉 +雒 +倦 +衅 +亏 +璩 +睹 +刻 +殿 +王 +算 +雕 +麻 +丘 +柯 +骆 +丸 +塍 +谚 +添 +鲈 +垓 +桎 +蚯 +芥 +予 +飕 +镦 +谌 +窗 +醚 +菀 +亮 +搪 +莺 +蒿 +羁 +足 +J +真 +轶 +悬 +衷 +靛 +翊 +掩 +哒 +炅 +掐 +冼 +妮 +l +谐 +稚 +荆 +擒 +犯 +陵 +虏 +浓 +崽 +刍 +陌 +傻 +孜 +千 +靖 +演 +矜 +钕 +煽 +杰 +酗 +渗 +伞 +栋 +俗 +泫 +戍 +罕 +沾 +疽 +灏 +煦 +芬 +磴 +叱 +阱 +榉 +湃 +蜀 +叉 +醒 +彪 +租 +郡 +篷 +屎 +良 +垢 +隗 +弱 +陨 +峪 +砷 +掴 +颁 +胎 +雯 +绵 +贬 +沐 +撵 +隘 +篙 +暖 +曹 +陡 +栓 +填 +臼 +彦 +瓶 +琪 +潼 +哪 +鸡 +摩 +啦 +俟 +锋 +域 +耻 +蔫 +疯 +纹 +撇 +毒 +绶 +痛 +酯 +忍 +爪 +赳 +歆 +嘹 +辕 +烈 +册 +朴 +钱 +吮 +毯 +癜 +娃 +谀 +邵 +厮 +炽 +璞 +邃 +丐 +追 +词 +瓒 +忆 +轧 +芫 +谯 +喷 +弟 +半 +冕 +裙 +掖 +墉 +绮 +寝 +苔 +势 +顷 +褥 +切 +衮 +君 +佳 +嫒 +蚩 +霞 +佚 +洙 +逊 +镖 +暹 +唛 +& +殒 +顶 +碗 +獗 +轭 +铺 +蛊 +废 +恹 +汨 +崩 +珍 +那 +杵 +曲 +纺 +夏 +薰 +傀 +闳 +淬 +姘 +舀 +拧 +卷 +楂 +恍 +讪 +厩 +寮 +篪 +赓 +乘 +灭 +盅 +鞣 +沟 +慎 +挂 +饺 +鼾 +杳 +树 +缨 +丛 +絮 +娌 +臻 +嗳 +篡 +侩 +述 +衰 +矛 +圈 +蚜 +匕 +筹 +匿 +濞 +晨 +叶 +骋 +郝 +挚 +蚴 +滞 +增 +侍 +描 +瓣 +吖 +嫦 +蟒 +匾 +圣 +赌 +毡 +癞 +恺 +百 +曳 +需 +篓 +肮 +庖 +帏 +卿 +驿 +遗 +蹬 +鬓 +骡 +歉 +芎 +胳 +屐 +禽 +烦 +晌 +寄 +媾 +狄 +翡 +苒 +船 +廉 +终 +痞 +殇 +々 +畦 +饶 +改 +拆 +悻 +萄 +£ +瓿 +乃 +訾 +桅 +匮 +溧 +拥 +纱 +铍 +骗 +蕃 +龋 +缬 +父 +佐 +疚 +栎 +醍 +掳 +蓄 +x +惆 +颜 +鲆 +榆 +〔 +猎 +敌 +暴 +谥 +鲫 +贾 +罗 +玻 +缄 +扦 +芪 +癣 +落 +徒 +臾 +恿 +猩 +托 +邴 +肄 +牵 +春 +陛 +耀 +刊 +拓 +蓓 +邳 +堕 +寇 +枉 +淌 +啡 +湄 +兽 +酷 +萼 +碚 +濠 +萤 +夹 +旬 +戮 +梭 +琥 +椭 +昔 +勺 +蜊 +绐 +晚 +孺 +僵 +宣 +摄 +冽 +旨 +萌 +忙 +蚤 +眉 +噼 +蟑 +付 +契 +瓜 +悼 +颡 +壁 +曾 +窕 +颢 +澎 +仿 +俑 +浑 +嵌 +浣 +乍 +碌 +褪 +乱 +蔟 +隙 +玩 +剐 +葫 +箫 +纲 +围 +伐 +决 +伙 +漩 +瑟 +刑 +肓 +镳 +缓 +蹭 +氨 +皓 +典 +畲 +坍 +铑 +檐 +塑 +洞 +倬 +储 +胴 +淳 +戾 +吐 +灼 +惺 +妙 +毕 +珐 +缈 +虱 +盖 +羰 +鸿 +磅 +谓 +髅 +娴 +苴 +唷 +蚣 +霹 +抨 +贤 +唠 +犬 +誓 +逍 +庠 +逼 +麓 +籼 +釉 +呜 +碧 +秧 +氩 +摔 +霄 +穸 +纨 +辟 +妈 +映 +完 +牛 +缴 +嗷 +炊 +恩 +荔 +茆 +掉 +紊 +慌 +莓 +羟 +阙 +萁 +磐 +另 +蕹 +辱 +鳐 +湮 +吡 +吩 +唐 +睦 +垠 +舒 +圜 +冗 +瞿 +溺 +芾 +囱 +匠 +僳 +汐 +菩 +饬 +漓 +黑 +霰 +浸 +濡 +窥 +毂 +蒡 +兢 +驻 +鹉 +芮 +诙 +迫 +雳 +厂 +忐 +臆 +猴 +鸣 +蚪 +栈 +箕 +羡 +渐 +莆 +捍 +眈 +哓 +趴 +蹼 +埕 +嚣 +骛 +宏 +淄 +斑 +噜 +严 +瑛 +垃 +椎 +诱 +压 +庾 +绞 +焘 +廿 +抡 +迄 +棘 +夫 +纬 +锹 +眨 +瞌 +侠 +脐 +竞 +瀑 +孳 +骧 +遁 +姜 +颦 +荪 +滚 +萦 +伪 +逸 +粳 +爬 +锁 +矣 +役 +趣 +洒 +颔 +诏 +逐 +奸 +甭 +惠 +攀 +蹄 +泛 +尼 +拼 +阮 +鹰 +亚 +颈 +惑 +勒 +〉 +际 +肛 +爷 +刚 +钨 +丰 +养 +冶 +鲽 +辉 +蔻 +画 +覆 +皴 +妊 +麦 +返 +醉 +皂 +擀 +〗 +酶 +凑 +粹 +悟 +诀 +硖 +港 +卜 +z +杀 +涕 +± +舍 +铠 +抵 +弛 +段 +敝 +镐 +奠 +拂 +轴 +跛 +袱 +e +t +沉 +菇 +俎 +薪 +峦 +秭 +蟹 +历 +盟 +菠 +寡 +液 +肢 +喻 +染 +裱 +悱 +抱 +氙 +赤 +捅 +猛 +跑 +氮 +谣 +仁 +尺 +辊 +窍 +烙 +衍 +架 +擦 +倏 +璐 +瑁 +币 +楞 +胖 +夔 +趸 +邛 +惴 +饕 +虔 +蝎 +§ +哉 +贝 +宽 +辫 +炮 +扩 +饲 +籽 +魏 +菟 +锰 +伍 +猝 +末 +琳 +哚 +蛎 +邂 +呀 +姿 +鄞 +却 +歧 +仙 +恸 +椐 +森 +牒 +寤 +袒 +婆 +虢 +雅 +钉 +朵 +贼 +欲 +苞 +寰 +故 +龚 +坭 +嘘 +咫 +礼 +硷 +兀 +睢 +汶 +’ +铲 +烧 +绕 +诃 +浃 +钿 +哺 +柜 +讼 +颊 +璁 +腔 +洽 +咐 +脲 +簌 +筠 +镣 +玮 +鞠 +谁 +兼 +姆 +挥 +梯 +蝴 +谘 +漕 +刷 +躏 +宦 +弼 +b +垌 +劈 +麟 +莉 +揭 +笙 +渎 +仕 +嗤 +仓 +配 +怏 +抬 +错 +泯 +镊 +孰 +猿 +邪 +仍 +秋 +鼬 +壹 +歇 +吵 +炼 +< +尧 +射 +柬 +廷 +胧 +霾 +凳 +隋 +肚 +浮 +梦 +祥 +株 +堵 +退 +L +鹫 +跎 +凶 +毽 +荟 +炫 +栩 +玳 +甜 +沂 +鹿 +顽 +伯 +爹 +赔 +蛴 +徐 +匡 +欣 +狰 +缸 +雹 +蟆 +疤 +默 +沤 +啜 +痂 +衣 +禅 +w +i +h +辽 +葳 +黝 +钗 +停 +沽 +棒 +馨 +颌 +肉 +吴 +硫 +悯 +劾 +娈 +马 +啧 +吊 +悌 +镑 +峭 +帆 +瀣 +涉 +咸 +疸 +滋 +泣 +翦 +拙 +癸 +钥 +蜒 ++ +尾 +庄 +凝 +泉 +婢 +渴 +谊 +乞 +陆 +锉 +糊 +鸦 +淮 +I +B +N +晦 +弗 +乔 +庥 +葡 +尻 +席 +橡 +傣 +渣 +拿 +惩 +麋 +斛 +缃 +矮 +蛏 +岘 +鸽 +姐 +膏 +催 +奔 +镒 +喱 +蠡 +摧 +钯 +胤 +柠 +拐 +璋 +鸥 +卢 +荡 +倾 +^ +_ +珀 +逄 +萧 +塾 +掇 +贮 +笆 +聂 +圃 +冲 +嵬 +M +滔 +笕 +值 +炙 +偶 +蜱 +搐 +梆 +汪 +蔬 +腑 +鸯 +蹇 +敞 +绯 +仨 +祯 +谆 +梧 +糗 +鑫 +啸 +豺 +囹 +猾 +巢 +柄 +瀛 +筑 +踌 +沭 +暗 +苁 +鱿 +蹉 +脂 +蘖 +牢 +热 +木 +吸 +溃 +宠 +序 +泞 +偿 +拜 +檩 +厚 +朐 +毗 +螳 +吞 +媚 +朽 +担 +蝗 +橘 +畴 +祈 +糟 +盱 +隼 +郜 +惜 +珠 +裨 +铵 +焙 +琚 +唯 +咚 +噪 +骊 +丫 +滢 +勤 +棉 +呸 +咣 +淀 +隔 +蕾 +窈 +饨 +挨 +煅 +短 +匙 +粕 +镜 +赣 +撕 +墩 +酬 +馁 +豌 +颐 +抗 +酣 +氓 +佑 +搁 +哭 +递 +耷 +涡 +桃 +贻 +碣 +截 +瘦 +昭 +镌 +蔓 +氚 +甲 +猕 +蕴 +蓬 +散 +拾 +纛 +狼 +猷 +铎 +埋 +旖 +矾 +讳 +囊 +糜 +迈 +粟 +蚂 +紧 +鲳 +瘢 +栽 +稼 +羊 +锄 +斟 +睁 +桥 +瓮 +蹙 +祉 +醺 +鼻 +昱 +剃 +跳 +篱 +跷 +蒜 +翎 +宅 +晖 +嗑 +壑 +峻 +癫 +屏 +狠 +陋 +袜 +途 +憎 +祀 +莹 +滟 +佶 +溥 +臣 +约 +盛 +峰 +磁 +慵 +婪 +拦 +莅 +朕 +鹦 +粲 +裤 +哎 +疡 +嫖 +琵 +窟 +堪 +谛 +嘉 +儡 +鳝 +斩 +郾 +驸 +酊 +妄 +胜 +贺 +徙 +傅 +噌 +钢 +栅 +庇 +恋 +匝 +巯 +邈 +尸 +锚 +粗 +佟 +蛟 +薹 +纵 +蚊 +郅 +绢 +锐 +苗 +俞 +篆 +淆 +膀 +鲜 +煎 +诶 +秽 +寻 +涮 +刺 +怀 +噶 +巨 +褰 +魅 +灶 +灌 +桉 +藕 +谜 +舸 +薄 +搀 +恽 +借 +牯 +痉 +渥 +愿 +亓 +耘 +杠 +柩 +锔 +蚶 +钣 +珈 +喘 +蹒 +幽 +赐 +稗 +晤 +莱 +泔 +扯 +肯 +菪 +裆 +腩 +豉 +疆 +骜 +腐 +倭 +珏 +唔 +粮 +亡 +润 +慰 +伽 +橄 +玄 +誉 +醐 +胆 +龊 +粼 +塬 +陇 +彼 +削 +嗣 +绾 +芽 +妗 +垭 +瘴 +爽 +薏 +寨 +龈 +泠 +弹 +赢 +漪 +猫 +嘧 +涂 +恤 +圭 +茧 +烽 +屑 +痕 +巾 +赖 +荸 +凰 +腮 +畈 +亵 +蹲 +偃 +苇 +澜 +艮 +换 +骺 +烘 +苕 +梓 +颉 +肇 +哗 +悄 +氤 +涠 +葬 +屠 +鹭 +植 +竺 +佯 +诣 +鲇 +瘀 +鲅 +邦 +移 +滁 +冯 +耕 +癔 +戌 +茬 +沁 +巩 +悠 +湘 +洪 +痹 +锟 +循 +谋 +腕 +鳃 +钠 +捞 +焉 +迎 +碱 +伫 +急 +榷 +奈 +邝 +卯 +辄 +皲 +卟 +醛 +畹 +忧 +稳 +雄 +昼 +缩 +阈 +睑 +扌 +耗 +曦 +涅 +捏 +瞧 +邕 +淖 +漉 +铝 +耦 +禹 +湛 +喽 +莼 +琅 +诸 +苎 +纂 +硅 +始 +嗨 +傥 +燃 +臂 +赅 +嘈 +呆 +贵 +屹 +壮 +肋 +亍 +蚀 +卅 +豹 +腆 +邬 +迭 +浊 +} +童 +螂 +捐 +圩 +勐 +触 +寞 +汊 +壤 +荫 +膺 +渌 +芳 +懿 +遴 +螈 +泰 +蓼 +蛤 +茜 +舅 +枫 +朔 +膝 +眙 +避 +梅 +判 +鹜 +璜 +牍 +缅 +垫 +藻 +黔 +侥 +惚 +懂 +踩 +腰 +腈 +札 +丞 +唾 +慈 +顿 +摹 +荻 +琬 +~ +斧 +沈 +滂 +胁 +胀 +幄 +莜 +Z +匀 +鄄 +掌 +绰 +茎 +焚 +赋 +萱 +谑 +汁 +铒 +瞎 +夺 +蜗 +野 +娆 +冀 +弯 +篁 +懵 +灞 +隽 +芡 +脘 +俐 +辩 +芯 +掺 +喏 +膈 +蝈 +觐 +悚 +踹 +蔗 +熠 +鼠 +呵 +抓 +橼 +峨 +畜 +缔 +禾 +崭 +弃 +熊 +摒 +凸 +拗 +穹 +蒙 +抒 +祛 +劝 +闫 +扳 +阵 +醌 +踪 +喵 +侣 +搬 +仅 +荧 +赎 +蝾 +琦 +买 +婧 +瞄 +寓 +皎 +冻 +赝 +箩 +莫 +瞰 +郊 +笫 +姝 +筒 +枪 +遣 +煸 +袋 +舆 +痱 +涛 +母 +〇 +启 +践 +耙 +绲 +盘 +遂 +昊 +搞 +槿 +诬 +纰 +泓 +惨 +檬 +亻 +越 +C +o +憩 +熵 +祷 +钒 +暧 +塔 +阗 +胰 +咄 +娶 +魔 +琶 +钞 +邻 +扬 +杉 +殴 +咽 +弓 +〆 +髻 +】 +吭 +揽 +霆 +拄 +殖 +脆 +彻 +岩 +芝 +勃 +辣 +剌 +钝 +嘎 +甄 +佘 +皖 +伦 +授 +徕 +憔 +挪 +皇 +庞 +稔 +芜 +踏 +溴 +兖 +卒 +擢 +饥 +鳞 +煲 +‰ +账 +颗 +叻 +斯 +捧 +鳍 +琮 +讹 +蛙 +纽 +谭 +酸 +兔 +莒 +睇 +伟 +觑 +羲 +嗜 +宜 +褐 +旎 +辛 +卦 +诘 +筋 +鎏 +溪 +挛 +熔 +阜 +晰 +鳅 +丢 +奚 +灸 +呱 +献 +陉 +黛 +鸪 +甾 +萨 +疮 +拯 +洲 +疹 +辑 +叙 +恻 +谒 +允 +柔 +烂 +氏 +逅 +漆 +拎 +惋 +扈 +湟 +纭 +啕 +掬 +擞 +哥 +忽 +涤 +鸵 +靡 +郗 +瓷 +扁 +廊 +怨 +雏 +钮 +敦 +E +懦 +憋 +汀 +拚 +啉 +腌 +岸 +f +痼 +瞅 +尊 +咀 +眩 +飙 +忌 +仝 +迦 +熬 +毫 +胯 +篑 +茄 +腺 +凄 +舛 +碴 +锵 +诧 +羯 +後 +漏 +汤 +宓 +仞 +蚁 +壶 +谰 +皑 +铄 +棰 +罔 +辅 +晶 +苦 +牟 +闽 +\ +烃 +饮 +聿 +丙 +蛳 +朱 +煤 +涔 +鳖 +犁 +罐 +荼 +砒 +淦 +妤 +黏 +戎 +孑 +婕 +瑾 +戢 +钵 +枣 +捋 +砥 +衩 +狙 +桠 +稣 +阎 +肃 +梏 +诫 +孪 +昶 +婊 +衫 +嗔 +侃 +塞 +蜃 +樵 +峒 +貌 +屿 +欺 +缫 +阐 +栖 +诟 +珞 +荭 +吝 +萍 +嗽 +恂 +啻 +蜴 +磬 +峋 +俸 +豫 +谎 +徊 +镍 +韬 +魇 +晴 +U +囟 +猜 +蛮 +坐 +囿 +伴 +亭 +肝 +佗 +蝠 +妃 +胞 +滩 +榴 +氖 +垩 +苋 +砣 +扪 +馏 +姓 +轩 +厉 +夥 +侈 +禀 +垒 +岑 +赏 +钛 +辐 +痔 +披 +纸 +碳 +“ +坞 +蠓 +挤 +荥 +沅 +悔 +铧 +帼 +蒌 +蝇 +a +p +y +n +g +哀 +浆 +瑶 +凿 +桶 +馈 +皮 +奴 +苜 +佤 +伶 +晗 +铱 +炬 +优 +弊 +氢 +恃 +甫 +攥 +端 +锌 +灰 +稹 +炝 +曙 +邋 +亥 +眶 +碾 +拉 +萝 +绔 +捷 +浍 +腋 +姑 +菖 +凌 +涞 +麽 +锢 +桨 +潢 +绎 +镰 +殆 +锑 +渝 +铬 +困 +绽 +觎 +匈 +糙 +暑 +裹 +鸟 +盔 +肽 +迷 +綦 +『 +亳 +佝 +俘 +钴 +觇 +骥 +仆 +疝 +跪 +婶 +郯 +瀹 +唉 +脖 +踞 +针 +晾 +忒 +扼 +瞩 +叛 +椒 +疟 +嗡 +邗 +肆 +跆 +玫 +忡 +捣 +咧 +唆 +艄 +蘑 +潦 +笛 +阚 +沸 +泻 +掊 +菽 +贫 +斥 +髂 +孢 +镂 +赂 +麝 +鸾 +屡 +衬 +苷 +恪 +叠 +希 +粤 +爻 +喝 +茫 +惬 +郸 +绻 +庸 +撅 +碟 +宄 +妹 +膛 +叮 +饵 +崛 +嗲 +椅 +冤 +搅 +咕 +敛 +尹 +垦 +闷 +蝉 +霎 +勰 +败 +蓑 +泸 +肤 +鹌 +幌 +焦 +浠 +鞍 +刁 +舰 +乙 +竿 +裔 +。 +茵 +函 +伊 +兄 +丨 +娜 +匍 +謇 +莪 +宥 +似 +蝽 +翳 +酪 +翠 +粑 +薇 +祢 +骏 +赠 +叫 +Q +噤 +噻 +竖 +芗 +莠 +潭 +俊 +羿 +耜 +O +郫 +趁 +嗪 +囚 +蹶 +芒 +洁 +笋 +鹑 +敲 +硝 +啶 +堡 +渲 +揩 +』 +携 +宿 +遒 +颍 +扭 +棱 +割 +萜 +蔸 +葵 +琴 +捂 +饰 +衙 +耿 +掠 +募 +岂 +窖 +涟 +蔺 +瘤 +柞 +瞪 +怜 +匹 +距 +楔 +炜 +哆 +秦 +缎 +幼 +茁 +绪 +痨 +恨 +楸 +娅 +瓦 +桩 +雪 +嬴 +伏 +榔 +妥 +铿 +拌 +眠 +雍 +缇 +‘ +卓 +搓 +哌 +觞 +噩 +屈 +哧 +髓 +咦 +巅 +娑 +侑 +淫 +膳 +祝 +勾 +姊 +莴 +胄 +疃 +薛 +蜷 +胛 +巷 +芙 +芋 +熙 +闰 +勿 +窃 +狱 +剩 +钏 +幢 +陟 +铛 +慧 +靴 +耍 +k +浙 +浇 +飨 +惟 +绗 +祜 +澈 +啼 +咪 +磷 +摞 +诅 +郦 +抹 +跃 +壬 +吕 +肖 +琏 +颤 +尴 +剡 +抠 +凋 +赚 +泊 +津 +宕 +殷 +倔 +氲 +漫 +邺 +涎 +怠 +$ +垮 +荬 +遵 +俏 +叹 +噢 +饽 +蜘 +孙 +筵 +疼 +鞭 +羧 +牦 +箭 +潴 +c +眸 +祭 +髯 +啖 +坳 +愁 +芩 +驮 +倡 +巽 +穰 +沃 +胚 +怒 +凤 +槛 +剂 +趵 +嫁 +v +邢 +灯 +鄢 +桐 +睽 +檗 +锯 +槟 +婷 +嵋 +圻 +诗 +蕈 +颠 +遭 +痢 +芸 +怯 +馥 +竭 +锗 +徜 +恭 +遍 +籁 +剑 +嘱 +苡 +龄 +僧 +桑 +潸 +弘 +澶 +楹 +悲 +讫 +愤 +腥 +悸 +谍 +椹 +呢 +桓 +葭 +攫 +阀 +翰 +躲 +敖 +柑 +郎 +笨 +橇 +呃 +魁 +燎 +脓 +葩 +磋 +垛 +玺 +狮 +沓 +砜 +蕊 +锺 +罹 +蕉 +翱 +虐 +闾 +巫 +旦 +茱 +嬷 +枯 +鹏 +贡 +芹 +汛 +矫 +绁 +拣 +禺 +佃 +讣 +舫 +惯 +乳 +趋 +疲 +挽 +岚 +虾 +衾 +蠹 +蹂 +飓 +氦 +铖 +孩 +稞 +瑜 +壅 +掀 +勘 +妓 +畅 +髋 +W +庐 +牲 +蓿 +榕 +练 +垣 +唱 +邸 +菲 +昆 +婺 +穿 +绡 +麒 +蚱 +掂 +愚 +泷 +涪 +漳 +妩 +娉 +榄 +讷 +觅 +旧 +藤 +煮 +呛 +柳 +腓 +叭 +庵 +烷 +阡 +罂 +蜕 +擂 +猖 +咿 +媲 +脉 +【 +沏 +貅 +黠 +熏 +哲 +烁 +坦 +酵 +兜 +× +潇 +撒 +剽 +珩 +圹 +乾 +摸 +樟 +帽 +嗒 +襄 +魂 +轿 +憬 +锡 +〕 +喃 +皆 +咖 +隅 +脸 +残 +泮 +袂 +鹂 +珊 +囤 +捆 +咤 +误 +徨 +闹 +淙 +芊 +淋 +怆 +囗 +拨 +梳 +渤 +R +G +绨 +蚓 +婀 +幡 +狩 +麾 +谢 +唢 +裸 +旌 +伉 +纶 +裂 +驳 +砼 +咛 +澄 +樨 +蹈 +宙 +澍 +倍 +貔 +操 +勇 +蟠 +摈 +砧 +虬 +够 +缁 +悦 +藿 +撸 +艹 +摁 +淹 +豇 +虎 +榭 +ˉ +吱 +d +° +喧 +荀 +踱 +侮 +奋 +偕 +饷 +犍 +惮 +坑 +璎 +徘 +宛 +妆 +袈 +倩 +窦 +昂 +荏 +乖 +K +怅 +撰 +鳙 +牙 +袁 +酞 +X +痿 +琼 +闸 +雁 +趾 +荚 +虻 +涝 +《 +杏 +韭 +偈 +烤 +绫 +鞘 +卉 +症 +遢 +蓥 +诋 +杭 +荨 +匆 +竣 +簪 +辙 +敕 +虞 +丹 +缭 +咩 +黟 +m +淤 +瑕 +咂 +铉 +硼 +茨 +嶂 +痒 +畸 +敬 +涿 +粪 +窘 +熟 +叔 +嫔 +盾 +忱 +裘 +憾 +梵 +赡 +珙 +咯 +娘 +庙 +溯 +胺 +葱 +痪 +摊 +荷 +卞 +乒 +髦 +寐 +铭 +坩 +胗 +枷 +爆 +溟 +嚼 +羚 +砬 +轨 +惊 +挠 +罄 +竽 +菏 +氧 +浅 +楣 +盼 +枢 +炸 +阆 +杯 +谏 +噬 +淇 +渺 +俪 +秆 +墓 +泪 +跻 +砌 +痰 +垡 +渡 +耽 +釜 +讶 +鳎 +煞 +呗 +韶 +舶 +绷 +鹳 +缜 +旷 +铊 +皱 +龌 +檀 +霖 +奄 +槐 +艳 +蝶 +旋 +哝 +赶 +骞 +蚧 +腊 +盈 +丁 +` +蜚 +矸 +蝙 +睨 +嚓 +僻 +鬼 +醴 +夜 +彝 +磊 +笔 +拔 +栀 +糕 +厦 +邰 +纫 +逭 +纤 +眦 +膊 +馍 +躇 +烯 +蘼 +冬 +诤 +暄 +骶 +哑 +瘠 +」 +臊 +丕 +愈 +咱 +螺 +擅 +跋 +搏 +硪 +谄 +笠 +淡 +嘿 +骅 +谧 +鼎 +皋 +姚 +歼 +蠢 +驼 +耳 +胬 +挝 +涯 +狗 +蒽 +孓 +犷 +凉 +芦 +箴 +铤 +孤 +嘛 +坤 +V +茴 +朦 +挞 +尖 +橙 +诞 +搴 +碇 +洵 +浚 +帚 +蜍 +漯 +柘 +嚎 +讽 +芭 +荤 +咻 +祠 +秉 +跖 +埃 +吓 +糯 +眷 +馒 +惹 +娼 +鲑 +嫩 +讴 +轮 +瞥 +靶 +褚 +乏 +缤 +宋 +帧 +删 +驱 +碎 +扑 +俩 +俄 +偏 +涣 +竹 +噱 +皙 +佰 +渚 +唧 +斡 +# +镉 +刀 +崎 +筐 +佣 +夭 +贰 +肴 +峙 +哔 +艿 +匐 +牺 +镛 +缘 +仡 +嫡 +劣 +枸 +堀 +梨 +簿 +鸭 +蒸 +亦 +稽 +浴 +{ +衢 +束 +槲 +j +阁 +揍 +疥 +棋 +潋 +聪 +窜 +乓 +睛 +插 +冉 +阪 +苍 +搽 +「 +蟾 +螟 +幸 +仇 +樽 +撂 +慢 +跤 +幔 +俚 +淅 +覃 +觊 +溶 +妖 +帛 +侨 +曰 +妾 +泗 +· +: +瀘 +風 +Ë +( +) +∶ +紅 +紗 +瑭 +雲 +頭 +鶏 +財 +許 +• +¥ +樂 +焗 +麗 +— +; +滙 +東 +榮 +繪 +興 +… +門 +業 +π +楊 +國 +顧 +é +盤 +寳 +Λ +龍 +鳳 +島 +誌 +緣 +結 +銭 +萬 +勝 +祎 +璟 +優 +歡 +臨 +時 +購 += +★ +藍 +昇 +鐵 +觀 +勅 +農 +聲 +畫 +兿 +術 +發 +劉 +記 +專 +耑 +園 +書 +壴 +種 +Ο +● +褀 +號 +銀 +匯 +敟 +锘 +葉 +橪 +廣 +進 +蒄 +鑽 +阝 +祙 +貢 +鍋 +豊 +夬 +喆 +團 +閣 +開 +燁 +賓 +館 +酡 +沔 +順 ++ +硚 +劵 +饸 +陽 +車 +湓 +復 +萊 +氣 +軒 +華 +堃 +迮 +纟 +戶 +馬 +學 +裡 +電 +嶽 +獨 +マ +シ +サ +ジ +燘 +袪 +環 +❤ +臺 +灣 +専 +賣 +孖 +聖 +攝 +線 +▪ +α +傢 +俬 +夢 +達 +莊 +喬 +貝 +薩 +劍 +羅 +壓 +棛 +饦 +尃 +璈 +囍 +醫 +G +I +A +# +N +鷄 +髙 +嬰 +啓 +約 +隹 +潔 +賴 +藝 +~ +寶 +籣 +麺 +  +嶺 +√ +義 +網 +峩 +長 +∧ +魚 +機 +構 +② +鳯 +偉 +L +B +㙟 +畵 +鴿 +' +詩 +溝 +嚞 +屌 +藔 +佧 +玥 +蘭 +織 +1 +3 +9 +0 +7 +點 +砭 +鴨 +鋪 +銘 +廳 +弍 +‧ +創 +湯 +坶 +℃ +卩 +骝 +& +烜 +荘 +當 +潤 +扞 +係 +懷 +碶 +钅 +蚨 +讠 +☆ +叢 +爲 +埗 +涫 +塗 +→ +楽 +現 +鯨 +愛 +瑪 +鈺 +忄 +悶 +藥 +飾 +樓 +視 +孬 +ㆍ +燚 +苪 +師 +① +丼 +锽 +│ +韓 +標 +è +兒 +閏 +匋 +張 +漢 +Ü +髪 +會 +閑 +檔 +習 +裝 +の +峯 +菘 +輝 +И +雞 +釣 +億 +浐 +K +O +R +8 +H +E +P +T +W +D +S +C +M +F +姌 +饹 +» +晞 +廰 +ä +嵯 +鷹 +負 +飲 +絲 +冚 +楗 +澤 +綫 +區 +❋ +← +質 +靑 +揚 +③ +滬 +統 +産 +協 +﹑ +乸 +畐 +經 +運 +際 +洺 +岽 +為 +粵 +諾 +崋 +豐 +碁 +ɔ +V +2 +6 +齋 +誠 +訂 +´ +勑 +雙 +陳 +無 +í +泩 +媄 +夌 +刂 +i +c +t +o +r +a +嘢 +耄 +燴 +暃 +壽 +媽 +靈 +抻 +體 +唻 +É +冮 +甹 +鎮 +錦 +ʌ +蜛 +蠄 +尓 +駕 +戀 +飬 +逹 +倫 +貴 +極 +Я +Й +寬 +磚 +嶪 +郎 +職 +| +間 +n +d +剎 +伈 +課 +飛 +橋 +瘊 +№ +譜 +骓 +圗 +滘 +縣 +粿 +咅 +養 +濤 +彳 +® +% +Ⅱ +啰 +㴪 +見 +矞 +薬 +糁 +邨 +鲮 +顔 +罱 +З +選 +話 +贏 +氪 +俵 +競 +瑩 +繡 +枱 +β +綉 +á +獅 +爾 +™ +麵 +戋 +淩 +徳 +個 +劇 +場 +務 +簡 +寵 +h +實 +膠 +轱 +圖 +築 +嘣 +樹 +㸃 +營 +耵 +孫 +饃 +鄺 +飯 +麯 +遠 +輸 +坫 +孃 +乚 +閃 +鏢 +㎡ +題 +廠 +關 +↑ +爺 +將 +軍 +連 +篦 +覌 +參 +箸 +- +窠 +棽 +寕 +夀 +爰 +歐 +呙 +閥 +頡 +熱 +雎 +垟 +裟 +凬 +勁 +帑 +馕 +夆 +疌 +枼 +馮 +貨 +蒤 +樸 +彧 +旸 +靜 +龢 +暢 +㐱 +鳥 +珺 +鏡 +灡 +爭 +堷 +廚 +Ó +騰 +診 +┅ +蘇 +褔 +凱 +頂 +豕 +亞 +帥 +嘬 +⊥ +仺 +桖 +複 +饣 +絡 +穂 +顏 +棟 +納 +▏ +濟 +親 +設 +計 +攵 +埌 +烺 +ò +頤 +燦 +蓮 +撻 +節 +講 +濱 +濃 +娽 +洳 +朿 +燈 +鈴 +護 +膚 +铔 +過 +補 +Z +U +5 +4 +坋 +闿 +䖝 +餘 +缐 +铞 +貿 +铪 +桼 +趙 +鍊 +[ +㐂 +垚 +菓 +揸 +捲 +鐘 +滏 +𣇉 +爍 +輪 +燜 +鴻 +鮮 +動 +鹞 +鷗 +丄 +慶 +鉌 +翥 +飮 +腸 +⇋ +漁 +覺 +來 +熘 +昴 +翏 +鲱 +圧 +鄉 +萭 +頔 +爐 +嫚 +г +貭 +類 +聯 +幛 +輕 +訓 +鑒 +夋 +锨 +芃 +珣 +䝉 +扙 +嵐 +銷 +處 +ㄱ +語 +誘 +苝 +歸 +儀 +燒 +楿 +內 +粢 +葒 +奧 +麥 +礻 +滿 +蠔 +穵 +瞭 +態 +鱬 +榞 +硂 +鄭 +黃 +煙 +祐 +奓 +逺 +* +瑄 +獲 +聞 +薦 +讀 +這 +樣 +決 +問 +啟 +們 +執 +説 +轉 +單 +隨 +唘 +帶 +倉 +庫 +還 +贈 +尙 +皺 +■ +餅 +產 +○ +∈ +報 +狀 +楓 +賠 +琯 +嗮 +禮 +` +傳 +> +≤ +嗞 +Φ +≥ +換 +咭 +∣ +↓ +曬 +ε +応 +寫 +″ +終 +様 +純 +費 +療 +聨 +凍 +壐 +郵 +ü +黒 +∫ +製 +塊 +調 +軽 +確 +撃 +級 +馴 +Ⅲ +涇 +繹 +數 +碼 +證 +狒 +処 +劑 +< +晧 +賀 +衆 +] +櫥 +兩 +陰 +絶 +對 +鯉 +憶 +◎ +p +e +Y +蕒 +煖 +頓 +測 +試 +鼽 +僑 +碩 +妝 +帯 +≈ +鐡 +舖 +權 +喫 +倆 +ˋ +該 +悅 +ā +俫 +. +f +s +b +m +k +g +u +j +貼 +淨 +濕 +針 +適 +備 +l +/ +給 +謢 +強 +觸 +衛 +與 +⊙ +$ +緯 +變 +⑴ +⑵ +⑶ +㎏ +殺 +∩ +幚 +─ +價 +▲ +離 +ú +ó +飄 +烏 +関 +閟 +﹝ +﹞ +邏 +輯 +鍵 +驗 +訣 +導 +歷 +屆 +層 +▼ +儱 +錄 +熳 +ē +艦 +吋 +錶 +辧 +飼 +顯 +④ +禦 +販 +気 +対 +枰 +閩 +紀 +幹 +瞓 +貊 +淚 +△ +眞 +墊 +Ω +獻 +褲 +縫 +緑 +亜 +鉅 +餠 +{ +} +◆ +蘆 +薈 +█ +◇ +溫 +彈 +晳 +粧 +犸 +穩 +訊 +崬 +凖 +熥 +П +舊 +條 +紋 +圍 +Ⅳ +筆 +尷 +難 +雜 +錯 +綁 +識 +頰 +鎖 +艶 +□ +殁 +殼 +⑧ +├ +▕ +鵬 +ǐ +ō +ǒ +糝 +綱 +▎ +μ +盜 +饅 +醬 +籤 +蓋 +釀 +鹽 +據 +à +ɡ +辦 +◥ +彐 +┌ +婦 +獸 +鲩 +伱 +ī +蒟 +蒻 +齊 +袆 +腦 +寧 +凈 +妳 +煥 +詢 +偽 +謹 +啫 +鯽 +騷 +鱸 +損 +傷 +鎻 +髮 +買 +冏 +儥 +両 +﹢ +∞ +載 +喰 +z +羙 +悵 +燙 +曉 +員 +組 +徹 +艷 +痠 +鋼 +鼙 +縮 +細 +嚒 +爯 +≠ +維 +" +鱻 +壇 +厍 +帰 +浥 +犇 +薡 +軎 +² +應 +醜 +刪 +緻 +鶴 +賜 +噁 +軌 +尨 +镔 +鷺 +槗 +彌 +葚 +濛 +請 +溇 +緹 +賢 +訪 +獴 +瑅 +資 +縤 +陣 +蕟 +栢 +韻 +祼 +恁 +伢 +謝 +劃 +涑 +總 +衖 +踺 +砋 +凉 +籃 +駿 +苼 +瘋 +昽 +紡 +驊 +腎 +﹗ +響 +杋 +剛 +嚴 +禪 +歓 +槍 +傘 +檸 +檫 +炣 +勢 +鏜 +鎢 +銑 +尐 +減 +奪 +惡 +θ +僮 +婭 +臘 +ū +ì +殻 +鉄 +∑ +蛲 +焼 +緖 +續 +紹 +懮 \ No newline at end of file diff --git a/deploy/pipeline/ppvehicle/vehicle_attr.py b/deploy/pipeline/ppvehicle/vehicle_attr.py new file mode 100644 index 0000000000000000000000000000000000000000..82db40b3189571452b36a870049b5c17bdaf554d --- /dev/null +++ b/deploy/pipeline/ppvehicle/vehicle_attr.py @@ -0,0 +1,132 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob + +import cv2 +import numpy as np +import math +import paddle +import sys +from collections import Sequence + +# add deploy path of PadleDetection to sys.path +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 3))) +sys.path.insert(0, parent_path) + +from paddle.inference import Config, create_predictor +from python.utils import argsparser, Timer, get_current_memory_mb +from python.benchmark_utils import PaddleInferBenchmark +from python.infer import Detector, print_arguments +from pipeline.pphuman.attr_infer import AttrDetector + + +class VehicleAttr(AttrDetector): + """ + Args: + model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml + device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU + run_mode (str): mode of running(paddle/trt_fp32/trt_fp16) + batch_size (int): size of pre batch in inference + trt_min_shape (int): min shape for dynamic shape in trt + trt_max_shape (int): max shape for dynamic shape in trt + trt_opt_shape (int): opt shape for dynamic shape in trt + trt_calib_mode (bool): If the model is produced by TRT offline quantitative + calibration, trt_calib_mode need to set True + cpu_threads (int): cpu threads + enable_mkldnn (bool): whether to open MKLDNN + type_threshold (float): The threshold of score for vehicle type recognition. + color_threshold (float): The threshold of score for vehicle color recognition. + """ + + def __init__(self, + model_dir, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1280, + trt_opt_shape=640, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + color_threshold=0.5, + type_threshold=0.5): + super(VehicleAttr, self).__init__( + model_dir=model_dir, + device=device, + run_mode=run_mode, + batch_size=batch_size, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn, + output_dir=output_dir) + self.color_threshold = color_threshold + self.type_threshold = type_threshold + self.result_history = {} + self.color_list = [ + "yellow", "orange", "green", "gray", "red", "blue", "white", + "golden", "brown", "black" + ] + self.type_list = [ + "sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus", "truck", + "estate" + ] + + def postprocess(self, inputs, result): + # postprocess output of predictor + im_results = result['output'] + batch_res = [] + for res in im_results: + res = res.tolist() + attr_res = [] + color_res_str = "Color: " + type_res_str = "Type: " + color_idx = np.argmax(res[:10]) + type_idx = np.argmax(res[10:]) + + if res[color_idx] >= self.color_threshold: + color_res_str += self.color_list[color_idx] + else: + color_res_str += "Unknown" + attr_res.append(color_res_str) + + if res[type_idx + 10] >= self.type_threshold: + type_res_str += self.type_list[type_idx] + else: + type_res_str += "Unknown" + attr_res.append(type_res_str) + + batch_res.append(attr_res) + result = {'output': batch_res} + return result + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + print_arguments(FLAGS) + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device" + + main() diff --git a/deploy/pipeline/ppvehicle/vehicle_plate.py b/deploy/pipeline/ppvehicle/vehicle_plate.py new file mode 100644 index 0000000000000000000000000000000000000000..cfb831c55cc0a76e6bc26328636e978040054a80 --- /dev/null +++ b/deploy/pipeline/ppvehicle/vehicle_plate.py @@ -0,0 +1,298 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import glob +from functools import reduce + +import time +import cv2 +import numpy as np +import math +import paddle + +import sys +parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 3))) +sys.path.insert(0, parent_path) + +from python.infer import get_test_images +from python.preprocess import preprocess, NormalizeImage, Permute, Resize_Mult32 +from pipeline.ppvehicle.vehicle_plateutils import create_predictor, get_infer_gpuid, get_rotate_crop_image, draw_boxes +from pipeline.ppvehicle.vehicleplate_postprocess import build_post_process +from pipeline.cfg_utils import merge_cfg, print_arguments, argsparser + + +class PlateDetector(object): + def __init__(self, args, cfg): + self.args = args + self.pre_process_list = { + 'Resize_Mult32': { + 'limit_side_len': cfg['det_limit_side_len'], + 'limit_type': cfg['det_limit_type'], + }, + 'NormalizeImage': { + 'mean': [0.485, 0.456, 0.406], + 'std': [0.229, 0.224, 0.225], + 'is_scale': True, + }, + 'Permute': {} + } + postprocess_params = {} + postprocess_params['name'] = 'DBPostProcess' + postprocess_params["thresh"] = 0.3 + postprocess_params["box_thresh"] = 0.6 + postprocess_params["max_candidates"] = 1000 + postprocess_params["unclip_ratio"] = 1.5 + postprocess_params["use_dilation"] = False + postprocess_params["score_mode"] = "fast" + + self.postprocess_op = build_post_process(postprocess_params) + self.predictor, self.input_tensor, self.output_tensors, self.config = create_predictor( + args, cfg, 'det') + + def preprocess(self, im_path): + preprocess_ops = [] + for op_type, new_op_info in self.pre_process_list.items(): + preprocess_ops.append(eval(op_type)(**new_op_info)) + + input_im_lst = [] + input_im_info_lst = [] + + im, im_info = preprocess(im_path, preprocess_ops) + input_im_lst.append(im) + input_im_info_lst.append(im_info['im_shape'] / im_info['scale_factor']) + + return np.stack(input_im_lst, axis=0), input_im_info_lst + + def order_points_clockwise(self, pts): + rect = np.zeros((4, 2), dtype="float32") + s = pts.sum(axis=1) + rect[0] = pts[np.argmin(s)] + rect[2] = pts[np.argmax(s)] + diff = np.diff(pts, axis=1) + rect[1] = pts[np.argmin(diff)] + rect[3] = pts[np.argmax(diff)] + return rect + + def clip_det_res(self, points, img_height, img_width): + for pno in range(points.shape[0]): + points[pno, 0] = int(min(max(points[pno, 0], 0), img_width - 1)) + points[pno, 1] = int(min(max(points[pno, 1], 0), img_height - 1)) + return points + + def filter_tag_det_res(self, dt_boxes, image_shape): + img_height, img_width = image_shape[0:2] + dt_boxes_new = [] + for box in dt_boxes: + box = self.order_points_clockwise(box) + box = self.clip_det_res(box, img_height, img_width) + rect_width = int(np.linalg.norm(box[0] - box[1])) + rect_height = int(np.linalg.norm(box[0] - box[3])) + if rect_width <= 3 or rect_height <= 3: + continue + dt_boxes_new.append(box) + dt_boxes = np.array(dt_boxes_new) + return dt_boxes + + def filter_tag_det_res_only_clip(self, dt_boxes, image_shape): + img_height, img_width = image_shape[0:2] + dt_boxes_new = [] + for box in dt_boxes: + box = self.clip_det_res(box, img_height, img_width) + dt_boxes_new.append(box) + dt_boxes = np.array(dt_boxes_new) + return dt_boxes + + def predict_image(self, img_list): + st = time.time() + + dt_batch_boxes = [] + for image in img_list: + img, shape_list = self.preprocess(image) + if img is None: + return None, 0 + + self.input_tensor.copy_from_cpu(img) + self.predictor.run() + outputs = [] + for output_tensor in self.output_tensors: + output = output_tensor.copy_to_cpu() + outputs.append(output) + + preds = {} + preds['maps'] = outputs[0] + + #self.predictor.try_shrink_memory() + post_result = self.postprocess_op(preds, shape_list) + # print("post_result length:{}".format(len(post_result))) + + org_shape = image.shape + dt_boxes = post_result[0]['points'] + dt_boxes = self.filter_tag_det_res(dt_boxes, org_shape) + dt_batch_boxes.append(dt_boxes) + + et = time.time() + return dt_batch_boxes, et - st + + +class TextRecognizer(object): + def __init__(self, args, cfg, use_gpu=True): + self.rec_image_shape = cfg['rec_image_shape'] + self.rec_batch_num = cfg['rec_batch_num'] + word_dict_path = cfg['word_dict_path'] + use_space_char = True + + postprocess_params = { + 'name': 'CTCLabelDecode', + "character_dict_path": word_dict_path, + "use_space_char": use_space_char + } + self.postprocess_op = build_post_process(postprocess_params) + self.predictor, self.input_tensor, self.output_tensors, self.config = \ + create_predictor(args, cfg, 'rec') + self.use_onnx = False + + def resize_norm_img(self, img, max_wh_ratio): + imgC, imgH, imgW = self.rec_image_shape + + assert imgC == img.shape[2] + imgW = int((imgH * max_wh_ratio)) + if self.use_onnx: + w = self.input_tensor.shape[3:][0] + if w is not None and w > 0: + imgW = w + + h, w = img.shape[:2] + ratio = w / float(h) + if math.ceil(imgH * ratio) > imgW: + resized_w = imgW + else: + resized_w = int(math.ceil(imgH * ratio)) + resized_image = cv2.resize(img, (resized_w, imgH)) + resized_image = resized_image.astype('float32') + resized_image = resized_image.transpose((2, 0, 1)) / 255 + resized_image -= 0.5 + resized_image /= 0.5 + padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) + padding_im[:, :, 0:resized_w] = resized_image + return padding_im + + def predict_text(self, img_list): + img_num = len(img_list) + # Calculate the aspect ratio of all text bars + width_list = [] + for img in img_list: + width_list.append(img.shape[1] / float(img.shape[0])) + # Sorting can speed up the recognition process + indices = np.argsort(np.array(width_list)) + rec_res = [['', 0.0]] * img_num + batch_num = self.rec_batch_num + st = time.time() + for beg_img_no in range(0, img_num, batch_num): + end_img_no = min(img_num, beg_img_no + batch_num) + norm_img_batch = [] + imgC, imgH, imgW = self.rec_image_shape + max_wh_ratio = imgW / imgH + # max_wh_ratio = 0 + for ino in range(beg_img_no, end_img_no): + h, w = img_list[indices[ino]].shape[0:2] + wh_ratio = w * 1.0 / h + max_wh_ratio = max(max_wh_ratio, wh_ratio) + for ino in range(beg_img_no, end_img_no): + norm_img = self.resize_norm_img(img_list[indices[ino]], + max_wh_ratio) + norm_img = norm_img[np.newaxis, :] + norm_img_batch.append(norm_img) + norm_img_batch = np.concatenate(norm_img_batch) + norm_img_batch = norm_img_batch.copy() + if self.use_onnx: + input_dict = {} + input_dict[self.input_tensor.name] = norm_img_batch + outputs = self.predictor.run(self.output_tensors, input_dict) + preds = outputs[0] + else: + self.input_tensor.copy_from_cpu(norm_img_batch) + self.predictor.run() + outputs = [] + for output_tensor in self.output_tensors: + output = output_tensor.copy_to_cpu() + outputs.append(output) + if len(outputs) != 1: + preds = outputs + else: + preds = outputs[0] + rec_result = self.postprocess_op(preds) + for rno in range(len(rec_result)): + rec_res[indices[beg_img_no + rno]] = rec_result[rno] + return rec_res, time.time() - st + + +class PlateRecognizer(object): + def __init__(self, args, cfg): + use_gpu = args.device.lower() == "gpu" + self.platedetector = PlateDetector(args, cfg) + self.textrecognizer = TextRecognizer(args, cfg, use_gpu=use_gpu) + + def get_platelicense(self, image_list): + plate_text_list = [] + plateboxes, det_time = self.platedetector.predict_image(image_list) + for idx, boxes_pcar in enumerate(plateboxes): + plate_pcar_list = [] + for box in boxes_pcar: + plate_images = get_rotate_crop_image(image_list[idx], box) + plate_texts = self.textrecognizer.predict_text([plate_images]) + plate_pcar_list.append(plate_texts) + plate_text_list.append(plate_pcar_list) + return self.check_plate(plate_text_list) + + def check_plate(self, text_list): + simcode = [ + '浙', '粤', '京', '津', '冀', '晋', '蒙', '辽', '黑', '沪', '吉', '苏', '皖', + '赣', '鲁', '豫', '鄂', '湘', '桂', '琼', '渝', '川', '贵', '云', '藏', '陕', + '甘', '青', '宁' + ] + plate_all = {"plate": []} + for text_pcar in text_list: + platelicense = "" + for text_info in text_pcar: + text = text_info[0][0][0] + if len(text) > 2 and len(text) < 10: + platelicense = text + plate_all["plate"].append(platelicense) + return plate_all + + +def main(): + cfg = merge_cfg(FLAGS) + print_arguments(cfg) + vehicleplate_cfg = cfg['VEHICLE_PLATE'] + detector = PlateRecognizer(FLAGS, vehicleplate_cfg) + # predict from image + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + for img in img_list: + image = cv2.imread(img) + results = detector.get_platelicense([image]) + print(results) + + +if __name__ == '__main__': + paddle.enable_static() + parser = argsparser() + FLAGS = parser.parse_args() + FLAGS.device = FLAGS.device.upper() + assert FLAGS.device in ['CPU', 'GPU', 'XPU' + ], "device should be CPU, GPU or XPU" + + main() diff --git a/deploy/pipeline/ppvehicle/vehicle_plateutils.py b/deploy/pipeline/ppvehicle/vehicle_plateutils.py new file mode 100644 index 0000000000000000000000000000000000000000..431b647206fe4539f71d45350586dfdb51e2731c --- /dev/null +++ b/deploy/pipeline/ppvehicle/vehicle_plateutils.py @@ -0,0 +1,505 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os +import sys +import platform +import cv2 +import numpy as np +import paddle +from PIL import Image, ImageDraw, ImageFont +import math +from paddle import inference +import time +import ast + + +def create_predictor(args, cfg, mode): + if mode == "det": + model_dir = cfg['det_model_dir'] + else: + model_dir = cfg['rec_model_dir'] + + if model_dir is None: + print("not find {} model file path {}".format(mode, model_dir)) + sys.exit(0) + + model_file_path = model_dir + "/inference.pdmodel" + params_file_path = model_dir + "/inference.pdiparams" + if not os.path.exists(model_file_path): + raise ValueError("not find model file path {}".format(model_file_path)) + if not os.path.exists(params_file_path): + raise ValueError("not find params file path {}".format( + params_file_path)) + + config = inference.Config(model_file_path, params_file_path) + + batch_size = 1 + + if args.device == "GPU": + gpu_id = get_infer_gpuid() + if gpu_id is None: + print( + "GPU is not found in current device by nvidia-smi. Please check your device or ignore it if run on jetson." + ) + config.enable_use_gpu(500, 0) + + precision_map = { + 'trt_int8': inference.PrecisionType.Int8, + 'trt_fp32': inference.PrecisionType.Float32, + 'trt_fp16': inference.PrecisionType.Half + } + if args.run_mode in precision_map.keys(): + config.enable_tensorrt_engine( + workspace_size=(1 << 25) * batch_size, + max_batch_size=batch_size, + min_subgraph_size=min_subgraph_size, + precision_mode=precision_map[args.run_mode], + use_static=False, + use_calib_mode=trt_calib_mode) + use_dynamic_shape = True + + if mode == "det": + min_input_shape = { + "x": [1, 3, 50, 50], + "conv2d_92.tmp_0": [1, 120, 20, 20], + "conv2d_91.tmp_0": [1, 24, 10, 10], + "conv2d_59.tmp_0": [1, 96, 20, 20], + "nearest_interp_v2_1.tmp_0": [1, 256, 10, 10], + "nearest_interp_v2_2.tmp_0": [1, 256, 20, 20], + "conv2d_124.tmp_0": [1, 256, 20, 20], + "nearest_interp_v2_3.tmp_0": [1, 64, 20, 20], + "nearest_interp_v2_4.tmp_0": [1, 64, 20, 20], + "nearest_interp_v2_5.tmp_0": [1, 64, 20, 20], + "elementwise_add_7": [1, 56, 2, 2], + "nearest_interp_v2_0.tmp_0": [1, 256, 2, 2] + } + max_input_shape = { + "x": [1, 3, 1536, 1536], + "conv2d_92.tmp_0": [1, 120, 400, 400], + "conv2d_91.tmp_0": [1, 24, 200, 200], + "conv2d_59.tmp_0": [1, 96, 400, 400], + "nearest_interp_v2_1.tmp_0": [1, 256, 200, 200], + "conv2d_124.tmp_0": [1, 256, 400, 400], + "nearest_interp_v2_2.tmp_0": [1, 256, 400, 400], + "nearest_interp_v2_3.tmp_0": [1, 64, 400, 400], + "nearest_interp_v2_4.tmp_0": [1, 64, 400, 400], + "nearest_interp_v2_5.tmp_0": [1, 64, 400, 400], + "elementwise_add_7": [1, 56, 400, 400], + "nearest_interp_v2_0.tmp_0": [1, 256, 400, 400] + } + opt_input_shape = { + "x": [1, 3, 640, 640], + "conv2d_92.tmp_0": [1, 120, 160, 160], + "conv2d_91.tmp_0": [1, 24, 80, 80], + "conv2d_59.tmp_0": [1, 96, 160, 160], + "nearest_interp_v2_1.tmp_0": [1, 256, 80, 80], + "nearest_interp_v2_2.tmp_0": [1, 256, 160, 160], + "conv2d_124.tmp_0": [1, 256, 160, 160], + "nearest_interp_v2_3.tmp_0": [1, 64, 160, 160], + "nearest_interp_v2_4.tmp_0": [1, 64, 160, 160], + "nearest_interp_v2_5.tmp_0": [1, 64, 160, 160], + "elementwise_add_7": [1, 56, 40, 40], + "nearest_interp_v2_0.tmp_0": [1, 256, 40, 40] + } + min_pact_shape = { + "nearest_interp_v2_26.tmp_0": [1, 256, 20, 20], + "nearest_interp_v2_27.tmp_0": [1, 64, 20, 20], + "nearest_interp_v2_28.tmp_0": [1, 64, 20, 20], + "nearest_interp_v2_29.tmp_0": [1, 64, 20, 20] + } + max_pact_shape = { + "nearest_interp_v2_26.tmp_0": [1, 256, 400, 400], + "nearest_interp_v2_27.tmp_0": [1, 64, 400, 400], + "nearest_interp_v2_28.tmp_0": [1, 64, 400, 400], + "nearest_interp_v2_29.tmp_0": [1, 64, 400, 400] + } + opt_pact_shape = { + "nearest_interp_v2_26.tmp_0": [1, 256, 160, 160], + "nearest_interp_v2_27.tmp_0": [1, 64, 160, 160], + "nearest_interp_v2_28.tmp_0": [1, 64, 160, 160], + "nearest_interp_v2_29.tmp_0": [1, 64, 160, 160] + } + min_input_shape.update(min_pact_shape) + max_input_shape.update(max_pact_shape) + opt_input_shape.update(opt_pact_shape) + elif mode == "rec": + imgH = int(cfg['rec_image_shape'][-2]) + min_input_shape = {"x": [1, 3, imgH, 10]} + max_input_shape = {"x": [batch_size, 3, imgH, 2304]} + opt_input_shape = {"x": [batch_size, 3, imgH, 320]} + elif mode == "cls": + min_input_shape = {"x": [1, 3, 48, 10]} + max_input_shape = {"x": [batch_size, 3, 48, 1024]} + opt_input_shape = {"x": [batch_size, 3, 48, 320]} + else: + use_dynamic_shape = False + if use_dynamic_shape: + config.set_trt_dynamic_shape_info( + min_input_shape, max_input_shape, opt_input_shape) + + else: + config.disable_gpu() + if hasattr(args, "cpu_threads"): + config.set_cpu_math_library_num_threads(args.cpu_threads) + else: + # default cpu threads as 10 + config.set_cpu_math_library_num_threads(10) + if args.enable_mkldnn: + # cache 10 different shapes for mkldnn to avoid memory leak + config.set_mkldnn_cache_capacity(10) + config.enable_mkldnn() + if args.run_mode == "fp16": + config.enable_mkldnn_bfloat16() + # enable memory optim + config.enable_memory_optim() + config.disable_glog_info() + config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass") + config.delete_pass("matmul_transpose_reshape_fuse_pass") + if mode == 'table': + config.delete_pass("fc_fuse_pass") # not supported for table + config.switch_use_feed_fetch_ops(False) + config.switch_ir_optim(True) + + # create predictor + predictor = inference.create_predictor(config) + input_names = predictor.get_input_names() + for name in input_names: + input_tensor = predictor.get_input_handle(name) + output_tensors = get_output_tensors(cfg, mode, predictor) + return predictor, input_tensor, output_tensors, config + + +def get_output_tensors(cfg, mode, predictor): + output_names = predictor.get_output_names() + output_tensors = [] + output_name = 'softmax_0.tmp_0' + if output_name in output_names: + return [predictor.get_output_handle(output_name)] + else: + for output_name in output_names: + output_tensor = predictor.get_output_handle(output_name) + output_tensors.append(output_tensor) + return output_tensors + + +def get_infer_gpuid(): + sysstr = platform.system() + if sysstr == "Windows": + return 0 + + if not paddle.fluid.core.is_compiled_with_rocm(): + cmd = "env | grep CUDA_VISIBLE_DEVICES" + else: + cmd = "env | grep HIP_VISIBLE_DEVICES" + env_cuda = os.popen(cmd).readlines() + if len(env_cuda) == 0: + return 0 + else: + gpu_id = env_cuda[0].strip().split("=")[1] + return int(gpu_id[0]) + + +def draw_e2e_res(dt_boxes, strs, img_path): + src_im = cv2.imread(img_path) + for box, str in zip(dt_boxes, strs): + box = box.astype(np.int32).reshape((-1, 1, 2)) + cv2.polylines(src_im, [box], True, color=(255, 255, 0), thickness=2) + cv2.putText( + src_im, + str, + org=(int(box[0, 0, 0]), int(box[0, 0, 1])), + fontFace=cv2.FONT_HERSHEY_COMPLEX, + fontScale=0.7, + color=(0, 255, 0), + thickness=1) + return src_im + + +def draw_text_det_res(dt_boxes, img_path): + src_im = cv2.imread(img_path) + for box in dt_boxes: + box = np.array(box).astype(np.int32).reshape(-1, 2) + cv2.polylines(src_im, [box], True, color=(255, 255, 0), thickness=2) + return src_im + + +def resize_img(img, input_size=600): + """ + resize img and limit the longest side of the image to input_size + """ + img = np.array(img) + im_shape = img.shape + im_size_max = np.max(im_shape[0:2]) + im_scale = float(input_size) / float(im_size_max) + img = cv2.resize(img, None, None, fx=im_scale, fy=im_scale) + return img + + +def draw_ocr(image, + boxes, + txts=None, + scores=None, + drop_score=0.5, + font_path="./doc/fonts/simfang.ttf"): + """ + Visualize the results of OCR detection and recognition + args: + image(Image|array): RGB image + boxes(list): boxes with shape(N, 4, 2) + txts(list): the texts + scores(list): txxs corresponding scores + drop_score(float): only scores greater than drop_threshold will be visualized + font_path: the path of font which is used to draw text + return(array): + the visualized img + """ + if scores is None: + scores = [1] * len(boxes) + box_num = len(boxes) + for i in range(box_num): + if scores is not None and (scores[i] < drop_score or + math.isnan(scores[i])): + continue + box = np.reshape(np.array(boxes[i]), [-1, 1, 2]).astype(np.int64) + image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2) + if txts is not None: + img = np.array(resize_img(image, input_size=600)) + txt_img = text_visual( + txts, + scores, + img_h=img.shape[0], + img_w=600, + threshold=drop_score, + font_path=font_path) + img = np.concatenate([np.array(img), np.array(txt_img)], axis=1) + return img + return image + + +def draw_ocr_box_txt(image, + boxes, + txts, + scores=None, + drop_score=0.5, + font_path="./doc/simfang.ttf"): + h, w = image.height, image.width + img_left = image.copy() + img_right = Image.new('RGB', (w, h), (255, 255, 255)) + + import random + + random.seed(0) + draw_left = ImageDraw.Draw(img_left) + draw_right = ImageDraw.Draw(img_right) + for idx, (box, txt) in enumerate(zip(boxes, txts)): + if scores is not None and scores[idx] < drop_score: + continue + color = (random.randint(0, 255), random.randint(0, 255), + random.randint(0, 255)) + draw_left.polygon(box, fill=color) + draw_right.polygon( + [ + box[0][0], box[0][1], box[1][0], box[1][1], box[2][0], + box[2][1], box[3][0], box[3][1] + ], + outline=color) + box_height = math.sqrt((box[0][0] - box[3][0])**2 + (box[0][1] - box[3][ + 1])**2) + box_width = math.sqrt((box[0][0] - box[1][0])**2 + (box[0][1] - box[1][ + 1])**2) + if box_height > 2 * box_width: + font_size = max(int(box_width * 0.9), 10) + font = ImageFont.truetype(font_path, font_size, encoding="utf-8") + cur_y = box[0][1] + for c in txt: + char_size = font.getsize(c) + draw_right.text( + (box[0][0] + 3, cur_y), c, fill=(0, 0, 0), font=font) + cur_y += char_size[1] + else: + font_size = max(int(box_height * 0.8), 10) + font = ImageFont.truetype(font_path, font_size, encoding="utf-8") + draw_right.text( + [box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font) + img_left = Image.blend(image, img_left, 0.5) + img_show = Image.new('RGB', (w * 2, h), (255, 255, 255)) + img_show.paste(img_left, (0, 0, w, h)) + img_show.paste(img_right, (w, 0, w * 2, h)) + return np.array(img_show) + + +def str_count(s): + """ + Count the number of Chinese characters, + a single English character and a single number + equal to half the length of Chinese characters. + args: + s(string): the input of string + return(int): + the number of Chinese characters + """ + import string + count_zh = count_pu = 0 + s_len = len(s) + en_dg_count = 0 + for c in s: + if c in string.ascii_letters or c.isdigit() or c.isspace(): + en_dg_count += 1 + elif c.isalpha(): + count_zh += 1 + else: + count_pu += 1 + return s_len - math.ceil(en_dg_count / 2) + + +def text_visual(texts, + scores, + img_h=400, + img_w=600, + threshold=0., + font_path="./doc/simfang.ttf"): + """ + create new blank img and draw txt on it + args: + texts(list): the text will be draw + scores(list|None): corresponding score of each txt + img_h(int): the height of blank img + img_w(int): the width of blank img + font_path: the path of font which is used to draw text + return(array): + """ + if scores is not None: + assert len(texts) == len( + scores), "The number of txts and corresponding scores must match" + + def create_blank_img(): + blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255 + blank_img[:, img_w - 1:] = 0 + blank_img = Image.fromarray(blank_img).convert("RGB") + draw_txt = ImageDraw.Draw(blank_img) + return blank_img, draw_txt + + blank_img, draw_txt = create_blank_img() + + font_size = 20 + txt_color = (0, 0, 0) + font = ImageFont.truetype(font_path, font_size, encoding="utf-8") + + gap = font_size + 5 + txt_img_list = [] + count, index = 1, 0 + for idx, txt in enumerate(texts): + index += 1 + if scores[idx] < threshold or math.isnan(scores[idx]): + index -= 1 + continue + first_line = True + while str_count(txt) >= img_w // font_size - 4: + tmp = txt + txt = tmp[:img_w // font_size - 4] + if first_line: + new_txt = str(index) + ': ' + txt + first_line = False + else: + new_txt = ' ' + txt + draw_txt.text((0, gap * count), new_txt, txt_color, font=font) + txt = tmp[img_w // font_size - 4:] + if count >= img_h // gap - 1: + txt_img_list.append(np.array(blank_img)) + blank_img, draw_txt = create_blank_img() + count = 0 + count += 1 + if first_line: + new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx]) + else: + new_txt = " " + txt + " " + '%.3f' % (scores[idx]) + draw_txt.text((0, gap * count), new_txt, txt_color, font=font) + # whether add new blank img or not + if count >= img_h // gap - 1 and idx + 1 < len(texts): + txt_img_list.append(np.array(blank_img)) + blank_img, draw_txt = create_blank_img() + count = 0 + count += 1 + txt_img_list.append(np.array(blank_img)) + if len(txt_img_list) == 1: + blank_img = np.array(txt_img_list[0]) + else: + blank_img = np.concatenate(txt_img_list, axis=1) + return np.array(blank_img) + + +def base64_to_cv2(b64str): + import base64 + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +def draw_boxes(image, boxes, scores=None, drop_score=0.5): + if scores is None: + scores = [1] * len(boxes) + for (box, score) in zip(boxes, scores): + if score < drop_score: + continue + box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64) + image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2) + return image + + +def get_rotate_crop_image(img, points): + ''' + img_height, img_width = img.shape[0:2] + left = int(np.min(points[:, 0])) + right = int(np.max(points[:, 0])) + top = int(np.min(points[:, 1])) + bottom = int(np.max(points[:, 1])) + img_crop = img[top:bottom, left:right, :].copy() + points[:, 0] = points[:, 0] - left + points[:, 1] = points[:, 1] - top + ''' + assert len(points) == 4, "shape of points must be 4*2" + img_crop_width = int( + max( + np.linalg.norm(points[0] - points[1]), + np.linalg.norm(points[2] - points[3]))) + img_crop_height = int( + max( + np.linalg.norm(points[0] - points[3]), + np.linalg.norm(points[1] - points[2]))) + pts_std = np.float32([[0, 0], [img_crop_width, 0], + [img_crop_width, img_crop_height], + [0, img_crop_height]]) + M = cv2.getPerspectiveTransform(points, pts_std) + dst_img = cv2.warpPerspective( + img, + M, (img_crop_width, img_crop_height), + borderMode=cv2.BORDER_REPLICATE, + flags=cv2.INTER_CUBIC) + dst_img_height, dst_img_width = dst_img.shape[0:2] + if dst_img_height * 1.0 / dst_img_width >= 1.5: + dst_img = np.rot90(dst_img) + return dst_img + + +def check_gpu(use_gpu): + if use_gpu and not paddle.is_compiled_with_cuda(): + use_gpu = False + return use_gpu + + +if __name__ == '__main__': + pass diff --git a/deploy/pipeline/ppvehicle/vehicleplate_postprocess.py b/deploy/pipeline/ppvehicle/vehicleplate_postprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..66a00a3410340995a058368abfc333a35f454b66 --- /dev/null +++ b/deploy/pipeline/ppvehicle/vehicleplate_postprocess.py @@ -0,0 +1,296 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import paddle +from paddle.nn import functional as F +import re +from shapely.geometry import Polygon +import cv2 +import copy + + +def build_post_process(config, global_config=None): + support_dict = ['DBPostProcess', 'CTCLabelDecode'] + + config = copy.deepcopy(config) + module_name = config.pop('name') + if module_name == "None": + return + if global_config is not None: + config.update(global_config) + assert module_name in support_dict, Exception( + 'post process only support {}'.format(support_dict)) + module_class = eval(module_name)(**config) + return module_class + + +class DBPostProcess(object): + """ + The post process for Differentiable Binarization (DB). + """ + + def __init__(self, + thresh=0.3, + box_thresh=0.7, + max_candidates=1000, + unclip_ratio=2.0, + use_dilation=False, + score_mode="fast", + **kwargs): + self.thresh = thresh + self.box_thresh = box_thresh + self.max_candidates = max_candidates + self.unclip_ratio = unclip_ratio + self.min_size = 3 + self.score_mode = score_mode + assert score_mode in [ + "slow", "fast" + ], "Score mode must be in [slow, fast] but got: {}".format(score_mode) + + self.dilation_kernel = None if not use_dilation else np.array( + [[1, 1], [1, 1]]) + + def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height): + ''' + _bitmap: single map with shape (1, H, W), + whose values are binarized as {0, 1} + ''' + + bitmap = _bitmap + height, width = bitmap.shape + + outs = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST, + cv2.CHAIN_APPROX_SIMPLE) + if len(outs) == 3: + img, contours, _ = outs[0], outs[1], outs[2] + elif len(outs) == 2: + contours, _ = outs[0], outs[1] + + num_contours = min(len(contours), self.max_candidates) + + boxes = [] + scores = [] + for index in range(num_contours): + contour = contours[index] + points, sside = self.get_mini_boxes(contour) + if sside < self.min_size: + continue + points = np.array(points) + if self.score_mode == "fast": + score = self.box_score_fast(pred, points.reshape(-1, 2)) + else: + score = self.box_score_slow(pred, contour) + if self.box_thresh > score: + continue + + box = self.unclip(points).reshape(-1, 1, 2) + box, sside = self.get_mini_boxes(box) + if sside < self.min_size + 2: + continue + box = np.array(box) + + box[:, 0] = np.clip( + np.round(box[:, 0] / width * dest_width), 0, dest_width) + box[:, 1] = np.clip( + np.round(box[:, 1] / height * dest_height), 0, dest_height) + boxes.append(box.astype(np.int16)) + scores.append(score) + return np.array(boxes, dtype=np.int16), scores + + def unclip(self, box): + try: + import pyclipper + except Exception as e: + raise RuntimeError( + 'Unable to use vehicleplate postprocess in PP-Vehicle, please install pyclipper, for example: `pip install pyclipper`, see https://github.com/fonttools/pyclipper' + ) + unclip_ratio = self.unclip_ratio + poly = Polygon(box) + distance = poly.area * unclip_ratio / poly.length + offset = pyclipper.PyclipperOffset() + offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON) + expanded = np.array(offset.Execute(distance)) + return expanded + + def get_mini_boxes(self, contour): + bounding_box = cv2.minAreaRect(contour) + points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0]) + + index_1, index_2, index_3, index_4 = 0, 1, 2, 3 + if points[1][1] > points[0][1]: + index_1 = 0 + index_4 = 1 + else: + index_1 = 1 + index_4 = 0 + if points[3][1] > points[2][1]: + index_2 = 2 + index_3 = 3 + else: + index_2 = 3 + index_3 = 2 + + box = [ + points[index_1], points[index_2], points[index_3], points[index_4] + ] + return box, min(bounding_box[1]) + + def box_score_fast(self, bitmap, _box): + ''' + box_score_fast: use bbox mean score as the mean score + ''' + h, w = bitmap.shape[:2] + box = _box.copy() + xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1) + xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1) + ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1) + ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1) + + mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8) + box[:, 0] = box[:, 0] - xmin + box[:, 1] = box[:, 1] - ymin + cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1) + return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0] + + def box_score_slow(self, bitmap, contour): + ''' + box_score_slow: use polyon mean score as the mean score + ''' + h, w = bitmap.shape[:2] + contour = contour.copy() + contour = np.reshape(contour, (-1, 2)) + + xmin = np.clip(np.min(contour[:, 0]), 0, w - 1) + xmax = np.clip(np.max(contour[:, 0]), 0, w - 1) + ymin = np.clip(np.min(contour[:, 1]), 0, h - 1) + ymax = np.clip(np.max(contour[:, 1]), 0, h - 1) + + mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8) + + contour[:, 0] = contour[:, 0] - xmin + contour[:, 1] = contour[:, 1] - ymin + + cv2.fillPoly(mask, contour.reshape(1, -1, 2).astype(np.int32), 1) + return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0] + + def __call__(self, outs_dict, shape_list): + pred = outs_dict['maps'] + if isinstance(pred, paddle.Tensor): + pred = pred.numpy() + pred = pred[:, 0, :, :] + segmentation = pred > self.thresh + + boxes_batch = [] + for batch_index in range(pred.shape[0]): + src_h, src_w = shape_list[batch_index] + if self.dilation_kernel is not None: + mask = cv2.dilate( + np.array(segmentation[batch_index]).astype(np.uint8), + self.dilation_kernel) + else: + mask = segmentation[batch_index] + boxes, scores = self.boxes_from_bitmap(pred[batch_index], mask, + src_w, src_h) + + boxes_batch.append({'points': boxes}) + return boxes_batch + + +class BaseRecLabelDecode(object): + """ Convert between text-label and text-index """ + + def __init__(self, character_dict_path=None, use_space_char=False): + self.beg_str = "sos" + self.end_str = "eos" + + self.character_str = [] + if character_dict_path is None: + self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" + dict_character = list(self.character_str) + else: + with open(character_dict_path, "rb") as fin: + lines = fin.readlines() + for line in lines: + line = line.decode('utf-8').strip("\n").strip("\r\n") + self.character_str.append(line) + if use_space_char: + self.character_str.append(" ") + dict_character = list(self.character_str) + + dict_character = self.add_special_char(dict_character) + self.dict = {} + for i, char in enumerate(dict_character): + self.dict[char] = i + self.character = dict_character + + def add_special_char(self, dict_character): + return dict_character + + def decode(self, text_index, text_prob=None, is_remove_duplicate=False): + """ convert text-index into text-label. """ + result_list = [] + ignored_tokens = self.get_ignored_tokens() + batch_size = len(text_index) + for batch_idx in range(batch_size): + selection = np.ones(len(text_index[batch_idx]), dtype=bool) + if is_remove_duplicate: + selection[1:] = text_index[batch_idx][1:] != text_index[ + batch_idx][:-1] + for ignored_token in ignored_tokens: + selection &= text_index[batch_idx] != ignored_token + + char_list = [ + self.character[text_id] + for text_id in text_index[batch_idx][selection] + ] + if text_prob is not None: + conf_list = text_prob[batch_idx][selection] + else: + conf_list = [1] * len(selection) + if len(conf_list) == 0: + conf_list = [0] + + text = ''.join(char_list) + result_list.append((text, np.mean(conf_list).tolist())) + return result_list + + def get_ignored_tokens(self): + return [0] # for ctc blank + + +class CTCLabelDecode(BaseRecLabelDecode): + """ Convert between text-label and text-index """ + + def __init__(self, character_dict_path=None, use_space_char=False, + **kwargs): + super(CTCLabelDecode, self).__init__(character_dict_path, + use_space_char) + + def __call__(self, preds, label=None, *args, **kwargs): + if isinstance(preds, tuple) or isinstance(preds, list): + preds = preds[-1] + if isinstance(preds, paddle.Tensor): + preds = preds.numpy() + preds_idx = preds.argmax(axis=2) + preds_prob = preds.max(axis=2) + text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True) + if label is None: + return text + label = self.decode(label) + return text, label + + def add_special_char(self, dict_character): + dict_character = ['blank'] + dict_character + return dict_character diff --git a/deploy/pipeline/tools/clip_video.py b/deploy/pipeline/tools/clip_video.py new file mode 100644 index 0000000000000000000000000000000000000000..fbfb9cd08169b90bc71a436f6a414c4d6d1f480f --- /dev/null +++ b/deploy/pipeline/tools/clip_video.py @@ -0,0 +1,36 @@ +import cv2 + + +def cut_video(video_path, frameToStart, frametoStop, saved_video_path): + cap = cv2.VideoCapture(video_path) + FPS = cap.get(cv2.CAP_PROP_FPS) + + TOTAL_FRAME = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) # 获取视频总帧数 + + size = (cap.get(cv2.CAP_PROP_FRAME_WIDTH), + cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) + + videoWriter = cv2.VideoWriter( + saved_video_path, + apiPreference=0, + fourcc=cv2.VideoWriter_fourcc(* 'mp4v'), + fps=FPS, + frameSize=(int(size[0]), int(size[1]))) + + COUNT = 0 + while True: + success, frame = cap.read() + if success: + COUNT += 1 + if COUNT <= frametoStop and COUNT > frameToStart: # 选取起始帧 + videoWriter.write(frame) + else: + print("cap.read failed!") + break + if COUNT > frametoStop: + break + + cap.release() + videoWriter.release() + + print(saved_video_path) diff --git a/deploy/pipeline/tools/get_video_info.py b/deploy/pipeline/tools/get_video_info.py new file mode 100644 index 0000000000000000000000000000000000000000..39aa30d81212577666f25d4e14a147113197b1ed --- /dev/null +++ b/deploy/pipeline/tools/get_video_info.py @@ -0,0 +1,71 @@ +import os +import sys +import cv2 +import numpy as np +import argparse + + +def argsparser(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "--video_file", + type=str, + default=None, + help="Path of video file, `video_file` or `camera_id` has a highest priority." + ) + parser.add_argument( + '--region_polygon', + nargs='+', + type=int, + default=[], + help="Clockwise point coords (x0,y0,x1,y1...) of polygon of area when " + "do_break_in_counting. Note that only support single-class MOT and " + "the video should be taken by a static camera.") + return parser + + +def get_video_info(video_file, region_polygon): + entrance = [] + assert len(region_polygon + ) % 2 == 0, "region_polygon should be pairs of coords points." + for i in range(0, len(region_polygon), 2): + entrance.append([region_polygon[i], region_polygon[i + 1]]) + + if not os.path.exists(video_file): + print("video path '{}' not exists".format(video_file)) + sys.exit(-1) + capture = cv2.VideoCapture(video_file) + width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)) + print("video width: %d, height: %d" % (width, height)) + np_masks = np.zeros((height, width, 1), np.uint8) + + entrance = np.array(entrance) + cv2.fillPoly(np_masks, [entrance], 255) + + fps = int(capture.get(cv2.CAP_PROP_FPS)) + frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) + print("video fps: %d, frame_count: %d" % (fps, frame_count)) + cnt = 0 + while (1): + ret, frame = capture.read() + cnt += 1 + if cnt == 3: break + + alpha = 0.3 + img = np.array(frame).astype('float32') + mask = np_masks[:, :, 0] + color_mask = [0, 0, 255] + idx = np.nonzero(mask) + color_mask = np.array(color_mask) + img[idx[0], idx[1], :] *= 1.0 - alpha + img[idx[0], idx[1], :] += alpha * color_mask + cv2.imwrite('region_vis.jpg', img) + + +if __name__ == "__main__": + parser = argsparser() + FLAGS = parser.parse_args() + get_video_info(FLAGS.video_file, FLAGS.region_polygon) + + # python get_video_info.py --video_file=demo.mp4 --region_polygon 200 200 400 200 300 400 100 400 diff --git a/deploy/pipeline/tools/split_fight_train_test_dataset.py b/deploy/pipeline/tools/split_fight_train_test_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..5ca8fce64d00ccefee965c899a0d2b96863ff1dc --- /dev/null +++ b/deploy/pipeline/tools/split_fight_train_test_dataset.py @@ -0,0 +1,80 @@ +import os +import glob +import random +import fnmatch +import re +import sys + +class_id = {"nofight": 0, "fight": 1} + + +def get_list(path, key_func=lambda x: x[-11:], rgb_prefix='img_', level=1): + if level == 1: + frame_folders = glob.glob(os.path.join(path, '*')) + elif level == 2: + frame_folders = glob.glob(os.path.join(path, '*', '*')) + else: + raise ValueError('level can be only 1 or 2') + + def count_files(directory): + lst = os.listdir(directory) + cnt = len(fnmatch.filter(lst, rgb_prefix + '*')) + return cnt + + # check RGB + video_dict = {} + for f in frame_folders: + cnt = count_files(f) + k = key_func(f) + if level == 2: + k = k.split("/")[0] + + video_dict[f] = str(cnt) + " " + str(class_id[k]) + + return video_dict + + +def fight_splits(video_dict, train_percent=0.8): + videos = list(video_dict.keys()) + + train_num = int(len(videos) * train_percent) + + train_list = [] + val_list = [] + + random.shuffle(videos) + + for i in range(train_num): + train_list.append(videos[i] + " " + str(video_dict[videos[i]])) + for i in range(train_num, len(videos)): + val_list.append(videos[i] + " " + str(video_dict[videos[i]])) + + print("train:", len(train_list), ",val:", len(val_list)) + + with open("fight_train_list.txt", "w") as f: + for item in train_list: + f.write(item + "\n") + + with open("fight_val_list.txt", "w") as f: + for item in val_list: + f.write(item + "\n") + + +if __name__ == "__main__": + frame_dir = sys.argv[1] # "rawframes" + level = sys.argv[2] # 2 + train_percent = sys.argv[3] # 0.8 + + if level == 2: + + def key_func(x): + return '/'.join(x.split('/')[-2:]) + else: + + def key_func(x): + return x.split('/')[-1] + + video_dict = get_list(frame_dir, key_func=key_func, level=level) + print("number:", len(video_dict)) + + fight_splits(video_dict, train_percent) diff --git a/deploy/pphuman/README.md b/deploy/pphuman/README.md deleted file mode 100644 index 008cb69b170b5e60dbf071c9ad0c6ca9a466e4b8..0000000000000000000000000000000000000000 --- a/deploy/pphuman/README.md +++ /dev/null @@ -1,164 +0,0 @@ -[English](README_en.md) | 简体中文 - -# 实时行人分析 PP-Human - -PP-Human是基于飞桨深度学习框架的业界首个开源的实时行人分析工具,具有功能丰富,应用广泛和部署高效三大优势。PP-Human -支持图片/单镜头视频/多镜头视频多种输入方式,功能覆盖多目标跟踪、属性识别和行为分析。能够广泛应用于智慧交通、智慧社区、工业巡检等领域。支持服务器端部署及TensorRT加速,T4服务器上可达到实时。 - -PP-Human赋能社区智能精细化管理, AIStudio快速上手教程[链接](https://aistudio.baidu.com/aistudio/projectdetail/3679564) - -## 一、环境准备 - -环境要求: PaddleDetection版本 >= release/2.4 或 develop版本 - -PaddlePaddle和PaddleDetection安装 - -``` -# PaddlePaddle CUDA10.1 -python -m pip install paddlepaddle-gpu==2.2.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html - -# PaddlePaddle CPU -python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple - -# 克隆PaddleDetection仓库 -cd -git clone https://github.com/PaddlePaddle/PaddleDetection.git - -# 安装其他依赖 -cd PaddleDetection -pip install -r requirements.txt -``` - -详细安装文档参考[文档](docs/tutorials/INSTALL_cn.md) - -## 二、快速开始 - -### 1. 模型下载 - -PP-Human提供了目标检测、属性识别、行为识别、ReID预训练模型,以实现不同使用场景,用户可以直接下载使用 - -| 任务 | 适用场景 | 精度 | 预测速度(ms) | 预测部署模型 | -| :---------: |:---------: |:--------------- | :-------: | :------: | -| 目标检测 | 图片输入 | mAP: 56.3 | 28.0ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | -| 目标跟踪 | 视频输入 | MOTA: 72.0 | 33.1ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | -| 属性识别 | 图片/视频输入 属性识别 | mA: 94.86 | 单人2ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | -| 关键点检测 | 视频输入 行为识别 | AP: 87.1 | 单人2.9ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) -| 行为识别 | 视频输入 行为识别 | 准确率: 96.43 | 单人2.7ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | -| ReID | 视频输入 跨镜跟踪 | mAP: 99.7 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) | - -下载模型后,解压至`./output_inference`文件夹 - -**注意:** - -- 模型精度为融合数据集结果,数据集包含开源数据集和企业数据集 -- ReID模型精度为Market1501数据集测试结果 -- 预测速度为T4下,开启TensorRT FP16的效果, 模型预测速度包含数据预处理、模型预测、后处理全流程 - -### 2. 配置文件说明 - -PP-Human相关配置位于```deploy/pphuman/config/infer_cfg.yml```中,存放模型路径,完成不同功能需要设置不同的任务类型 - -功能及任务类型对应表单如下: - -| 输入类型 | 功能 | 任务类型 | 配置项 | -|-------|-------|----------|-----| -| 图片 | 属性识别 | 目标检测 属性识别 | DET ATTR | -| 单镜头视频 | 属性识别 | 多目标跟踪 属性识别 | MOT ATTR | -| 单镜头视频 | 行为识别 | 多目标跟踪 关键点检测 行为识别 | MOT KPT ACTION | - -例如基于视频输入的属性识别,任务类型包含多目标跟踪和属性识别,具体配置如下: - -``` -crop_thresh: 0.5 -attr_thresh: 0.5 -visual: True - -MOT: - model_dir: output_inference/mot_ppyoloe_l_36e_pipeline/ - tracker_config: deploy/pphuman/config/tracker_config.yml - batch_size: 1 - -ATTR: - model_dir: output_inference/strongbaseline_r50_30e_pa100k/ - batch_size: 8 -``` - -**注意:** - -- 如果用户仅需要实现不同任务,可以在命令行中加入 `--enable_attr=True` 或 `--enable_action=True`即可,无需修改配置文件 -- 如果用户仅需要修改模型文件路径,可以在命令行中加入 `--model_dir det=ppyoloe/` 即可,无需修改配置文件,详细说明参考下方参数说明文档 - - -### 3. 预测部署 - -``` -# 指定配置文件路径和测试图片 -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --image_file=test_image.jpg --device=gpu - -# 指定配置文件路径和测试视频,完成属性识别 -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --video_file=test_video.mp4 --device=gpu --enable_attr=True - -# 指定配置文件路径和测试视频,完成行为识别 -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --video_file=test_video.mp4 --device=gpu --enable_action=True - -# 指定配置文件路径,模型路径和测试视频,完成多目标跟踪 -# 命令行中指定的模型路径优先级高于配置文件 -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --video_file=test_video.mp4 --device=gpu --model_dir det=ppyoloe/ -``` - -#### 3.1 参数说明 - -| 参数 | 是否必须|含义 | -|-------|-------|----------| -| --config | Yes | 配置文件路径 | -| --model_dir | Option | PP-Human中各任务模型路径,优先级高于配置文件, 例如`--model_dir det=better_det/ attr=better_attr/`| -| --image_file | Option | 需要预测的图片 | -| --image_dir | Option | 要预测的图片文件夹路径 | -| --video_file | Option | 需要预测的视频 | -| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测,可设置为:0 - (摄像头数目-1) ),预测过程中在可视化界面按`q`退出输出预测结果到:output/output.mp4| -| --enable_attr| Option | 是否进行属性识别, 默认为False,即不开启属性识别 | -| --enable_action| Option | 是否进行行为识别,默认为False,即不开启行为识别 | -| --device | Option | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU`| -| --output_dir | Option|可视化结果保存的根目录,默认为output/| -| --run_mode | Option |使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8)| -| --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速,默认为False | -| --cpu_threads | Option| 设置cpu线程数,默认为1 | -| --trt_calib_mode | Option| TensorRT是否使用校准功能,默认为False。使用TensorRT的int8功能时,需设置为True,使用PaddleSlim量化后的模型时需要设置为False | -| --do_entrance_counting | Option | 是否统计出入口流量,默认为False | -| --draw_center_traj | Option | 是否绘制跟踪轨迹,默认为False | - -## 三、方案介绍 - -PP-Human整体方案如下图所示 - -
    - -
    - - -### 1. 目标检测 -- 采用PP-YOLOE L 作为目标检测模型 -- 详细文档参考[PP-YOLOE](../../configs/ppyoloe/)和[检测跟踪文档](docs/mot.md) - -### 2. 多目标跟踪 -- 采用SDE方案完成多目标跟踪 -- 检测模型使用PP-YOLOE L -- 跟踪模块采用Bytetrack方案 -- 详细文档参考[Bytetrack](../../configs/mot/bytetrack)和[检测跟踪文档](docs/mot.md) - -### 3. 跨镜跟踪 -- 使用PP-YOLOE + Bytetrack得到单镜头多目标跟踪轨迹 -- 使用ReID(centroid网络)对每一帧的检测结果提取特征 -- 多镜头轨迹特征进行匹配,得到跨镜头跟踪结果 -- 详细文档参考[跨镜跟踪](docs/mtmct.md) - -### 4. 属性识别 -- 使用PP-YOLOE + Bytetrack跟踪人体 -- 使用StrongBaseline(多分类模型)完成识别属性,主要属性包括年龄、性别、帽子、眼睛、上衣下衣款式、背包等 -- 详细文档参考[属性识别](docs/attribute.md) - -### 5. 行为识别: -- 使用PP-YOLOE + Bytetrack跟踪人体 -- 使用HRNet进行关键点检测得到人体17个骨骼点 -- 结合50帧内同一个人骨骼点的变化,通过ST-GCN判断50帧内发生的动作是否为摔倒 -- 详细文档参考[行为识别](docs/action.md) diff --git a/deploy/pphuman/README_en.md b/deploy/pphuman/README_en.md deleted file mode 100644 index d45481b4d5f8da088b251997bbe8266146b72c9f..0000000000000000000000000000000000000000 --- a/deploy/pphuman/README_en.md +++ /dev/null @@ -1,156 +0,0 @@ -English | [简体中文](README.md) - -# PP-Human— a Real-Time Pedestrian Analysis Tool - -PP-Human serves as the first open-source tool of real-time pedestrian anaylsis relying on the PaddlePaddle deep learning framework. Versatile and efficient in deployment, it has been used in various senarios. PP-Human -offers many input options, including image/single-camera video/multi-camera video, and covers multi-object tracking, attribute recognition, and action recognition. PP-Human can be applied to intelligent traffic, the intelligent community, industiral patrol, and so on. It supports server-side deployment and TensorRT acceleration,and even can achieve real-time analysis on the T4 server. - -## I. Environment Preparation - -Requirement: PaddleDetection version >= release/2.4 - - -The installation of PaddlePaddle and PaddleDetection - -``` -# PaddlePaddle CUDA10.1 -python -m pip install paddlepaddle-gpu==2.2.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html - -# PaddlePaddle CPU -python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple - -# Clone the PaddleDetection repository -cd -git clone https://github.com/PaddlePaddle/PaddleDetection.git - -# Install other dependencies -cd PaddleDetection -pip install -r requirements.txt -``` - -For details of the installation, please refer to this [document](docs/tutorials/INSTALL_cn.md) - -## II. Quick Start - -### 1. Model Download - -To make users have access to models of different scenarios, PP-Human provides pre-trained models of object detection, attribute recognition, behavior recognition, and ReID. - -| Task | Scenario | Precision | Inference Speed(FPS) | Model Inference and Deployment | -| :---------: |:---------: |:--------------- | :-------: | :------: | -| Object Detection | Image/Video Input | - | - | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | -| Attribute Recognition | Image/Video Input Attribute Recognition | - | - | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.tar) | -| Keypoint Detection | Video Input Action Recognition | - | - | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) -| Behavior Recognition | Video Input Bheavior Recognition | - | - | [Link](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | -| ReID | Multi-Target Multi-Camera Tracking | - | - | [Link]() | - -Then, unzip the downloaded model to the folder `./output_inference`. - -**Note: ** - -- The model precision is decided by the fusion of datasets which include open-source datasets and enterprise ones. -- When the inference speed is T4, use TensorRT FP16. - -### 2. Preparation of Configuration Files - -Configuration files of PP-Human are stored in ```deploy/pphuman/config/infer_cfg.yml```. Different tasks are for different functions, so you need to set the task type beforhand. - -Their correspondence is as follows: - -| Input | Function | Task Type | Config | -|-------|-------|----------|-----| -| Image | Attribute Recognition | Object Detection Attribute Recognition | DET ATTR | -| Single-Camera Video | Attribute Recognition | Multi-Object Tracking Attribute Recognition | MOT ATTR | -| Single-Camera Video | Behavior Recognition | Multi-Object Tracking Keypoint Detection Action Recognition | MOT KPT ACTION | - -For example, for the attribute recognition with the video input, its task types contain multi-object tracking and attribute recognition, and the config is: - -``` -crop_thresh: 0.5 -attr_thresh: 0.5 -visual: True - -MOT: - model_dir: output_inference/mot_ppyoloe_l_36e_pipeline/ - tracker_config: deploy/pphuman/config/tracker_config.yml - batch_size: 1 - -ATTR: - model_dir: output_inference/strongbaseline_r50_30e_pa100k/ - batch_size: 8 -``` - - - -### 3. Inference and Deployment - -``` -# Specify the config file path and test images -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --image_file=test_image.jpg --device=gpu - -# Specify the config file path and test videos,and finish the attribute recognition -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --video_file=test_video.mp4 --device=gpu --enable_attr=True - -# Specify the config file path and test videos,and finish the Action Recognition -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --video_file=test_video.mp4 --device=gpu --enable_action=True - -# Specify the config file path, the model path and test videos,and finish the multi-object tracking -# The model path specified on the command line prioritizes over the config file -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --video_file=test_video.mp4 --device=gpu --model_dir det=ppyoloe/ -``` - -### 3.1 Description of Parameters - -| Parameter | Optional or not| Meaning | -|-------|-------|----------| -| --config | Yes | Config file path | -| --model_dir | Option | the model paths of different tasks in PP-Human, with a priority higher than config files | -| --image_file | Option | Images to-be-predicted | -| --image_dir | Option | The path of folders of to-be-predicted images | -| --video_file | Option | Videos to-be-predicted | -| --camera_id | Option | ID of the inference camera is -1 by default (means inference without cameras,and it can be set to 0 - (number of cameras-1)), and during the inference, click `q` on the visual interface to exit and output the inference result to output/output.mp4| -| --enable_attr| Option | Enable attribute recognition or not | -| --enable_action| Option | Enable action recognition or not | -| --device | Option | During the operation,available devices are `CPU/GPU/XPU`,and the default is `CPU`| -| --output_dir | Option| The default root directory which stores the visualization result is output/| -| --run_mode | Option | When using GPU,the default one is paddle, and all these are available(paddle/trt_fp32/trt_fp16/trt_int8).| -| --enable_mkldnn | Option |Enable the MKLDNN acceleration or not in the CPU inference, and the default value is false | -| --cpu_threads | Option| The default CPU thread is 1 | -| --trt_calib_mode | Option| Enable calibration on TensorRT or not, and the default is False. When using the int8 of TensorRT,it should be set to True; When using the model quantized by PaddleSlim, it should be set to False. | - - -## III. Introduction to the Solution - -The overall solution of PP-Human is as follows: - -
    - -
    - - -### 1. Object Detection -- Use PP-YOLOE L as the model of object detection -- For details, please refer to [PP-YOLOE](../../configs/ppyoloe/) - -### 2. Multi-Object Tracking -- Conduct multi-object tracking with the SDE solution -- Use PP-YOLOE L as the detection model -- Use the Bytetrack solution to track modules -- For details, refer to [Bytetrack](configs/mot/bytetrack) - -### 3. Cross-Camera Tracking -- Use PP-YOLOE + Bytetrack to obtain the tracks of single-camera multi-object tracking -- Use ReID(centroid network)to extract features of the detection result of each frame -- Match the features of multi-camera tracks to get the cross-camera tracking result -- For details, please refer to [Cross-Camera Tracking](docs/mtmct_en.md) - -### 4. Multi-Target Multi-Camera Tracking -- Use PP-YOLOE + Bytetrack to track humans -- Use StrongBaseline(a multi-class model)to conduct attribute recognition, and the main attributes include age, gender, hats, eyes, clothing, and backpacks. -- For details, please refer to [Attribute Recognition](docs/attribute_en.md) - -### 5. Action Recognition -- Use PP-YOLOE + Bytetrack to track humans -- Use HRNet for keypoint detection and get the information of the 17 key points in the human body -- According to the changes of the key points of the same person within 50 frames, judge whether the action made by the person within 50 frames is a fall with the help of ST-GCN -- For details, please refer to [Action Recognition](docs/action_en.md) diff --git a/deploy/pphuman/config/infer_cfg.yml b/deploy/pphuman/config/infer_cfg.yml deleted file mode 100644 index 0d4de94c2bfec0b05db1a90691528808d051bc28..0000000000000000000000000000000000000000 --- a/deploy/pphuman/config/infer_cfg.yml +++ /dev/null @@ -1,33 +0,0 @@ -crop_thresh: 0.5 -attr_thresh: 0.5 -kpt_thresh: 0.2 -visual: True -warmup_frame: 50 - -DET: - model_dir: output_inference/mot_ppyoloe_l_36e_pipeline/ - batch_size: 1 - -ATTR: - model_dir: output_inference/strongbaseline_r50_30e_pa100k/ - batch_size: 8 - -MOT: - model_dir: output_inference/mot_ppyoloe_l_36e_pipeline/ - tracker_config: deploy/pphuman/config/tracker_config.yml - batch_size: 1 - -KPT: - model_dir: output_inference/dark_hrnet_w32_256x192/ - batch_size: 8 - -ACTION: - model_dir: output_inference/STGCN - batch_size: 1 - max_frames: 50 - display_frames: 80 - coord_size: [384, 512] - -REID: - model_dir: output_inference/reid_model/ - batch_size: 16 diff --git a/deploy/pphuman/docs/action.md b/deploy/pphuman/docs/action.md deleted file mode 100644 index 82d1ac0e3a40ccb3745460d27ec94df425616033..0000000000000000000000000000000000000000 --- a/deploy/pphuman/docs/action.md +++ /dev/null @@ -1,76 +0,0 @@ -# PP-Human行为识别模块 - -行为识别在智慧社区,安防监控等方向具有广泛应用,PP-Human中集成了基于骨骼点的行为识别模块。 - -
    - -
    数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用
    -
    - -## 模型库 -在这里,我们提供了检测/跟踪、关键点识别以及识别摔倒动作的预训练模型,用户可以直接下载使用。 - -| 任务 | 算法 | 精度 | 预测速度(ms) | 下载链接 | -|:---------------------|:---------:|:------:|:------:| :---------------------------------------------------------------------------------: | -| 行人检测/跟踪 | PP-YOLOE | mAP: 56.3
    MOTA: 72.0 | 检测: 28ms
    跟踪:33.1ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | -| 关键点识别 | HRNet | AP: 87.1 | 单人 2.9ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)| -| 行为识别 | ST-GCN | 准确率: 96.43 | 单人 2.7ms | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | - - -注: -1. 检测/跟踪模型精度为MOT17,CrowdHuman,HIEVE和部分业务数据融合训练测试得到。 -2. 关键点模型使用COCO,UAVHuman和部分业务数据融合训练, 精度在业务数据测试集上得到。 -3. 行为识别模型使用NTU-RGB+D,UR Fall Detection Dataset和部分业务数据融合训练,精度在业务数据测试集上得到。 -4. 预测速度为NVIDIA T4 机器上使用TensorRT FP16时的速度, 速度包含数据预处理、模型预测、后处理全流程。 - -## 配置说明 -[配置文件](../config/infer_cfg.yml)中与行为识别相关的参数如下: -``` -ACTION: - model_dir: output_inference/STGCN # 模型所在路径 - batch_size: 1 # 预测批大小。 当前仅支持为1进行推理 - max_frames: 50 # 动作片段对应的帧数。在行人ID对应时序骨骼点结果时达到该帧数后,会通过行为识别模型判断该段序列的动作类型。与训练设置一致时效果最佳。 - display_frames: 80 # 显示帧数。当预测结果为摔倒时,在对应人物ID中显示状态的持续时间。 - coord_size: [384, 512] # 坐标统一缩放到的尺度大小。与训练设置一致时效果最佳。 -``` - -## 使用方法 -1. 从上表链接中下载模型并解压到```./output_inference```路径下。 -2. 目前行为识别模块仅支持视频输入,启动命令如下: -```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ - --video_file=test_video.mp4 \ - --device=gpu \ - --enable_action=True -``` -3. 若修改模型路径,有以下两种方式: - - - ```./deploy/pphuman/config/infer_cfg.yml```下可以配置不同模型路径,关键点模型和行为识别模型分别对应`KPT`和`ACTION`字段,修改对应字段下的路径为实际期望的路径即可。 - - 命令行中增加`--model_dir`修改模型路径: -```python -python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml \ - --video_file=test_video.mp4 \ - --device=gpu \ - --enable_action=True \ - --model_dir kpt=./dark_hrnet_w32_256x192 action=./STGCN -``` - -## 方案说明 -1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../configs/ppyoloe)。 -2. 通过行人检测框的坐标在输入视频的对应帧中截取每个行人,并使用[关键点识别模型](../../../configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml)得到对应的17个骨骼特征点。骨骼特征点的顺序及类型与COCO一致,详见[如何准备关键点数据集](../../../docs/tutorials/PrepareKeypointDataSet_cn.md)中的`COCO数据集`部分。 -3. 每个跟踪ID对应的目标行人各自累计骨骼特征点结果,组成该人物的时序关键点序列。当累计到预定帧数或跟踪丢失后,使用行为识别模型判断时序关键点序列的动作类型。当前版本模型支持摔倒行为的识别,预测得到的`class id`对应关系为: -``` -0: 摔倒, -1: 其他 -``` -4. 行为识别模型使用了[ST-GCN](https://arxiv.org/abs/1801.07455),并基于[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md)套件完成模型训练。 - -## 参考文献 -``` -@inproceedings{stgcn2018aaai, - title = {Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition}, - author = {Sijie Yan and Yuanjun Xiong and Dahua Lin}, - booktitle = {AAAI}, - year = {2018}, -} -``` diff --git a/deploy/pptracking/README_cn.md b/deploy/pptracking/README_cn.md index 4330e114205598e41282c9c937c4433e54487a81..cf25c9973feacf5458582cb7387c8377c894be8c 100644 --- a/deploy/pptracking/README_cn.md +++ b/deploy/pptracking/README_cn.md @@ -36,7 +36,7 @@ PP-Tracking 提供了简洁的GUI可视化界面,教程请参考[PP-Tracking PP-Tracking 支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式。 - 单镜头跟踪同时支持**FairMOT**和**DeepSORT**两种多目标跟踪算法,跨镜头跟踪只支持**DeepSORT**算法。 - 单镜头跟踪的功能包括行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪以及流量统计,模型主要是基于FairMOT进行优化,实现了实时跟踪的效果,同时基于不同应用场景提供了针对性的预训练模型。 -- DeepSORT算法方案(包括跨镜头跟踪用到的DeepSORT),选用的检测器是PaddleDetection自研的高性能检测模型[PP-YOLOv2](../../ppyolo/)和轻量级特色检测模型[PP-PicoDet](../../picodet/),选用的ReID模型是PaddleClas自研的超轻量骨干网络模型[PP-LCNet](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/models/PP-LCNet.md) +- DeepSORT算法方案(包括跨镜头跟踪用到的DeepSORT),选用的检测器是PaddleDetection自研的高性能检测模型[PP-YOLOv2](../../configs/ppyolo/)和轻量级特色检测模型[PP-PicoDet](../../configs/picodet/),选用的ReID模型是PaddleClas自研的超轻量骨干网络模型[PP-LCNet](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/models/PP-LCNet.md) PP-Tracking中提供的多场景预训练模型以及导出后的预测部署模型如下: diff --git a/deploy/pptracking/cpp/src/tracker.cc b/deploy/pptracking/cpp/src/tracker.cc index 09b2dfa249ffbc90819d2dd5d3e419a27d23cd43..9540e39f6701750ae8af5229ecd9cfa264460095 100644 --- a/deploy/pptracking/cpp/src/tracker.cc +++ b/deploy/pptracking/cpp/src/tracker.cc @@ -56,8 +56,8 @@ bool JDETracker::update(const cv::Mat &dets, ++timestamp; TrajectoryPool candidates(dets.rows); for (int i = 0; i < dets.rows; ++i) { - float score = *dets.ptr(i, 4); - const cv::Mat <rb_ = dets(cv::Rect(0, i, 4, 1)); + float score = *dets.ptr(i, 1); + const cv::Mat <rb_ = dets(cv::Rect(2, i, 4, 1)); cv::Vec4f ltrb = mat2vec4f(ltrb_); const cv::Mat &embedding = emb(cv::Rect(0, i, emb.cols, 1)); candidates[i] = Trajectory(ltrb, score, embedding); diff --git a/deploy/pptracking/python/README.md b/deploy/pptracking/python/README.md index d5c34cdf56efec0f0dd7686d2127c33e584eaf37..e56a69e3fe96aba11ed3a50d1173b39c30fc83cf 100644 --- a/deploy/pptracking/python/README.md +++ b/deploy/pptracking/python/README.md @@ -65,13 +65,12 @@ python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fa - bdd100k车辆跟踪和多类别demo视频可从此链接下载:`wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/bdd100k_demo.mp4` + ## 2. 对DeepSORT模型的导出和预测 ### 2.1 导出预测模型 Step 1:导出检测模型 ```bash -# 导出PPYOLOv2行人检测模型 -CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams -# 或导出PPYOLOe行人检测模型 +# 导出PPYOLOe行人检测模型 CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams ``` @@ -88,45 +87,41 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid # 下载行人跟踪demo视频: wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 -# 用导出的PPYOLOv2行人检测模型和PPLCNet ReID模型 -python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyolov2_r50vd_dcn_365e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --tracker_config=tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5 --save_mot_txts --save_images -# 或用导出的PPYOLOe行人检测模型和PPLCNet ReID模型 -python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --tracker_config=tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5 --save_mot_txts --save_images +# 用导出的PPYOLOE行人检测模型和PPLCNet ReID模型 +python3.7 deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts --threshold=0.5 ``` ### 2.3 用导出的模型基于Python去预测车辆跟踪 ```bash -# 下载车辆检测PicoDet导出的模型: -wget https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_640_aic21mtmct_vehicle.tar -tar -xvf picodet_l_640_aic21mtmct_vehicle.tar -# 或者车辆检测PP-YOLOv2导出的模型: -wget https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_aic21mtmct_vehicle.tar -tar -xvf ppyolov2_r50vd_dcn_365e_aic21mtmct_vehicle.tar +# 下载车辆demo视频 +wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/bdd100k_demo.mp4 + +# 下载车辆检测PPYOLOE导出的模型: +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip +unzip mot_ppyoloe_l_36e_ppvehicle.zip # 下载车辆ReID导出的模型: wget https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.tar tar -xvf deepsort_pplcnet_vehicle.tar -# 用导出的PicoDet车辆检测模型和PPLCNet车辆ReID模型 -python deploy/pptracking/python/mot_sde_infer.py --model_dir=picodet_l_640_aic21mtmct_vehicle/ --reid_model_dir=deepsort_pplcnet_vehicle/ --tracker_config=tracker_config.yml --device=GPU --threshold=0.5 --video_file={your video}.mp4 --save_mot_txts --save_images - -# 用导出的PP-YOLOv2车辆检测模型和PPLCNet车辆ReID模型 -python deploy/pptracking/python/mot_sde_infer.py --model_dir=ppyolov2_r50vd_dcn_365e_aic21mtmct_vehicle/ --reid_model_dir=deepsort_pplcnet_vehicle/ --tracker_config=tracker_config.yml --device=GPU --threshold=0.5 --video_file={your video}.mp4 --save_mot_txts --save_images +# 用导出的PPYOLOE车辆检测模型和PPLCNet车辆ReID模型 +python deploy/pptracking/python/mot_sde_infer.py --model_dir=mot_ppyoloe_l_36e_ppvehicle/ --reid_model_dir=deepsort_pplcnet_vehicle/ --tracker_config=deploy/pptracking/python/tracker_config.yml --device=GPU --threshold=0.5 --video_file=bdd100k_demo.mp4 --save_mot_txts --save_images ``` **注意:** + - 运行前需要手动修改`tracker_config.yml`的跟踪器类型为`type: DeepSORTTracker`。 - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_images`表示保存跟踪结果可视化图片。 - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 - `--threshold`表示结果可视化的置信度阈值,默认为0.5,低于该阈值的结果会被过滤掉,为了可视化效果更佳,可根据实际情况自行修改。 - DeepSORT算法不支持多类别跟踪,只支持单类别跟踪,且ReID模型最好是与检测模型同一类别的物体训练过的,比如行人跟踪最好使用行人ReID模型,车辆跟踪最好使用车辆ReID模型。 - - 需要手动修改`tracker_config.yml`的跟踪器类型为`type: DeepSORTTracker`。 -## 3. 对ByteTrack模型的导出和预测 + +## 3. 对ByteTrack和OC_SORT模型的导出和预测 ### 3.1 导出预测模型 ```bash # 导出PPYOLOe行人检测模型 -CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams ``` ### 3.2 用导出的模型基于Python去预测行人跟踪 @@ -135,27 +130,28 @@ CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/dete wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4 # 用导出的PPYOLOe行人检测模型 -python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5 --save_mot_txts --save_images +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts # 用导出的PPYOLOe行人检测模型和PPLCNet ReID模型 -python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --tracker_config=tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5 --save_mot_txts --save_images +python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5 --save_mot_txts --save_images ``` **注意:** + - 运行ByteTrack模型需要确认`tracker_config.yml`的跟踪器类型为`type: JDETracker`。 + - 可切换`tracker_config.yml`的跟踪器类型为`type: OCSORTTracker`运行OC_SORT模型。 - ByteTrack模型是加载导出的检测器和单独配置的`--tracker_config`文件运行的,为了实时跟踪所以不需要reid模型,`--reid_model_dir`表示reid导出模型的路径,默认为空,加不加具体视效果而定; - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_images`表示保存跟踪结果可视化图片。 - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。 - `--threshold`表示结果可视化的置信度阈值,默认为0.5,低于该阈值的结果会被过滤掉,为了可视化效果更佳,可根据实际情况自行修改。 + ## 4. 跨境跟踪模型的导出和预测 ### 4.1 导出预测模型 Step 1:下载导出的检测模型 ```bash -wget https://paddledet.bj.bcebos.com/models/mot/deepsort/picodet_l_640_aic21mtmct_vehicle.tar -tar -xvf picodet_l_640_aic21mtmct_vehicle.tar -# 或者 -wget https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_aic21mtmct_vehicle.tar -tar -xvf ppyolov2_r50vd_dcn_365e_aic21mtmct_vehicle.tar +# 下载车辆检测PPYOLOE导出的模型: +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip +unzip mot_ppyoloe_l_36e_ppvehicle.zip ``` Step 2:下载导出的ReID模型 ```bash @@ -169,13 +165,10 @@ tar -xvf deepsort_pplcnet_vehicle.tar wget https://paddledet.bj.bcebos.com/data/mot/demo/mtmct-demo.tar tar -xvf mtmct-demo.tar -# 用导出的PicoDet车辆检测模型和PPLCNet车辆ReID模型 -python deploy/pptracking/python/mot_sde_infer.py --model_dir=picodet_l_640_aic21mtmct_vehicle/ --reid_model_dir=deepsort_pplcnet_vehicle/ --mtmct_dir=mtmct-demo --mtmct_cfg=mtmct_cfg.yml --device=GPU --threshold=0.5 --save_mot_txts --save_images - -# 用导出的PP-YOLOv2车辆检测模型和PPLCNet车辆ReID模型 -python deploy/pptracking/python/mot_sde_infer.py --model_dir=ppyolov2_r50vd_dcn_365e_aic21mtmct_vehicle/ --reid_model_dir=deepsort_pplcnet_vehicle/ --mtmct_dir=mtmct-demo --mtmct_cfg=mtmct_cfg.yml --device=GPU --threshold=0.5 --save_mot_txts --save_images - +# 用导出的PPYOLOE车辆检测模型和PPLCNet车辆ReID模型 +python deploy/pptracking/python/mot_sde_infer.py --model_dir=mot_ppyoloe_l_36e_ppvehicle/ --reid_model_dir=deepsort_pplcnet_vehicle/ --mtmct_dir=mtmct-demo --mtmct_cfg=mtmct_cfg.yml --device=GPU --threshold=0.5 --save_mot_txts --save_images ``` + **注意:** - 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt),或`--save_images`表示保存跟踪结果可视化图片。 - 跨镜头跟踪结果txt文件每行信息是`camera_id,frame,id,x1,y1,w,h,-1,-1`。 @@ -190,6 +183,7 @@ python deploy/pptracking/python/mot_sde_infer.py --model_dir=ppyolov2_r50vd_dcn_ | 参数 | 是否必须|含义 | |-------|-------|----------| | --model_dir | Yes| 上述导出的模型路径 | +| --reid_model_dir | Option| ReID导出的模型路径 | | --image_file | Option | 需要预测的图片 | | --image_dir | Option | 要预测的图片文件夹路径 | | --video_file | Option | 需要预测的视频 | @@ -203,8 +197,10 @@ python deploy/pptracking/python/mot_sde_infer.py --model_dir=ppyolov2_r50vd_dcn_ | --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速,默认为False | | --cpu_threads | Option| 设置cpu线程数,默认为1 | | --trt_calib_mode | Option| TensorRT是否使用校准功能,默认为False。使用TensorRT的int8功能时,需设置为True,使用PaddleSlim量化后的模型时需要设置为False | -| --do_entrance_counting | Option | 是否统计出入口流量,默认为False | -| --draw_center_traj | Option | 是否绘制跟踪轨迹,默认为False | +| --save_mot_txts | Option | 跟踪任务是否保存txt结果文件,默认为False | +| --save_images | Option | 跟踪任务是否保存视频的可视化图片,默认为False | +| --do_entrance_counting | Option | 跟踪任务是否统计出入口流量,默认为False | +| --draw_center_traj | Option | 跟踪任务是否绘制跟踪轨迹,默认为False | | --mtmct_dir | Option | 需要进行MTMCT跨境头跟踪预测的图片文件夹路径,默认为None | | --mtmct_cfg | Option | 需要进行MTMCT跨境头跟踪预测的配置文件路径,默认为None | diff --git a/deploy/pptracking/python/det_infer.py b/deploy/pptracking/python/det_infer.py index 90a391e07209951cc80671c97f898b5cdd4bc0a9..c52879453027e099d25cd0388083eb57d1f8aeea 100644 --- a/deploy/pptracking/python/det_infer.py +++ b/deploy/pptracking/python/det_infer.py @@ -32,7 +32,7 @@ sys.path.insert(0, parent_path) from benchmark_utils import PaddleInferBenchmark from picodet_postprocess import PicoDetPostProcess -from preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, LetterBoxResize, decode_image +from preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, LetterBoxResize, Pad, decode_image from mot.visualize import visualize_box_mask from mot_utils import argsparser, Timer, get_current_memory_mb @@ -416,9 +416,15 @@ def load_predictor(model_dir, raise ValueError( "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}" .format(run_mode, device)) - config = Config( - os.path.join(model_dir, 'model.pdmodel'), - os.path.join(model_dir, 'model.pdiparams')) + infer_model = os.path.join(model_dir, 'model.pdmodel') + infer_params = os.path.join(model_dir, 'model.pdiparams') + if not os.path.exists(infer_model): + infer_model = os.path.join(model_dir, 'inference.pdmodel') + infer_params = os.path.join(model_dir, 'inference.pdiparams') + if not os.path.exists(infer_model): + raise ValueError( + "Cannot find any inference model in dir: {},".format(model_dir)) + config = Config(infer_model, infer_params) if device == 'GPU': # initial GPU memory(M), device ID config.enable_use_gpu(200, 0) diff --git a/deploy/pptracking/python/mot/matching/__init__.py b/deploy/pptracking/python/mot/matching/__init__.py index 54c6680f79f16247c562a9da1024dd3e1de4c57f..f6a88c5673a50452415b1f86f7b18bac12297f49 100644 --- a/deploy/pptracking/python/mot/matching/__init__.py +++ b/deploy/pptracking/python/mot/matching/__init__.py @@ -14,6 +14,8 @@ from . import jde_matching from . import deepsort_matching +from . import ocsort_matching from .jde_matching import * from .deepsort_matching import * +from .ocsort_matching import * diff --git a/deploy/pptracking/python/mot/matching/jde_matching.py b/deploy/pptracking/python/mot/matching/jde_matching.py index eb3749885b0ad8e563e32cf3ca1b89c3364700bc..3b1cf02edd75cb960e433926274b761d49136033 100644 --- a/deploy/pptracking/python/mot/matching/jde_matching.py +++ b/deploy/pptracking/python/mot/matching/jde_matching.py @@ -15,7 +15,14 @@ This code is based on https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/matching.py """ -import lap +try: + import lap +except: + print( + 'Warning: Unable to use JDE/FairMOT/ByteTrack, please install lap, for example: `pip install lap`, see https://github.com/gatagat/lap' + ) + pass + import scipy import numpy as np from scipy.spatial.distance import cdist @@ -26,7 +33,7 @@ warnings.filterwarnings("ignore") __all__ = [ 'merge_matches', 'linear_assignment', - 'cython_bbox_ious', + 'bbox_ious', 'iou_distance', 'embedding_distance', 'fuse_motion', @@ -53,6 +60,12 @@ def merge_matches(m1, m2, shape): def linear_assignment(cost_matrix, thresh): + try: + import lap + except Exception as e: + raise RuntimeError( + 'Unable to use JDE/FairMOT/ByteTrack, please install lap, for example: `pip install lap`, see https://github.com/gatagat/lap' + ) if cost_matrix.size == 0: return np.empty( (0, 2), dtype=int), tuple(range(cost_matrix.shape[0])), tuple( @@ -68,22 +81,28 @@ def linear_assignment(cost_matrix, thresh): return matches, unmatched_a, unmatched_b -def cython_bbox_ious(atlbrs, btlbrs): - ious = np.zeros((len(atlbrs), len(btlbrs)), dtype=np.float) - if ious.size == 0: +def bbox_ious(atlbrs, btlbrs): + boxes = np.ascontiguousarray(atlbrs, dtype=np.float) + query_boxes = np.ascontiguousarray(btlbrs, dtype=np.float) + N = boxes.shape[0] + K = query_boxes.shape[0] + ious = np.zeros((N, K), dtype=boxes.dtype) + if N * K == 0: return ious - try: - import cython_bbox - except Exception as e: - print('cython_bbox not found, please install cython_bbox.' - 'for example: `pip install cython_bbox`.') - exit() - - ious = cython_bbox.bbox_overlaps( - np.ascontiguousarray( - atlbrs, dtype=np.float), - np.ascontiguousarray( - btlbrs, dtype=np.float)) + + for k in range(K): + box_area = ((query_boxes[k, 2] - query_boxes[k, 0] + 1) * + (query_boxes[k, 3] - query_boxes[k, 1] + 1)) + for n in range(N): + iw = (min(boxes[n, 2], query_boxes[k, 2]) - max( + boxes[n, 0], query_boxes[k, 0]) + 1) + if iw > 0: + ih = (min(boxes[n, 3], query_boxes[k, 3]) - max( + boxes[n, 1], query_boxes[k, 1]) + 1) + if ih > 0: + ua = float((boxes[n, 2] - boxes[n, 0] + 1) * (boxes[ + n, 3] - boxes[n, 1] + 1) + box_area - iw * ih) + ious[n, k] = iw * ih / ua return ious @@ -98,7 +117,7 @@ def iou_distance(atracks, btracks): else: atlbrs = [track.tlbr for track in atracks] btlbrs = [track.tlbr for track in btracks] - _ious = cython_bbox_ious(atlbrs, btlbrs) + _ious = bbox_ious(atlbrs, btlbrs) cost_matrix = 1 - _ious return cost_matrix diff --git a/deploy/pptracking/python/mot/matching/ocsort_matching.py b/deploy/pptracking/python/mot/matching/ocsort_matching.py new file mode 100644 index 0000000000000000000000000000000000000000..b2428d020731a6f91e28f9962168390cf1b3a12f --- /dev/null +++ b/deploy/pptracking/python/mot/matching/ocsort_matching.py @@ -0,0 +1,127 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/noahcao/OC_SORT/blob/master/trackers/ocsort_tracker/association.py +""" + +import os +import numpy as np + + +def iou_batch(bboxes1, bboxes2): + """ + From SORT: Computes IOU between two bboxes in the form [x1,y1,x2,y2] + """ + bboxes2 = np.expand_dims(bboxes2, 0) + bboxes1 = np.expand_dims(bboxes1, 1) + + xx1 = np.maximum(bboxes1[..., 0], bboxes2[..., 0]) + yy1 = np.maximum(bboxes1[..., 1], bboxes2[..., 1]) + xx2 = np.minimum(bboxes1[..., 2], bboxes2[..., 2]) + yy2 = np.minimum(bboxes1[..., 3], bboxes2[..., 3]) + w = np.maximum(0., xx2 - xx1) + h = np.maximum(0., yy2 - yy1) + wh = w * h + o = wh / ((bboxes1[..., 2] - bboxes1[..., 0]) * + (bboxes1[..., 3] - bboxes1[..., 1]) + + (bboxes2[..., 2] - bboxes2[..., 0]) * + (bboxes2[..., 3] - bboxes2[..., 1]) - wh) + return (o) + + +def speed_direction_batch(dets, tracks): + tracks = tracks[..., np.newaxis] + CX1, CY1 = (dets[:, 0] + dets[:, 2]) / 2.0, (dets[:, 1] + dets[:, 3]) / 2.0 + CX2, CY2 = (tracks[:, 0] + tracks[:, 2]) / 2.0, ( + tracks[:, 1] + tracks[:, 3]) / 2.0 + dx = CX1 - CX2 + dy = CY1 - CY2 + norm = np.sqrt(dx**2 + dy**2) + 1e-6 + dx = dx / norm + dy = dy / norm + return dy, dx # size: num_track x num_det + + +def linear_assignment(cost_matrix): + try: + import lap + _, x, y = lap.lapjv(cost_matrix, extend_cost=True) + return np.array([[y[i], i] for i in x if i >= 0]) # + except ImportError: + from scipy.optimize import linear_sum_assignment + x, y = linear_sum_assignment(cost_matrix) + return np.array(list(zip(x, y))) + + +def associate(detections, trackers, iou_threshold, velocities, previous_obs, + vdc_weight): + if (len(trackers) == 0): + return np.empty( + (0, 2), dtype=int), np.arange(len(detections)), np.empty( + (0, 5), dtype=int) + + Y, X = speed_direction_batch(detections, previous_obs) + inertia_Y, inertia_X = velocities[:, 0], velocities[:, 1] + inertia_Y = np.repeat(inertia_Y[:, np.newaxis], Y.shape[1], axis=1) + inertia_X = np.repeat(inertia_X[:, np.newaxis], X.shape[1], axis=1) + diff_angle_cos = inertia_X * X + inertia_Y * Y + diff_angle_cos = np.clip(diff_angle_cos, a_min=-1, a_max=1) + diff_angle = np.arccos(diff_angle_cos) + diff_angle = (np.pi / 2.0 - np.abs(diff_angle)) / np.pi + + valid_mask = np.ones(previous_obs.shape[0]) + valid_mask[np.where(previous_obs[:, 4] < 0)] = 0 + + iou_matrix = iou_batch(detections, trackers) + scores = np.repeat( + detections[:, -1][:, np.newaxis], trackers.shape[0], axis=1) + # iou_matrix = iou_matrix * scores # a trick sometiems works, we don't encourage this + valid_mask = np.repeat(valid_mask[:, np.newaxis], X.shape[1], axis=1) + + angle_diff_cost = (valid_mask * diff_angle) * vdc_weight + angle_diff_cost = angle_diff_cost.T + angle_diff_cost = angle_diff_cost * scores + + if min(iou_matrix.shape) > 0: + a = (iou_matrix > iou_threshold).astype(np.int32) + if a.sum(1).max() == 1 and a.sum(0).max() == 1: + matched_indices = np.stack(np.where(a), axis=1) + else: + matched_indices = linear_assignment(-(iou_matrix + angle_diff_cost)) + else: + matched_indices = np.empty(shape=(0, 2)) + + unmatched_detections = [] + for d, det in enumerate(detections): + if (d not in matched_indices[:, 0]): + unmatched_detections.append(d) + unmatched_trackers = [] + for t, trk in enumerate(trackers): + if (t not in matched_indices[:, 1]): + unmatched_trackers.append(t) + + # filter out matched with low IOU + matches = [] + for m in matched_indices: + if (iou_matrix[m[0], m[1]] < iou_threshold): + unmatched_detections.append(m[0]) + unmatched_trackers.append(m[1]) + else: + matches.append(m.reshape(1, 2)) + if (len(matches) == 0): + matches = np.empty((0, 2), dtype=int) + else: + matches = np.concatenate(matches, axis=0) + + return matches, np.array(unmatched_detections), np.array(unmatched_trackers) diff --git a/deploy/pptracking/python/mot/mtmct/camera_utils.py b/deploy/pptracking/python/mot/mtmct/camera_utils.py index e11472637c8615a370f8edb3b4d58ae2735fa509..445e6386cff826742e8f7f7d5171ca247e148b67 100644 --- a/deploy/pptracking/python/mot/mtmct/camera_utils.py +++ b/deploy/pptracking/python/mot/mtmct/camera_utils.py @@ -19,7 +19,13 @@ Note: The following codes are strongly related to camera parameters of the AIC21 """ import numpy as np -from sklearn.cluster import AgglomerativeClustering +try: + from sklearn.cluster import AgglomerativeClustering +except: + print( + 'Warning: Unable to use MTMCT in PP-Tracking, please install sklearn, for example: `pip install sklearn`' + ) + pass from .utils import get_dire, get_match, get_cid_tid, combin_feature, combin_cluster from .utils import normalize, intracam_ignore, visual_rerank diff --git a/deploy/pptracking/python/mot/mtmct/postprocess.py b/deploy/pptracking/python/mot/mtmct/postprocess.py index 7e338b901fa75716dacc2dc0560bbbeafe53573a..32be0466d0ed2899d01d192f82561f5e30cc9ad9 100644 --- a/deploy/pptracking/python/mot/mtmct/postprocess.py +++ b/deploy/pptracking/python/mot/mtmct/postprocess.py @@ -20,7 +20,13 @@ import re import cv2 from tqdm import tqdm import numpy as np -import motmetrics as mm +try: + import motmetrics as mm +except: + print( + 'Warning: Unable to use motmetrics in MTMCT in PP-Tracking, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) + pass from functools import reduce from .utils import parse_pt_gt, parse_pt, compare_dataframes_mtmc @@ -201,6 +207,12 @@ def print_mtmct_result(gt_file, pred_file): summary.loc[:, 'idr'] *= 100 summary.loc[:, 'idf1'] *= 100 summary.loc[:, 'mota'] *= 100 + try: + import motmetrics as mm + except Exception as e: + raise RuntimeError( + 'Unable to use motmetrics in MTMCT in PP-Tracking, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) print( mm.io.render_summary( summary, diff --git a/deploy/pptracking/python/mot/mtmct/utils.py b/deploy/pptracking/python/mot/mtmct/utils.py index ef7ec8be73fbd218363cc02695a1626fc66d71ae..f0b52aa67638b6659e18b470ae384720a68f5294 100644 --- a/deploy/pptracking/python/mot/mtmct/utils.py +++ b/deploy/pptracking/python/mot/mtmct/utils.py @@ -20,9 +20,6 @@ import re import cv2 import gc import numpy as np -from sklearn import preprocessing -from sklearn.cluster import AgglomerativeClustering -import motmetrics as mm import pandas as pd from tqdm import tqdm import warnings @@ -195,10 +192,10 @@ def find_topk(a, k, axis=-1, largest=True, sorted=True): a = np.asanyarray(a) if largest: - index_array = np.argpartition(a, axis_size-k, axis=axis) - topk_indices = np.take(index_array, -np.arange(k)-1, axis=axis) + index_array = np.argpartition(a, axis_size - k, axis=axis) + topk_indices = np.take(index_array, -np.arange(k) - 1, axis=axis) else: - index_array = np.argpartition(a, k-1, axis=axis) + index_array = np.argpartition(a, k - 1, axis=axis) topk_indices = np.take(index_array, np.arange(k), axis=axis) topk_values = np.take_along_axis(a, topk_indices, axis=axis) if sorted: @@ -228,7 +225,8 @@ def batch_numpy_topk(qf, gf, k1, N=6000): temp_qd = temp_qd / (np.max(temp_qd, axis=0)[0]) temp_qd = temp_qd.T initial_rank.append( - find_topk(temp_qd, k=k1, axis=1, largest=False, sorted=True)[1]) + find_topk( + temp_qd, k=k1, axis=1, largest=False, sorted=True)[1]) del temp_qd del temp_gf del temp_qf @@ -374,6 +372,12 @@ def visual_rerank(prb_feats, def normalize(nparray, axis=0): + try: + from sklearn import preprocessing + except Exception as e: + raise RuntimeError( + 'Unable to use sklearn in MTMCT in PP-Tracking, please install sklearn, for example: `pip install sklearn`' + ) nparray = preprocessing.normalize(nparray, norm='l2', axis=axis) return nparray @@ -453,6 +457,12 @@ def parse_pt_gt(mot_feature): # eval result def compare_dataframes_mtmc(gts, ts): + try: + import motmetrics as mm + except Exception as e: + raise RuntimeError( + 'Unable to use motmetrics in MTMCT in PP-Tracking, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) """Compute ID-based evaluation metrics for MTMCT Return: df (pandas.DataFrame): Results of the evaluations in a df with only the 'idf1', 'idp', and 'idr' columns. @@ -528,6 +538,12 @@ def get_labels(cid_tid_dict, use_ff=True, use_rerank=True, use_st_filter=False): + try: + from sklearn.cluster import AgglomerativeClustering + except Exception as e: + raise RuntimeError( + 'Unable to use sklearn in MTMCT in PP-Tracking, please install sklearn, for example: `pip install sklearn`' + ) # 1st cluster sim_matrix = get_sim_matrix( cid_tid_dict, diff --git a/deploy/pptracking/python/mot/mtmct/zone.py b/deploy/pptracking/python/mot/mtmct/zone.py index f079f64c1e99ca083ba34664a73c0f6ecbfbc655..f3fa162e573550a4f497fb7ddd3e578796df6176 100644 --- a/deploy/pptracking/python/mot/mtmct/zone.py +++ b/deploy/pptracking/python/mot/mtmct/zone.py @@ -21,7 +21,13 @@ Note: The following codes are strongly related to zone of the AIC21 test-set S06 import os import cv2 import numpy as np -from sklearn.cluster import AgglomerativeClustering +try: + from sklearn.cluster import AgglomerativeClustering +except: + print( + 'Warning: Unable to use MTMCT in PP-Tracking, please install sklearn, for example: `pip install sklearn`' + ) + pass BBOX_B = 10 / 15 diff --git a/deploy/pptracking/python/mot/tracker/__init__.py b/deploy/pptracking/python/mot/tracker/__init__.py index b74593b4126d878cd655326e58369f5b6f76a2ae..03a5dd0a169203b86edbc6c81a44a095ebe9b3cc 100644 --- a/deploy/pptracking/python/mot/tracker/__init__.py +++ b/deploy/pptracking/python/mot/tracker/__init__.py @@ -16,8 +16,10 @@ from . import base_jde_tracker from . import base_sde_tracker from . import jde_tracker from . import deepsort_tracker +from . import ocsort_tracker from .base_jde_tracker import * from .base_sde_tracker import * from .jde_tracker import * from .deepsort_tracker import * +from .ocsort_tracker import * diff --git a/deploy/pptracking/python/mot/tracker/jde_tracker.py b/deploy/pptracking/python/mot/tracker/jde_tracker.py index 8549801febb198fadece87ed043bf106436134ae..f412842a0205036e02de60018fe86331a2e5d9b7 100644 --- a/deploy/pptracking/python/mot/tracker/jde_tracker.py +++ b/deploy/pptracking/python/mot/tracker/jde_tracker.py @@ -38,7 +38,7 @@ class JDETracker(object): track_buffer (int): buffer for tracker min_box_area (int): min box area to filter out low quality boxes vertical_ratio (float): w/h, the vertical ratio of the bbox to filter - bad results. If set <0 means no need to filter bboxes,usually set + bad results. If set <= 0 means no need to filter bboxes,usually set 1.6 for pedestrian tracking. tracked_thresh (float): linear assignment threshold of tracked stracks and detections @@ -64,8 +64,8 @@ class JDETracker(object): num_classes=1, det_thresh=0.3, track_buffer=30, - min_box_area=200, - vertical_ratio=1.6, + min_box_area=0, + vertical_ratio=0, tracked_thresh=0.7, r_tracked_thresh=0.5, unconfirmed_thresh=0.7, @@ -116,7 +116,7 @@ class JDETracker(object): Return: output_stracks_dict (dict(list)): The list contains information - regarding the online_tracklets for the recieved image tensor. + regarding the online_tracklets for the received image tensor. """ self.frame_id += 1 if self.frame_id == 1: @@ -161,9 +161,8 @@ class JDETracker(object): detections = [ STrack( STrack.tlbr_to_tlwh(tlbrs[2:6]), tlbrs[1], cls_id, - 30, temp_feat) - for (tlbrs, temp_feat - ) in zip(pred_dets_cls, pred_embs_cls) + 30, temp_feat) for (tlbrs, temp_feat) in + zip(pred_dets_cls, pred_embs_cls) ] else: detections = [] @@ -238,15 +237,13 @@ class JDETracker(object): for tlbrs in pred_dets_cls_second ] else: - pred_embs_cls_second = pred_embs_dict[cls_id][inds_second] + pred_embs_cls_second = pred_embs_dict[cls_id][ + inds_second] detections_second = [ STrack( - STrack.tlbr_to_tlwh(tlbrs[2:6]), - tlbrs[1], - cls_id, - 30, - temp_feat) - for (tlbrs, temp_feat) in zip(pred_dets_cls_second, pred_embs_cls_second) + STrack.tlbr_to_tlwh(tlbrs[2:6]), tlbrs[1], + cls_id, 30, temp_feat) for (tlbrs, temp_feat) in + zip(pred_dets_cls_second, pred_embs_cls_second) ] else: detections_second = [] diff --git a/deploy/pptracking/python/mot/tracker/ocsort_tracker.py b/deploy/pptracking/python/mot/tracker/ocsort_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..919af5b23349b7423f9bee45b38941a4024ac777 --- /dev/null +++ b/deploy/pptracking/python/mot/tracker/ocsort_tracker.py @@ -0,0 +1,366 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/noahcao/OC_SORT/blob/master/trackers/ocsort_tracker/ocsort.py +""" + +import numpy as np +try: + from filterpy.kalman import KalmanFilter +except: + print( + 'Warning: Unable to use OC-SORT, please install filterpy, for example: `pip install filterpy`, see https://github.com/rlabbe/filterpy' + ) + pass + +from ..matching.ocsort_matching import associate, linear_assignment, iou_batch + + +def k_previous_obs(observations, cur_age, k): + if len(observations) == 0: + return [-1, -1, -1, -1, -1] + for i in range(k): + dt = k - i + if cur_age - dt in observations: + return observations[cur_age - dt] + max_age = max(observations.keys()) + return observations[max_age] + + +def convert_bbox_to_z(bbox): + """ + Takes a bounding box in the form [x1,y1,x2,y2] and returns z in the form + [x,y,s,r] where x,y is the centre of the box and s is the scale/area and r is + the aspect ratio + """ + w = bbox[2] - bbox[0] + h = bbox[3] - bbox[1] + x = bbox[0] + w / 2. + y = bbox[1] + h / 2. + s = w * h # scale is just area + r = w / float(h + 1e-6) + return np.array([x, y, s, r]).reshape((4, 1)) + + +def convert_x_to_bbox(x, score=None): + """ + Takes a bounding box in the centre form [x,y,s,r] and returns it in the form + [x1,y1,x2,y2] where x1,y1 is the top left and x2,y2 is the bottom right + """ + w = np.sqrt(x[2] * x[3]) + h = x[2] / w + if (score == None): + return np.array( + [x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., + x[1] + h / 2.]).reshape((1, 4)) + else: + score = np.array([score]) + return np.array([ + x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., x[1] + h / 2., score + ]).reshape((1, 5)) + + +def speed_direction(bbox1, bbox2): + cx1, cy1 = (bbox1[0] + bbox1[2]) / 2.0, (bbox1[1] + bbox1[3]) / 2.0 + cx2, cy2 = (bbox2[0] + bbox2[2]) / 2.0, (bbox2[1] + bbox2[3]) / 2.0 + speed = np.array([cy2 - cy1, cx2 - cx1]) + norm = np.sqrt((cy2 - cy1)**2 + (cx2 - cx1)**2) + 1e-6 + return speed / norm + + +class KalmanBoxTracker(object): + """ + This class represents the internal state of individual tracked objects observed as bbox. + + Args: + bbox (np.array): bbox in [x1,y1,x2,y2,score] format. + delta_t (int): delta_t of previous observation + """ + count = 0 + + def __init__(self, bbox, delta_t=3): + try: + from filterpy.kalman import KalmanFilter + except Exception as e: + raise RuntimeError( + 'Unable to use OC-SORT, please install filterpy, for example: `pip install filterpy`, see https://github.com/rlabbe/filterpy' + ) + self.kf = KalmanFilter(dim_x=7, dim_z=4) + self.kf.F = np.array([[1, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 1, 0], + [0, 0, 1, 0, 0, 0, 1], [0, 0, 0, 1, 0, 0, 0], + [0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 1, 0], + [0, 0, 0, 0, 0, 0, 1]]) + self.kf.H = np.array([[1, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0], + [0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0]]) + self.kf.R[2:, 2:] *= 10. + self.kf.P[4:, 4:] *= 1000. + # give high uncertainty to the unobservable initial velocities + self.kf.P *= 10. + self.kf.Q[-1, -1] *= 0.01 + self.kf.Q[4:, 4:] *= 0.01 + + self.score = bbox[4] + self.kf.x[:4] = convert_bbox_to_z(bbox) + self.time_since_update = 0 + self.id = KalmanBoxTracker.count + KalmanBoxTracker.count += 1 + self.history = [] + self.hits = 0 + self.hit_streak = 0 + self.age = 0 + """ + NOTE: [-1,-1,-1,-1,-1] is a compromising placeholder for non-observation status, the same for the return of + function k_previous_obs. It is ugly and I do not like it. But to support generate observation array in a + fast and unified way, which you would see below k_observations = np.array([k_previous_obs(...]]), let's bear it for now. + """ + self.last_observation = np.array([-1, -1, -1, -1, -1]) # placeholder + self.observations = dict() + self.history_observations = [] + self.velocity = None + self.delta_t = delta_t + + def update(self, bbox): + """ + Updates the state vector with observed bbox. + """ + if bbox is not None: + if self.last_observation.sum() >= 0: # no previous observation + previous_box = None + for i in range(self.delta_t): + dt = self.delta_t - i + if self.age - dt in self.observations: + previous_box = self.observations[self.age - dt] + break + if previous_box is None: + previous_box = self.last_observation + """ + Estimate the track speed direction with observations \Delta t steps away + """ + self.velocity = speed_direction(previous_box, bbox) + """ + Insert new observations. This is a ugly way to maintain both self.observations + and self.history_observations. Bear it for the moment. + """ + self.last_observation = bbox + self.observations[self.age] = bbox + self.history_observations.append(bbox) + + self.time_since_update = 0 + self.history = [] + self.hits += 1 + self.hit_streak += 1 + self.kf.update(convert_bbox_to_z(bbox)) + else: + self.kf.update(bbox) + + def predict(self): + """ + Advances the state vector and returns the predicted bounding box estimate. + """ + if ((self.kf.x[6] + self.kf.x[2]) <= 0): + self.kf.x[6] *= 0.0 + + self.kf.predict() + self.age += 1 + if (self.time_since_update > 0): + self.hit_streak = 0 + self.time_since_update += 1 + self.history.append(convert_x_to_bbox(self.kf.x, score=self.score)) + return self.history[-1] + + def get_state(self): + return convert_x_to_bbox(self.kf.x, score=self.score) + + +class OCSORTTracker(object): + """ + OCSORT tracker, support single class + + Args: + det_thresh (float): threshold of detection score + max_age (int): maximum number of missed misses before a track is deleted + min_hits (int): minimum hits for associate + iou_threshold (float): iou threshold for associate + delta_t (int): delta_t of previous observation + inertia (float): vdc_weight of angle_diff_cost for associate + vertical_ratio (float): w/h, the vertical ratio of the bbox to filter + bad results. If set <= 0 means no need to filter bboxes,usually set + 1.6 for pedestrian tracking. + min_box_area (int): min box area to filter out low quality boxes + use_byte (bool): Whether use ByteTracker, default False + """ + + def __init__(self, + det_thresh=0.6, + max_age=30, + min_hits=3, + iou_threshold=0.3, + delta_t=3, + inertia=0.2, + vertical_ratio=-1, + min_box_area=0, + use_byte=False): + self.det_thresh = det_thresh + self.max_age = max_age + self.min_hits = min_hits + self.iou_threshold = iou_threshold + self.delta_t = delta_t + self.inertia = inertia + self.vertical_ratio = vertical_ratio + self.min_box_area = min_box_area + self.use_byte = use_byte + + self.trackers = [] + self.frame_count = 0 + KalmanBoxTracker.count = 0 + + def update(self, pred_dets, pred_embs=None): + """ + Args: + pred_dets (np.array): Detection results of the image, the shape is + [N, 6], means 'cls_id, score, x0, y0, x1, y1'. + pred_embs (np.array): Embedding results of the image, the shape is + [N, 128] or [N, 512], default as None. + + Return: + tracking boxes (np.array): [M, 6], means 'x0, y0, x1, y1, score, id'. + """ + if pred_dets is None: + return np.empty((0, 6)) + + self.frame_count += 1 + + bboxes = pred_dets[:, 2:] + scores = pred_dets[:, 1:2] + dets = np.concatenate((bboxes, scores), axis=1) + scores = scores.squeeze(-1) + + inds_low = scores > 0.1 + inds_high = scores < self.det_thresh + inds_second = np.logical_and(inds_low, inds_high) + # self.det_thresh > score > 0.1, for second matching + dets_second = dets[inds_second] # detections for second matching + remain_inds = scores > self.det_thresh + dets = dets[remain_inds] + + # get predicted locations from existing trackers. + trks = np.zeros((len(self.trackers), 5)) + to_del = [] + ret = [] + for t, trk in enumerate(trks): + pos = self.trackers[t].predict()[0] + trk[:] = [pos[0], pos[1], pos[2], pos[3], 0] + if np.any(np.isnan(pos)): + to_del.append(t) + trks = np.ma.compress_rows(np.ma.masked_invalid(trks)) + for t in reversed(to_del): + self.trackers.pop(t) + + velocities = np.array([ + trk.velocity if trk.velocity is not None else np.array((0, 0)) + for trk in self.trackers + ]) + last_boxes = np.array([trk.last_observation for trk in self.trackers]) + k_observations = np.array([ + k_previous_obs(trk.observations, trk.age, self.delta_t) + for trk in self.trackers + ]) + """ + First round of association + """ + matched, unmatched_dets, unmatched_trks = associate( + dets, trks, self.iou_threshold, velocities, k_observations, + self.inertia) + for m in matched: + self.trackers[m[1]].update(dets[m[0], :]) + """ + Second round of associaton by OCR + """ + # BYTE association + if self.use_byte and len(dets_second) > 0 and unmatched_trks.shape[ + 0] > 0: + u_trks = trks[unmatched_trks] + iou_left = iou_batch( + dets_second, + u_trks) # iou between low score detections and unmatched tracks + iou_left = np.array(iou_left) + if iou_left.max() > self.iou_threshold: + """ + NOTE: by using a lower threshold, e.g., self.iou_threshold - 0.1, you may + get a higher performance especially on MOT17/MOT20 datasets. But we keep it + uniform here for simplicity + """ + matched_indices = linear_assignment(-iou_left) + to_remove_trk_indices = [] + for m in matched_indices: + det_ind, trk_ind = m[0], unmatched_trks[m[1]] + if iou_left[m[0], m[1]] < self.iou_threshold: + continue + self.trackers[trk_ind].update(dets_second[det_ind, :]) + to_remove_trk_indices.append(trk_ind) + unmatched_trks = np.setdiff1d(unmatched_trks, + np.array(to_remove_trk_indices)) + + if unmatched_dets.shape[0] > 0 and unmatched_trks.shape[0] > 0: + left_dets = dets[unmatched_dets] + left_trks = last_boxes[unmatched_trks] + iou_left = iou_batch(left_dets, left_trks) + iou_left = np.array(iou_left) + if iou_left.max() > self.iou_threshold: + """ + NOTE: by using a lower threshold, e.g., self.iou_threshold - 0.1, you may + get a higher performance especially on MOT17/MOT20 datasets. But we keep it + uniform here for simplicity + """ + rematched_indices = linear_assignment(-iou_left) + to_remove_det_indices = [] + to_remove_trk_indices = [] + for m in rematched_indices: + det_ind, trk_ind = unmatched_dets[m[0]], unmatched_trks[m[ + 1]] + if iou_left[m[0], m[1]] < self.iou_threshold: + continue + self.trackers[trk_ind].update(dets[det_ind, :]) + to_remove_det_indices.append(det_ind) + to_remove_trk_indices.append(trk_ind) + unmatched_dets = np.setdiff1d(unmatched_dets, + np.array(to_remove_det_indices)) + unmatched_trks = np.setdiff1d(unmatched_trks, + np.array(to_remove_trk_indices)) + + for m in unmatched_trks: + self.trackers[m].update(None) + + # create and initialise new trackers for unmatched detections + for i in unmatched_dets: + trk = KalmanBoxTracker(dets[i, :], delta_t=self.delta_t) + self.trackers.append(trk) + i = len(self.trackers) + for trk in reversed(self.trackers): + if trk.last_observation.sum() < 0: + d = trk.get_state()[0] + else: + d = trk.last_observation # tlbr + score + if (trk.time_since_update < 1) and ( + trk.hit_streak >= self.min_hits or + self.frame_count <= self.min_hits): + # +1 as MOT benchmark requires positive + ret.append(np.concatenate((d, [trk.id + 1])).reshape(1, -1)) + i -= 1 + # remove dead tracklet + if (trk.time_since_update > self.max_age): + self.trackers.pop(i) + if (len(ret) > 0): + return np.concatenate(ret) + return np.empty((0, 6)) diff --git a/deploy/pptracking/python/mot/utils.py b/deploy/pptracking/python/mot/utils.py index 8bb380af0874e9ee795f7616cc14c0abf55eb320..503589d8185aad91dfb2bad7f9032eacabefac4b 100644 --- a/deploy/pptracking/python/mot/utils.py +++ b/deploy/pptracking/python/mot/utils.py @@ -211,6 +211,8 @@ def preprocess_reid(imgs, def flow_statistic(result, secs_interval, do_entrance_counting, + do_break_in_counting, + region_type, video_fps, entrance, id_set, @@ -221,39 +223,84 @@ def flow_statistic(result, records, data_type='mot', num_classes=1): - # Count in and out number: - # Use horizontal center line as the entrance just for simplification. - # If a person located in the above the horizontal center line - # at the previous frame and is in the below the line at the current frame, - # the in number is increased by one. - # If a person was in the below the horizontal center line - # at the previous frame and locates in the below the line at the current frame, - # the out number is increased by one. - # TODO: if the entrance is not the horizontal center line, - # the counting method should be optimized. + # Count in/out number: + # Note that 'region_type' should be one of ['horizontal', 'vertical', 'custom'], + # 'horizontal' and 'vertical' means entrance is the center line as the entrance when do_entrance_counting, + # 'custom' means entrance is a region defined by users when do_break_in_counting. + if do_entrance_counting: - entrance_y = entrance[1] # xmin, ymin, xmax, ymax + assert region_type in [ + 'horizontal', 'vertical' + ], "region_type should be 'horizontal' or 'vertical' when do entrance counting." + entrance_x, entrance_y = entrance[0], entrance[1] frame_id, tlwhs, tscores, track_ids = result for tlwh, score, track_id in zip(tlwhs, tscores, track_ids): if track_id < 0: continue if data_type == 'kitti': frame_id -= 1 - x1, y1, w, h = tlwh center_x = x1 + w / 2. center_y = y1 + h / 2. if track_id in prev_center: - if prev_center[track_id][1] <= entrance_y and \ - center_y > entrance_y: - in_id_list.append(track_id) - if prev_center[track_id][1] >= entrance_y and \ - center_y < entrance_y: - out_id_list.append(track_id) + if region_type == 'horizontal': + # horizontal center line + if prev_center[track_id][1] <= entrance_y and \ + center_y > entrance_y: + in_id_list.append(track_id) + if prev_center[track_id][1] >= entrance_y and \ + center_y < entrance_y: + out_id_list.append(track_id) + else: + # vertical center line + if prev_center[track_id][0] <= entrance_x and \ + center_x > entrance_x: + in_id_list.append(track_id) + if prev_center[track_id][0] >= entrance_x and \ + center_x < entrance_x: + out_id_list.append(track_id) prev_center[track_id][0] = center_x prev_center[track_id][1] = center_y else: prev_center[track_id] = [center_x, center_y] - # Count totol number, number at a manual-setting interval + + if do_break_in_counting: + assert region_type in [ + 'custom' + ], "region_type should be 'custom' when do break_in counting." + assert len( + entrance + ) >= 4, "entrance should be at least 3 points and (w,h) of image when do break_in counting." + im_w, im_h = entrance[-1][:] + entrance = np.array(entrance[:-1]) + + frame_id, tlwhs, tscores, track_ids = result + for tlwh, score, track_id in zip(tlwhs, tscores, track_ids): + if track_id < 0: continue + if data_type == 'kitti': + frame_id -= 1 + x1, y1, w, h = tlwh + center_x = min(x1 + w / 2., im_w - 1) + center_down_y = min(y1 + h, im_h - 1) + + # counting objects in region of the first frame + if frame_id == 1: + if in_quadrangle([center_x, center_down_y], entrance, im_h, + im_w): + in_id_list.append(-1) + else: + prev_center[track_id] = [center_x, center_down_y] + else: + if track_id in prev_center: + if not in_quadrangle(prev_center[track_id], entrance, im_h, + im_w) and in_quadrangle( + [center_x, center_down_y], + entrance, im_h, im_w): + in_id_list.append(track_id) + prev_center[track_id] = [center_x, center_down_y] + else: + prev_center[track_id] = [center_x, center_down_y] + +# Count totol number, number at a manual-setting interval frame_id, tlwhs, tscores, track_ids = result for tlwh, score, track_id in zip(tlwhs, tscores, track_ids): if track_id < 0: continue @@ -268,6 +315,8 @@ def flow_statistic(result, if do_entrance_counting: info += ", In count: {}, Out count: {}".format( len(in_id_list), len(out_id_list)) + if do_break_in_counting: + info += ", Break_in count: {}".format(len(in_id_list)) if frame_id % video_fps == 0 and frame_id / video_fps % secs_interval == 0: info += ", Count during {} secs: {}".format(secs_interval, curr_interval_count) @@ -282,5 +331,15 @@ def flow_statistic(result, "in_id_list": in_id_list, "out_id_list": out_id_list, "prev_center": prev_center, - "records": records + "records": records, } + + +def in_quadrangle(point, entrance, im_h, im_w): + mask = np.zeros((im_h, im_w, 1), np.uint8) + cv2.fillPoly(mask, [entrance], 255) + p = tuple(map(int, point)) + if mask[p[1], p[0], :] > 0: + return True + else: + return False diff --git a/deploy/pptracking/python/mot/visualize.py b/deploy/pptracking/python/mot/visualize.py index bf39d6db8157175259b8940c5a74a8e500fe38f1..9a4cb5806c769fb8c97177cfd9b06a1163f78944 100644 --- a/deploy/pptracking/python/mot/visualize.py +++ b/deploy/pptracking/python/mot/visualize.py @@ -193,11 +193,14 @@ def plot_tracking_dict(image, fps=0., ids2names=[], do_entrance_counting=False, + do_break_in_counting=False, entrance=None, records=None, center_traj=None): im = np.ascontiguousarray(np.copy(image)) im_h, im_w = im.shape[:2] + if do_break_in_counting: + entrance = np.array(entrance[:-1]) # last pair is [im_w, im_h] text_scale = max(0.5, image.shape[1] / 3000.) text_thickness = 2 @@ -231,6 +234,30 @@ def plot_tracking_dict(image, text_scale, (0, 0, 255), thickness=text_thickness) + if num_classes == 1 and do_break_in_counting: + np_masks = np.zeros((im_h, im_w, 1), np.uint8) + cv2.fillPoly(np_masks, [entrance], 255) + + # Draw region mask + alpha = 0.3 + im = np.array(im).astype('float32') + mask = np_masks[:, :, 0] + color_mask = [0, 0, 255] + idx = np.nonzero(mask) + color_mask = np.array(color_mask) + im[idx[0], idx[1], :] *= 1.0 - alpha + im[idx[0], idx[1], :] += alpha * color_mask + im = np.array(im).astype('uint8') + + # find start location for break in counting data + start = records[-1].find('Break_in') + cv2.putText( + im, + records[-1][start:-1], (entrance[0][0] - 10, entrance[0][1] - 10), + cv2.FONT_ITALIC, + text_scale, (0, 0, 255), + thickness=text_thickness) + for cls_id in range(num_classes): tlwhs = tlwhs_dict[cls_id] obj_ids = obj_ids_dict[cls_id] @@ -262,7 +289,17 @@ def plot_tracking_dict(image, id_text = 'class{}_{}'.format(cls_id, id_text) _line_thickness = 1 if obj_id <= 0 else line_thickness - color = get_color(abs(obj_id)) + + in_region = False + if do_break_in_counting: + center_x = min(x1 + w / 2., im_w - 1) + center_down_y = min(y1 + h, im_h - 1) + if in_quadrangle([center_x, center_down_y], entrance, im_h, + im_w): + in_region = True + + color = get_color(abs(obj_id)) if in_region == False else (0, 0, + 255) cv2.rectangle( im, intbox[0:2], @@ -273,16 +310,26 @@ def plot_tracking_dict(image, im, id_text, (intbox[0], intbox[1] - 25), cv2.FONT_ITALIC, - text_scale, (0, 255, 255), + text_scale, + color, thickness=text_thickness) + if do_break_in_counting and in_region: + cv2.putText( + im, + 'Break in now.', (intbox[0], intbox[1] - 50), + cv2.FONT_ITALIC, + text_scale, (0, 0, 255), + thickness=text_thickness) + if scores is not None: text = 'score: {:.2f}'.format(float(scores[i])) cv2.putText( im, text, (intbox[0], intbox[1] - 6), cv2.FONT_ITALIC, - text_scale, (0, 255, 0), + text_scale, + color, thickness=text_thickness) if center_traj is not None: for traj in center_traj: @@ -292,3 +339,13 @@ def plot_tracking_dict(image, for point in traj[i]: cv2.circle(im, point, 3, (0, 0, 255), -1) return im + + +def in_quadrangle(point, entrance, im_h, im_w): + mask = np.zeros((im_h, im_w, 1), np.uint8) + cv2.fillPoly(mask, [entrance], 255) + p = tuple(map(int, point)) + if mask[p[1], p[0], :] > 0: + return True + else: + return False diff --git a/deploy/pptracking/python/mot_jde_infer.py b/deploy/pptracking/python/mot_jde_infer.py index afabf5f4b6a573cb8a97af757dc92dafb29a76b2..8809b2497fcabb43396509e702d09afcc8b1ee79 100644 --- a/deploy/pptracking/python/mot_jde_infer.py +++ b/deploy/pptracking/python/mot_jde_infer.py @@ -64,28 +64,39 @@ class JDE_Detector(Detector): do_entrance_counting(bool): Whether counting the numbers of identifiers entering or getting out from the entrance, default as False,only support single class counting in MOT. + do_break_in_counting(bool): Whether counting the numbers of identifiers break in + the area, default as False,only support single class counting in MOT, + and the video should be taken by a static camera. + region_type (str): Area type for entrance counting or break in counting, 'horizontal' + and 'vertical' used when do entrance counting. 'custom' used when do break in counting. + Note that only support single-class MOT, and the video should be taken by a static camera. + region_polygon (list): Clockwise point coords (x0,y0,x1,y1...) of polygon of area when + do_break_in_counting. Note that only support single-class MOT and + the video should be taken by a static camera. """ - def __init__( - self, - model_dir, - tracker_config=None, - device='CPU', - run_mode='paddle', - batch_size=1, - trt_min_shape=1, - trt_max_shape=1088, - trt_opt_shape=608, - trt_calib_mode=False, - cpu_threads=1, - enable_mkldnn=False, - output_dir='output', - threshold=0.5, - save_images=False, - save_mot_txts=False, - draw_center_traj=False, - secs_interval=10, - do_entrance_counting=False, ): + def __init__(self, + model_dir, + tracker_config=None, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1088, + trt_opt_shape=608, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + save_images=False, + save_mot_txts=False, + draw_center_traj=False, + secs_interval=10, + do_entrance_counting=False, + do_break_in_counting=False, + region_type='horizontal', + region_polygon=[]): super(JDE_Detector, self).__init__( model_dir=model_dir, device=device, @@ -104,6 +115,13 @@ class JDE_Detector(Detector): self.draw_center_traj = draw_center_traj self.secs_interval = secs_interval self.do_entrance_counting = do_entrance_counting + self.do_break_in_counting = do_break_in_counting + self.region_type = region_type + self.region_polygon = region_polygon + if self.region_type == 'custom': + assert len( + self.region_polygon + ) > 6, 'region_type is custom, region_polygon should be at least 3 pairs of point coords.' assert batch_size == 1, "MOT model only supports batch_size=1." self.det_times = Timer(with_tracker=True) @@ -112,8 +130,8 @@ class JDE_Detector(Detector): # tracker config assert self.pred_config.tracker, "The exported JDE Detector model should have tracker." cfg = self.pred_config.tracker - min_box_area = cfg.get('min_box_area', 200) - vertical_ratio = cfg.get('vertical_ratio', 1.6) + min_box_area = cfg.get('min_box_area', 0.0) + vertical_ratio = cfg.get('vertical_ratio', 0.0) conf_thres = cfg.get('conf_thres', 0.0) tracked_thresh = cfg.get('tracked_thresh', 0.7) metric_type = cfg.get('metric_type', 'euclidean') @@ -164,7 +182,7 @@ class JDE_Detector(Detector): repeats (int): repeats number for prediction Returns: result (dict): include 'pred_dets': np.ndarray: shape:[N,6], N: number of box, - matix element:[x_min, y_min, x_max, y_max, score, class] + matix element:[class, score, x_min, y_min, x_max, y_max] FairMOT(JDE)'s result include 'pred_embs': np.ndarray: shape: [N, 128] ''' @@ -310,7 +328,24 @@ class JDE_Detector(Detector): out_id_list = list() prev_center = dict() records = list() - entrance = [0, height / 2., width, height / 2.] + if self.do_entrance_counting or self.do_break_in_counting: + if self.region_type == 'horizontal': + entrance = [0, height / 2., width, height / 2.] + elif self.region_type == 'vertical': + entrance = [width / 2, 0., width / 2, height] + elif self.region_type == 'custom': + entrance = [] + assert len( + self.region_polygon + ) % 2 == 0, "region_polygon should be pairs of coords points when do break_in counting." + for i in range(0, len(self.region_polygon), 2): + entrance.append([ + self.region_polygon[i], self.region_polygon[i + 1] + ]) + entrance.append([width, height]) + else: + raise ValueError("region_type:{} is not supported.".format( + self.region_type)) video_fps = fps @@ -340,8 +375,9 @@ class JDE_Detector(Detector): online_ids[0]) statistic = flow_statistic( result, self.secs_interval, self.do_entrance_counting, - video_fps, entrance, id_set, interval_id_set, in_id_list, - out_id_list, prev_center, records, data_type, num_classes) + self.do_break_in_counting, self.region_type, video_fps, + entrance, id_set, interval_id_set, in_id_list, out_id_list, + prev_center, records, data_type, num_classes) records = statistic['records'] fps = 1. / timer.duration @@ -403,7 +439,10 @@ def main(): save_mot_txts=FLAGS.save_mot_txts, draw_center_traj=FLAGS.draw_center_traj, secs_interval=FLAGS.secs_interval, - do_entrance_counting=FLAGS.do_entrance_counting, ) + do_entrance_counting=FLAGS.do_entrance_counting, + do_break_in_counting=FLAGS.do_break_in_counting, + region_type=FLAGS.region_type, + region_polygon=FLAGS.region_polygon) # predict from video file or camera video stream if FLAGS.video_file is not None or FLAGS.camera_id != -1: diff --git a/deploy/pptracking/python/mot_sde_infer.py b/deploy/pptracking/python/mot_sde_infer.py index 62907ba240a34facc1264ebd3b1092c66dcdef99..b62905e2aecfc4aaacb8bfc452198b50bf94ef88 100644 --- a/deploy/pptracking/python/mot_sde_infer.py +++ b/deploy/pptracking/python/mot_sde_infer.py @@ -32,7 +32,7 @@ sys.path.insert(0, parent_path) from det_infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig, load_predictor from mot_utils import argsparser, Timer, get_current_memory_mb, video2frames, _is_valid_video -from mot.tracker import JDETracker, DeepSORTTracker +from mot.tracker import JDETracker, DeepSORTTracker, OCSORTTracker from mot.utils import MOTTimer, write_mot_results, get_crops, clip_box, flow_statistic from mot.visualize import plot_tracking, plot_tracking_dict @@ -64,7 +64,16 @@ class SDE_Detector(Detector): secs_interval (int): The seconds interval to count after tracking, default as 10 do_entrance_counting(bool): Whether counting the numbers of identifiers entering or getting out from the entrance, default as False,only support single class - counting in MOT. + counting in MOT, and the video should be taken by a static camera. + do_break_in_counting(bool): Whether counting the numbers of identifiers break in + the area, default as False,only support single class counting in MOT, + and the video should be taken by a static camera. + region_type (str): Area type for entrance counting or break in counting, 'horizontal' + and 'vertical' used when do entrance counting. 'custom' used when do break in counting. + Note that only support single-class MOT, and the video should be taken by a static camera. + region_polygon (list): Clockwise point coords (x0,y0,x1,y1...) of polygon of area when + do_break_in_counting. Note that only support single-class MOT and + the video should be taken by a static camera. reid_model_dir (str): reid model dir, default None for ByteTrack, but set for DeepSORT mtmct_dir (str): MTMCT dir, default None, set for doing MTMCT """ @@ -88,6 +97,9 @@ class SDE_Detector(Detector): draw_center_traj=False, secs_interval=10, do_entrance_counting=False, + do_break_in_counting=False, + region_type='horizontal', + region_polygon=[], reid_model_dir=None, mtmct_dir=None): super(SDE_Detector, self).__init__( @@ -108,6 +120,13 @@ class SDE_Detector(Detector): self.draw_center_traj = draw_center_traj self.secs_interval = secs_interval self.do_entrance_counting = do_entrance_counting + self.do_break_in_counting = do_break_in_counting + self.region_type = region_type + self.region_polygon = region_polygon + if self.region_type == 'custom': + assert len( + self.region_polygon + ) > 6, 'region_type is custom, region_polygon should be at least 3 pairs of point coords.' assert batch_size == 1, "MOT model only supports batch_size=1." self.det_times = Timer(with_tracker=True) @@ -142,8 +161,10 @@ class SDE_Detector(Detector): # tracker config self.use_deepsort_tracker = True if tracker_cfg[ 'type'] == 'DeepSORTTracker' else False + self.use_ocsort_tracker = True if tracker_cfg[ + 'type'] == 'OCSORTTracker' else False + if self.use_deepsort_tracker: - # use DeepSORTTracker if self.reid_pred_config is not None and hasattr( self.reid_pred_config, 'tracker'): cfg = self.reid_pred_config.tracker @@ -161,12 +182,34 @@ class SDE_Detector(Detector): matching_threshold=matching_threshold, min_box_area=min_box_area, vertical_ratio=vertical_ratio, ) + + elif self.use_ocsort_tracker: + det_thresh = cfg.get('det_thresh', 0.4) + max_age = cfg.get('max_age', 30) + min_hits = cfg.get('min_hits', 3) + iou_threshold = cfg.get('iou_threshold', 0.3) + delta_t = cfg.get('delta_t', 3) + inertia = cfg.get('inertia', 0.2) + min_box_area = cfg.get('min_box_area', 0) + vertical_ratio = cfg.get('vertical_ratio', 0) + use_byte = cfg.get('use_byte', False) + + self.tracker = OCSORTTracker( + det_thresh=det_thresh, + max_age=max_age, + min_hits=min_hits, + iou_threshold=iou_threshold, + delta_t=delta_t, + inertia=inertia, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, + use_byte=use_byte) else: # use ByteTracker use_byte = cfg.get('use_byte', False) det_thresh = cfg.get('det_thresh', 0.3) - min_box_area = cfg.get('min_box_area', 200) - vertical_ratio = cfg.get('vertical_ratio', 1.6) + min_box_area = cfg.get('min_box_area', 0) + vertical_ratio = cfg.get('vertical_ratio', 0) match_thres = cfg.get('match_thres', 0.9) conf_thres = cfg.get('conf_thres', 0.6) low_conf_thres = cfg.get('low_conf_thres', 0.1) @@ -186,7 +229,9 @@ class SDE_Detector(Detector): def postprocess(self, inputs, result): # postprocess output of predictor - np_boxes_num = result['boxes_num'] + keep_idx = result['boxes'][:, 1] > self.threshold + result['boxes'] = result['boxes'][keep_idx] + np_boxes_num = [len(result['boxes'])] if np_boxes_num[0] <= 0: print('[WARNNING] No object detected.') result = {'boxes': np.zeros([0, 6]), 'boxes_num': [0]} @@ -194,7 +239,7 @@ class SDE_Detector(Detector): return result def reidprocess(self, det_results, repeats=1): - pred_dets = det_results['boxes'] + pred_dets = det_results['boxes'] # cls_id, score, x0, y0, x1, y1 pred_xyxys = pred_dets[:, 2:6] ori_image = det_results['ori_image'] @@ -234,7 +279,7 @@ class SDE_Detector(Detector): return det_results def tracking(self, det_results): - pred_dets = det_results['boxes'] + pred_dets = det_results['boxes'] # cls_id, score, x0, y0, x1, y1 pred_embs = det_results.get('embeddings', None) if self.use_deepsort_tracker: @@ -281,6 +326,32 @@ class SDE_Detector(Detector): feat_data['feat'] = _feat tracking_outs['feat_data'].update({_imgname: feat_data}) return tracking_outs + + elif self.use_ocsort_tracker: + # use OCSORTTracker, only support singe class + online_targets = self.tracker.update(pred_dets, pred_embs) + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + for t in online_targets: + tlwh = [t[0], t[1], t[2] - t[0], t[3] - t[1]] + tscore = float(t[4]) + tid = int(t[5]) + if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: continue + if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > self.tracker.vertical_ratio: + continue + if tlwh[2] * tlwh[3] > 0: + online_tlwhs[0].append(tlwh) + online_ids[0].append(tid) + online_scores[0].append(tscore) + tracking_outs = { + 'online_tlwhs': online_tlwhs, + 'online_scores': online_scores, + 'online_ids': online_ids, + } + return tracking_outs + else: # use ByteTracker, support multiple class online_tlwhs = defaultdict(list) @@ -441,14 +512,15 @@ class SDE_Detector(Detector): online_ids, online_scores, frame_id=frame_id, - ids2names=[]) + ids2names=ids2names) else: im = plot_tracking( frame, online_tlwhs, online_ids, online_scores, - frame_id=frame_id) + frame_id=frame_id, + ids2names=ids2names) save_dir = os.path.join(self.output_dir, seq_name) if not os.path.exists(save_dir): os.makedirs(save_dir) @@ -500,7 +572,25 @@ class SDE_Detector(Detector): out_id_list = list() prev_center = dict() records = list() - entrance = [0, height / 2., width, height / 2.] + if self.do_entrance_counting or self.do_break_in_counting: + if self.region_type == 'horizontal': + entrance = [0, height / 2., width, height / 2.] + elif self.region_type == 'vertical': + entrance = [width / 2, 0., width / 2, height] + elif self.region_type == 'custom': + entrance = [] + assert len( + self.region_polygon + ) % 2 == 0, "region_polygon should be pairs of coords points when do break_in counting." + for i in range(0, len(self.region_polygon), 2): + entrance.append([ + self.region_polygon[i], self.region_polygon[i + 1] + ]) + entrance.append([width, height]) + else: + raise ValueError("region_type:{} is not supported.".format( + self.region_type)) + video_fps = fps while (1): @@ -520,19 +610,20 @@ class SDE_Detector(Detector): # bs=1 in MOT model online_tlwhs, online_scores, online_ids = mot_results[0] - # NOTE: just implement flow statistic for one class - if num_classes == 1: + # flow statistic for one class, and only for bytetracker + if num_classes == 1 and not self.use_deepsort_tracker and not self.use_ocsort_tracker: result = (frame_id + 1, online_tlwhs[0], online_scores[0], online_ids[0]) statistic = flow_statistic( result, self.secs_interval, self.do_entrance_counting, - video_fps, entrance, id_set, interval_id_set, in_id_list, - out_id_list, prev_center, records, data_type, num_classes) + self.do_break_in_counting, self.region_type, video_fps, + entrance, id_set, interval_id_set, in_id_list, out_id_list, + prev_center, records, data_type, num_classes) records = statistic['records'] fps = 1. / timer.duration - if self.use_deepsort_tracker: - # use DeepSORTTracker, only support singe class + if self.use_deepsort_tracker or self.use_ocsort_tracker: + # use DeepSORTTracker or OCSORTTracker, only support singe class results[0].append( (frame_id + 1, online_tlwhs, online_scores, online_ids)) im = plot_tracking( @@ -542,6 +633,7 @@ class SDE_Detector(Detector): online_scores, frame_id=frame_id, fps=fps, + ids2names=ids2names, do_entrance_counting=self.do_entrance_counting, entrance=entrance) else: @@ -712,6 +804,9 @@ def main(): draw_center_traj=FLAGS.draw_center_traj, secs_interval=FLAGS.secs_interval, do_entrance_counting=FLAGS.do_entrance_counting, + do_break_in_counting=FLAGS.do_break_in_counting, + region_type=FLAGS.region_type, + region_polygon=FLAGS.region_polygon, reid_model_dir=FLAGS.reid_model_dir, mtmct_dir=FLAGS.mtmct_dir, ) diff --git a/deploy/pptracking/python/mot_utils.py b/deploy/pptracking/python/mot_utils.py index 3c2d31c89115b656f54cc6579516c873ad0698cc..0cf9ab3198d3e6576efb2ba7eb9a8369187d4cda 100644 --- a/deploy/pptracking/python/mot_utils.py +++ b/deploy/pptracking/python/mot_utils.py @@ -141,8 +141,30 @@ def argsparser(): "--do_entrance_counting", action='store_true', help="Whether counting the numbers of identifiers entering " - "or getting out from the entrance. Note that only support one-class" - "counting, multi-class counting is coming soon.") + "or getting out from the entrance. Note that only support single-class MOT." + ) + parser.add_argument( + "--do_break_in_counting", + action='store_true', + help="Whether counting the numbers of identifiers break in " + "the area. Note that only support single-class MOT and " + "the video should be taken by a static camera.") + parser.add_argument( + "--region_type", + type=str, + default='horizontal', + help="Area type for entrance counting or break in counting, 'horizontal' and " + "'vertical' used when do entrance counting. 'custom' used when do break in counting. " + "Note that only support single-class MOT, and the video should be taken by a static camera." + ) + parser.add_argument( + '--region_polygon', + nargs='+', + type=int, + default=[], + help="Clockwise point coords (x0,y0,x1,y1...) of polygon of area when " + "do_break_in_counting. Note that only support single-class MOT and " + "the video should be taken by a static camera.") parser.add_argument( "--secs_interval", type=int, diff --git a/deploy/pptracking/python/preprocess.py b/deploy/pptracking/python/preprocess.py index 2df5df9c3c3dc0dcb90b0224bf0d8e022a47903e..427479c814d6b3250921ead6b7b2a07ea6352173 100644 --- a/deploy/pptracking/python/preprocess.py +++ b/deploy/pptracking/python/preprocess.py @@ -245,6 +245,34 @@ class LetterBoxResize(object): return im, im_info +class Pad(object): + def __init__(self, size, fill_value=[114.0, 114.0, 114.0]): + """ + Pad image to a specified size. + Args: + size (list[int]): image target size + fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0) + """ + super(Pad, self).__init__() + if isinstance(size, int): + size = [size, size] + self.size = size + self.fill_value = fill_value + + def __call__(self, im, im_info): + im_h, im_w = im.shape[:2] + h, w = self.size + if h == im_h and w == im_w: + im = im.astype(np.float32) + return im, im_info + + canvas = np.ones((h, w, 3), dtype=np.float32) + canvas *= np.array(self.fill_value, dtype=np.float32) + canvas[0:im_h, 0:im_w, :] = im.astype(np.float32) + im = canvas + return im, im_info + + def preprocess(im, preprocess_ops): # process image by preprocess_ops im_info = { diff --git a/deploy/pptracking/python/tracker_config.yml b/deploy/pptracking/python/tracker_config.yml index 5182da93e3f19bd49421e09d1ec01be0c0f11643..464337702a4a6b5d8d32fc87eeb8df9bcc01c57a 100644 --- a/deploy/pptracking/python/tracker_config.yml +++ b/deploy/pptracking/python/tracker_config.yml @@ -2,7 +2,7 @@ # The tracker of MOT JDE Detector (such as FairMOT) is exported together with the model. # Here 'min_box_area' and 'vertical_ratio' are set for pedestrian, you can modify for other objects tracking. -type: JDETracker # 'JDETracker' or 'DeepSORTTracker' +type: OCSORTTracker # choose one tracker in ['JDETracker', 'OCSORTTracker', 'DeepSORTTracker'] # BYTETracker JDETracker: @@ -11,8 +11,21 @@ JDETracker: conf_thres: 0.6 low_conf_thres: 0.1 match_thres: 0.9 - min_box_area: 100 - vertical_ratio: 1.6 # for pedestrian + min_box_area: 0 + vertical_ratio: 0 # 1.6 for pedestrian + + +OCSORTTracker: + det_thresh: 0.4 + max_age: 30 + min_hits: 3 + iou_threshold: 0.3 + delta_t: 3 + inertia: 0.2 + min_box_area: 0 + vertical_ratio: 0 + use_byte: False + DeepSORTTracker: input_size: [64, 192] diff --git a/deploy/python/README.md b/deploy/python/README.md index 8b672c84da4467ba2802e5e1b39aecba6779516d..0c7fc24c8510fb7f5f96d30269d4d2371cf00ed6 100644 --- a/deploy/python/README.md +++ b/deploy/python/README.md @@ -3,27 +3,76 @@ 在PaddlePaddle中预测引擎和训练引擎底层有着不同的优化方法, 预测引擎使用了AnalysisPredictor,专门针对推理进行了优化,是基于[C++预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html)的Python接口,该引擎可以对模型进行多项图优化,减少不必要的内存拷贝。如果用户在部署已训练模型的过程中对性能有较高的要求,我们提供了独立于PaddleDetection的预测脚本,方便用户直接集成部署。 -主要包含两个步骤: - +Python端预测部署主要包含两个步骤: - 导出预测模型 - 基于Python进行预测 ## 1. 导出预测模型 -PaddleDetection在训练过程包括网络的前向和优化器相关参数,而在部署过程中,我们只需要前向参数,具体参考:[导出模型](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/EXPORT_MODEL.md) +PaddleDetection在训练过程包括网络的前向和优化器相关参数,而在部署过程中,我们只需要前向参数,具体参考:[导出模型](../EXPORT_MODEL.md),例如 + +```bash +# 导出YOLOv3检测模型 +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \ + -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams + +# 导出HigherHRNet(bottom-up)关键点检测模型 +python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams + +# 导出HRNet(top-down)关键点检测模型 +python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_384x288.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams + +# 导出FairMOT多目标跟踪模型 +python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams + +# 导出ByteTrack多目标跟踪模型(相当于只导出检测器) +python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams +``` 导出后目录下,包括`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`四个文件。 -## 2. 基于Python的预测 +## 2. 基于Python的预测 +### 2.1 通用检测 +在终端输入以下命令进行预测: +```bash +python deploy/python/infer.py --model_dir=./output_inference/yolov3_darknet53_270e_coco --image_file=./demo/000000014439.jpg --device=GPU +``` +### 2.2 关键点检测 在终端输入以下命令进行预测: +```bash +# keypoint top-down(HRNet)/bottom-up(HigherHRNet)单独推理,该模式下top-down模型HRNet只支持单人截图预测 +python deploy/python/keypoint_infer.py --model_dir=output_inference/hrnet_w32_384x288/ --image_file=./demo/hrnet_demo.jpg --device=GPU --threshold=0.5 +python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=GPU --threshold=0.5 + +# detector 检测 + keypoint top-down模型联合部署(联合推理只支持top-down关键点模型) +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/yolov3_darknet53_270e_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file={your video name}.mp4 --device=GPU +``` +**注意:** + - 关键点检测模型导出和预测具体可参照[keypoint](../../configs/keypoint/README.md),可分别在各个模型的文档中查找具体用法; + - 此目录下的关键点检测部署为基础前向功能,更多关键点检测功能可使用PP-Human项目,参照[pipeline](../pipeline/README.md); + +### 2.3 多目标跟踪 +在终端输入以下命令进行预测: ```bash -python deploy/python/infer.py --model_dir=./output_inference/yolov3_mobilenet_v1_roadsign --image_file=./demo/road554.png --device=GPU +# FairMOT跟踪 +python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU + +# ByteTrack跟踪 +python deploy/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/python/tracker_config.yml --video_file={your video name}.mp4 --device=GPU --scaled=True + +# FairMOT多目标跟踪联合HRNet关键点检测(联合推理只支持top-down关键点模型) +python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file={your video name}.mp4 --device=GPU ``` +**注意:** + - 多目标跟踪模型导出和预测具体可参照[mot]](../../configs/mot/README.md),可分别在各个模型的文档中查找具体用法; + - 此目录下的跟踪部署为基础前向功能以及联合关键点部署,更多跟踪功能可使用PP-Human项目,参照[pipeline](../pipeline/README.md),或PP-Tracking项目(绘制轨迹、出入口流量计数),参照[pptracking](../pptracking/README.md); + + 参数说明如下: | 参数 | 是否必须|含义 | @@ -42,6 +91,8 @@ python deploy/python/infer.py --model_dir=./output_inference/yolov3_mobilenet_v1 | --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速,默认为False | | --cpu_threads | Option| 设置cpu线程数,默认为1 | | --trt_calib_mode | Option| TensorRT是否使用校准功能,默认为False。使用TensorRT的int8功能时,需设置为True,使用PaddleSlim量化后的模型时需要设置为False | +| --save_results | Option| 是否在文件夹下将图片的预测结果以JSON的形式保存 | + 说明: diff --git a/deploy/python/det_keypoint_unite_infer.py b/deploy/python/det_keypoint_unite_infer.py index a82c2c58cefb0926958e911c4f2f8ee2fc5bfd75..7b57714d18c7a34bc8b740ec39ed7443da9a10d6 100644 --- a/deploy/python/det_keypoint_unite_infer.py +++ b/deploy/python/det_keypoint_unite_infer.py @@ -36,10 +36,16 @@ KEYPOINT_SUPPORT_MODELS = { def predict_with_given_det(image, det_res, keypoint_detector, - keypoint_batch_size, det_threshold, - keypoint_threshold, run_benchmark): + keypoint_batch_size, run_benchmark): + keypoint_res = {} + rec_images, records, det_rects = keypoint_detector.get_person_from_rect( - image, det_res, det_threshold) + image, det_res) + + if len(det_rects) == 0: + keypoint_res['keypoint'] = [[], []] + return keypoint_res + keypoint_vector = [] score_vector = [] @@ -48,7 +54,6 @@ def predict_with_given_det(image, det_res, keypoint_detector, rec_images, run_benchmark, repeats=10, visual=False) keypoint_vector, score_vector = translate_to_ori_images(keypoint_results, np.array(records)) - keypoint_res = {} keypoint_res['keypoint'] = [ keypoint_vector.tolist(), score_vector.tolist() ] if len(keypoint_vector) > 0 else [[], []] @@ -79,19 +84,21 @@ def topdown_unite_predict(detector, detector.gpu_util += gu else: results = detector.predict_image([image], visual=False) + results = detector.filter_box(results, FLAGS.det_threshold) + if results['boxes_num'] > 0: + keypoint_res = predict_with_given_det( + image, results, topdown_keypoint_detector, keypoint_batch_size, + FLAGS.run_benchmark) - if results['boxes_num'] == 0: - continue - - keypoint_res = predict_with_given_det( - image, results, topdown_keypoint_detector, keypoint_batch_size, - FLAGS.det_threshold, FLAGS.keypoint_threshold, FLAGS.run_benchmark) - - if save_res: - store_res.append([ - i, keypoint_res['bbox'], - [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]] - ]) + if save_res: + save_name = img_file if isinstance(img_file, str) else i + store_res.append([ + save_name, keypoint_res['bbox'], + [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]] + ]) + else: + results["keypoint"] = [[], []] + keypoint_res = results if FLAGS.run_benchmark: cm, gm, gu = get_current_memory_mb() topdown_keypoint_detector.cpu_mem += cm @@ -138,10 +145,13 @@ def topdown_unite_predict_video(detector, if not os.path.exists(FLAGS.output_dir): os.makedirs(FLAGS.output_dir) out_path = os.path.join(FLAGS.output_dir, video_name) - fourcc = cv2.VideoWriter_fourcc(*'mp4v') + fourcc = cv2.VideoWriter_fourcc(* 'mp4v') writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) index = 0 store_res = [] + keypoint_smoothing = KeypointSmoothing( + width, height, filter_type=FLAGS.filter_type, beta=0.05) + while (1): ret, frame = capture.read() if not ret: @@ -152,16 +162,28 @@ def topdown_unite_predict_video(detector, frame2 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) results = detector.predict_image([frame2], visual=False) + results = detector.filter_box(results, FLAGS.det_threshold) + if results['boxes_num'] == 0: + writer.write(frame) + continue keypoint_res = predict_with_given_det( frame2, results, topdown_keypoint_detector, keypoint_batch_size, - FLAGS.det_threshold, FLAGS.keypoint_threshold, FLAGS.run_benchmark) + FLAGS.run_benchmark) + + if FLAGS.smooth and len(keypoint_res['keypoint'][0]) == 1: + current_keypoints = np.array(keypoint_res['keypoint'][0][0]) + smooth_keypoints = keypoint_smoothing.smooth_process( + current_keypoints) + + keypoint_res['keypoint'][0][0] = smooth_keypoints.tolist() im = visualize_pose( frame, keypoint_res, visual_thresh=FLAGS.keypoint_threshold, returnimg=True) + if save_res: store_res.append([ index, keypoint_res['bbox'], @@ -187,6 +209,93 @@ def topdown_unite_predict_video(detector, json.dump(store_res, wf, indent=4) +class KeypointSmoothing(object): + # The following code are modified from: + # https://github.com/jaantollander/OneEuroFilter + + def __init__(self, + width, + height, + filter_type, + alpha=0.5, + fc_d=0.1, + fc_min=0.1, + beta=0.1, + thres_mult=0.3): + super(KeypointSmoothing, self).__init__() + self.image_width = width + self.image_height = height + self.threshold = np.array([ + 0.005, 0.005, 0.005, 0.005, 0.005, 0.01, 0.01, 0.01, 0.01, 0.01, + 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01 + ]) * thres_mult + self.filter_type = filter_type + self.alpha = alpha + self.dx_prev_hat = None + self.x_prev_hat = None + self.fc_d = fc_d + self.fc_min = fc_min + self.beta = beta + + if self.filter_type == 'OneEuro': + self.smooth_func = self.one_euro_filter + elif self.filter_type == 'EMA': + self.smooth_func = self.ema_filter + else: + raise ValueError('filter type must be one_euro or ema') + + def smooth_process(self, current_keypoints): + if self.x_prev_hat is None: + self.x_prev_hat = current_keypoints[:, :2] + self.dx_prev_hat = np.zeros(current_keypoints[:, :2].shape) + return current_keypoints + else: + result = current_keypoints + num_keypoints = len(current_keypoints) + for i in range(num_keypoints): + result[i, :2] = self.smooth(current_keypoints[i, :2], + self.threshold[i], i) + return result + + def smooth(self, current_keypoint, threshold, index): + distance = np.sqrt( + np.square((current_keypoint[0] - self.x_prev_hat[index][0]) / + self.image_width) + np.square((current_keypoint[ + 1] - self.x_prev_hat[index][1]) / self.image_height)) + if distance < threshold: + result = self.x_prev_hat[index] + else: + result = self.smooth_func(current_keypoint, self.x_prev_hat[index], + index) + + return result + + def one_euro_filter(self, x_cur, x_pre, index): + te = 1 + self.alpha = self.smoothing_factor(te, self.fc_d) + dx_cur = (x_cur - x_pre) / te + dx_cur_hat = self.exponential_smoothing(dx_cur, self.dx_prev_hat[index]) + + fc = self.fc_min + self.beta * np.abs(dx_cur_hat) + self.alpha = self.smoothing_factor(te, fc) + x_cur_hat = self.exponential_smoothing(x_cur, x_pre) + self.dx_prev_hat[index] = dx_cur_hat + self.x_prev_hat[index] = x_cur_hat + return x_cur_hat + + def ema_filter(self, x_cur, x_pre, index): + x_cur_hat = self.exponential_smoothing(x_cur, x_pre) + self.x_prev_hat[index] = x_cur_hat + return x_cur_hat + + def smoothing_factor(self, te, fc): + r = 2 * math.pi * fc * te + return r / (r + 1) + + def exponential_smoothing(self, x_cur, x_pre, index=0): + return self.alpha * x_cur + (1 - self.alpha) * x_pre + + def main(): deploy_file = os.path.join(FLAGS.det_model_dir, 'infer_cfg.yml') with open(deploy_file) as f: diff --git a/deploy/python/det_keypoint_unite_utils.py b/deploy/python/det_keypoint_unite_utils.py index 26344628a3e10457a394f351fc64f7049a4245bb..7de1295128d9151cf55a7b1e427d6ee946db8bc4 100644 --- a/deploy/python/det_keypoint_unite_utils.py +++ b/deploy/python/det_keypoint_unite_utils.py @@ -126,4 +126,16 @@ def argsparser(): "3) rects: list of rect [xmin, ymin, xmax, ymax]" "4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list" "5) scores: mean of all joint conf")) + parser.add_argument( + '--smooth', + type=ast.literal_eval, + default=False, + help='smoothing keypoints for each frame, new incoming keypoints will be more stable.' + ) + parser.add_argument( + '--filter_type', + type=str, + default='OneEuro', + help='when set --smooth True, choose filter type you want to use, it can be [OneEuro] or [EMA].' + ) return parser diff --git a/deploy/python/infer.py b/deploy/python/infer.py index 84c643935f3d3b20acd910b0fa7412b46e7d1b72..a2199a2be62f04af5e1e940704a1dce426596f46 100644 --- a/deploy/python/infer.py +++ b/deploy/python/infer.py @@ -15,6 +15,8 @@ import os import yaml import glob +import json +from pathlib import Path from functools import reduce import cv2 @@ -31,7 +33,7 @@ sys.path.insert(0, parent_path) from benchmark_utils import PaddleInferBenchmark from picodet_postprocess import PicoDetPostProcess -from preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, LetterBoxResize, WarpAffine, decode_image +from preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, LetterBoxResize, WarpAffine, Pad, decode_image from keypoint_preprocess import EvalAffine, TopDownEvalAffine, expand_crop from visualize import visualize_box_mask from utils import argsparser, Timer, get_current_memory_mb @@ -39,8 +41,8 @@ from utils import argsparser, Timer, get_current_memory_mb # Global dictionary SUPPORT_MODELS = { 'YOLO', 'RCNN', 'SSD', 'Face', 'FCOS', 'SOLOv2', 'TTFNet', 'S2ANet', 'JDE', - 'FairMOT', 'DeepSORT', 'GFL', 'PicoDet', 'CenterNet', 'TOOD', - 'StrongBaseline', 'STGCN' + 'FairMOT', 'DeepSORT', 'GFL', 'PicoDet', 'CenterNet', 'TOOD', 'RetinaNet', + 'StrongBaseline', 'STGCN', 'YOLOX', 'PPHGNet', 'PPLCNet' } @@ -140,14 +142,17 @@ class Detector(object): input_names = self.predictor.get_input_names() for i in range(len(input_names)): input_tensor = self.predictor.get_input_handle(input_names[i]) - input_tensor.copy_from_cpu(inputs[input_names[i]]) + if input_names[i] == 'x': + input_tensor.copy_from_cpu(inputs['image']) + else: + input_tensor.copy_from_cpu(inputs[input_names[i]]) return inputs def postprocess(self, inputs, result): # postprocess output of predictor np_boxes_num = result['boxes_num'] - if np_boxes_num[0] <= 0: + if sum(np_boxes_num) <= 0: print('[WARNNING] No object detected.') result = {'boxes': np.zeros([0, 6]), 'boxes_num': [0]} result = {k: v for k, v in result.items() if v is not None} @@ -206,7 +211,8 @@ class Detector(object): for k, v in res.items(): results[k].append(v) for k, v in results.items(): - results[k] = np.concatenate(v) + if k not in ['masks', 'segm']: + results[k] = np.concatenate(v) return results def get_timer(self): @@ -216,7 +222,8 @@ class Detector(object): image_list, run_benchmark=False, repeats=1, - visual=True): + visual=True, + save_file=None): batch_loop_cnt = math.ceil(float(len(image_list)) / self.batch_size) results = [] for i in range(batch_loop_cnt): @@ -231,7 +238,7 @@ class Detector(object): self.det_times.preprocess_time_s.end() # model prediction - result = self.predict(repeats=repeats) # warmup + result = self.predict(repeats=50) # warmup self.det_times.inference_time_s.start() result = self.predict(repeats=repeats) self.det_times.inference_time_s.end(repeats=repeats) @@ -276,6 +283,10 @@ class Detector(object): if visual: print('Test iter {}'.format(i)) + if save_file is not None: + Path(self.output_dir).mkdir(exist_ok=True) + self.format_coco_results(image_list, results, save_file=save_file) + results = self.merge_batch_result(results) return results @@ -305,7 +316,7 @@ class Detector(object): break print('detect frame: %d' % (index)) index += 1 - results = self.predict_image([frame], visual=False) + results = self.predict_image([frame[:, :, ::-1]], visual=False) im = visualize_box_mask( frame, @@ -320,6 +331,68 @@ class Detector(object): break writer.release() + @staticmethod + def format_coco_results(image_list, results, save_file=None): + coco_results = [] + image_id = 0 + + for result in results: + start_idx = 0 + for box_num in result['boxes_num']: + idx_slice = slice(start_idx, start_idx + box_num) + start_idx += box_num + + image_file = image_list[image_id] + image_id += 1 + + if 'boxes' in result: + boxes = result['boxes'][idx_slice, :] + per_result = [ + { + 'image_file': image_file, + 'bbox': + [box[2], box[3], box[4] - box[2], + box[5] - box[3]], # xyxy -> xywh + 'score': box[1], + 'category_id': int(box[0]), + } for k, box in enumerate(boxes.tolist()) + ] + + elif 'segm' in result: + import pycocotools.mask as mask_util + + scores = result['score'][idx_slice].tolist() + category_ids = result['label'][idx_slice].tolist() + segms = result['segm'][idx_slice, :] + rles = [ + mask_util.encode( + np.array( + mask[:, :, np.newaxis], + dtype=np.uint8, + order='F'))[0] for mask in segms + ] + for rle in rles: + rle['counts'] = rle['counts'].decode('utf-8') + + per_result = [{ + 'image_file': image_file, + 'segmentation': rle, + 'score': scores[k], + 'category_id': category_ids[k], + } for k, rle in enumerate(rles)] + + else: + raise RuntimeError('') + + # per_result = [item for item in per_result if item['score'] > threshold] + coco_results.extend(per_result) + + if save_file: + with open(os.path.join(save_file), 'w') as f: + json.dump(coco_results, f) + + return coco_results + class DetectorSOLOv2(Detector): """ @@ -618,9 +691,15 @@ def load_predictor(model_dir, raise ValueError( "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}" .format(run_mode, device)) - config = Config( - os.path.join(model_dir, 'model.pdmodel'), - os.path.join(model_dir, 'model.pdiparams')) + infer_model = os.path.join(model_dir, 'model.pdmodel') + infer_params = os.path.join(model_dir, 'model.pdiparams') + if not os.path.exists(infer_model): + infer_model = os.path.join(model_dir, 'inference.pdmodel') + infer_params = os.path.join(model_dir, 'inference.pdiparams') + if not os.path.exists(infer_model): + raise ValueError( + "Cannot find any inference model in dir: {},".format(model_dir)) + config = Config(infer_model, infer_params) if device == 'GPU': # initial GPU memory(M), device ID config.enable_use_gpu(200, 0) @@ -652,7 +731,7 @@ def load_predictor(model_dir, } if run_mode in precision_map.keys(): config.enable_tensorrt_engine( - workspace_size=1 << 25, + workspace_size=(1 << 25) * batch_size, max_batch_size=batch_size, min_subgraph_size=min_subgraph_size, precision_mode=precision_map[run_mode], @@ -690,7 +769,7 @@ def get_test_images(infer_dir, infer_img): Get image path list in TEST mode """ assert infer_img is not None or infer_dir is not None, \ - "--infer_img or --infer_dir should be set" + "--image_file or --image_dir should be set" assert infer_img is None or os.path.isfile(infer_img), \ "{} is not a file".format(infer_img) assert infer_dir is None or os.path.isdir(infer_dir), \ @@ -790,7 +869,10 @@ def main(): if FLAGS.image_dir is None and FLAGS.image_file is not None: assert FLAGS.batch_size == 1, "batch_size should be 1, when image_file is not None" img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) - detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10) + save_file = os.path.join(FLAGS.output_dir, + 'results.json') if FLAGS.save_results else None + detector.predict_image( + img_list, FLAGS.run_benchmark, repeats=100, save_file=save_file) if not FLAGS.run_benchmark: detector.det_times.info(average=True) else: diff --git a/deploy/python/keypoint_infer.py b/deploy/python/keypoint_infer.py index e16ddd647cf58a58bb9b4c8cb239fd9e3d472673..70b81375621bdd49ec916f246d90f5d5a243aa33 100644 --- a/deploy/python/keypoint_infer.py +++ b/deploy/python/keypoint_infer.py @@ -95,12 +95,10 @@ class KeyPointDetector(Detector): def set_config(self, model_dir): return PredictConfig_KeyPoint(model_dir) - def get_person_from_rect(self, image, results, det_threshold=0.5): + def get_person_from_rect(self, image, results): # crop the person result from image self.det_times.preprocess_time_s.start() - det_results = results['boxes'] - mask = det_results[:, 1] > det_threshold - valid_rects = det_results[mask] + valid_rects = results['boxes'] rect_images = [] new_rects = [] org_rects = [] @@ -268,9 +266,11 @@ class KeyPointDetector(Detector): break print('detect frame: %d' % (index)) index += 1 - results = self.predict_image([frame], visual=False) + results = self.predict_image([frame[:, :, ::-1]], visual=False) + im_results = {} + im_results['keypoint'] = [results['keypoint'], results['score']] im = visualize_pose( - frame, results, visual_thresh=self.threshold, returnimg=True) + frame, im_results, visual_thresh=self.threshold, returnimg=True) writer.write(im) if camera_id != -1: cv2.imshow('Mask Detection', im) diff --git a/deploy/python/keypoint_postprocess.py b/deploy/python/keypoint_postprocess.py index 2275df78a3be412ac78ea8f26f55211fb1f028bd..69f1d3fd9dcc83278d331cd361b36d50e64ef508 100644 --- a/deploy/python/keypoint_postprocess.py +++ b/deploy/python/keypoint_postprocess.py @@ -35,7 +35,7 @@ class HrHRNetPostProcess(object): heat_thresh (float): value of topk below this threshhold will be ignored tag_thresh (float): coord's value sampled in tagmap below this threshold belong to same people for init - inputs(list[heatmap]): the output list of modle, [heatmap, heatmap_maxpool, tagmap], heatmap_maxpool used to get topk + inputs(list[heatmap]): the output list of model, [heatmap, heatmap_maxpool, tagmap], heatmap_maxpool used to get topk original_height, original_width (float): the original image size """ diff --git a/deploy/python/mot_jde_infer.py b/deploy/python/mot_jde_infer.py index 99033cabaa13164eb300c7e0e4800bffb12f462b..51a2562ee554a3eeb2489821d376486dcba0985c 100644 --- a/deploy/python/mot_jde_infer.py +++ b/deploy/python/mot_jde_infer.py @@ -32,7 +32,7 @@ sys.path.insert(0, parent_path) from pptracking.python.mot import JDETracker from pptracking.python.mot.utils import MOTTimer, write_mot_results -from pptracking.python.visualize import plot_tracking, plot_tracking_dict +from pptracking.python.mot.visualize import plot_tracking_dict # Global dictionary MOT_JDE_SUPPORT_MODELS = { @@ -54,23 +54,30 @@ class JDE_Detector(Detector): trt_calib_mode (bool): If the model is produced by TRT offline quantitative calibration, trt_calib_mode need to set True cpu_threads (int): cpu threads - enable_mkldnn (bool): whether to open MKLDNN + enable_mkldnn (bool): whether to open MKLDNN + output_dir (string): The path of output, default as 'output' + threshold (float): Score threshold of the detected bbox, default as 0.5 + save_images (bool): Whether to save visualization image results, default as False + save_mot_txts (bool): Whether to save tracking results (txt), default as False """ - def __init__(self, - model_dir, - tracker_config=None, - device='CPU', - run_mode='paddle', - batch_size=1, - trt_min_shape=1, - trt_max_shape=1088, - trt_opt_shape=608, - trt_calib_mode=False, - cpu_threads=1, - enable_mkldnn=False, - output_dir='output', - threshold=0.5): + def __init__( + self, + model_dir, + tracker_config=None, + device='CPU', + run_mode='paddle', + batch_size=1, + trt_min_shape=1, + trt_max_shape=1088, + trt_opt_shape=608, + trt_calib_mode=False, + cpu_threads=1, + enable_mkldnn=False, + output_dir='output', + threshold=0.5, + save_images=False, + save_mot_txts=False, ): super(JDE_Detector, self).__init__( model_dir=model_dir, device=device, @@ -84,6 +91,8 @@ class JDE_Detector(Detector): enable_mkldnn=enable_mkldnn, output_dir=output_dir, threshold=threshold, ) + self.save_images = save_images + self.save_mot_txts = save_mot_txts assert batch_size == 1, "MOT model only supports batch_size=1." self.det_times = Timer(with_tracker=True) self.num_classes = len(self.pred_config.labels) @@ -91,8 +100,8 @@ class JDE_Detector(Detector): # tracker config assert self.pred_config.tracker, "The exported JDE Detector model should have tracker." cfg = self.pred_config.tracker - min_box_area = cfg.get('min_box_area', 200) - vertical_ratio = cfg.get('vertical_ratio', 1.6) + min_box_area = cfg.get('min_box_area', 0.0) + vertical_ratio = cfg.get('vertical_ratio', 0.0) conf_thres = cfg.get('conf_thres', 0.0) tracked_thresh = cfg.get('tracked_thresh', 0.7) metric_type = cfg.get('metric_type', 'euclidean') @@ -115,7 +124,7 @@ class JDE_Detector(Detector): return result def tracking(self, det_results): - pred_dets = det_results['pred_dets'] # 'cls_id, score, x0, y0, x1, y1' + pred_dets = det_results['pred_dets'] # cls_id, score, x0, y0, x1, y1 pred_embs = det_results['pred_embs'] online_targets_dict = self.tracker.update(pred_dets, pred_embs) @@ -164,7 +173,8 @@ class JDE_Detector(Detector): image_list, run_benchmark=False, repeats=1, - visual=True): + visual=True, + seq_name=None): mot_results = [] num_classes = self.num_classes image_list.sort() @@ -225,7 +235,7 @@ class JDE_Detector(Detector): self.det_times.img_num += 1 if visual: - if frame_id % 10 == 0: + if len(image_list) > 1 and frame_id % 10 == 0: print('Tracking frame {}'.format(frame_id)) frame, _ = decode_image(img_file, {}) @@ -237,7 +247,8 @@ class JDE_Detector(Detector): online_scores, frame_id=frame_id, ids2names=ids2names) - seq_name = image_list[0].split('/')[-2] + if seq_name is None: + seq_name = image_list[0].split('/')[-2] save_dir = os.path.join(self.output_dir, seq_name) if not os.path.exists(save_dir): os.makedirs(save_dir) @@ -264,7 +275,8 @@ class JDE_Detector(Detector): if not os.path.exists(self.output_dir): os.makedirs(self.output_dir) out_path = os.path.join(self.output_dir, video_out_name) - fourcc = cv2.VideoWriter_fourcc(*'mp4v') + video_format = 'mp4v' + fourcc = cv2.VideoWriter_fourcc(*video_format) writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) frame_id = 1 @@ -282,7 +294,9 @@ class JDE_Detector(Detector): frame_id += 1 timer.tic() - mot_results = self.predict_image([frame], visual=False) + seq_name = video_out_name.split('.')[0] + mot_results = self.predict_image( + [frame[:, :, ::-1]], visual=False, seq_name=seq_name) timer.toc() online_tlwhs, online_scores, online_ids = mot_results[0] @@ -307,20 +321,33 @@ class JDE_Detector(Detector): cv2.imshow('Mask Detection', im) if cv2.waitKey(1) & 0xFF == ord('q'): break + + if self.save_mot_txts: + result_filename = os.path.join( + self.output_dir, video_out_name.split('.')[-2] + '.txt') + + write_mot_results(result_filename, results, data_type, num_classes) + writer.release() def main(): detector = JDE_Detector( FLAGS.model_dir, + tracker_config=None, device=FLAGS.device, run_mode=FLAGS.run_mode, + batch_size=1, trt_min_shape=FLAGS.trt_min_shape, trt_max_shape=FLAGS.trt_max_shape, trt_opt_shape=FLAGS.trt_opt_shape, trt_calib_mode=FLAGS.trt_calib_mode, cpu_threads=FLAGS.cpu_threads, - enable_mkldnn=FLAGS.enable_mkldnn) + enable_mkldnn=FLAGS.enable_mkldnn, + output_dir=FLAGS.output_dir, + threshold=FLAGS.threshold, + save_images=FLAGS.save_images, + save_mot_txts=FLAGS.save_mot_txts) # predict from video file or camera video stream if FLAGS.video_file is not None or FLAGS.camera_id != -1: diff --git a/deploy/python/mot_keypoint_unite_infer.py b/deploy/python/mot_keypoint_unite_infer.py index 3eea4bd6b22c148b09be9cf794116725bf6e89d6..edf394152c28d682cdb5845050b78dbb27e8b22f 100644 --- a/deploy/python/mot_keypoint_unite_infer.py +++ b/deploy/python/mot_keypoint_unite_infer.py @@ -24,7 +24,7 @@ from collections import defaultdict from mot_keypoint_unite_utils import argsparser from preprocess import decode_image -from infer import print_arguments, get_test_images +from infer import print_arguments, get_test_images, bench_log from mot_sde_infer import SDE_Detector from mot_jde_infer import JDE_Detector, MOT_JDE_SUPPORT_MODELS from keypoint_infer import KeyPointDetector, KEYPOINT_SUPPORT_MODELS @@ -39,7 +39,7 @@ import sys parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) sys.path.insert(0, parent_path) -from pptracking.python.visualize import plot_tracking, plot_tracking_dict +from pptracking.python.mot.visualize import plot_tracking, plot_tracking_dict from pptracking.python.mot.utils import MOTTimer as FPSTimer @@ -92,11 +92,12 @@ def mot_topdown_unite_predict(mot_detector, keypoint_res = predict_with_given_det( image, results, topdown_keypoint_detector, keypoint_batch_size, - FLAGS.mot_threshold, FLAGS.keypoint_threshold, FLAGS.run_benchmark) + FLAGS.run_benchmark) if save_res: + save_name = img_file if isinstance(img_file, str) else i store_res.append([ - i, keypoint_res['bbox'], + save_name, keypoint_res['bbox'], [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]] ]) if FLAGS.run_benchmark: @@ -146,7 +147,7 @@ def mot_topdown_unite_predict_video(mot_detector, if not os.path.exists(FLAGS.output_dir): os.makedirs(FLAGS.output_dir) out_path = os.path.join(FLAGS.output_dir, video_name) - fourcc = cv2.VideoWriter_fourcc(*'mp4v') + fourcc = cv2.VideoWriter_fourcc(* 'mp4v') writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) frame_id = 0 timer_mot, timer_kp, timer_mot_kp = FPSTimer(), FPSTimer(), FPSTimer() @@ -166,7 +167,10 @@ def mot_topdown_unite_predict_video(mot_detector, # mot model timer_mot.tic() - mot_results = mot_detector.predict_image([frame], visual=False) + + frame2 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) + + mot_results = mot_detector.predict_image([frame2], visual=False) timer_mot.toc() online_tlwhs, online_scores, online_ids = mot_results[0] results = convert_mot_to_det( @@ -178,8 +182,8 @@ def mot_topdown_unite_predict_video(mot_detector, # keypoint model timer_kp.tic() keypoint_res = predict_with_given_det( - frame, results, topdown_keypoint_detector, keypoint_batch_size, - FLAGS.mot_threshold, FLAGS.keypoint_threshold, FLAGS.run_benchmark) + frame2, results, topdown_keypoint_detector, keypoint_batch_size, + FLAGS.run_benchmark) timer_kp.toc() timer_mot_kp.toc() diff --git a/deploy/python/mot_sde_infer.py b/deploy/python/mot_sde_infer.py index 4dd66dda0812d7143bc5f31cb033f8ac8305828c..b4a487facddc4a6eb9492ba367c5333b0c77d9a9 100644 --- a/deploy/python/mot_sde_infer.py +++ b/deploy/python/mot_sde_infer.py @@ -1,4 +1,4 @@ -# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,15 +23,15 @@ import paddle from benchmark_utils import PaddleInferBenchmark from preprocess import decode_image from utils import argsparser, Timer, get_current_memory_mb -from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig +from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig, load_predictor # add python path import sys parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2))) sys.path.insert(0, parent_path) -from pptracking.python.mot import JDETracker -from pptracking.python.mot.utils import MOTTimer, write_mot_results +from pptracking.python.mot import JDETracker, DeepSORTTracker +from pptracking.python.mot.utils import MOTTimer, write_mot_results, get_crops, clip_box from pptracking.python.mot.visualize import plot_tracking, plot_tracking_dict @@ -50,7 +50,11 @@ class SDE_Detector(Detector): calibration, trt_calib_mode need to set True cpu_threads (int): cpu threads enable_mkldnn (bool): whether to open MKLDNN - use_dark(bool): whether to use postprocess in DarkPose + output_dir (string): The path of output, default as 'output' + threshold (float): Score threshold of the detected bbox, default as 0.5 + save_images (bool): Whether to save visualization image results, default as False + save_mot_txts (bool): Whether to save tracking results (txt), default as False + reid_model_dir (str): reid model dir, default None for ByteTrack, but set for DeepSORT """ def __init__(self, @@ -66,7 +70,10 @@ class SDE_Detector(Detector): cpu_threads=1, enable_mkldnn=False, output_dir='output', - threshold=0.5): + threshold=0.5, + save_images=False, + save_mot_txts=False, + reid_model_dir=None): super(SDE_Detector, self).__init__( model_dir=model_dir, device=device, @@ -80,65 +87,198 @@ class SDE_Detector(Detector): enable_mkldnn=enable_mkldnn, output_dir=output_dir, threshold=threshold, ) + self.save_images = save_images + self.save_mot_txts = save_mot_txts assert batch_size == 1, "MOT model only supports batch_size=1." self.det_times = Timer(with_tracker=True) self.num_classes = len(self.pred_config.labels) - # tracker config + # reid config + self.use_reid = False if reid_model_dir is None else True + if self.use_reid: + self.reid_pred_config = self.set_config(reid_model_dir) + self.reid_predictor, self.config = load_predictor( + reid_model_dir, + run_mode=run_mode, + batch_size=50, # reid_batch_size + min_subgraph_size=self.reid_pred_config.min_subgraph_size, + device=device, + use_dynamic_shape=self.reid_pred_config.use_dynamic_shape, + trt_min_shape=trt_min_shape, + trt_max_shape=trt_max_shape, + trt_opt_shape=trt_opt_shape, + trt_calib_mode=trt_calib_mode, + cpu_threads=cpu_threads, + enable_mkldnn=enable_mkldnn) + else: + self.reid_pred_config = None + self.reid_predictor = None + + assert tracker_config is not None, 'Note that tracker_config should be set.' self.tracker_config = tracker_config - cfg = yaml.safe_load(open(self.tracker_config))['tracker'] - min_box_area = cfg.get('min_box_area', 200) - vertical_ratio = cfg.get('vertical_ratio', 1.6) - use_byte = cfg.get('use_byte', True) - match_thres = cfg.get('match_thres', 0.9) - conf_thres = cfg.get('conf_thres', 0.6) - low_conf_thres = cfg.get('low_conf_thres', 0.1) - - self.tracker = JDETracker( - use_byte=use_byte, - num_classes=self.num_classes, - min_box_area=min_box_area, - vertical_ratio=vertical_ratio, - match_thres=match_thres, - conf_thres=conf_thres, - low_conf_thres=low_conf_thres) + tracker_cfg = yaml.safe_load(open(self.tracker_config)) + cfg = tracker_cfg[tracker_cfg['type']] + + # tracker config + self.use_deepsort_tracker = True if tracker_cfg[ + 'type'] == 'DeepSORTTracker' else False + if self.use_deepsort_tracker: + # use DeepSORTTracker + if self.reid_pred_config is not None and hasattr( + self.reid_pred_config, 'tracker'): + cfg = self.reid_pred_config.tracker + budget = cfg.get('budget', 100) + max_age = cfg.get('max_age', 30) + max_iou_distance = cfg.get('max_iou_distance', 0.7) + matching_threshold = cfg.get('matching_threshold', 0.2) + min_box_area = cfg.get('min_box_area', 0) + vertical_ratio = cfg.get('vertical_ratio', 0) + + self.tracker = DeepSORTTracker( + budget=budget, + max_age=max_age, + max_iou_distance=max_iou_distance, + matching_threshold=matching_threshold, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, ) + else: + # use ByteTracker + use_byte = cfg.get('use_byte', False) + det_thresh = cfg.get('det_thresh', 0.3) + min_box_area = cfg.get('min_box_area', 0) + vertical_ratio = cfg.get('vertical_ratio', 0) + match_thres = cfg.get('match_thres', 0.9) + conf_thres = cfg.get('conf_thres', 0.6) + low_conf_thres = cfg.get('low_conf_thres', 0.1) + + self.tracker = JDETracker( + use_byte=use_byte, + det_thresh=det_thresh, + num_classes=self.num_classes, + min_box_area=min_box_area, + vertical_ratio=vertical_ratio, + match_thres=match_thres, + conf_thres=conf_thres, + low_conf_thres=low_conf_thres, ) + + def postprocess(self, inputs, result): + # postprocess output of predictor + np_boxes_num = result['boxes_num'] + if np_boxes_num[0] <= 0: + print('[WARNNING] No object detected.') + result = {'boxes': np.zeros([0, 6]), 'boxes_num': [0]} + result = {k: v for k, v in result.items() if v is not None} + return result + + def reidprocess(self, det_results, repeats=1): + pred_dets = det_results['boxes'] + pred_xyxys = pred_dets[:, 2:6] + + ori_image = det_results['ori_image'] + ori_image_shape = ori_image.shape[:2] + pred_xyxys, keep_idx = clip_box(pred_xyxys, ori_image_shape) + + if len(keep_idx[0]) == 0: + det_results['boxes'] = np.zeros((1, 6), dtype=np.float32) + det_results['embeddings'] = None + return det_results + + pred_dets = pred_dets[keep_idx[0]] + pred_xyxys = pred_dets[:, 2:6] + + w, h = self.tracker.input_size + crops = get_crops(pred_xyxys, ori_image, w, h) + + # to keep fast speed, only use topk crops + crops = crops[:50] # reid_batch_size + det_results['crops'] = np.array(crops).astype('float32') + det_results['boxes'] = pred_dets[:50] + + input_names = self.reid_predictor.get_input_names() + for i in range(len(input_names)): + input_tensor = self.reid_predictor.get_input_handle(input_names[i]) + input_tensor.copy_from_cpu(det_results[input_names[i]]) + + # model prediction + for i in range(repeats): + self.reid_predictor.run() + output_names = self.reid_predictor.get_output_names() + feature_tensor = self.reid_predictor.get_output_handle(output_names[ + 0]) + pred_embs = feature_tensor.copy_to_cpu() + + det_results['embeddings'] = pred_embs + return det_results def tracking(self, det_results): pred_dets = det_results['boxes'] # 'cls_id, score, x0, y0, x1, y1' - pred_embs = None - - online_targets_dict = self.tracker.update(pred_dets, pred_embs) - online_tlwhs = defaultdict(list) - online_scores = defaultdict(list) - online_ids = defaultdict(list) - for cls_id in range(self.num_classes): - online_targets = online_targets_dict[cls_id] + pred_embs = det_results.get('embeddings', None) + + if self.use_deepsort_tracker: + # use DeepSORTTracker, only support singe class + self.tracker.predict() + online_targets = self.tracker.update(pred_dets, pred_embs) + online_tlwhs, online_scores, online_ids = [], [], [] for t in online_targets: - tlwh = t.tlwh - tid = t.track_id - tscore = t.score - if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: + if not t.is_confirmed() or t.time_since_update > 1: continue + tlwh = t.to_tlwh() + tscore = t.score + tid = t.track_id if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ 3] > self.tracker.vertical_ratio: continue - online_tlwhs[cls_id].append(tlwh) - online_ids[cls_id].append(tid) - online_scores[cls_id].append(tscore) - - return online_tlwhs, online_scores, online_ids + online_tlwhs.append(tlwh) + online_scores.append(tscore) + online_ids.append(tid) + + tracking_outs = { + 'online_tlwhs': online_tlwhs, + 'online_scores': online_scores, + 'online_ids': online_ids, + } + return tracking_outs + else: + # use ByteTracker, support multiple class + online_tlwhs = defaultdict(list) + online_scores = defaultdict(list) + online_ids = defaultdict(list) + online_targets_dict = self.tracker.update(pred_dets, pred_embs) + for cls_id in range(self.num_classes): + online_targets = online_targets_dict[cls_id] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + tscore = t.score + if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: + continue + if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[ + 3] > self.tracker.vertical_ratio: + continue + online_tlwhs[cls_id].append(tlwh) + online_ids[cls_id].append(tid) + online_scores[cls_id].append(tscore) + + tracking_outs = { + 'online_tlwhs': online_tlwhs, + 'online_scores': online_scores, + 'online_ids': online_ids, + } + return tracking_outs def predict_image(self, image_list, run_benchmark=False, repeats=1, - visual=True): - mot_results = [] + visual=True, + seq_name=None): num_classes = self.num_classes image_list.sort() ids2names = self.pred_config.labels + mot_results = [] for frame_id, img_file in enumerate(image_list): batch_image_list = [img_file] # bs=1 in MOT model + frame, _ = decode_image(img_file, {}) if run_benchmark: # preprocess inputs = self.preprocess(batch_image_list) # warmup @@ -159,10 +299,16 @@ class SDE_Detector(Detector): self.det_times.postprocess_time_s.end() # tracking + if self.use_reid: + det_result['frame_id'] = frame_id + det_result['seq_name'] = seq_name + det_result['ori_image'] = frame + det_result = self.reidprocess(det_result) result_warmup = self.tracking(det_result) self.det_times.tracking_time_s.start() - online_tlwhs, online_scores, online_ids = self.tracking( - det_result) + if self.use_reid: + det_result = self.reidprocess(det_result) + tracking_outs = self.tracking(det_result) self.det_times.tracking_time_s.end() self.det_times.img_num += 1 @@ -186,32 +332,48 @@ class SDE_Detector(Detector): # tracking process self.det_times.tracking_time_s.start() - online_tlwhs, online_scores, online_ids = self.tracking( - det_result) + if self.use_reid: + det_result['frame_id'] = frame_id + det_result['seq_name'] = seq_name + det_result['ori_image'] = frame + det_result = self.reidprocess(det_result) + tracking_outs = self.tracking(det_result) self.det_times.tracking_time_s.end() self.det_times.img_num += 1 + online_tlwhs = tracking_outs['online_tlwhs'] + online_scores = tracking_outs['online_scores'] + online_ids = tracking_outs['online_ids'] + + mot_results.append([online_tlwhs, online_scores, online_ids]) + if visual: - if frame_id % 10 == 0: + if len(image_list) > 1 and frame_id % 10 == 0: print('Tracking frame {}'.format(frame_id)) frame, _ = decode_image(img_file, {}) - - im = plot_tracking_dict( - frame, - num_classes, - online_tlwhs, - online_ids, - online_scores, - frame_id=frame_id, - ids2names=[]) - seq_name = image_list[0].split('/')[-2] + if isinstance(online_tlwhs, defaultdict): + im = plot_tracking_dict( + frame, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + ids2names=ids2names) + else: + im = plot_tracking( + frame, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + ids2names=ids2names) save_dir = os.path.join(self.output_dir, seq_name) if not os.path.exists(save_dir): os.makedirs(save_dir) cv2.imwrite( os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im) - mot_results.append([online_tlwhs, online_scores, online_ids]) return mot_results def predict_video(self, video_file, camera_id): @@ -231,13 +393,17 @@ class SDE_Detector(Detector): if not os.path.exists(self.output_dir): os.makedirs(self.output_dir) out_path = os.path.join(self.output_dir, video_out_name) - fourcc = cv2.VideoWriter_fourcc(* 'mp4v') + video_format = 'mp4v' + fourcc = cv2.VideoWriter_fourcc(*video_format) writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height)) frame_id = 1 timer = MOTTimer() - results = defaultdict(list) # support single class and multi classes + results = defaultdict(list) num_classes = self.num_classes + data_type = 'mcmot' if num_classes > 1 else 'mot' + ids2names = self.pred_config.labels + while (1): ret, frame = capture.read() if not ret: @@ -247,31 +413,54 @@ class SDE_Detector(Detector): frame_id += 1 timer.tic() - mot_results = self.predict_image([frame], visual=False) + seq_name = video_out_name.split('.')[0] + mot_results = self.predict_image( + [frame[:, :, ::-1]], visual=False, seq_name=seq_name) timer.toc() + # bs=1 in MOT model online_tlwhs, online_scores, online_ids = mot_results[0] - for cls_id in range(num_classes): - results[cls_id].append( - (frame_id + 1, online_tlwhs[cls_id], online_scores[cls_id], - online_ids[cls_id])) fps = 1. / timer.duration - im = plot_tracking_dict( - frame, - num_classes, - online_tlwhs, - online_ids, - online_scores, - frame_id=frame_id, - fps=fps, - ids2names=[]) + if self.use_deepsort_tracker: + # use DeepSORTTracker, only support singe class + results[0].append( + (frame_id + 1, online_tlwhs, online_scores, online_ids)) + im = plot_tracking( + frame, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=ids2names) + else: + # use ByteTracker, support multiple class + for cls_id in range(num_classes): + results[cls_id].append( + (frame_id + 1, online_tlwhs[cls_id], + online_scores[cls_id], online_ids[cls_id])) + im = plot_tracking_dict( + frame, + num_classes, + online_tlwhs, + online_ids, + online_scores, + frame_id=frame_id, + fps=fps, + ids2names=ids2names) writer.write(im) if camera_id != -1: cv2.imshow('Mask Detection', im) if cv2.waitKey(1) & 0xFF == ord('q'): break + + if self.save_mot_txts: + result_filename = os.path.join( + self.output_dir, video_out_name.split('.')[-2] + '.txt') + write_mot_results(result_filename, results) + writer.release() @@ -282,18 +471,20 @@ def main(): arch = yml_conf['arch'] detector = SDE_Detector( FLAGS.model_dir, - FLAGS.tracker_config, + tracker_config=FLAGS.tracker_config, device=FLAGS.device, run_mode=FLAGS.run_mode, - batch_size=FLAGS.batch_size, + batch_size=1, trt_min_shape=FLAGS.trt_min_shape, trt_max_shape=FLAGS.trt_max_shape, trt_opt_shape=FLAGS.trt_opt_shape, trt_calib_mode=FLAGS.trt_calib_mode, cpu_threads=FLAGS.cpu_threads, enable_mkldnn=FLAGS.enable_mkldnn, + output_dir=FLAGS.output_dir, threshold=FLAGS.threshold, - output_dir=FLAGS.output_dir) + save_images=FLAGS.save_images, + save_mot_txts=FLAGS.save_mot_txts, ) # predict from video file or camera video stream if FLAGS.video_file is not None or FLAGS.camera_id != -1: @@ -303,7 +494,9 @@ def main(): if FLAGS.image_dir is None and FLAGS.image_file is not None: assert FLAGS.batch_size == 1, "--batch_size should be 1 in MOT models." img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) - detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10) + seq_name = FLAGS.image_dir.split('/')[-1] + detector.predict_image( + img_list, FLAGS.run_benchmark, repeats=10, seq_name=seq_name) if not FLAGS.run_benchmark: detector.det_times.info(average=True) diff --git a/deploy/python/preprocess.py b/deploy/python/preprocess.py index 998367a349b3e14045991c2e1d2af2e6ec94e03d..d447c744b75600886075c669e47d05036a93eae7 100644 --- a/deploy/python/preprocess.py +++ b/deploy/python/preprocess.py @@ -15,6 +15,7 @@ import cv2 import numpy as np from keypoint_preprocess import get_affine_transform +from PIL import Image def decode_image(im_file, im_info): @@ -39,6 +40,85 @@ def decode_image(im_file, im_info): return im, im_info +class Resize_Mult32(object): + """resize image by target_size and max_size + Args: + target_size (int): the target size of image + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): method of resize + """ + + def __init__(self, limit_side_len, limit_type, interp=cv2.INTER_LINEAR): + self.limit_side_len = limit_side_len + self.limit_type = limit_type + self.interp = interp + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, img): + """ + Args: + img (np.ndarray): image (np.ndarray) + Returns: + im_scale_x: the resize ratio of X + im_scale_y: the resize ratio of Y + """ + limit_side_len = self.limit_side_len + h, w, c = img.shape + + # limit the max side + if self.limit_type == 'max': + if max(h, w) > limit_side_len: + if h > w: + ratio = float(limit_side_len) / h + else: + ratio = float(limit_side_len) / w + else: + ratio = 1. + elif self.limit_type == 'min': + if min(h, w) < limit_side_len: + if h < w: + ratio = float(limit_side_len) / h + else: + ratio = float(limit_side_len) / w + else: + ratio = 1. + elif self.limit_type == 'resize_long': + ratio = float(limit_side_len) / max(h, w) + else: + raise Exception('not support limit type, image ') + resize_h = int(h * ratio) + resize_w = int(w * ratio) + + resize_h = max(int(round(resize_h / 32) * 32), 32) + resize_w = max(int(round(resize_w / 32) * 32), 32) + + im_scale_y = resize_h / float(h) + im_scale_x = resize_w / float(w) + return im_scale_y, im_scale_x + + class Resize(object): """resize image by target_size and max_size Args: @@ -106,6 +186,95 @@ class Resize(object): return im_scale_y, im_scale_x +class ShortSizeScale(object): + """ + Scale images by short size. + Args: + short_size(float | int): Short size of an image will be scaled to the short_size. + fixed_ratio(bool): Set whether to zoom according to a fixed ratio. default: True + do_round(bool): Whether to round up when calculating the zoom ratio. default: False + backend(str): Choose pillow or cv2 as the graphics processing backend. default: 'pillow' + """ + + def __init__(self, + short_size, + fixed_ratio=True, + keep_ratio=None, + do_round=False, + backend='pillow'): + self.short_size = short_size + assert (fixed_ratio and not keep_ratio) or ( + not fixed_ratio + ), "fixed_ratio and keep_ratio cannot be true at the same time" + self.fixed_ratio = fixed_ratio + self.keep_ratio = keep_ratio + self.do_round = do_round + + assert backend in [ + 'pillow', 'cv2' + ], "Scale's backend must be pillow or cv2, but get {backend}" + + self.backend = backend + + def __call__(self, img): + """ + Performs resize operations. + Args: + img (PIL.Image): a PIL.Image. + return: + resized_img: a PIL.Image after scaling. + """ + + result_img = None + + if isinstance(img, np.ndarray): + h, w, _ = img.shape + elif isinstance(img, Image.Image): + w, h = img.size + else: + raise NotImplementedError + + if w <= h: + ow = self.short_size + if self.fixed_ratio: # default is True + oh = int(self.short_size * 4.0 / 3.0) + elif not self.keep_ratio: # no + oh = self.short_size + else: + scale_factor = self.short_size / w + oh = int(h * float(scale_factor) + + 0.5) if self.do_round else int(h * self.short_size / w) + ow = int(w * float(scale_factor) + + 0.5) if self.do_round else int(w * self.short_size / h) + else: + oh = self.short_size + if self.fixed_ratio: + ow = int(self.short_size * 4.0 / 3.0) + elif not self.keep_ratio: # no + ow = self.short_size + else: + scale_factor = self.short_size / h + oh = int(h * float(scale_factor) + + 0.5) if self.do_round else int(h * self.short_size / w) + ow = int(w * float(scale_factor) + + 0.5) if self.do_round else int(w * self.short_size / h) + + if type(img) == np.ndarray: + img = Image.fromarray(img, mode='RGB') + + if self.backend == 'pillow': + result_img = img.resize((ow, oh), Image.BILINEAR) + elif self.backend == 'cv2' and (self.keep_ratio is not None): + result_img = cv2.resize( + img, (ow, oh), interpolation=cv2.INTER_LINEAR) + else: + result_img = Image.fromarray( + cv2.resize( + np.asarray(img), (ow, oh), interpolation=cv2.INTER_LINEAR)) + + return result_img + + class NormalizeImage(object): """normalize image Args: @@ -246,6 +415,34 @@ class LetterBoxResize(object): return im, im_info +class Pad(object): + def __init__(self, size, fill_value=[114.0, 114.0, 114.0]): + """ + Pad image to a specified size. + Args: + size (list[int]): image target size + fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0) + """ + super(Pad, self).__init__() + if isinstance(size, int): + size = [size, size] + self.size = size + self.fill_value = fill_value + + def __call__(self, im, im_info): + im_h, im_w = im.shape[:2] + h, w = self.size + if h == im_h and w == im_w: + im = im.astype(np.float32) + return im, im_info + + canvas = np.ones((h, w, 3), dtype=np.float32) + canvas *= np.array(self.fill_value, dtype=np.float32) + canvas[0:im_h, 0:im_w, :] = im.astype(np.float32) + im = canvas + return im, im_info + + class WarpAffine(object): """Warp affine the image """ diff --git a/deploy/python/tracker_config.yml b/deploy/python/tracker_config.yml index d92510148ec175d9dd7c19fd191e43f13cebe2ce..ddd55e8653870ed9bdfe9734995e8af5b56f49e2 100644 --- a/deploy/python/tracker_config.yml +++ b/deploy/python/tracker_config.yml @@ -1,10 +1,26 @@ -# config of tracker for MOT SDE Detector, use ByteTracker as default. -# The tracker of MOT JDE Detector is exported together with the model. +# config of tracker for MOT SDE Detector, use 'JDETracker' as default. +# The tracker of MOT JDE Detector (such as FairMOT) is exported together with the model. # Here 'min_box_area' and 'vertical_ratio' are set for pedestrian, you can modify for other objects tracking. -tracker: - use_byte: true + +type: JDETracker # 'JDETracker' or 'DeepSORTTracker' + +# BYTETracker +JDETracker: + use_byte: True + det_thresh: 0.3 conf_thres: 0.6 low_conf_thres: 0.1 match_thres: 0.9 - min_box_area: 100 - vertical_ratio: 1.6 + min_box_area: 0 + vertical_ratio: 0 # 1.6 for pedestrian + +DeepSORTTracker: + input_size: [64, 192] + min_box_area: 0 + vertical_ratio: -1 + budget: 100 + max_age: 70 + n_init: 3 + metric_type: cosine + matching_threshold: 0.2 + max_iou_distance: 0.9 diff --git a/deploy/python/utils.py b/deploy/python/utils.py index c542f0176494e03312516574077815fbdd2d6d4c..41dc7ae9e81f49bdd08e0917d50b21ac00f2e527 100644 --- a/deploy/python/utils.py +++ b/deploy/python/utils.py @@ -156,6 +156,12 @@ def argsparser(): type=ast.literal_eval, default=False, help="Whether do random padding for action recognition.") + parser.add_argument( + "--save_results", + type=bool, + default=False, + help="Whether save detection result to file using coco format") + return parser diff --git a/deploy/python/visualize.py b/deploy/python/visualize.py index 9c07b8491d6790ddd2303d9abe1c45070f8c5657..626da02555985a5568f3bdaff20705d8a8dd1c11 100644 --- a/deploy/python/visualize.py +++ b/deploy/python/visualize.py @@ -96,6 +96,8 @@ def draw_mask(im, np_boxes, np_masks, labels, threshold=0.5): expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1) np_boxes = np_boxes[expect_boxes, :] np_masks = np_masks[expect_boxes, :, :] + im_h, im_w = im.shape[:2] + np_masks = np_masks[:, :im_h, :im_w] for i in range(len(np_masks)): clsid, score = int(np_boxes[i][0]), np_boxes[i][1] mask = np_masks[i] @@ -329,7 +331,7 @@ def visualize_pose(imgfile, plt.close() -def visualize_attr(im, results, boxes=None): +def visualize_attr(im, results, boxes=None, is_mtmct=False): if isinstance(im, str): im = Image.open(im) im = np.ascontiguousarray(np.copy(im)) @@ -346,8 +348,12 @@ def visualize_attr(im, results, boxes=None): if boxes is None: text_w = 3 text_h = 1 + elif is_mtmct: + box = boxes[i] # multi camera, bbox shape is x,y, w,h + text_w = int(box[0]) + 3 + text_h = int(box[1]) else: - box = boxes[i] + box = boxes[i] # single camera, bbox shape is 0, 0, x,y, w,h text_w = int(box[2]) + 3 text_h = int(box[3]) for text in res: @@ -363,15 +369,76 @@ def visualize_attr(im, results, boxes=None): return im -def visualize_action(im, mot_boxes, action_visual_collector, action_text=""): +def visualize_action(im, + mot_boxes, + action_visual_collector=None, + action_text="", + video_action_score=None, + video_action_text=""): im = cv2.imread(im) if isinstance(im, str) else im - id_detected = action_visual_collector.get_visualize_ids() + im_h, im_w = im.shape[:2] + text_scale = max(1, im.shape[1] / 1600.) - for mot_box in mot_boxes: - # mot_box is a format with [mot_id, class, score, xmin, ymin, w, h] - if mot_box[0] in id_detected: - text_position = (int(mot_box[3] + mot_box[5] * 0.75), - int(mot_box[4] - 10)) - cv2.putText(im, action_text, text_position, cv2.FONT_HERSHEY_PLAIN, - text_scale, (0, 0, 255), 2) + text_thickness = 2 + + if action_visual_collector: + id_action_dict = {} + for collector, action_type in zip(action_visual_collector, action_text): + id_detected = collector.get_visualize_ids() + for pid in id_detected: + id_action_dict[pid] = id_action_dict.get(pid, []) + id_action_dict[pid].append(action_type) + for mot_box in mot_boxes: + # mot_box is a format with [mot_id, class, score, xmin, ymin, w, h] + if mot_box[0] in id_action_dict: + text_position = (int(mot_box[3] + mot_box[5] * 0.75), + int(mot_box[4] - 10)) + display_text = ', '.join(id_action_dict[mot_box[0]]) + cv2.putText(im, display_text, text_position, + cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 0, 255), 2) + + if video_action_score: + cv2.putText( + im, + video_action_text + ': %.2f' % video_action_score, + (int(im_w / 2), int(15 * text_scale) + 5), + cv2.FONT_ITALIC, + text_scale, (0, 0, 255), + thickness=text_thickness) + + return im + + +def visualize_vehicleplate(im, results, boxes=None): + if isinstance(im, str): + im = Image.open(im) + im = np.ascontiguousarray(np.copy(im)) + im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) + else: + im = np.ascontiguousarray(np.copy(im)) + + im_h, im_w = im.shape[:2] + text_scale = max(1.0, im.shape[0] / 1600.) + text_thickness = 1 + + line_inter = im.shape[0] / 40. + for i, res in enumerate(results): + if boxes is None: + text_w = 3 + text_h = 1 + else: + box = boxes[i] + text = res + if text == "": + continue + text_w = int(box[2]) + text_h = int(box[5] + box[3]) + text_loc = (text_w, text_h) + cv2.putText( + im, + text, + text_loc, + cv2.FONT_ITALIC, + text_scale, (0, 255, 255), + thickness=text_thickness) return im diff --git a/deploy/serving/cpp/build_server.sh b/deploy/serving/cpp/build_server.sh new file mode 100644 index 0000000000000000000000000000000000000000..803dce07c1cdb9c6a77f063b7b01391f3109667c --- /dev/null +++ b/deploy/serving/cpp/build_server.sh @@ -0,0 +1,70 @@ +#使用镜像: +#registry.baidubce.com/paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 + +#编译Serving Server: + +#client和app可以直接使用release版本 + +#server因为加入了自定义OP,需要重新编译 + +apt-get update +apt install -y libcurl4-openssl-dev libbz2-dev +wget https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar && tar xf centos_ssl.tar && rm -rf centos_ssl.tar && mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k && mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k && ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10 && ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10 && ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so && ln -sf /usr/lib/libssl.so.10 /usr/lib/libssl.so + +# 安装go依赖 +rm -rf /usr/local/go +wget -qO- https://paddle-ci.cdn.bcebos.com/go1.17.2.linux-amd64.tar.gz | tar -xz -C /usr/local +export GOROOT=/usr/local/go +export GOPATH=/root/gopath +export PATH=$PATH:$GOPATH/bin:$GOROOT/bin +go env -w GO111MODULE=on +go env -w GOPROXY=https://goproxy.cn,direct +go install github.com/grpc-ecosystem/grpc-gateway/protoc-gen-grpc-gateway@v1.15.2 +go install github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger@v1.15.2 +go install github.com/golang/protobuf/protoc-gen-go@v1.4.3 +go install google.golang.org/grpc@v1.33.0 +go env -w GO111MODULE=auto + +# 下载opencv库 +wget https://paddle-qa.bj.bcebos.com/PaddleServing/opencv3.tar.gz && tar -xvf opencv3.tar.gz && rm -rf opencv3.tar.gz +export OPENCV_DIR=$PWD/opencv3 + +# clone Serving +git clone https://github.com/PaddlePaddle/Serving.git -b develop --depth=1 +cd Serving +export Serving_repo_path=$PWD +git submodule update --init --recursive +python -m pip install -r python/requirements.txt + +# set env +export PYTHON_INCLUDE_DIR=$(python -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") +export PYTHON_LIBRARIES=$(python -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))") +export PYTHON_EXECUTABLE=`which python` + +export CUDA_PATH='/usr/local/cuda' +export CUDNN_LIBRARY='/usr/local/cuda/lib64/' +export CUDA_CUDART_LIBRARY='/usr/local/cuda/lib64/' +export TENSORRT_LIBRARY_PATH='/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/' + +# cp 自定义OP代码 +\cp ../deploy/serving/cpp/preprocess/*.h ${Serving_repo_path}/core/general-server/op +\cp ../deploy/serving/cpp/preprocess/*.cpp ${Serving_repo_path}/core/general-server/op + +# 编译Server, export SERVING_BIN +mkdir server-build-gpu-opencv && cd server-build-gpu-opencv +cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \ + -DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \ + -DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \ + -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \ + -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \ + -DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \ + -DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \ + -DOPENCV_DIR=${OPENCV_DIR} \ + -DWITH_OPENCV=ON \ + -DSERVER=ON \ + -DWITH_GPU=ON .. +make -j32 + +python -m pip install python/dist/paddle* +export SERVING_BIN=$PWD/core/general-server/serving +cd ../../ diff --git a/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.cpp b/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.cpp new file mode 100644 index 0000000000000000000000000000000000000000..b60eb24ce288e349eec73a4bf6c6b7ce8983e7fe --- /dev/null +++ b/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.cpp @@ -0,0 +1,309 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/mask_rcnn_r50_fpn_1x_coco.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int mask_rcnn_r50_fpn_1x_coco::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + Resize(&img, scale_factor_h, scale_factor_w, im_shape_h, im_shape_w); + Normalize(&img, mean_, scale_, is_scale_); + PadStride(&img, 32); + int input_shape_h = img.rows; + int input_shape_w = img.cols; + std::vector input(1 * 3 * input_shape_h * input_shape_w, 0.0f); + Permute(img, input.data()); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // im_shape + std::vector im_shape{static_cast(im_shape_h), + static_cast(im_shape_w)}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, im_shape.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_0(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_0; + tensor_in_0.name = "im_shape"; + tensor_in_0.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_0.shape = {1, 2}; + tensor_in_0.lod = in->at(0).lod; + tensor_in_0.data = paddleBuf_0; + real_in->push_back(tensor_in_0); + + // image + in_num = 1 * 3 * input_shape_h * input_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_1(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_1; + tensor_in_1.name = "image"; + tensor_in_1.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_1.shape = {1, 3, input_shape_h, input_shape_w}; + tensor_in_1.lod = in->at(0).lod; + tensor_in_1.data = paddleBuf_1; + real_in->push_back(tensor_in_1); + + // scale_factor + std::vector scale_factor{scale_factor_h, scale_factor_w}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, scale_factor.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_2(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_2; + tensor_in_2.name = "scale_factor"; + tensor_in_2.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_2.shape = {1, 2}; + tensor_in_2.lod = in->at(0).lod; + tensor_in_2.data = paddleBuf_2; + real_in->push_back(tensor_in_2); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void mask_rcnn_r50_fpn_1x_coco::Resize(cv::Mat *img, float &scale_factor_h, + float &scale_factor_w, int &im_shape_h, + int &im_shape_w) { + // keep_ratio + int im_size_max = std::max(img->rows, img->cols); + int im_size_min = std::min(img->rows, img->cols); + int target_size_max = std::max(im_shape_h, im_shape_w); + int target_size_min = std::min(im_shape_h, im_shape_w); + float scale_min = + static_cast(target_size_min) / static_cast(im_size_min); + float scale_max = + static_cast(target_size_max) / static_cast(im_size_max); + float scale_ratio = std::min(scale_min, scale_max); + + // scale_factor + scale_factor_h = scale_ratio; + scale_factor_w = scale_ratio; + + // Resize + cv::resize(*img, *img, cv::Size(), scale_ratio, scale_ratio, 2); + im_shape_h = img->rows; + im_shape_w = img->cols; +} + +void mask_rcnn_r50_fpn_1x_coco::Normalize(cv::Mat *img, + const std::vector &mean, + const std::vector &scale, + const bool is_scale) { + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + (*img).convertTo(*img, CV_32FC3, e); + for (int h = 0; h < img->rows; h++) { + for (int w = 0; w < img->cols; w++) { + img->at(h, w)[0] = + (img->at(h, w)[0] - mean[0]) / scale[0]; + img->at(h, w)[1] = + (img->at(h, w)[1] - mean[1]) / scale[1]; + img->at(h, w)[2] = + (img->at(h, w)[2] - mean[2]) / scale[2]; + } + } +} + +void mask_rcnn_r50_fpn_1x_coco::PadStride(cv::Mat *img, int stride_) { + // PadStride + if (stride_ <= 0) + return; + int rh = img->rows; + int rw = img->cols; + int nh = (rh / stride_) * stride_ + (rh % stride_ != 0) * stride_; + int nw = (rw / stride_) * stride_ + (rw % stride_ != 0) * stride_; + cv::copyMakeBorder(*img, *img, 0, nh - rh, 0, nw - rw, cv::BORDER_CONSTANT, + cv::Scalar(0)); +} + +void mask_rcnn_r50_fpn_1x_coco::Permute(const cv::Mat &img, float *data) { + // Permute + int rh = img.rows; + int rw = img.cols; + int rc = img.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), i); + } +} + +cv::Mat mask_rcnn_r50_fpn_1x_coco::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string mask_rcnn_r50_fpn_1x_coco::base64Decode(const char *Data, + int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(mask_rcnn_r50_fpn_1x_coco); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.h b/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.h new file mode 100644 index 0000000000000000000000000000000000000000..5b2b8377a88b0cbcc313a3dd8a96c35dd9f57f91 --- /dev/null +++ b/deploy/serving/cpp/preprocess/mask_rcnn_r50_fpn_1x_coco.h @@ -0,0 +1,72 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class mask_rcnn_r50_fpn_1x_coco + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(mask_rcnn_r50_fpn_1x_coco); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 1333; + int im_shape_w = 800; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + + void Resize(cv::Mat *img, float &scale_factor_h, float &scale_factor_w, + int &im_shape_h, int &im_shape_w); + void Normalize(cv::Mat *img, const std::vector &mean, + const std::vector &scale, const bool is_scale); + void PadStride(cv::Mat *img, int stride_ = -1); + void Permute(const cv::Mat &img, float *data); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.cpp b/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.cpp new file mode 100644 index 0000000000000000000000000000000000000000..66bfeaef21189e395c2f15d716468723465c24b6 --- /dev/null +++ b/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.cpp @@ -0,0 +1,258 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/picodet_lcnet_1_5x_416_coco.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int picodet_lcnet_1_5x_416_coco::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + std::vector input(1 * 3 * im_shape_h * im_shape_w, 0.0f); + preprocess_det(img, input.data(), scale_factor_h, scale_factor_w, im_shape_h, + im_shape_w, mean_, scale_, is_scale_); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // image + in_num = 1 * 3 * im_shape_h * im_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in; + tensor_in.name = "image"; + tensor_in.dtype = paddle::PaddleDType::FLOAT32; + tensor_in.shape = {1, 3, im_shape_h, im_shape_w}; + tensor_in.lod = in->at(0).lod; + tensor_in.data = paddleBuf; + real_in->push_back(tensor_in); + + // scale_factor + std::vector scale_factor{scale_factor_h, scale_factor_w}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, scale_factor.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_2(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_2; + tensor_in_2.name = "scale_factor"; + tensor_in_2.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_2.shape = {1, 2}; + tensor_in_2.lod = in->at(0).lod; + tensor_in_2.data = paddleBuf_2; + real_in->push_back(tensor_in_2); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void picodet_lcnet_1_5x_416_coco::preprocess_det( + const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, const std::vector &scale, + const bool is_scale) { + // scale_factor + scale_factor_h = + static_cast(im_shape_h) / static_cast(img.rows); + scale_factor_w = + static_cast(im_shape_w) / static_cast(img.cols); + + // Resize + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(im_shape_w, im_shape_h), 0, 0, 2); + + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + cv::Mat img_fp; + (resize_img).convertTo(img_fp, CV_32FC3, e); + for (int h = 0; h < im_shape_h; h++) { + for (int w = 0; w < im_shape_w; w++) { + img_fp.at(h, w)[0] = + (img_fp.at(h, w)[0] - mean[0]) / scale[0]; + img_fp.at(h, w)[1] = + (img_fp.at(h, w)[1] - mean[1]) / scale[1]; + img_fp.at(h, w)[2] = + (img_fp.at(h, w)[2] - mean[2]) / scale[2]; + } + } + + // Permute + int rh = img_fp.rows; + int rw = img_fp.cols; + int rc = img_fp.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img_fp, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), + i); + } +} + +cv::Mat picodet_lcnet_1_5x_416_coco::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string picodet_lcnet_1_5x_416_coco::base64Decode(const char *Data, + int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(picodet_lcnet_1_5x_416_coco); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.h b/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.h new file mode 100644 index 0000000000000000000000000000000000000000..4db649a27b2dbd408b1984511cbb184c112bf1fe --- /dev/null +++ b/deploy/serving/cpp/preprocess/picodet_lcnet_1_5x_416_coco.h @@ -0,0 +1,69 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class picodet_lcnet_1_5x_416_coco + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(picodet_lcnet_1_5x_416_coco); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 416; + int im_shape_w = 416; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + void preprocess_det(const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, const bool is_scale); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.cpp b/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.cpp new file mode 100644 index 0000000000000000000000000000000000000000..2d2d62cd321bf1d2d5055b827552337e86b4aa15 --- /dev/null +++ b/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.cpp @@ -0,0 +1,282 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/ppyolo_mbv3_large_coco.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int ppyolo_mbv3_large_coco::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + std::vector input(1 * 3 * im_shape_h * im_shape_w, 0.0f); + preprocess_det(img, input.data(), scale_factor_h, scale_factor_w, im_shape_h, + im_shape_w, mean_, scale_, is_scale_); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // im_shape + std::vector im_shape{static_cast(im_shape_h), + static_cast(im_shape_w)}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, im_shape.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_0(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_0; + tensor_in_0.name = "im_shape"; + tensor_in_0.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_0.shape = {1, 2}; + tensor_in_0.lod = in->at(0).lod; + tensor_in_0.data = paddleBuf_0; + real_in->push_back(tensor_in_0); + + // image + in_num = 1 * 3 * im_shape_h * im_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_1(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_1; + tensor_in_1.name = "image"; + tensor_in_1.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_1.shape = {1, 3, im_shape_h, im_shape_w}; + tensor_in_1.lod = in->at(0).lod; + tensor_in_1.data = paddleBuf_1; + real_in->push_back(tensor_in_1); + + // scale_factor + std::vector scale_factor{scale_factor_h, scale_factor_w}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, scale_factor.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_2(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_2; + tensor_in_2.name = "scale_factor"; + tensor_in_2.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_2.shape = {1, 2}; + tensor_in_2.lod = in->at(0).lod; + tensor_in_2.data = paddleBuf_2; + real_in->push_back(tensor_in_2); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void ppyolo_mbv3_large_coco::preprocess_det(const cv::Mat &img, float *data, + float &scale_factor_h, + float &scale_factor_w, + int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, + const bool is_scale) { + // scale_factor + scale_factor_h = + static_cast(im_shape_h) / static_cast(img.rows); + scale_factor_w = + static_cast(im_shape_w) / static_cast(img.cols); + + // Resize + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(im_shape_w, im_shape_h), 0, 0, 2); + + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + cv::Mat img_fp; + (resize_img).convertTo(img_fp, CV_32FC3, e); + for (int h = 0; h < im_shape_h; h++) { + for (int w = 0; w < im_shape_w; w++) { + img_fp.at(h, w)[0] = + (img_fp.at(h, w)[0] - mean[0]) / scale[0]; + img_fp.at(h, w)[1] = + (img_fp.at(h, w)[1] - mean[1]) / scale[1]; + img_fp.at(h, w)[2] = + (img_fp.at(h, w)[2] - mean[2]) / scale[2]; + } + } + + // Permute + int rh = img_fp.rows; + int rw = img_fp.cols; + int rc = img_fp.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img_fp, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), + i); + } +} + +cv::Mat ppyolo_mbv3_large_coco::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string ppyolo_mbv3_large_coco::base64Decode(const char *Data, + int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(ppyolo_mbv3_large_coco); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.h b/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.h new file mode 100644 index 0000000000000000000000000000000000000000..5f55e18f51eae4c3f5588594b2db05773d529987 --- /dev/null +++ b/deploy/serving/cpp/preprocess/ppyolo_mbv3_large_coco.h @@ -0,0 +1,69 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class ppyolo_mbv3_large_coco + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(ppyolo_mbv3_large_coco); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 320; + int im_shape_w = 320; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + void preprocess_det(const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, const bool is_scale); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.cpp b/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.cpp new file mode 100644 index 0000000000000000000000000000000000000000..f59c4f341539db3a7b777051c49da6d6f6919166 --- /dev/null +++ b/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.cpp @@ -0,0 +1,260 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/ppyoloe_crn_s_300e_coco.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int ppyoloe_crn_s_300e_coco::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + std::vector input(1 * 3 * im_shape_h * im_shape_w, 0.0f); + preprocess_det(img, input.data(), scale_factor_h, scale_factor_w, im_shape_h, + im_shape_w, mean_, scale_, is_scale_); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // image + in_num = 1 * 3 * im_shape_h * im_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in; + tensor_in.name = "image"; + tensor_in.dtype = paddle::PaddleDType::FLOAT32; + tensor_in.shape = {1, 3, im_shape_h, im_shape_w}; + tensor_in.lod = in->at(0).lod; + tensor_in.data = paddleBuf; + real_in->push_back(tensor_in); + + // scale_factor + std::vector scale_factor{scale_factor_h, scale_factor_w}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, scale_factor.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_2(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_2; + tensor_in_2.name = "scale_factor"; + tensor_in_2.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_2.shape = {1, 2}; + tensor_in_2.lod = in->at(0).lod; + tensor_in_2.data = paddleBuf_2; + real_in->push_back(tensor_in_2); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void ppyoloe_crn_s_300e_coco::preprocess_det(const cv::Mat &img, float *data, + float &scale_factor_h, + float &scale_factor_w, + int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, + const bool is_scale) { + // scale_factor + scale_factor_h = + static_cast(im_shape_h) / static_cast(img.rows); + scale_factor_w = + static_cast(im_shape_w) / static_cast(img.cols); + + // Resize + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(im_shape_w, im_shape_h), 0, 0, 2); + + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + cv::Mat img_fp; + (resize_img).convertTo(img_fp, CV_32FC3, e); + for (int h = 0; h < im_shape_h; h++) { + for (int w = 0; w < im_shape_w; w++) { + img_fp.at(h, w)[0] = + (img_fp.at(h, w)[0] - mean[0]) / scale[0]; + img_fp.at(h, w)[1] = + (img_fp.at(h, w)[1] - mean[1]) / scale[1]; + img_fp.at(h, w)[2] = + (img_fp.at(h, w)[2] - mean[2]) / scale[2]; + } + } + + // Permute + int rh = img_fp.rows; + int rw = img_fp.cols; + int rc = img_fp.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img_fp, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), + i); + } +} + +cv::Mat ppyoloe_crn_s_300e_coco::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string ppyoloe_crn_s_300e_coco::base64Decode(const char *Data, + int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(ppyoloe_crn_s_300e_coco); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.h b/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.h new file mode 100644 index 0000000000000000000000000000000000000000..cb3e68476998d7fadaafba8e2bc9282c4479a5f8 --- /dev/null +++ b/deploy/serving/cpp/preprocess/ppyoloe_crn_s_300e_coco.h @@ -0,0 +1,69 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class ppyoloe_crn_s_300e_coco + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(ppyoloe_crn_s_300e_coco); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 640; + int im_shape_w = 640; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + void preprocess_det(const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, const bool is_scale); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/tinypose_128x96.cpp b/deploy/serving/cpp/preprocess/tinypose_128x96.cpp new file mode 100644 index 0000000000000000000000000000000000000000..ccc94d2c4a35ed9f47f65fab6e74301e35c801d6 --- /dev/null +++ b/deploy/serving/cpp/preprocess/tinypose_128x96.cpp @@ -0,0 +1,232 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/tinypose_128x96.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int tinypose_128x96::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + std::vector input(1 * 3 * im_shape_h * im_shape_w, 0.0f); + preprocess_det(img, input.data(), scale_factor_h, scale_factor_w, im_shape_h, + im_shape_w, mean_, scale_, is_scale_); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // image + in_num = 1 * 3 * im_shape_h * im_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in; + tensor_in.name = "image"; + tensor_in.dtype = paddle::PaddleDType::FLOAT32; + tensor_in.shape = {1, 3, im_shape_h, im_shape_w}; + tensor_in.lod = in->at(0).lod; + tensor_in.data = paddleBuf; + real_in->push_back(tensor_in); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void tinypose_128x96::preprocess_det(const cv::Mat &img, float *data, + float &scale_factor_h, + float &scale_factor_w, int im_shape_h, + int im_shape_w, + const std::vector &mean, + const std::vector &scale, + const bool is_scale) { + // Resize + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(im_shape_w, im_shape_h), 0, 0, 1); + + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + cv::Mat img_fp; + (resize_img).convertTo(img_fp, CV_32FC3, e); + for (int h = 0; h < im_shape_h; h++) { + for (int w = 0; w < im_shape_w; w++) { + img_fp.at(h, w)[0] = + (img_fp.at(h, w)[0] - mean[0]) / scale[0]; + img_fp.at(h, w)[1] = + (img_fp.at(h, w)[1] - mean[1]) / scale[1]; + img_fp.at(h, w)[2] = + (img_fp.at(h, w)[2] - mean[2]) / scale[2]; + } + } + + // Permute + int rh = img_fp.rows; + int rw = img_fp.cols; + int rc = img_fp.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img_fp, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), + i); + } +} + +cv::Mat tinypose_128x96::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string tinypose_128x96::base64Decode(const char *Data, int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(tinypose_128x96); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/tinypose_128x96.h b/deploy/serving/cpp/preprocess/tinypose_128x96.h new file mode 100644 index 0000000000000000000000000000000000000000..83bf9bf7d17de5fd03407f73bf7e96b512a6fe3e --- /dev/null +++ b/deploy/serving/cpp/preprocess/tinypose_128x96.h @@ -0,0 +1,69 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class tinypose_128x96 + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(tinypose_128x96); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 128; + int im_shape_w = 96; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + void preprocess_det(const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, const bool is_scale); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.cpp b/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.cpp new file mode 100644 index 0000000000000000000000000000000000000000..5937be46c0ffffe07651e7c8ed13be369d03bf7c --- /dev/null +++ b/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.cpp @@ -0,0 +1,282 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "core/general-server/op/yolov3_darknet53_270e_coco.h" +#include "core/predictor/framework/infer.h" +#include "core/predictor/framework/memory.h" +#include "core/predictor/framework/resource.h" +#include "core/util/include/timer.h" +#include +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +using baidu::paddle_serving::Timer; +using baidu::paddle_serving::predictor::InferManager; +using baidu::paddle_serving::predictor::MempoolWrapper; +using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; +using baidu::paddle_serving::predictor::general_model::Request; +using baidu::paddle_serving::predictor::general_model::Response; +using baidu::paddle_serving::predictor::general_model::Tensor; + +int yolov3_darknet53_270e_coco::inference() { + VLOG(2) << "Going to run inference"; + const std::vector pre_node_names = pre_names(); + if (pre_node_names.size() != 1) { + LOG(ERROR) << "This op(" << op_name() + << ") can only have one predecessor op, but received " + << pre_node_names.size(); + return -1; + } + const std::string pre_name = pre_node_names[0]; + + const GeneralBlob *input_blob = get_depend_argument(pre_name); + if (!input_blob) { + LOG(ERROR) << "input_blob is nullptr,error"; + return -1; + } + uint64_t log_id = input_blob->GetLogId(); + VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name; + + GeneralBlob *output_blob = mutable_data(); + if (!output_blob) { + LOG(ERROR) << "output_blob is nullptr,error"; + return -1; + } + output_blob->SetLogId(log_id); + + if (!input_blob) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed mutable depended argument, op:" << pre_name; + return -1; + } + + const TensorVector *in = &input_blob->tensor_vector; + TensorVector *out = &output_blob->tensor_vector; + + int batch_size = input_blob->_batch_size; + output_blob->_batch_size = batch_size; + VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size; + + Timer timeline; + int64_t start = timeline.TimeStampUS(); + timeline.Start(); + + // only support string type + char *total_input_ptr = static_cast(in->at(0).data.data()); + std::string base64str = total_input_ptr; + + cv::Mat img = Base2Mat(base64str); + cv::cvtColor(img, img, cv::COLOR_BGR2RGB); + + // preprocess + std::vector input(1 * 3 * im_shape_h * im_shape_w, 0.0f); + preprocess_det(img, input.data(), scale_factor_h, scale_factor_w, im_shape_h, + im_shape_w, mean_, scale_, is_scale_); + + // create real_in + TensorVector *real_in = new TensorVector(); + if (!real_in) { + LOG(ERROR) << "real_in is nullptr,error"; + return -1; + } + + int in_num = 0; + size_t databuf_size = 0; + void *databuf_data = NULL; + char *databuf_char = NULL; + + // im_shape + std::vector im_shape{static_cast(im_shape_h), + static_cast(im_shape_w)}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, im_shape.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_0(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_0; + tensor_in_0.name = "im_shape"; + tensor_in_0.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_0.shape = {1, 2}; + tensor_in_0.lod = in->at(0).lod; + tensor_in_0.data = paddleBuf_0; + real_in->push_back(tensor_in_0); + + // image + in_num = 1 * 3 * im_shape_h * im_shape_w; + databuf_size = in_num * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, input.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_1(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_1; + tensor_in_1.name = "image"; + tensor_in_1.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_1.shape = {1, 3, im_shape_h, im_shape_w}; + tensor_in_1.lod = in->at(0).lod; + tensor_in_1.data = paddleBuf_1; + real_in->push_back(tensor_in_1); + + // scale_factor + std::vector scale_factor{scale_factor_h, scale_factor_w}; + databuf_size = 2 * sizeof(float); + + databuf_data = MempoolWrapper::instance().malloc(databuf_size); + if (!databuf_data) { + LOG(ERROR) << "Malloc failed, size: " << databuf_size; + return -1; + } + + memcpy(databuf_data, scale_factor.data(), databuf_size); + databuf_char = reinterpret_cast(databuf_data); + paddle::PaddleBuf paddleBuf_2(databuf_char, databuf_size); + paddle::PaddleTensor tensor_in_2; + tensor_in_2.name = "scale_factor"; + tensor_in_2.dtype = paddle::PaddleDType::FLOAT32; + tensor_in_2.shape = {1, 2}; + tensor_in_2.lod = in->at(0).lod; + tensor_in_2.data = paddleBuf_2; + real_in->push_back(tensor_in_2); + + if (InferManager::instance().infer(engine_name().c_str(), real_in, out, + batch_size)) { + LOG(ERROR) << "(logid=" << log_id + << ") Failed do infer in fluid model: " << engine_name().c_str(); + return -1; + } + + int64_t end = timeline.TimeStampUS(); + CopyBlobInfo(input_blob, output_blob); + AddBlobInfo(output_blob, start); + AddBlobInfo(output_blob, end); + return 0; +} + +void yolov3_darknet53_270e_coco::preprocess_det(const cv::Mat &img, float *data, + float &scale_factor_h, + float &scale_factor_w, + int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, + const bool is_scale) { + // scale_factor + scale_factor_h = + static_cast(im_shape_h) / static_cast(img.rows); + scale_factor_w = + static_cast(im_shape_w) / static_cast(img.cols); + + // Resize + cv::Mat resize_img; + cv::resize(img, resize_img, cv::Size(im_shape_w, im_shape_h), 0, 0, 2); + + // Normalize + double e = 1.0; + if (is_scale) { + e /= 255.0; + } + cv::Mat img_fp; + (resize_img).convertTo(img_fp, CV_32FC3, e); + for (int h = 0; h < im_shape_h; h++) { + for (int w = 0; w < im_shape_w; w++) { + img_fp.at(h, w)[0] = + (img_fp.at(h, w)[0] - mean[0]) / scale[0]; + img_fp.at(h, w)[1] = + (img_fp.at(h, w)[1] - mean[1]) / scale[1]; + img_fp.at(h, w)[2] = + (img_fp.at(h, w)[2] - mean[2]) / scale[2]; + } + } + + // Permute + int rh = img_fp.rows; + int rw = img_fp.cols; + int rc = img_fp.channels(); + for (int i = 0; i < rc; ++i) { + cv::extractChannel(img_fp, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), + i); + } +} + +cv::Mat yolov3_darknet53_270e_coco::Base2Mat(std::string &base64_data) { + cv::Mat img; + std::string s_mat; + s_mat = base64Decode(base64_data.data(), base64_data.size()); + std::vector base64_img(s_mat.begin(), s_mat.end()); + img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR + return img; +} + +std::string yolov3_darknet53_270e_coco::base64Decode(const char *Data, + int DataByte) { + const char DecodeTable[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, + 62, // '+' + 0, 0, 0, + 63, // '/' + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9' + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z' + 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, + 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z' + }; + + std::string strDecode; + int nValue; + int i = 0; + while (i < DataByte) { + if (*Data != '\r' && *Data != '\n') { + nValue = DecodeTable[*Data++] << 18; + nValue += DecodeTable[*Data++] << 12; + strDecode += (nValue & 0x00FF0000) >> 16; + if (*Data != '=') { + nValue += DecodeTable[*Data++] << 6; + strDecode += (nValue & 0x0000FF00) >> 8; + if (*Data != '=') { + nValue += DecodeTable[*Data++]; + strDecode += nValue & 0x000000FF; + } + } + i += 4; + } else // 回车换行,跳过 + { + Data++; + i++; + } + } + return strDecode; +} + +DEFINE_OP(yolov3_darknet53_270e_coco); + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.h b/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.h new file mode 100644 index 0000000000000000000000000000000000000000..67593040eadd664d49981c66f37d4e689807ec8f --- /dev/null +++ b/deploy/serving/cpp/preprocess/yolov3_darknet53_270e_coco.h @@ -0,0 +1,69 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once +#include "core/general-server/general_model_service.pb.h" +#include "core/general-server/op/general_infer_helper.h" +#include "paddle_inference_api.h" // NOLINT +#include +#include + +#include "opencv2/core.hpp" +#include "opencv2/imgcodecs.hpp" +#include "opencv2/imgproc.hpp" +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace baidu { +namespace paddle_serving { +namespace serving { + +class yolov3_darknet53_270e_coco + : public baidu::paddle_serving::predictor::OpWithChannel { +public: + typedef std::vector TensorVector; + + DECLARE_OP(yolov3_darknet53_270e_coco); + + int inference(); + +private: + // preprocess + std::vector mean_ = {0.485f, 0.456f, 0.406f}; + std::vector scale_ = {0.229f, 0.224f, 0.225f}; + bool is_scale_ = true; + int im_shape_h = 608; + int im_shape_w = 608; + float scale_factor_h = 1.0f; + float scale_factor_w = 1.0f; + void preprocess_det(const cv::Mat &img, float *data, float &scale_factor_h, + float &scale_factor_w, int im_shape_h, int im_shape_w, + const std::vector &mean, + const std::vector &scale, const bool is_scale); + + // read pics + cv::Mat Base2Mat(std::string &base64_data); + std::string base64Decode(const char *Data, int DataByte); +}; + +} // namespace serving +} // namespace paddle_serving +} // namespace baidu diff --git a/deploy/serving/cpp/serving_client.py b/deploy/serving/cpp/serving_client.py new file mode 100644 index 0000000000000000000000000000000000000000..49134c30569d60533b131b8a25d6584ab782329c --- /dev/null +++ b/deploy/serving/cpp/serving_client.py @@ -0,0 +1,125 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import glob +import base64 +import argparse +from paddle_serving_client import Client +from paddle_serving_client.proto import general_model_config_pb2 as m_config +import google.protobuf.text_format + +parser = argparse.ArgumentParser(description="args for paddleserving") +parser.add_argument( + "--serving_client", type=str, help="the directory of serving_client") +parser.add_argument("--image_dir", type=str) +parser.add_argument("--image_file", type=str) +parser.add_argument("--http_port", type=int, default=9997) +parser.add_argument( + "--threshold", type=float, default=0.5, help="Threshold of score.") +args = parser.parse_args() + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--image_file or --image_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +def postprocess(fetch_dict, fetch_vars, draw_threshold=0.5): + result = [] + if "conv2d_441.tmp_1" in fetch_dict: + heatmap = fetch_dict["conv2d_441.tmp_1"] + print(heatmap) + result.append(heatmap) + else: + bboxes = fetch_dict[fetch_vars[0]] + for bbox in bboxes: + if bbox[0] > -1 and bbox[1] > draw_threshold: + print(f"{int(bbox[0])} {bbox[1]} " + f"{bbox[2]} {bbox[3]} {bbox[4]} {bbox[5]}") + result.append(f"{int(bbox[0])} {bbox[1]} " + f"{bbox[2]} {bbox[3]} {bbox[4]} {bbox[5]}") + return result + + +def get_model_vars(client_config_dir): + # read original serving_client_conf.prototxt + client_config_file = os.path.join(client_config_dir, + "serving_client_conf.prototxt") + with open(client_config_file, 'r') as f: + model_var = google.protobuf.text_format.Merge( + str(f.read()), m_config.GeneralModelConfig()) + # modify feed_var to run core/general-server/op/ + [model_var.feed_var.pop() for _ in range(len(model_var.feed_var))] + feed_var = m_config.FeedVar() + feed_var.name = "input" + feed_var.alias_name = "input" + feed_var.is_lod_tensor = False + feed_var.feed_type = 20 + feed_var.shape.extend([1]) + model_var.feed_var.extend([feed_var]) + with open( + os.path.join(client_config_dir, "serving_client_conf_cpp.prototxt"), + "w") as f: + f.write(str(model_var)) + # get feed_vars/fetch_vars + feed_vars = [var.name for var in model_var.feed_var] + fetch_vars = [var.name for var in model_var.fetch_var] + return feed_vars, fetch_vars + + +if __name__ == '__main__': + url = f"127.0.0.1:{args.http_port}" + logid = 10000 + img_list = get_test_images(args.image_dir, args.image_file) + feed_vars, fetch_vars = get_model_vars(args.serving_client) + + client = Client() + client.load_client_config( + os.path.join(args.serving_client, "serving_client_conf_cpp.prototxt")) + client.connect([url]) + + for img_file in img_list: + with open(img_file, 'rb') as file: + image_data = file.read() + image = base64.b64encode(image_data).decode('utf8') + fetch_dict = client.predict( + feed={feed_vars[0]: image}, fetch=fetch_vars) + result = postprocess(fetch_dict, fetch_vars, args.threshold) diff --git a/deploy/serving/cpp/serving_client_conf.prototxt b/deploy/serving/cpp/serving_client_conf.prototxt new file mode 100644 index 0000000000000000000000000000000000000000..fb069003ab8a6b8163d7e06d7760b1c6c42b196a --- /dev/null +++ b/deploy/serving/cpp/serving_client_conf.prototxt @@ -0,0 +1,20 @@ +feed_var { + name: "input" + alias_name: "input" + is_lod_tensor: false + feed_type: 20 + shape: 1 +} +fetch_var { + name: "multiclass_nms3_0.tmp_0" + alias_name: "multiclass_nms3_0.tmp_0" + is_lod_tensor: true + fetch_type: 1 + shape: -1 +} +fetch_var { + name: "multiclass_nms3_0.tmp_2" + alias_name: "multiclass_nms3_0.tmp_2" + is_lod_tensor: false + fetch_type: 2 +} \ No newline at end of file diff --git a/deploy/serving/python/config.yml b/deploy/serving/python/config.yml new file mode 100644 index 0000000000000000000000000000000000000000..5ec4285257d618f6c5a7ed02aab5c34dae9a96e1 --- /dev/null +++ b/deploy/serving/python/config.yml @@ -0,0 +1,31 @@ +#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG +##当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num +worker_num: 20 + +#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port +http_port: 18093 +rpc_port: 9993 + +dag: + #op资源类型, True, 为线程模型;False,为进程模型 + is_thread_op: False +op: + #op名称,与web_service中的TIPCExampleService初始化name参数一致 + ppdet: + #并发数,is_thread_op=True时,为线程并发;否则为进程并发 + concurrency: 1 + + #当op配置没有server_endpoints时,从local_service_conf读取本地服务配置 + local_service_conf: + + #uci模型路径 + model_config: "./serving_server" + + #计算硬件类型: 空缺时由devices决定(CPU/GPU),0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu + device_type: + + #计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡 + devices: "0" # "0,1" + + #client类型,包括brpc, grpc和local_predictor.local_predictor不启动Serving服务,进程内预测 + client_type: local_predictor diff --git a/deploy/serving/python/pipeline_http_client.py b/deploy/serving/python/pipeline_http_client.py new file mode 100644 index 0000000000000000000000000000000000000000..fa9b30c0d79bf5a7e0d5da7a2538580e7452f8bb --- /dev/null +++ b/deploy/serving/python/pipeline_http_client.py @@ -0,0 +1,76 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import glob +import requests +import json +import base64 +import os +import argparse + +parser = argparse.ArgumentParser(description="args for paddleserving") +parser.add_argument("--image_dir", type=str) +parser.add_argument("--image_file", type=str) +parser.add_argument("--http_port", type=int, default=18093) +parser.add_argument("--service_name", type=str, default="ppdet") +args = parser.parse_args() + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--image_file or --image_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +if __name__ == "__main__": + url = f"http://127.0.0.1:{args.http_port}/{args.service_name}/prediction" + logid = 10000 + img_list = get_test_images(args.image_dir, args.image_file) + + for img_file in img_list: + with open(img_file, 'rb') as file: + image_data = file.read() + + # base64 encode + image = base64.b64encode(image_data).decode('utf8') + + data = {"key": ["image_0"], "value": [image], "logid": logid} + # send requests + r = requests.post(url=url, data=json.dumps(data)) + print(r.json()) diff --git a/deploy/serving/python/postprocess_ops.py b/deploy/serving/python/postprocess_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..1836f7de776921c4dae97d42e927834a3d2d8613 --- /dev/null +++ b/deploy/serving/python/postprocess_ops.py @@ -0,0 +1,171 @@ +import cv2 +import math +import numpy as np +from preprocess_ops import get_affine_transform + + +class HRNetPostProcess(object): + def __init__(self, use_dark=True): + self.use_dark = use_dark + + def flip_back(self, output_flipped, matched_parts): + assert output_flipped.ndim == 4,\ + 'output_flipped should be [batch_size, num_joints, height, width]' + + output_flipped = output_flipped[:, :, :, ::-1] + + for pair in matched_parts: + tmp = output_flipped[:, pair[0], :, :].copy() + output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :] + output_flipped[:, pair[1], :, :] = tmp + + return output_flipped + + def get_max_preds(self, heatmaps): + """get predictions from score maps + + Args: + heatmaps: numpy.ndarray([batch_size, num_joints, height, width]) + + Returns: + preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords + maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints + """ + assert isinstance(heatmaps, + np.ndarray), 'heatmaps should be numpy.ndarray' + assert heatmaps.ndim == 4, 'batch_images should be 4-ndim' + + batch_size = heatmaps.shape[0] + num_joints = heatmaps.shape[1] + width = heatmaps.shape[3] + heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1)) + idx = np.argmax(heatmaps_reshaped, 2) + maxvals = np.amax(heatmaps_reshaped, 2) + + maxvals = maxvals.reshape((batch_size, num_joints, 1)) + idx = idx.reshape((batch_size, num_joints, 1)) + + preds = np.tile(idx, (1, 1, 2)).astype(np.float32) + + preds[:, :, 0] = (preds[:, :, 0]) % width + preds[:, :, 1] = np.floor((preds[:, :, 1]) / width) + + pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2)) + pred_mask = pred_mask.astype(np.float32) + + preds *= pred_mask + + return preds, maxvals + + def gaussian_blur(self, heatmap, kernel): + border = (kernel - 1) // 2 + batch_size = heatmap.shape[0] + num_joints = heatmap.shape[1] + height = heatmap.shape[2] + width = heatmap.shape[3] + for i in range(batch_size): + for j in range(num_joints): + origin_max = np.max(heatmap[i, j]) + dr = np.zeros((height + 2 * border, width + 2 * border)) + dr[border:-border, border:-border] = heatmap[i, j].copy() + dr = cv2.GaussianBlur(dr, (kernel, kernel), 0) + heatmap[i, j] = dr[border:-border, border:-border].copy() + heatmap[i, j] *= origin_max / np.max(heatmap[i, j]) + return heatmap + + def dark_parse(self, hm, coord): + heatmap_height = hm.shape[0] + heatmap_width = hm.shape[1] + px = int(coord[0]) + py = int(coord[1]) + if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2: + dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1]) + dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px]) + dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2]) + dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \ + + hm[py-1][px-1]) + dyy = 0.25 * ( + hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px]) + derivative = np.matrix([[dx], [dy]]) + hessian = np.matrix([[dxx, dxy], [dxy, dyy]]) + if dxx * dyy - dxy**2 != 0: + hessianinv = hessian.I + offset = -hessianinv * derivative + offset = np.squeeze(np.array(offset.T), axis=0) + coord += offset + return coord + + def dark_postprocess(self, hm, coords, kernelsize): + """ + refer to https://github.com/ilovepose/DarkPose/lib/core/inference.py + + """ + hm = self.gaussian_blur(hm, kernelsize) + hm = np.maximum(hm, 1e-10) + hm = np.log(hm) + for n in range(coords.shape[0]): + for p in range(coords.shape[1]): + coords[n, p] = self.dark_parse(hm[n][p], coords[n][p]) + return coords + + def get_final_preds(self, heatmaps, center, scale, kernelsize=3): + """the highest heatvalue location with a quarter offset in the + direction from the highest response to the second highest response. + + Args: + heatmaps (numpy.ndarray): The predicted heatmaps + center (numpy.ndarray): The boxes center + scale (numpy.ndarray): The scale factor + + Returns: + preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords + maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints + """ + + coords, maxvals = self.get_max_preds(heatmaps) + + heatmap_height = heatmaps.shape[2] + heatmap_width = heatmaps.shape[3] + + if self.use_dark: + coords = self.dark_postprocess(heatmaps, coords, kernelsize) + else: + for n in range(coords.shape[0]): + for p in range(coords.shape[1]): + hm = heatmaps[n][p] + px = int(math.floor(coords[n][p][0] + 0.5)) + py = int(math.floor(coords[n][p][1] + 0.5)) + if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1: + diff = np.array([ + hm[py][px + 1] - hm[py][px - 1], + hm[py + 1][px] - hm[py - 1][px] + ]) + coords[n][p] += np.sign(diff) * .25 + preds = coords.copy() + + # Transform back + for i in range(coords.shape[0]): + preds[i] = transform_preds(coords[i], center[i], scale[i], + [heatmap_width, heatmap_height]) + + return preds, maxvals + + def __call__(self, output, center, scale): + preds, maxvals = self.get_final_preds(output, center, scale) + return np.concatenate( + (preds, maxvals), axis=-1), np.mean( + maxvals, axis=1) + + +def transform_preds(coords, center, scale, output_size): + target_coords = np.zeros(coords.shape) + trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1) + for p in range(coords.shape[0]): + target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans) + return target_coords + + +def affine_transform(pt, t): + new_pt = np.array([pt[0], pt[1], 1.]).T + new_pt = np.dot(t, new_pt) + return new_pt[:2] diff --git a/deploy/serving/python/preprocess_ops.py b/deploy/serving/python/preprocess_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..0a419688b1db243179082487d1c19b2a9a4e3d4b --- /dev/null +++ b/deploy/serving/python/preprocess_ops.py @@ -0,0 +1,487 @@ +import numpy as np +import cv2 +import copy + + +def decode_image(im): + im = np.array(im) + img_info = { + "im_shape": np.array( + im.shape[:2], dtype=np.float32), + "scale_factor": np.array( + [1., 1.], dtype=np.float32) + } + return im, img_info + + +class Resize(object): + """resize image by target_size and max_size + Args: + target_size (int): the target size of image + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): method of resize + """ + + def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR): + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + self.keep_ratio = keep_ratio + self.interp = interp + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, im): + """ + Args: + im (np.ndarray): image (np.ndarray) + Returns: + im_scale_x: the resize ratio of X + im_scale_y: the resize ratio of Y + """ + origin_shape = im.shape[:2] + im_c = im.shape[2] + if self.keep_ratio: + im_size_min = np.min(origin_shape) + im_size_max = np.max(origin_shape) + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + im_scale = float(target_size_min) / float(im_size_min) + if np.round(im_scale * im_size_max) > target_size_max: + im_scale = float(target_size_max) / float(im_size_max) + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / float(origin_shape[0]) + im_scale_x = resize_w / float(origin_shape[1]) + return im_scale_y, im_scale_x + + +class NormalizeImage(object): + """normalize image + Args: + mean (list): im - mean + std (list): im / std + is_scale (bool): whether need im / 255 + is_channel_first (bool): if True: image shape is CHW, else: HWC + """ + + def __init__(self, mean, std, is_scale=True): + self.mean = mean + self.std = std + self.is_scale = is_scale + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.astype(np.float32, copy=False) + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + + if self.is_scale: + im = im / 255.0 + im -= mean + im /= std + return im, im_info + + +class Permute(object): + """permute image + Args: + to_bgr (bool): whether convert RGB to BGR + channel_first (bool): whether convert HWC to CHW + """ + + def __init__(self, ): + super(Permute, self).__init__() + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.transpose((2, 0, 1)).copy() + return im, im_info + + +class PadStride(object): + """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config + Args: + stride (bool): model with FPN need image shape % stride == 0 + """ + + def __init__(self, stride=0): + self.coarsest_stride = stride + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + coarsest_stride = self.coarsest_stride + if coarsest_stride <= 0: + return im, im_info + im_c, im_h, im_w = im.shape + pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride) + pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride) + padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + return padding_im, im_info + + +class LetterBoxResize(object): + def __init__(self, target_size): + """ + Resize image to target size, convert normalized xywh to pixel xyxy + format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]). + Args: + target_size (int|list): image target size. + """ + super(LetterBoxResize, self).__init__() + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + + def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)): + # letterbox: resize a rectangular image to a padded rectangular + shape = img.shape[:2] # [height, width] + ratio_h = float(height) / shape[0] + ratio_w = float(width) / shape[1] + ratio = min(ratio_h, ratio_w) + new_shape = (round(shape[1] * ratio), + round(shape[0] * ratio)) # [width, height] + padw = (width - new_shape[0]) / 2 + padh = (height - new_shape[1]) / 2 + top, bottom = round(padh - 0.1), round(padh + 0.1) + left, right = round(padw - 0.1), round(padw + 0.1) + + img = cv2.resize( + img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border + img = cv2.copyMakeBorder( + img, top, bottom, left, right, cv2.BORDER_CONSTANT, + value=color) # padded rectangular + return img, ratio, padw, padh + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + height, width = self.target_size + h, w = im.shape[:2] + im, ratio, padw, padh = self.letterbox(im, height=height, width=width) + + new_shape = [round(h * ratio), round(w * ratio)] + im_info['im_shape'] = np.array(new_shape, dtype=np.float32) + im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32) + return im, im_info + + +class Pad(object): + def __init__(self, size, fill_value=[114.0, 114.0, 114.0]): + """ + Pad image to a specified size. + Args: + size (list[int]): image target size + fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0) + """ + super(Pad, self).__init__() + if isinstance(size, int): + size = [size, size] + self.size = size + self.fill_value = fill_value + + def __call__(self, im, im_info): + im_h, im_w = im.shape[:2] + h, w = self.size + if h == im_h and w == im_w: + im = im.astype(np.float32) + return im, im_info + + canvas = np.ones((h, w, 3), dtype=np.float32) + canvas *= np.array(self.fill_value, dtype=np.float32) + canvas[0:im_h, 0:im_w, :] = im.astype(np.float32) + im = canvas + return im, im_info + + +def rotate_point(pt, angle_rad): + """Rotate a point by an angle. + + Args: + pt (list[float]): 2 dimensional point to be rotated + angle_rad (float): rotation angle by radian + + Returns: + list[float]: Rotated point. + """ + assert len(pt) == 2 + sn, cs = np.sin(angle_rad), np.cos(angle_rad) + new_x = pt[0] * cs - pt[1] * sn + new_y = pt[0] * sn + pt[1] * cs + rotated_pt = [new_x, new_y] + + return rotated_pt + + +def _get_3rd_point(a, b): + """To calculate the affine matrix, three pairs of points are required. This + function is used to get the 3rd point, given 2D points a & b. + + The 3rd point is defined by rotating vector `a - b` by 90 degrees + anticlockwise, using b as the rotation center. + + Args: + a (np.ndarray): point(x,y) + b (np.ndarray): point(x,y) + + Returns: + np.ndarray: The 3rd point. + """ + assert len(a) == 2 + assert len(b) == 2 + direction = a - b + third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32) + + return third_pt + + +def get_affine_transform(center, + input_size, + rot, + output_size, + shift=(0., 0.), + inv=False): + """Get the affine transform matrix, given the center/scale/rot/output_size. + + Args: + center (np.ndarray[2, ]): Center of the bounding box (x, y). + scale (np.ndarray[2, ]): Scale of the bounding box + wrt [width, height]. + rot (float): Rotation angle (degree). + output_size (np.ndarray[2, ]): Size of the destination heatmaps. + shift (0-100%): Shift translation ratio wrt the width/height. + Default (0., 0.). + inv (bool): Option to inverse the affine transform direction. + (inv=False: src->dst or inv=True: dst->src) + + Returns: + np.ndarray: The transform matrix. + """ + assert len(center) == 2 + assert len(output_size) == 2 + assert len(shift) == 2 + if not isinstance(input_size, (np.ndarray, list)): + input_size = np.array([input_size, input_size], dtype=np.float32) + scale_tmp = input_size + + shift = np.array(shift) + src_w = scale_tmp[0] + dst_w = output_size[0] + dst_h = output_size[1] + + rot_rad = np.pi * rot / 180 + src_dir = rotate_point([0., src_w * -0.5], rot_rad) + dst_dir = np.array([0., dst_w * -0.5]) + + src = np.zeros((3, 2), dtype=np.float32) + src[0, :] = center + scale_tmp * shift + src[1, :] = center + src_dir + scale_tmp * shift + src[2, :] = _get_3rd_point(src[0, :], src[1, :]) + + dst = np.zeros((3, 2), dtype=np.float32) + dst[0, :] = [dst_w * 0.5, dst_h * 0.5] + dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir + dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :]) + + if inv: + trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) + else: + trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) + + return trans + + +class WarpAffine(object): + """Warp affine the image + """ + + def __init__(self, + keep_res=False, + pad=31, + input_h=512, + input_w=512, + scale=0.4, + shift=0.1): + self.keep_res = keep_res + self.pad = pad + self.input_h = input_h + self.input_w = input_w + self.scale = scale + self.shift = shift + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) + + h, w = img.shape[:2] + + if self.keep_res: + input_h = (h | self.pad) + 1 + input_w = (w | self.pad) + 1 + s = np.array([input_w, input_h], dtype=np.float32) + c = np.array([w // 2, h // 2], dtype=np.float32) + + else: + s = max(h, w) * 1.0 + input_h, input_w = self.input_h, self.input_w + c = np.array([w / 2., h / 2.], dtype=np.float32) + + trans_input = get_affine_transform(c, s, 0, [input_w, input_h]) + img = cv2.resize(img, (w, h)) + inp = cv2.warpAffine( + img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR) + return inp, im_info + + +# keypoint preprocess +def get_warp_matrix(theta, size_input, size_dst, size_target): + """This code is based on + https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py + + Calculate the transformation matrix under the constraint of unbiased. + Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased + Data Processing for Human Pose Estimation (CVPR 2020). + + Args: + theta (float): Rotation angle in degrees. + size_input (np.ndarray): Size of input image [w, h]. + size_dst (np.ndarray): Size of output image [w, h]. + size_target (np.ndarray): Size of ROI in input plane [w, h]. + + Returns: + matrix (np.ndarray): A matrix for transformation. + """ + theta = np.deg2rad(theta) + matrix = np.zeros((2, 3), dtype=np.float32) + scale_x = size_dst[0] / size_target[0] + scale_y = size_dst[1] / size_target[1] + matrix[0, 0] = np.cos(theta) * scale_x + matrix[0, 1] = -np.sin(theta) * scale_x + matrix[0, 2] = scale_x * ( + -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] * + np.sin(theta) + 0.5 * size_target[0]) + matrix[1, 0] = np.sin(theta) * scale_y + matrix[1, 1] = np.cos(theta) * scale_y + matrix[1, 2] = scale_y * ( + -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] * + np.cos(theta) + 0.5 * size_target[1]) + return matrix + + +class TopDownEvalAffine(object): + """apply affine transform to image and coords + + Args: + trainsize (list): [w, h], the standard size used to train + use_udp (bool): whether to use Unbiased Data Processing. + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, trainsize, use_udp=False): + self.trainsize = trainsize + self.use_udp = use_udp + + def __call__(self, image, im_info): + rot = 0 + imshape = im_info['im_shape'][::-1] + center = im_info['center'] if 'center' in im_info else imshape / 2. + scale = im_info['scale'] if 'scale' in im_info else imshape + if self.use_udp: + trans = get_warp_matrix( + rot, center * 2.0, + [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + else: + trans = get_affine_transform(center, scale, rot, self.trainsize) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + + return image, im_info + + +class Compose: + def __init__(self, transforms): + self.transforms = [] + for op_info in transforms: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + self.transforms.append(eval(op_type)(**new_op_info)) + + def __call__(self, img): + img, im_info = decode_image(img) + for t in self.transforms: + img, im_info = t(img, im_info) + inputs = copy.deepcopy(im_info) + inputs['image'] = img + return inputs diff --git a/deploy/serving/python/web_service.py b/deploy/serving/python/web_service.py new file mode 100644 index 0000000000000000000000000000000000000000..2fbb4ed839d15d102f157a5f5cc780d1efebd267 --- /dev/null +++ b/deploy/serving/python/web_service.py @@ -0,0 +1,255 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import copy + +from paddle_serving_server.web_service import WebService, Op +from paddle_serving_server.proto import general_model_config_pb2 as m_config +import google.protobuf.text_format + +import os +import numpy as np +import base64 +from PIL import Image +import io +from preprocess_ops import Compose +from postprocess_ops import HRNetPostProcess + +from argparse import ArgumentParser, RawDescriptionHelpFormatter +import yaml + +# Global dictionary +SUPPORT_MODELS = { + 'YOLO', 'RCNN', 'SSD', 'Face', 'FCOS', 'SOLOv2', 'TTFNet', 'S2ANet', 'JDE', + 'FairMOT', 'DeepSORT', 'GFL', 'PicoDet', 'CenterNet', 'TOOD', 'RetinaNet', + 'StrongBaseline', 'STGCN', 'YOLOX', 'HRNet' +} + +GLOBAL_VAR = {} + + +class ArgsParser(ArgumentParser): + def __init__(self): + super(ArgsParser, self).__init__( + formatter_class=RawDescriptionHelpFormatter) + self.add_argument( + "-c", + "--config", + default="deploy/serving/python/config.yml", + help="configuration file to use") + self.add_argument( + "--model_dir", + type=str, + default=None, + help=("Directory include:'model.pdiparams', 'model.pdmodel', " + "'infer_cfg.yml', created by tools/export_model.py."), + required=True) + self.add_argument( + "-o", "--opt", nargs='+', help="set configuration options") + + def parse_args(self, argv=None): + args = super(ArgsParser, self).parse_args(argv) + assert args.config is not None, \ + "Please specify --config=configure_file_path." + args.service_config = self._parse_opt(args.opt, args.config) + print("args config:", args.service_config) + args.model_config = PredictConfig(args.model_dir) + return args + + def _parse_helper(self, v): + if v.isnumeric(): + if "." in v: + v = float(v) + else: + v = int(v) + elif v == "True" or v == "False": + v = (v == "True") + return v + + def _parse_opt(self, opts, conf_path): + f = open(conf_path) + config = yaml.load(f, Loader=yaml.Loader) + if not opts: + return config + for s in opts: + s = s.strip() + k, v = s.split('=') + v = self._parse_helper(v) + if "devices" in k: + v = str(v) + print(k, v, type(v)) + cur = config + parent = cur + for kk in k.split("."): + if kk not in cur: + cur[kk] = {} + parent = cur + cur = cur[kk] + else: + parent = cur + cur = cur[kk] + parent[k.split(".")[-1]] = v + return config + + +class PredictConfig(object): + """set config of preprocess, postprocess and visualize + Args: + model_dir (str): root path of infer_cfg.yml + """ + + def __init__(self, model_dir): + # parsing Yaml config for Preprocess + deploy_file = os.path.join(model_dir, 'infer_cfg.yml') + with open(deploy_file) as f: + yml_conf = yaml.safe_load(f) + self.check_model(yml_conf) + self.arch = yml_conf['arch'] + self.preprocess_infos = yml_conf['Preprocess'] + self.min_subgraph_size = yml_conf['min_subgraph_size'] + self.label_list = yml_conf['label_list'] + self.use_dynamic_shape = yml_conf['use_dynamic_shape'] + self.draw_threshold = yml_conf.get("draw_threshold", 0.5) + self.mask = yml_conf.get("mask", False) + self.tracker = yml_conf.get("tracker", None) + self.nms = yml_conf.get("NMS", None) + self.fpn_stride = yml_conf.get("fpn_stride", None) + if self.arch == 'RCNN' and yml_conf.get('export_onnx', False): + print( + 'The RCNN export model is used for ONNX and it only supports batch_size = 1' + ) + self.print_config() + + def check_model(self, yml_conf): + """ + Raises: + ValueError: loaded model not in supported model type + """ + for support_model in SUPPORT_MODELS: + if support_model in yml_conf['arch']: + return True + raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[ + 'arch'], SUPPORT_MODELS)) + + def print_config(self): + print('----------- Model Configuration -----------') + print('%s: %s' % ('Model Arch', self.arch)) + print('%s: ' % ('Transform Order')) + for op_info in self.preprocess_infos: + print('--%s: %s' % ('transform op', op_info['type'])) + print('--------------------------------------------') + + +class DetectorOp(Op): + def init_op(self): + self.preprocess_pipeline = Compose(GLOBAL_VAR['preprocess_ops']) + + def preprocess(self, input_dicts, data_id, log_id): + (_, input_dict), = input_dicts.items() + inputs = [] + for key, data in input_dict.items(): + data = base64.b64decode(data.encode('utf8')) + byte_stream = io.BytesIO(data) + img = Image.open(byte_stream).convert("RGB") + inputs.append(self.preprocess_pipeline(img)) + inputs = self.collate_inputs(inputs) + return inputs, False, None, "" + + def postprocess(self, input_dicts, fetch_dict, data_id, log_id): + (_, input_dict), = input_dicts.items() + if GLOBAL_VAR['model_config'].arch in ["HRNet"]: + result = self.parse_keypoint_result(input_dict, fetch_dict) + else: + result = self.parse_detection_result(input_dict, fetch_dict) + return result, None, "" + + def collate_inputs(self, inputs): + collate_inputs = {k: [] for k in inputs[0].keys()} + for info in inputs: + for k in collate_inputs.keys(): + collate_inputs[k].append(info[k]) + return { + k: np.stack(v) + for k, v in collate_inputs.items() if k in GLOBAL_VAR['feed_vars'] + } + + def parse_detection_result(self, input_dict, fetch_dict): + bboxes = fetch_dict[GLOBAL_VAR['fetch_vars'][0]] + bboxes_num = fetch_dict[GLOBAL_VAR['fetch_vars'][1]] + if GLOBAL_VAR['model_config'].mask: + masks = fetch_dict[GLOBAL_VAR['fetch_vars'][2]] + idx = 0 + results = {} + for img_name, num in zip(input_dict.keys(), bboxes_num): + result = [] + bbox = bboxes[idx:idx + num] + for line in bbox: + if line[0] > -1 and line[1] > GLOBAL_VAR[ + 'model_config'].draw_threshold: + result.append(f"{int(line[0])} {line[1]} " + f"{line[2]} {line[3]} {line[4]} {line[5]}") + results[img_name] = result + idx += num + return results + + def parse_keypoint_result(self, input_dict, fetch_dict): + heatmap = fetch_dict["conv2d_441.tmp_1"] + keypoint_postprocess = HRNetPostProcess() + im_shape = [] + for key, data in input_dict.items(): + data = base64.b64decode(data.encode('utf8')) + byte_stream = io.BytesIO(data) + img = Image.open(byte_stream).convert("RGB") + im_shape.append([img.width, img.height]) + im_shape = np.array(im_shape) + center = np.round(im_shape / 2.) + scale = im_shape / 200. + kpts, scores = keypoint_postprocess(heatmap, center, scale) + results = {"keypoint": kpts, "scores": scores} + return results + + +class DetectorService(WebService): + def get_pipeline_response(self, read_op): + return DetectorOp(name="ppdet", input_ops=[read_op]) + + +def get_model_vars(model_dir, service_config): + serving_server_dir = os.path.join(model_dir, "serving_server") + # rewrite model_config + service_config['op']['ppdet']['local_service_conf'][ + 'model_config'] = serving_server_dir + serving_server_conf = os.path.join(serving_server_dir, + "serving_server_conf.prototxt") + with open(serving_server_conf, 'r') as f: + model_var = google.protobuf.text_format.Merge( + str(f.read()), m_config.GeneralModelConfig()) + feed_vars = [var.name for var in model_var.feed_var] + fetch_vars = [var.name for var in model_var.fetch_var] + return feed_vars, fetch_vars + + +if __name__ == '__main__': + # load config and prepare the service + FLAGS = ArgsParser().parse_args() + feed_vars, fetch_vars = get_model_vars(FLAGS.model_dir, + FLAGS.service_config) + GLOBAL_VAR['feed_vars'] = feed_vars + GLOBAL_VAR['fetch_vars'] = fetch_vars + GLOBAL_VAR['preprocess_ops'] = FLAGS.model_config.preprocess_infos + GLOBAL_VAR['model_config'] = FLAGS.model_config + # define the service + uci_service = DetectorService(name="ppdet") + uci_service.prepare_pipeline_config(yml_dict=FLAGS.service_config) + # start the service + uci_service.run_service() diff --git a/deploy/third_engine/demo_avh/Makefile b/deploy/third_engine/demo_avh/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..4ea570578809f3d86d8bc75b7c1d41ad50819b58 --- /dev/null +++ b/deploy/third_engine/demo_avh/Makefile @@ -0,0 +1,114 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Makefile to build demo + +# Setup build environment +BUILD_DIR := build + +ARM_CPU = ARMCM55 +ETHOSU_PATH = /opt/arm/ethosu +CMSIS_PATH ?= ${ETHOSU_PATH}/cmsis +ETHOSU_PLATFORM_PATH ?= ${ETHOSU_PATH}/core_platform +STANDALONE_CRT_PATH := $(abspath $(BUILD_DIR))/runtime +CORSTONE_300_PATH = ${ETHOSU_PLATFORM_PATH}/targets/corstone-300 +PKG_COMPILE_OPTS = -g -Wall -O2 -Wno-incompatible-pointer-types -Wno-format -mcpu=cortex-m55 -mthumb -mfloat-abi=hard -std=gnu99 +CMAKE ?= cmake +CC = arm-none-eabi-gcc +AR = arm-none-eabi-ar +RANLIB = arm-none-eabi-ranlib +PKG_CFLAGS = ${PKG_COMPILE_OPTS} \ + -I${STANDALONE_CRT_PATH}/include \ + -I${STANDALONE_CRT_PATH}/src/runtime/crt/include \ + -I${PWD}/include \ + -I${CORSTONE_300_PATH} \ + -I${CMSIS_PATH}/Device/ARM/${ARM_CPU}/Include/ \ + -I${CMSIS_PATH}/CMSIS/Core/Include \ + -I${CMSIS_PATH}/CMSIS/NN/Include \ + -I${CMSIS_PATH}/CMSIS/DSP/Include \ + -I$(abspath $(BUILD_DIR))/codegen/host/include +CMSIS_NN_CMAKE_FLAGS = -DCMAKE_TOOLCHAIN_FILE=$(abspath $(BUILD_DIR))/../arm-none-eabi-gcc.cmake \ + -DTARGET_CPU=cortex-m55 \ + -DBUILD_CMSIS_NN_FUNCTIONS=YES +PKG_LDFLAGS = -lm -specs=nosys.specs -static -T corstone300.ld + +$(ifeq VERBOSE,1) +QUIET ?= +$(else) +QUIET ?= @ +$(endif) + +DEMO_MAIN = src/demo_bare_metal.c +CODEGEN_SRCS = $(wildcard $(abspath $(BUILD_DIR))/codegen/host/src/*.c) +CODEGEN_OBJS = $(subst .c,.o,$(CODEGEN_SRCS)) +CMSIS_STARTUP_SRCS = $(wildcard ${CMSIS_PATH}/Device/ARM/${ARM_CPU}/Source/*.c) +UART_SRCS = $(wildcard ${CORSTONE_300_PATH}/*.c) + +demo: $(BUILD_DIR)/demo + +$(BUILD_DIR)/stack_allocator.o: $(STANDALONE_CRT_PATH)/src/runtime/crt/memory/stack_allocator.c + $(QUIET)mkdir -p $(@D) + $(QUIET)$(CC) -c $(PKG_CFLAGS) -o $@ $^ + +$(BUILD_DIR)/crt_backend_api.o: $(STANDALONE_CRT_PATH)/src/runtime/crt/common/crt_backend_api.c + $(QUIET)mkdir -p $(@D) + $(QUIET)$(CC) -c $(PKG_CFLAGS) -o $@ $^ + +# Build generated code +$(BUILD_DIR)/libcodegen.a: $(CODEGEN_SRCS) + $(QUIET)cd $(abspath $(BUILD_DIR)/codegen/host/src) && $(CC) -c $(PKG_CFLAGS) $(CODEGEN_SRCS) + $(QUIET)$(AR) -cr $(abspath $(BUILD_DIR)/libcodegen.a) $(CODEGEN_OBJS) + $(QUIET)$(RANLIB) $(abspath $(BUILD_DIR)/libcodegen.a) + +# Build CMSIS startup code +${BUILD_DIR}/libcmsis_startup.a: $(CMSIS_STARTUP_SRCS) + $(QUIET)mkdir -p $(abspath $(BUILD_DIR)/libcmsis_startup) + $(QUIET)cd $(abspath $(BUILD_DIR)/libcmsis_startup) && $(CC) -c $(PKG_CFLAGS) -D${ARM_CPU} $^ + $(QUIET)$(AR) -cr $(abspath $(BUILD_DIR)/libcmsis_startup.a) $(abspath $(BUILD_DIR))/libcmsis_startup/*.o + $(QUIET)$(RANLIB) $(abspath $(BUILD_DIR)/libcmsis_startup.a) + +# Build CMSIS-NN +${BUILD_DIR}/cmsis_nn/Source/SoftmaxFunctions/libCMSISNNSoftmax.a: + $(QUIET)mkdir -p $(@D) + $(QUIET)cd $(CMSIS_PATH)/CMSIS/NN && $(CMAKE) -B $(abspath $(BUILD_DIR)/cmsis_nn) $(CMSIS_NN_CMAKE_FLAGS) + $(QUIET)cd $(abspath $(BUILD_DIR)/cmsis_nn) && $(MAKE) all + +# Build demo application +$(BUILD_DIR)/demo: $(DEMO_MAIN) $(UART_SRCS) $(BUILD_DIR)/stack_allocator.o $(BUILD_DIR)/crt_backend_api.o \ + ${BUILD_DIR}/libcodegen.a ${BUILD_DIR}/libcmsis_startup.a \ + ${BUILD_DIR}/cmsis_nn/Source/SoftmaxFunctions/libCMSISNNSoftmax.a \ + ${BUILD_DIR}/cmsis_nn/Source/FullyConnectedFunctions/libCMSISNNFullyConnected.a \ + ${BUILD_DIR}/cmsis_nn/Source/SVDFunctions/libCMSISNNSVDF.a \ + ${BUILD_DIR}/cmsis_nn/Source/ReshapeFunctions/libCMSISNNReshape.a \ + ${BUILD_DIR}/cmsis_nn/Source/ActivationFunctions/libCMSISNNActivation.a \ + ${BUILD_DIR}/cmsis_nn/Source/NNSupportFunctions/libCMSISNNSupport.a \ + ${BUILD_DIR}/cmsis_nn/Source/ConcatenationFunctions/libCMSISNNConcatenation.a \ + ${BUILD_DIR}/cmsis_nn/Source/BasicMathFunctions/libCMSISNNBasicMaths.a \ + ${BUILD_DIR}/cmsis_nn/Source/ConvolutionFunctions/libCMSISNNConvolutions.a \ + ${BUILD_DIR}/cmsis_nn/Source/PoolingFunctions/libCMSISNNPooling.a + $(QUIET)mkdir -p $(@D) + $(QUIET)$(CC) $(PKG_CFLAGS) $(FREERTOS_FLAGS) -o $@ -Wl,--whole-archive $^ -Wl,--no-whole-archive $(PKG_LDFLAGS) + +clean: + $(QUIET)rm -rf $(BUILD_DIR)/codegen + +cleanall: + $(QUIET)rm -rf $(BUILD_DIR) + +.SUFFIXES: + +.DEFAULT: demo diff --git a/deploy/third_engine/demo_avh/README.md b/deploy/third_engine/demo_avh/README.md new file mode 100644 index 0000000000000000000000000000000000000000..69250e5f90a0e48a21e5f360af21e9809f6750db --- /dev/null +++ b/deploy/third_engine/demo_avh/README.md @@ -0,0 +1,90 @@ + + + + + + + + + + + + + + +Running PP-PicoDet via TVM on bare metal Arm(R) Cortex(R)-M55 CPU and CMSIS-NN +=============================================================== + +This folder contains an example of how to use TVM to run a PP-PicoDet model +on bare metal Cortex(R)-M55 CPU and CMSIS-NN. + +Prerequisites +------------- +If the demo is run in the ci_cpu Docker container provided with TVM, then the following +software will already be installed. + +If the demo is not run in the ci_cpu Docker container, then you will need the following: +- Software required to build and run the demo (These can all be installed by running + tvm/docker/install/ubuntu_install_ethosu_driver_stack.sh.) + - [Fixed Virtual Platform (FVP) based on Arm(R) Corstone(TM)-300 software](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps) + - [cmake 3.19.5](https://github.com/Kitware/CMake/releases/) + - [GCC toolchain from Arm(R)](https://developer.arm.com/-/media/Files/downloads/gnu-rm/10-2020q4/gcc-arm-none-eabi-10-2020-q4-major-x86_64-linux.tar.bz2) + - [Arm(R) Ethos(TM)-U NPU driver stack](https://review.mlplatform.org) + - [CMSIS](https://github.com/ARM-software/CMSIS_5) +- The python libraries listed in the requirements.txt of this directory + - These can be installed by running the following from the current directory: + ```bash + pip install -r ./requirements.txt + ``` + +You will also need TVM which can either be: + - Built from source (see [Install from Source](https://tvm.apache.org/docs/install/from_source.html)) + - When building from source, the following need to be set in config.cmake: + - set(USE_CMSISNN ON) + - set(USE_MICRO ON) + - set(USE_LLVM ON) + - Installed from TLCPack(see [TLCPack](https://tlcpack.ai/)) + +You will need to update your PATH environment variable to include the path to cmake 3.19.5 and the FVP. +For example if you've installed these in ```/opt/arm``` , then you would do the following: +```bash +export PATH=/opt/arm/FVP_Corstone_SSE-300/models/Linux64_GCC-6.4:/opt/arm/cmake/bin:$PATH +``` + +Running the demo application +---------------------------- +Type the following command to run the bare metal text recognition application ([src/demo_bare_metal.c](./src/demo_bare_metal.c)): +```bash +./run_demo.sh +``` +If the Ethos(TM)-U platform and/or CMSIS have not been installed in /opt/arm/ethosu then +the locations for these can be specified as arguments to run_demo.sh, for example: + +```bash +./run_demo.sh --cmsis_path /home/tvm-user/cmsis \ +--ethosu_platform_path /home/tvm-user/ethosu/core_platform +``` + +This will: +- Download a PP-PicoDet text recognition model +- Use tvmc to compile the text recognition model for Cortex(R)-M55 CPU and CMSIS-NN +- Create a C header file inputs.c containing the image data as a C array +- Create a C header file outputs.c containing a C array where the output of inference will be stored +- Build the demo application +- Run the demo application on a Fixed Virtual Platform (FVP) based on Arm(R) Corstone(TM)-300 software +- The application will report the text on the image and the corresponding score. + +Using your own image +-------------------- +The create_image.py script takes a single argument on the command line which is the path of the +image to be converted into an array of bytes for consumption by the model. + +The demo can be modified to use an image of your choice by changing the following line in run_demo.sh + +```bash +python3 ./convert_image.py ../../demo/000000014439_640x640.jpg +``` + +Model description +----------------- +In this demo, the model we used is based on [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/picodet). Because of the excellent performance, PP-PicoDet are very suitable for deployment on mobile or CPU. And it is released by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). diff --git a/deploy/third_engine/demo_avh/arm-none-eabi-gcc.cmake b/deploy/third_engine/demo_avh/arm-none-eabi-gcc.cmake new file mode 100644 index 0000000000000000000000000000000000000000..415b3139be1b7f891c017dff0dc299b67f7ef2fe --- /dev/null +++ b/deploy/third_engine/demo_avh/arm-none-eabi-gcc.cmake @@ -0,0 +1,79 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +if (__TOOLCHAIN_LOADED) + return() +endif() +set(__TOOLCHAIN_LOADED TRUE) + +set(CMAKE_SYSTEM_NAME Generic) +set(CMAKE_C_COMPILER "arm-none-eabi-gcc") +set(CMAKE_CXX_COMPILER "arm-none-eabi-g++") +set(CMAKE_SYSTEM_PROCESSOR "cortex-m55" CACHE STRING "Select Arm(R) Cortex(R)-M architecture. (cortex-m0, cortex-m3, cortex-m33, cortex-m4, cortex-m55, cortex-m7, etc)") + +set(CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY) + +SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER) +SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY) +SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY) + +set(CMAKE_C_STANDARD 99) +set(CMAKE_CXX_STANDARD 14) + +# The system processor could for example be set to cortex-m33+nodsp+nofp. +set(__CPU_COMPILE_TARGET ${CMAKE_SYSTEM_PROCESSOR}) +string(REPLACE "+" ";" __CPU_FEATURES ${__CPU_COMPILE_TARGET}) +list(POP_FRONT __CPU_FEATURES CMAKE_SYSTEM_PROCESSOR) + +string(FIND ${__CPU_COMPILE_TARGET} "+" __OFFSET) +if(__OFFSET GREATER_EQUAL 0) + string(SUBSTRING ${__CPU_COMPILE_TARGET} ${__OFFSET} -1 CPU_FEATURES) +endif() + +# Add -mcpu to the compile options to override the -mcpu the CMake toolchain adds +add_compile_options(-mcpu=${__CPU_COMPILE_TARGET}) + +# Set floating point unit +if("${__CPU_COMPILE_TARGET}" MATCHES "\\+fp") + set(FLOAT hard) +elseif("${__CPU_COMPILE_TARGET}" MATCHES "\\+nofp") + set(FLOAT soft) +elseif("${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "cortex-m33" OR + "${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "cortex-m55") + set(FLOAT hard) +else() + set(FLOAT soft) +endif() + +add_compile_options(-mfloat-abi=${FLOAT}) +add_link_options(-mfloat-abi=${FLOAT}) + +# Link target +add_link_options(-mcpu=${__CPU_COMPILE_TARGET}) +add_link_options(-Xlinker -Map=output.map) + +# +# Compile options +# +set(cxx_flags "-fno-unwind-tables;-fno-rtti;-fno-exceptions") + +add_compile_options("-Wall;-Wextra;-Wsign-compare;-Wunused;-Wswitch-default;\ +-Wdouble-promotion;-Wredundant-decls;-Wshadow;-Wnull-dereference;\ +-Wno-format-extra-args;-Wno-unused-function;-Wno-unused-label;\ +-Wno-missing-field-initializers;-Wno-return-type;-Wno-format;-Wno-int-conversion" + "$<$:${cxx_flags}>" +) diff --git a/deploy/third_engine/demo_avh/convert_image.py b/deploy/third_engine/demo_avh/convert_image.py new file mode 100755 index 0000000000000000000000000000000000000000..a335b5aa7296db6a364baaf51f98b073c5de1429 --- /dev/null +++ b/deploy/third_engine/demo_avh/convert_image.py @@ -0,0 +1,97 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import os +import pathlib +import re +import sys +import cv2 +import math +from PIL import Image +import numpy as np + +def resize_norm_img(img, image_shape, padding=True): + imgC, imgH, imgW = image_shape + img = cv2.resize( + img, (imgW, imgH), interpolation=cv2.INTER_LINEAR) + img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + img = np.transpose(img, [2, 0, 1]) / 255 + img = np.expand_dims(img, 0) + img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) + img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) + img -= img_mean + img /= img_std + return img.astype(np.float32) + + +def create_header_file(name, tensor_name, tensor_data, output_path): + """ + This function generates a header file containing the data from the numpy array provided. + """ + file_path = pathlib.Path(f"{output_path}/" + name).resolve() + # Create header file with npy_data as a C array + raw_path = file_path.with_suffix(".h").resolve() + with open(raw_path, "a") as header_file: + header_file.write( + "\n" + + f"const size_t {tensor_name}_len = {tensor_data.size};\n" + + f'__attribute__((section(".data.tvm"), aligned(16))) float {tensor_name}[] = ' + ) + + header_file.write("{") + for i in np.ndindex(tensor_data.shape): + header_file.write(f"{tensor_data[i]}, ") + header_file.write("};\n\n") + + +def create_headers(image_name): + """ + This function generates C header files for the input and output arrays required to run inferences + """ + img_path = os.path.join("./", f"{image_name}") + + # Resize image to 32x320 + img = cv2.imread(img_path) + img = resize_norm_img(img, [3,32,320]) + img_data = img.astype("float32") + + # # Add the batch dimension, as we are expecting 4-dimensional input: NCHW. + img_data = np.expand_dims(img_data, axis=0) + + os.remove("./include/inputs.h") + os.remove("./include/outputs.h") + # Create input header file + create_header_file("inputs", "input", img_data, "./include") + # Create output header file + output_data = np.zeros([8500], np.float) + create_header_file( + "outputs", + "output0", + output_data, + "./include", + ) + output_data = np.zeros([170000], np.float) + create_header_file( + "outputs", + "output1", + output_data, + "./include", + ) + + +if __name__ == "__main__": + create_headers(sys.argv[1]) diff --git a/deploy/third_engine/demo_avh/corstone300.ld b/deploy/third_engine/demo_avh/corstone300.ld new file mode 100644 index 0000000000000000000000000000000000000000..1d2dd8805799fe78ee8b2696fe4ff7fab3d01f38 --- /dev/null +++ b/deploy/third_engine/demo_avh/corstone300.ld @@ -0,0 +1,295 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +/*------------------ Reference System Memories ------------- + +===================+============+=======+============+============+ + | Memory | Address | Size | CPU Access | NPU Access | + +===================+============+=======+============+============+ + | ITCM | 0x00000000 | 512KB | Yes (RO) | No | + +-------------------+------------+-------+------------+------------+ + | DTCM | 0x20000000 | 512KB | Yes (R/W) | No | + +-------------------+------------+-------+------------+------------+ + | SSE-300 SRAM | 0x21000000 | 2MB | Yes (R/W) | Yes (R/W) | + +-------------------+------------+-------+------------+------------+ + | Data SRAM | 0x01000000 | 2MB | Yes (R/W) | Yes (R/W) | + +-------------------+------------+-------+------------+------------+ + | DDR | 0x60000000 | 32MB | Yes (R/W) | Yes (R/W) | + +-------------------+------------+-------+------------+------------+ */ + +/*---------------------- ITCM Configuration ---------------------------------- + Flash Configuration + Flash Base Address <0x0-0xFFFFFFFF:8> + Flash Size (in Bytes) <0x0-0xFFFFFFFF:8> + + -----------------------------------------------------------------------------*/ +__ROM_BASE = 0x00000000; +__ROM_SIZE = 0x00080000; + +/*--------------------- DTCM RAM Configuration ---------------------------- + RAM Configuration + RAM Base Address <0x0-0xFFFFFFFF:8> + RAM Size (in Bytes) <0x0-0xFFFFFFFF:8> + + -----------------------------------------------------------------------------*/ +__RAM_BASE = 0x20000000; +__RAM_SIZE = 0x00080000; + +/*----------------------- Data SRAM Configuration ------------------------------ + Data SRAM Configuration + DATA_SRAM Base Address <0x0-0xFFFFFFFF:8> + DATA_SRAM Size (in Bytes) <0x0-0xFFFFFFFF:8> + + -----------------------------------------------------------------------------*/ +__DATA_SRAM_BASE = 0x01000000; +__DATA_SRAM_SIZE = 0x00200000; + +/*--------------------- Embedded SRAM Configuration ---------------------------- + SRAM Configuration + SRAM Base Address <0x0-0xFFFFFFFF:8> + SRAM Size (in Bytes) <0x0-0xFFFFFFFF:8> + + -----------------------------------------------------------------------------*/ +__SRAM_BASE = 0x21000000; +__SRAM_SIZE = 0x00200000; + +/*--------------------- Stack / Heap Configuration ---------------------------- + Stack / Heap Configuration + Stack Size (in Bytes) <0x0-0xFFFFFFFF:8> + Heap Size (in Bytes) <0x0-0xFFFFFFFF:8> + + -----------------------------------------------------------------------------*/ +__STACK_SIZE = 0x00008000; +__HEAP_SIZE = 0x00008000; + +/*--------------------- Embedded RAM Configuration ---------------------------- + DDR Configuration + DDR Base Address <0x0-0xFFFFFFFF:8> + DDR Size (in Bytes) <0x0-0xFFFFFFFF:8> + + -----------------------------------------------------------------------------*/ +__DDR_BASE = 0x60000000; +__DDR_SIZE = 0x02000000; + +/* + *-------------------- <<< end of configuration section >>> ------------------- + */ + +MEMORY +{ + ITCM (rx) : ORIGIN = __ROM_BASE, LENGTH = __ROM_SIZE + DTCM (rwx) : ORIGIN = __RAM_BASE, LENGTH = __RAM_SIZE + DATA_SRAM (rwx) : ORIGIN = __DATA_SRAM_BASE, LENGTH = __DATA_SRAM_SIZE + SRAM (rwx) : ORIGIN = __SRAM_BASE, LENGTH = __SRAM_SIZE + DDR (rwx) : ORIGIN = __DDR_BASE, LENGTH = __DDR_SIZE +} + +/* Linker script to place sections and symbol values. Should be used together + * with other linker script that defines memory regions ITCM and RAM. + * It references following symbols, which must be defined in code: + * Reset_Handler : Entry of reset handler + * + * It defines following symbols, which code can use without definition: + * __exidx_start + * __exidx_end + * __copy_table_start__ + * __copy_table_end__ + * __zero_table_start__ + * __zero_table_end__ + * __etext + * __data_start__ + * __preinit_array_start + * __preinit_array_end + * __init_array_start + * __init_array_end + * __fini_array_start + * __fini_array_end + * __data_end__ + * __bss_start__ + * __bss_end__ + * __end__ + * end + * __HeapLimit + * __StackLimit + * __StackTop + * __stack + */ +ENTRY(Reset_Handler) + +SECTIONS +{ + /* .ddr is placed before .text so that .rodata.tvm is encountered before .rodata* */ + .ddr : + { + . = ALIGN (16); + *(.rodata.tvm) + . = ALIGN (16); + *(.data.tvm); + . = ALIGN(16); + } > DDR + + .text : + { + KEEP(*(.vectors)) + *(.text*) + + KEEP(*(.init)) + KEEP(*(.fini)) + + /* .ctors */ + *crtbegin.o(.ctors) + *crtbegin?.o(.ctors) + *(EXCLUDE_FILE(*crtend?.o *crtend.o) .ctors) + *(SORT(.ctors.*)) + *(.ctors) + + /* .dtors */ + *crtbegin.o(.dtors) + *crtbegin?.o(.dtors) + *(EXCLUDE_FILE(*crtend?.o *crtend.o) .dtors) + *(SORT(.dtors.*)) + *(.dtors) + + *(.rodata*) + + KEEP(*(.eh_frame*)) + } > ITCM + + .ARM.extab : + { + *(.ARM.extab* .gnu.linkonce.armextab.*) + } > ITCM + + __exidx_start = .; + .ARM.exidx : + { + *(.ARM.exidx* .gnu.linkonce.armexidx.*) + } > ITCM + __exidx_end = .; + + .copy.table : + { + . = ALIGN(4); + __copy_table_start__ = .; + LONG (__etext) + LONG (__data_start__) + LONG (__data_end__ - __data_start__) + /* Add each additional data section here */ + __copy_table_end__ = .; + } > ITCM + + .zero.table : + { + . = ALIGN(4); + __zero_table_start__ = .; + __zero_table_end__ = .; + } > ITCM + + /** + * Location counter can end up 2byte aligned with narrow Thumb code but + * __etext is assumed by startup code to be the LMA of a section in DTCM + * which must be 4byte aligned + */ + __etext = ALIGN (4); + + .sram : + { + . = ALIGN(16); + } > SRAM AT > SRAM + + .data : AT (__etext) + { + __data_start__ = .; + *(vtable) + *(.data) + *(.data.*) + + . = ALIGN(4); + /* preinit data */ + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP(*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + + . = ALIGN(4); + /* init data */ + PROVIDE_HIDDEN (__init_array_start = .); + KEEP(*(SORT(.init_array.*))) + KEEP(*(.init_array)) + PROVIDE_HIDDEN (__init_array_end = .); + + + . = ALIGN(4); + /* finit data */ + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP(*(SORT(.fini_array.*))) + KEEP(*(.fini_array)) + PROVIDE_HIDDEN (__fini_array_end = .); + + KEEP(*(.jcr*)) + . = ALIGN(4); + /* All data end */ + __data_end__ = .; + + } > DTCM + + .bss.NoInit : + { + . = ALIGN(16); + *(.bss.NoInit) + . = ALIGN(16); + } > DDR AT > DDR + + .bss : + { + . = ALIGN(4); + __bss_start__ = .; + *(.bss) + *(.bss.*) + *(COMMON) + . = ALIGN(4); + __bss_end__ = .; + } > DTCM AT > DTCM + + .data_sram : + { + . = ALIGN(16); + } > DATA_SRAM + + .heap (COPY) : + { + . = ALIGN(8); + __end__ = .; + PROVIDE(end = .); + . = . + __HEAP_SIZE; + . = ALIGN(8); + __HeapLimit = .; + } > DTCM + + .stack (ORIGIN(DTCM) + LENGTH(DTCM) - __STACK_SIZE) (COPY) : + { + . = ALIGN(8); + __StackLimit = .; + . = . + __STACK_SIZE; + . = ALIGN(8); + __StackTop = .; + } > DTCM + PROVIDE(__stack = __StackTop); + + /* Check if data + stack exceeds DTCM limit */ + ASSERT(__StackLimit >= __bss_end__, "region DTCM overflowed with stack") +} diff --git a/deploy/third_engine/demo_avh/include/crt_config.h b/deploy/third_engine/demo_avh/include/crt_config.h new file mode 100644 index 0000000000000000000000000000000000000000..2fd0ead60697beb86d55dfadde5070b7ae5afd3e --- /dev/null +++ b/deploy/third_engine/demo_avh/include/crt_config.h @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TVM_RUNTIME_CRT_CONFIG_H_ +#define TVM_RUNTIME_CRT_CONFIG_H_ + +/*! Log level of the CRT runtime */ +#define TVM_CRT_LOG_LEVEL TVM_CRT_LOG_LEVEL_DEBUG + +#endif // TVM_RUNTIME_CRT_CONFIG_H_ diff --git a/deploy/third_engine/demo_avh/include/tvm_runtime.h b/deploy/third_engine/demo_avh/include/tvm_runtime.h new file mode 100644 index 0000000000000000000000000000000000000000..0978d7adfa039bf188aa8d17a43a7e61f1adecc6 --- /dev/null +++ b/deploy/third_engine/demo_avh/include/tvm_runtime.h @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include +#include +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +void __attribute__((noreturn)) TVMPlatformAbort(tvm_crt_error_t error_code) { + printf("TVMPlatformAbort: %d\n", error_code); + printf("EXITTHESIM\n"); + exit(-1); +} + +tvm_crt_error_t TVMPlatformMemoryAllocate(size_t num_bytes, DLDevice dev, + void **out_ptr) { + return kTvmErrorFunctionCallNotImplemented; +} + +tvm_crt_error_t TVMPlatformMemoryFree(void *ptr, DLDevice dev) { + return kTvmErrorFunctionCallNotImplemented; +} + +void TVMLogf(const char *msg, ...) { + va_list args; + va_start(args, msg); + vfprintf(stdout, msg, args); + va_end(args); +} + +TVM_DLL int TVMFuncRegisterGlobal(const char *name, TVMFunctionHandle f, + int override) { + return 0; +} + +#ifdef __cplusplus +} +#endif diff --git a/deploy/third_engine/demo_avh/requirements.txt b/deploy/third_engine/demo_avh/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..992002efbe1e197bb33d07e54a2722950911313e --- /dev/null +++ b/deploy/third_engine/demo_avh/requirements.txt @@ -0,0 +1,3 @@ +paddlepaddle +numpy +opencv-python diff --git a/deploy/third_engine/demo_avh/run_demo.sh b/deploy/third_engine/demo_avh/run_demo.sh new file mode 100755 index 0000000000000000000000000000000000000000..86607492629b251089f596b1411b0db0f52bb88c --- /dev/null +++ b/deploy/third_engine/demo_avh/run_demo.sh @@ -0,0 +1,151 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +export PATH=/opt/arm/FVP_Corstone_SSE-300/models/Linux64_GCC-6.4:/opt/arm/cmake/bin:$PATH +set -e +set -u +set -o pipefail + +# Show usage +function show_usage() { + cat <&2 + show_usage >&2 + exit 1 + fi + ;; + + --ethosu_platform_path) + if [ $# -gt 1 ] + then + export ETHOSU_PLATFORM_PATH="$2" + shift 2 + else + echo 'ERROR: --ethosu_platform_path requires a non-empty argument' >&2 + show_usage >&2 + exit 1 + fi + ;; + + --fvp_path) + if [ $# -gt 1 ] + then + export PATH="$2/models/Linux64_GCC-6.4:$PATH" + shift 2 + else + echo 'ERROR: --fvp_path requires a non-empty argument' >&2 + show_usage >&2 + exit 1 + fi + ;; + + --cmake_path) + if [ $# -gt 1 ] + then + export CMAKE="$2" + shift 2 + else + echo 'ERROR: --cmake_path requires a non-empty argument' >&2 + show_usage >&2 + exit 1 + fi + ;; + + -*|--*) + echo "Error: Unknown flag: $1" >&2 + show_usage >&2 + exit 1 + ;; + esac +done + + +# Directories +script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" + +# Make build directory +make cleanall +mkdir -p build +cd build + +# Compile model for Arm(R) Cortex(R)-M55 CPU and CMSIS-NN +# An alternative to using "python3 -m tvm.driver.tvmc" is to call +# "tvmc" directly once TVM has been pip installed. +python3 -m tvm.driver.tvmc compile --target=cmsis-nn,c \ + --target-cmsis-nn-mcpu=cortex-m55 \ + --target-c-mcpu=cortex-m55 \ + --runtime=crt \ + --executor=aot \ + --executor-aot-interface-api=c \ + --executor-aot-unpacked-api=1 \ + --pass-config tir.usmp.enable=1 \ + --pass-config tir.usmp.algorithm=hill_climb \ + --pass-config tir.disable_storage_rewrite=1 \ + --pass-config tir.disable_vectorize=1 ../models/picodet_s_320_coco_lcnet_no_nms/model \ + --output-format=mlf \ + --model-format=paddle \ + --module-name=picodet \ + --input-shapes image:[1,3,320,320] \ + --output=picodet.tar +tar -xf picodet.tar + + +# Create C header files +cd .. +python3 ./convert_image.py ../../demo/000000014439_640x640.jpg + +# Build demo executable +echo "Build demo executable..." +cd ${script_dir} +echo ${script_dir} +make +echo "End build demo executable..." + +# Run demo executable on the FVP +FVP_Corstone_SSE-300_Ethos-U55 -C cpu0.CFGDTCMSZ=15 \ +-C cpu0.CFGITCMSZ=15 -C mps3_board.uart0.out_file=\"-\" -C mps3_board.uart0.shutdown_tag=\"EXITTHESIM\" \ +-C mps3_board.visualisation.disable-visualisation=1 -C mps3_board.telnetterminal0.start_telnet=0 \ +-C mps3_board.telnetterminal1.start_telnet=0 -C mps3_board.telnetterminal2.start_telnet=0 -C mps3_board.telnetterminal5.start_telnet=0 \ +./build/demo diff --git a/deploy/third_engine/demo_avh/src/demo_bare_metal.c b/deploy/third_engine/demo_avh/src/demo_bare_metal.c new file mode 100644 index 0000000000000000000000000000000000000000..07ed5bebe2c266bde5b59b101c1df1a54ba2ef28 --- /dev/null +++ b/deploy/third_engine/demo_avh/src/demo_bare_metal.c @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include +#include +#include + +#include "uart.h" + +// Header files generated by convert_image.py +#include "inputs.h" +#include "outputs.h" + +int main(int argc, char **argv) { + uart_init(); + printf("Starting PicoDet inference:\n"); + struct tvmgen_picodet_outputs rec_outputs = { + .output0 = output0, .output1 = output1, + }; + struct tvmgen_picodet_inputs rec_inputs = { + .image = input, + }; + + tvmgen_picodet_run(&rec_inputs, &rec_outputs); + + // post process + for (int i = 0; i < output0_len / 4; i++) { + float score = 0; + int32_t class = 0; + for (int j = 0; j < 80; j++) { + if (output1[i + j * 2125] > score) { + score = output1[i + j * 2125]; + class = j; + } + } + if (score > 0.1 && output0[i * 4] > 0 && output0[i * 4 + 1] > 0) { + printf("box: %f, %f, %f, %f, class: %d, score: %f\n", output0[i * 4] * 2, + output0[i * 4 + 1] * 2, output0[i * 4 + 2] * 2, + output0[i * 4 + 3] * 2, class, score); + } + } + return 0; +} diff --git a/deploy/third_engine/demo_mnn/CMakeLists.txt b/deploy/third_engine/demo_mnn/CMakeLists.txt index 07d9b7f868136ba09cf8f3468cce5b42895b8d95..9afa8cfc011587977b4ef3ed13bb0b050e990fa0 100644 --- a/deploy/third_engine/demo_mnn/CMakeLists.txt +++ b/deploy/third_engine/demo_mnn/CMakeLists.txt @@ -2,13 +2,14 @@ cmake_minimum_required(VERSION 3.9) project(picodet-mnn) set(CMAKE_CXX_STANDARD 17) +set(MNN_DIR PATHS "./mnn") # find_package(OpenCV REQUIRED PATHS "/work/dependence/opencv/opencv-3.4.3/build") find_package(OpenCV REQUIRED) include_directories( - /path/to/MNN/include/MNN - /path/to/MNN/include - . + ${MNN_DIR}/include + ${MNN_DIR}/include/MNN + ${CMAKE_SOURCE_DIR} ) link_directories(mnn/lib) diff --git a/deploy/third_engine/demo_mnn/README.md b/deploy/third_engine/demo_mnn/README.md index 78a0f3a79febce170e06058c6c5f6233d0a5c201..ac11a8e18fdc53aa7eebb57fa1ba2d4680a9dcf3 100644 --- a/deploy/third_engine/demo_mnn/README.md +++ b/deploy/third_engine/demo_mnn/README.md @@ -1,105 +1,89 @@ # PicoDet MNN Demo -This fold provides PicoDet inference code using -[Alibaba's MNN framework](https://github.com/alibaba/MNN). Most of the implements in -this fold are same as *demo_ncnn*. +本Demo提供的预测代码是根据[Alibaba's MNN framework](https://github.com/alibaba/MNN) 推理库预测的。 -## Install MNN +## C++ Demo -### Python library - -Just run: - -``` shell -pip install MNN +- 第一步:根据[MNN官方编译文档](https://www.yuque.com/mnn/en/build_linux) 编译生成预测库. +- 第二步:编译或下载得到OpenCV库,可参考OpenCV官网,为了方便如果环境是gcc8.2 x86环境,可直接下载以下库: +```shell +wget https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz +tar -xf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz ``` -### C++ library - -Please follow the [official document](https://www.yuque.com/mnn/en/build_linux) to build MNN engine. -- Create picodet_m_416_coco.onnx +- 第三步:准备模型 ```shell - modelName=picodet_m_416_coco - # export model + modelName=picodet_s_320_coco_lcnet + # 导出Inference model python tools/export_model.py \ -c configs/picodet/${modelName}.yml \ -o weights=${modelName}.pdparams \ --output_dir=inference_model - # convert to onnx + # 转换到ONNX paddle2onnx --model_dir inference_model/${modelName} \ --model_filename model.pdmodel \ --params_filename model.pdiparams \ --opset_version 11 \ --save_file ${modelName}.onnx - # onnxsim + # 简化模型 python -m onnxsim ${modelName}.onnx ${modelName}_processed.onnx + # 将模型转换至MNN格式 + python -m MNN.tools.mnnconvert -f ONNX --modelFile picodet_s_320_lcnet_processed.onnx --MNNModel picodet_s_320_lcnet.mnn ``` +为了快速测试,可直接下载:[picodet_s_320_lcnet.mnn](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet.mnn)(不带后处理)。 -- Convert model - ``` shell - python -m MNN.tools.mnnconvert -f ONNX --modelFile picodet-416.onnx --MNNModel picodet-416.mnn - ``` -Here are converted model [download link](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416.mnn). +**注意:**由于MNN里,Matmul算子的输入shape如果不一致计算有问题,带后处理的Demo正在升级中,很快发布。 -## Build - -The python code *demo_mnn.py* can run directly and independently without main PicoDet repo. -`PicoDetONNX` and `PicoDetTorch` are two classes used to check the similarity of MNN inference results -with ONNX model and Pytorch model. They can be remove with no side effects. - -For C++ code, replace `libMNN.so` under *./mnn/lib* with the one you just compiled, modify OpenCV path and MNN path at CMake file, -and run +## 编译可执行程序 +- 第一步:导入lib包 +``` +mkdir mnn && cd mnn && mkdir lib +cp /path/to/MNN/build/libMNN.so . +cd .. +cp -r /path/to/MNN/include . +``` +- 第二步:修改CMakeLists.txt中OpenCV和MNN的路径 +- 第三步:开始编译 ``` shell mkdir build && cd build cmake .. make ``` +如果在build目录下生成`picodet-mnn`可执行文件,就证明成功了。 -Note that a flag at `main.cpp` is used to control whether to show the detection result or save it into a fold. - -``` c++ -#define __SAVE_RESULT__ // if defined save drawed results to ../results, else show it in windows -``` - -## Run - -### Python - -`demo_mnn.py` provide an inference class `PicoDetMNN` that combines preprocess, post process, visualization. -Besides it can be used in command line with the form: +## 开始运行 +首先新建预测结果存放目录: ```shell -demo_mnn.py [-h] [--model_path MODEL_PATH] [--cfg_path CFG_PATH] - [--img_fold IMG_FOLD] [--result_fold RESULT_FOLD] - [--input_shape INPUT_SHAPE INPUT_SHAPE] - [--backend {MNN,ONNX,torch}] +cp -r ../demo_onnxruntime/imgs . +cd build +mkdir ../results ``` -For example: - +- 预测一张图片 ``` shell -# run MNN 416 model -python ./demo_mnn.py --model_path ../model/picodet-416.mnn --img_fold ../imgs --result_fold ../results -# run MNN 320 model -python ./demo_mnn.py --model_path ../model/picodet-320.mnn --input_shape 320 320 --backend MNN -# run onnx model -python ./demo_mnn.py --model_path ../model/sim.onnx --backend ONNX +./picodet-mnn 0 ../picodet_s_320_lcnet_3.mnn 320 320 ../imgs/dog.jpg ``` -### C++ - -C++ inference interface is same with NCNN code, to detect images in a fold, run: +-测试速度Benchmark ``` shell -./picodet-mnn "1" "../imgs/test.jpg" +./picodet-mnn 1 ../picodet_s_320_lcnet.mnn 320 320 ``` -For speed benchmark +## FAQ -``` shell -./picodet-mnn "3" "0" +- 预测结果精度不对: +请先确认模型输入shape是否对齐,并且模型输出name是否对齐,不带后处理的PicoDet增强版模型输出name如下: +```shell +# 分类分支 | 检测分支 +{"transpose_0.tmp_0", "transpose_1.tmp_0"}, +{"transpose_2.tmp_0", "transpose_3.tmp_0"}, +{"transpose_4.tmp_0", "transpose_5.tmp_0"}, +{"transpose_6.tmp_0", "transpose_7.tmp_0"}, ``` +可使用[netron](https://netron.app)查看具体name,并修改`picodet_mnn.hpp`中相应`non_postprocess_heads_info`数组。 ## Reference [MNN](https://github.com/alibaba/MNN) diff --git a/deploy/third_engine/demo_mnn/main.cpp b/deploy/third_engine/demo_mnn/main.cpp index 52c977343b55b8ff4c7be305729ebe23c80a565e..5737368d5473a75ced391ad2e28883427a942795 100644 --- a/deploy/third_engine/demo_mnn/main.cpp +++ b/deploy/third_engine/demo_mnn/main.cpp @@ -11,7 +11,6 @@ // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. -// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn #include "picodet_mnn.hpp" #include @@ -19,354 +18,186 @@ #include #include -#define __SAVE_RESULT__ // if defined save drawed results to ../results, else show it in windows +#define __SAVE_RESULT__ // if defined save drawed results to ../results, else + // show it in windows struct object_rect { - int x; - int y; - int width; - int height; + int x; + int y; + int width; + int height; }; -int resize_uniform(cv::Mat& src, cv::Mat& dst, cv::Size dst_size, object_rect& effect_area) -{ - int w = src.cols; - int h = src.rows; - int dst_w = dst_size.width; - int dst_h = dst_size.height; - dst = cv::Mat(cv::Size(dst_w, dst_h), CV_8UC3, cv::Scalar(0)); - - float ratio_src = w * 1.0 / h; - float ratio_dst = dst_w * 1.0 / dst_h; - - int tmp_w = 0; - int tmp_h = 0; - if (ratio_src > ratio_dst) { - tmp_w = dst_w; - tmp_h = floor((dst_w * 1.0 / w) * h); - } - else if (ratio_src < ratio_dst) { - tmp_h = dst_h; - tmp_w = floor((dst_h * 1.0 / h) * w); - } - else { - cv::resize(src, dst, dst_size); - effect_area.x = 0; - effect_area.y = 0; - effect_area.width = dst_w; - effect_area.height = dst_h; - return 0; - } - cv::Mat tmp; - cv::resize(src, tmp, cv::Size(tmp_w, tmp_h)); - - if (tmp_w != dst_w) { - int index_w = floor((dst_w - tmp_w) / 2.0); - for (int i = 0; i < dst_h; i++) { - memcpy(dst.data + i * dst_w * 3 + index_w * 3, tmp.data + i * tmp_w * 3, tmp_w * 3); - } - effect_area.x = index_w; - effect_area.y = 0; - effect_area.width = tmp_w; - effect_area.height = tmp_h; - } - else if (tmp_h != dst_h) { - int index_h = floor((dst_h - tmp_h) / 2.0); - memcpy(dst.data + index_h * dst_w * 3, tmp.data, tmp_w * tmp_h * 3); - effect_area.x = 0; - effect_area.y = index_h; - effect_area.width = tmp_w; - effect_area.height = tmp_h; - } - else { - printf("error\n"); +std::vector GenerateColorMap(int num_class) { + auto colormap = std::vector(3 * num_class, 0); + for (int i = 0; i < num_class; ++i) { + int j = 0; + int lab = i; + while (lab) { + colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j)); + colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)); + colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)); + ++j; + lab >>= 3; } - return 0; + } + return colormap; } -const int color_list[80][3] = -{ - {216 , 82 , 24}, - {236 ,176 , 31}, - {125 , 46 ,141}, - {118 ,171 , 47}, - { 76 ,189 ,237}, - {238 , 19 , 46}, - { 76 , 76 , 76}, - {153 ,153 ,153}, - {255 , 0 , 0}, - {255 ,127 , 0}, - {190 ,190 , 0}, - { 0 ,255 , 0}, - { 0 , 0 ,255}, - {170 , 0 ,255}, - { 84 , 84 , 0}, - { 84 ,170 , 0}, - { 84 ,255 , 0}, - {170 , 84 , 0}, - {170 ,170 , 0}, - {170 ,255 , 0}, - {255 , 84 , 0}, - {255 ,170 , 0}, - {255 ,255 , 0}, - { 0 , 84 ,127}, - { 0 ,170 ,127}, - { 0 ,255 ,127}, - { 84 , 0 ,127}, - { 84 , 84 ,127}, - { 84 ,170 ,127}, - { 84 ,255 ,127}, - {170 , 0 ,127}, - {170 , 84 ,127}, - {170 ,170 ,127}, - {170 ,255 ,127}, - {255 , 0 ,127}, - {255 , 84 ,127}, - {255 ,170 ,127}, - {255 ,255 ,127}, - { 0 , 84 ,255}, - { 0 ,170 ,255}, - { 0 ,255 ,255}, - { 84 , 0 ,255}, - { 84 , 84 ,255}, - { 84 ,170 ,255}, - { 84 ,255 ,255}, - {170 , 0 ,255}, - {170 , 84 ,255}, - {170 ,170 ,255}, - {170 ,255 ,255}, - {255 , 0 ,255}, - {255 , 84 ,255}, - {255 ,170 ,255}, - { 42 , 0 , 0}, - { 84 , 0 , 0}, - {127 , 0 , 0}, - {170 , 0 , 0}, - {212 , 0 , 0}, - {255 , 0 , 0}, - { 0 , 42 , 0}, - { 0 , 84 , 0}, - { 0 ,127 , 0}, - { 0 ,170 , 0}, - { 0 ,212 , 0}, - { 0 ,255 , 0}, - { 0 , 0 , 42}, - { 0 , 0 , 84}, - { 0 , 0 ,127}, - { 0 , 0 ,170}, - { 0 , 0 ,212}, - { 0 , 0 ,255}, - { 0 , 0 , 0}, - { 36 , 36 , 36}, - { 72 , 72 , 72}, - {109 ,109 ,109}, - {145 ,145 ,145}, - {182 ,182 ,182}, - {218 ,218 ,218}, - { 0 ,113 ,188}, - { 80 ,182 ,188}, - {127 ,127 , 0}, -}; - -void draw_bboxes(const cv::Mat& bgr, const std::vector& bboxes, object_rect effect_roi, std::string save_path="None") -{ - static const char* class_names[] = { "person", "bicycle", "car", "motorcycle", "airplane", "bus", - "train", "truck", "boat", "traffic light", "fire hydrant", - "stop sign", "parking meter", "bench", "bird", "cat", "dog", - "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", - "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", - "skis", "snowboard", "sports ball", "kite", "baseball bat", - "baseball glove", "skateboard", "surfboard", "tennis racket", - "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", - "banana", "apple", "sandwich", "orange", "broccoli", "carrot", - "hot dog", "pizza", "donut", "cake", "chair", "couch", - "potted plant", "bed", "dining table", "toilet", "tv", "laptop", - "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", - "toaster", "sink", "refrigerator", "book", "clock", "vase", - "scissors", "teddy bear", "hair drier", "toothbrush" - }; - - cv::Mat image = bgr.clone(); - int src_w = image.cols; - int src_h = image.rows; - int dst_w = effect_roi.width; - int dst_h = effect_roi.height; - float width_ratio = (float)src_w / (float)dst_w; - float height_ratio = (float)src_h / (float)dst_h; - - - for (size_t i = 0; i < bboxes.size(); i++) - { - const BoxInfo& bbox = bboxes[i]; - cv::Scalar color = cv::Scalar(color_list[bbox.label][0], color_list[bbox.label][1], color_list[bbox.label][2]); - cv::rectangle(image, cv::Rect(cv::Point((bbox.x1 - effect_roi.x) * width_ratio, (bbox.y1 - effect_roi.y) * height_ratio), - cv::Point((bbox.x2 - effect_roi.x) * width_ratio, (bbox.y2 - effect_roi.y) * height_ratio)), color); - - char text[256]; - sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100); - - int baseLine = 0; - cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine); - - int x = (bbox.x1 - effect_roi.x) * width_ratio; - int y = (bbox.y1 - effect_roi.y) * height_ratio - label_size.height - baseLine; - if (y < 0) - y = 0; - if (x + label_size.width > image.cols) - x = image.cols - label_size.width; - - cv::rectangle(image, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)), - color, -1); - - cv::putText(image, text, cv::Point(x, y + label_size.height), - cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255)); - } - - if (save_path == "None") - { - cv::imshow("image", image); - } - else - { - cv::imwrite(save_path, image); - std::cout << save_path << std::endl; - } -} - - -int image_demo(PicoDet &detector, const char* imagepath) -{ - std::vector filenames; - cv::glob(imagepath, filenames, false); - - for (auto img_name : filenames) - { - cv::Mat image = cv::imread(img_name); - if (image.empty()) - { - fprintf(stderr, "cv::imread %s failed\n", img_name.c_str()); - return -1; - } - object_rect effect_roi; - cv::Mat resized_img; - resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi); - std::vector results; - detector.detect(resized_img, results); - - #ifdef __SAVE_RESULT__ - std::string save_path = img_name; - draw_bboxes(image, results, effect_roi, save_path.replace(3, 4, "results")); - #else - draw_bboxes(image, results, effect_roi); - cv::waitKey(0); - #endif - - } - return 0; +void draw_bboxes(const cv::Mat &im, const std::vector &bboxes, + std::string save_path = "None") { + static const char *class_names[] = { + "person", "bicycle", "car", + "motorcycle", "airplane", "bus", + "train", "truck", "boat", + "traffic light", "fire hydrant", "stop sign", + "parking meter", "bench", "bird", + "cat", "dog", "horse", + "sheep", "cow", "elephant", + "bear", "zebra", "giraffe", + "backpack", "umbrella", "handbag", + "tie", "suitcase", "frisbee", + "skis", "snowboard", "sports ball", + "kite", "baseball bat", "baseball glove", + "skateboard", "surfboard", "tennis racket", + "bottle", "wine glass", "cup", + "fork", "knife", "spoon", + "bowl", "banana", "apple", + "sandwich", "orange", "broccoli", + "carrot", "hot dog", "pizza", + "donut", "cake", "chair", + "couch", "potted plant", "bed", + "dining table", "toilet", "tv", + "laptop", "mouse", "remote", + "keyboard", "cell phone", "microwave", + "oven", "toaster", "sink", + "refrigerator", "book", "clock", + "vase", "scissors", "teddy bear", + "hair drier", "toothbrush"}; + + cv::Mat image = im.clone(); + int src_w = image.cols; + int src_h = image.rows; + int thickness = 2; + auto colormap = GenerateColorMap(sizeof(class_names)); + + for (size_t i = 0; i < bboxes.size(); i++) { + const BoxInfo &bbox = bboxes[i]; + std::cout << bbox.x1 << ". " << bbox.y1 << ". " << bbox.x2 << ". " + << bbox.y2 << ". " << std::endl; + int c1 = colormap[3 * bbox.label + 0]; + int c2 = colormap[3 * bbox.label + 1]; + int c3 = colormap[3 * bbox.label + 2]; + cv::Scalar color = cv::Scalar(c1, c2, c3); + // cv::Scalar color = cv::Scalar(0, 0, 255); + cv::rectangle(image, cv::Rect(cv::Point(bbox.x1, bbox.y1), + cv::Point(bbox.x2, bbox.y2)), + color, 1, cv::LINE_AA); + + char text[256]; + sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100); + + int baseLine = 0; + cv::Size label_size = + cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine); + + int x = bbox.x1; + int y = bbox.y1 - label_size.height - baseLine; + if (y < 0) + y = 0; + if (x + label_size.width > image.cols) + x = image.cols - label_size.width; + + cv::rectangle(image, cv::Rect(cv::Point(x, y), + cv::Size(label_size.width, + label_size.height + baseLine)), + color, -1); + + cv::putText(image, text, cv::Point(x, y + label_size.height), + cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255), 1, + cv::LINE_AA); + } + + if (save_path == "None") { + cv::imshow("image", image); + } else { + cv::imwrite(save_path, image); + std::cout << save_path << std::endl; + } } -int webcam_demo(PicoDet& detector, int cam_id) -{ - cv::Mat image; - cv::VideoCapture cap(cam_id); +int image_demo(PicoDet &detector, const char *imagepath) { + std::vector filenames; + cv::glob(imagepath, filenames, false); - while (true) - { - cap >> image; - object_rect effect_roi; - cv::Mat resized_img; - resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi); - std::vector results; - detector.detect(resized_img, results); - draw_bboxes(image, results, effect_roi); - cv::waitKey(1); + for (auto img_name : filenames) { + cv::Mat image = cv::imread(img_name, cv::IMREAD_COLOR); + if (image.empty()) { + fprintf(stderr, "cv::imread %s failed\n", img_name.c_str()); + return -1; } - return 0; + std::vector results; + detector.detect(image, results, false); + std::cout << "detect done." << std::endl; + +#ifdef __SAVE_RESULT__ + std::string save_path = img_name; + draw_bboxes(image, results, save_path.replace(3, 4, "results")); +#else + draw_bboxes(image, results); + cv::waitKey(0); +#endif + } + return 0; } -int video_demo(PicoDet& detector, const char* path) -{ - cv::Mat image; - cv::VideoCapture cap(path); - - while (true) - { - cap >> image; - object_rect effect_roi; - cv::Mat resized_img; - resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi); - std::vector results; - detector.detect(resized_img, results); - draw_bboxes(image, results, effect_roi); - cv::waitKey(1); +int benchmark(PicoDet &detector, int width, int height) { + int loop_num = 100; + int warm_up = 8; + + double time_min = DBL_MAX; + double time_max = -DBL_MAX; + double time_avg = 0; + cv::Mat image(width, height, CV_8UC3, cv::Scalar(1, 1, 1)); + for (int i = 0; i < warm_up + loop_num; i++) { + auto start = std::chrono::steady_clock::now(); + std::vector results; + detector.detect(image, results, false); + auto end = std::chrono::steady_clock::now(); + + std::chrono::duration elapsed = end - start; + double time = elapsed.count(); + if (i >= warm_up) { + time_min = (std::min)(time_min, time); + time_max = (std::max)(time_max, time); + time_avg += time; } - return 0; + } + time_avg /= loop_num; + fprintf(stderr, "%20s min = %7.2f max = %7.2f avg = %7.2f\n", "picodet", + time_min, time_max, time_avg); + return 0; } -int benchmark(PicoDet& detector) -{ - int loop_num = 100; - int warm_up = 8; - - double time_min = DBL_MAX; - double time_max = -DBL_MAX; - double time_avg = 0; - cv::Mat image(320, 320, CV_8UC3, cv::Scalar(1, 1, 1)); - for (int i = 0; i < warm_up + loop_num; i++) - { - auto start = std::chrono::steady_clock::now(); - std::vector results; - detector.detect(image, results); - auto end = std::chrono::steady_clock::now(); - - std::chrono::duration elapsed = end - start; - double time = elapsed.count(); - if (i >= warm_up) - { - time_min = (std::min)(time_min, time); - time_max = (std::max)(time_max, time); - time_avg += time; - } - } - time_avg /= loop_num; - fprintf(stderr, "%20s min = %7.2f max = %7.2f avg = %7.2f\n", "picodet", time_min, time_max, time_avg); - return 0; -} - - -int main(int argc, char** argv) -{ - if (argc != 3) - { - fprintf(stderr, "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; \n For benchmark, mode=3 path=0.\n", argv[0]); - return -1; - } - PicoDet detector = PicoDet("../weight/picodet-416.mnn", 416, 416, 4, 0.45, 0.3); - int mode = atoi(argv[1]); - switch (mode) - { - case 0:{ - int cam_id = atoi(argv[2]); - webcam_demo(detector, cam_id); - break; - } - case 1:{ - const char* images = argv[2]; - image_demo(detector, images); - break; - } - case 2:{ - const char* path = argv[2]; - video_demo(detector, path); - break; - } - case 3:{ - benchmark(detector); - break; - } - default:{ - fprintf(stderr, "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; \n For benchmark, mode=3 path=0.\n", argv[0]); - break; - } +int main(int argc, char **argv) { + int mode = atoi(argv[1]); + std::string model_path = argv[2]; + int height = 320; + int width = 320; + if (argc == 4) { + height = atoi(argv[3]); + width = atoi(argv[4]); + } + PicoDet detector = PicoDet(model_path, width, height, 4, 0.45, 0.3); + if (mode == 1) { + benchmark(detector, width, height); + } else { + if (argc != 5) { + std::cout << "Must set image file, such as ./picodet-mnn 0 " + "../picodet_s_320_lcnet.mnn 320 320 img.jpg" + << std::endl; } + const char *images = argv[5]; + image_demo(detector, images); + } } diff --git a/deploy/third_engine/demo_mnn/picodet_mnn.cpp b/deploy/third_engine/demo_mnn/picodet_mnn.cpp index d6cb9c9fd3f45eb2ef792819376001041c71e3df..a315f14a9e29f0958a2707a4e09fcdb78bd12b6c 100644 --- a/deploy/third_engine/demo_mnn/picodet_mnn.cpp +++ b/deploy/third_engine/demo_mnn/picodet_mnn.cpp @@ -44,7 +44,8 @@ PicoDet::~PicoDet() { PicoDet_interpreter->releaseSession(PicoDet_session); } -int PicoDet::detect(cv::Mat &raw_image, std::vector &result_list) { +int PicoDet::detect(cv::Mat &raw_image, std::vector &result_list, + bool has_postprocess) { if (raw_image.empty()) { std::cout << "image is empty ,please check!" << std::endl; return -1; @@ -70,22 +71,57 @@ int PicoDet::detect(cv::Mat &raw_image, std::vector &result_list) { std::vector> results; results.resize(num_class); - for (const auto &head_info : heads_info) { - MNN::Tensor *tensor_scores = PicoDet_interpreter->getSessionOutput( - PicoDet_session, head_info.cls_layer.c_str()); - MNN::Tensor *tensor_boxes = PicoDet_interpreter->getSessionOutput( - PicoDet_session, head_info.dis_layer.c_str()); - - MNN::Tensor tensor_scores_host(tensor_scores, - tensor_scores->getDimensionType()); - tensor_scores->copyToHostTensor(&tensor_scores_host); - - MNN::Tensor tensor_boxes_host(tensor_boxes, - tensor_boxes->getDimensionType()); - tensor_boxes->copyToHostTensor(&tensor_boxes_host); - - decode_infer(&tensor_scores_host, &tensor_boxes_host, head_info.stride, - score_threshold, results); + if (has_postprocess) { + auto bbox_out_tensor = PicoDet_interpreter->getSessionOutput( + PicoDet_session, nms_heads_info[0].c_str()); + auto class_out_tensor = PicoDet_interpreter->getSessionOutput( + PicoDet_session, nms_heads_info[1].c_str()); + // bbox branch + auto tensor_bbox_host = + new MNN::Tensor(bbox_out_tensor, MNN::Tensor::CAFFE); + bbox_out_tensor->copyToHostTensor(tensor_bbox_host); + auto bbox_output_shape = tensor_bbox_host->shape(); + int output_size = 1; + for (int j = 0; j < bbox_output_shape.size(); ++j) { + output_size *= bbox_output_shape[j]; + } + std::cout << "output_size:" << output_size << std::endl; + bbox_output_data_.resize(output_size); + std::copy_n(tensor_bbox_host->host(), output_size, + bbox_output_data_.data()); + delete tensor_bbox_host; + // class branch + auto tensor_class_host = + new MNN::Tensor(class_out_tensor, MNN::Tensor::CAFFE); + class_out_tensor->copyToHostTensor(tensor_class_host); + auto class_output_shape = tensor_class_host->shape(); + output_size = 1; + for (int j = 0; j < class_output_shape.size(); ++j) { + output_size *= class_output_shape[j]; + } + std::cout << "output_size:" << output_size << std::endl; + class_output_data_.resize(output_size); + std::copy_n(tensor_class_host->host(), output_size, + class_output_data_.data()); + delete tensor_class_host; + } else { + for (const auto &head_info : non_postprocess_heads_info) { + MNN::Tensor *tensor_scores = PicoDet_interpreter->getSessionOutput( + PicoDet_session, head_info.cls_layer.c_str()); + MNN::Tensor *tensor_boxes = PicoDet_interpreter->getSessionOutput( + PicoDet_session, head_info.dis_layer.c_str()); + + MNN::Tensor tensor_scores_host(tensor_scores, + tensor_scores->getDimensionType()); + tensor_scores->copyToHostTensor(&tensor_scores_host); + + MNN::Tensor tensor_boxes_host(tensor_boxes, + tensor_boxes->getDimensionType()); + tensor_boxes->copyToHostTensor(&tensor_boxes_host); + + decode_infer(&tensor_scores_host, &tensor_boxes_host, head_info.stride, + score_threshold, results); + } } auto end = chrono::steady_clock::now(); @@ -188,8 +224,6 @@ void PicoDet::nms(std::vector &input_boxes, float NMS_THRESH) { } } -string PicoDet::get_label_str(int label) { return labels[label]; } - inline float fast_exp(float x) { union { uint32_t i; diff --git a/deploy/third_engine/demo_mnn/picodet_mnn.hpp b/deploy/third_engine/demo_mnn/picodet_mnn.hpp index ecece8b17e30a3e4ab76f939a7bc0087ef2bdfa0..4744040e258498afd70ee587ffc0ae0b39d24faa 100644 --- a/deploy/third_engine/demo_mnn/picodet_mnn.hpp +++ b/deploy/third_engine/demo_mnn/picodet_mnn.hpp @@ -11,7 +11,6 @@ // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. -// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn #ifndef __PicoDet_H__ #define __PicoDet_H__ @@ -20,90 +19,84 @@ #include "Interpreter.hpp" +#include "ImageProcess.hpp" #include "MNNDefine.h" #include "Tensor.hpp" -#include "ImageProcess.hpp" -#include #include +#include #include +#include +#include #include #include -#include -#include - -typedef struct HeadInfo_ -{ - std::string cls_layer; - std::string dis_layer; - int stride; -} HeadInfo; - -typedef struct BoxInfo_ -{ - float x1; - float y1; - float x2; - float y2; - float score; - int label; +typedef struct NonPostProcessHeadInfo_ { + std::string cls_layer; + std::string dis_layer; + int stride; +} NonPostProcessHeadInfo; + +typedef struct BoxInfo_ { + float x1; + float y1; + float x2; + float y2; + float score; + int label; } BoxInfo; class PicoDet { public: - PicoDet(const std::string &mnn_path, - int input_width, int input_length, int num_thread_ = 4, float score_threshold_ = 0.5, float nms_threshold_ = 0.3); + PicoDet(const std::string &mnn_path, int input_width, int input_length, + int num_thread_ = 4, float score_threshold_ = 0.5, + float nms_threshold_ = 0.3); - ~PicoDet(); + ~PicoDet(); - int detect(cv::Mat &img, std::vector &result_list); - std::string get_label_str(int label); + int detect(cv::Mat &img, std::vector &result_list, + bool has_postprocess); private: - void decode_infer(MNN::Tensor *cls_pred, MNN::Tensor *dis_pred, int stride, float threshold, std::vector> &results); - BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x, int y, int stride); - void nms(std::vector &input_boxes, float NMS_THRESH); + void decode_infer(MNN::Tensor *cls_pred, MNN::Tensor *dis_pred, int stride, + float threshold, + std::vector> &results); + BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x, + int y, int stride); + void nms(std::vector &input_boxes, float NMS_THRESH); private: - - std::shared_ptr PicoDet_interpreter; - MNN::Session *PicoDet_session = nullptr; - MNN::Tensor *input_tensor = nullptr; - - int num_thread; - int image_w; - int image_h; - - int in_w = 320; - int in_h = 320; - - float score_threshold; - float nms_threshold; - - const float mean_vals[3] = { 103.53f, 116.28f, 123.675f }; - const float norm_vals[3] = { 0.017429f, 0.017507f, 0.017125f }; - - const int num_class = 80; - const int reg_max = 7; - - std::vector heads_info{ - // cls_pred|dis_pred|stride - {"save_infer_model/scale_0.tmp_1", "save_infer_model/scale_4.tmp_1", 8}, - {"save_infer_model/scale_1.tmp_1", "save_infer_model/scale_5.tmp_1", 16}, - {"save_infer_model/scale_2.tmp_1", "save_infer_model/scale_6.tmp_1", 32}, - {"save_infer_model/scale_3.tmp_1", "save_infer_model/scale_7.tmp_1", 64}, - }; - - std::vector - labels{"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", - "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", - "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", - "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", - "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", - "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", - "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", - "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", - "hair drier", "toothbrush"}; + std::shared_ptr PicoDet_interpreter; + MNN::Session *PicoDet_session = nullptr; + MNN::Tensor *input_tensor = nullptr; + + int num_thread; + int image_w; + int image_h; + + int in_w = 320; + int in_h = 320; + + float score_threshold; + float nms_threshold; + + const float mean_vals[3] = {103.53f, 116.28f, 123.675f}; + const float norm_vals[3] = {0.017429f, 0.017507f, 0.017125f}; + + const int num_class = 80; + const int reg_max = 7; + + std::vector bbox_output_data_; + std::vector class_output_data_; + + std::vector nms_heads_info{"tmp_16", "concat_4.tmp_0"}; + // If not export post-process, will use non_postprocess_heads_info + std::vector non_postprocess_heads_info{ + // cls_pred|dis_pred|stride + {"transpose_0.tmp_0", "transpose_1.tmp_0", 8}, + {"transpose_2.tmp_0", "transpose_3.tmp_0", 16}, + {"transpose_4.tmp_0", "transpose_5.tmp_0", 32}, + {"transpose_6.tmp_0", "transpose_7.tmp_0", 64}, + }; }; template diff --git a/deploy/third_engine/demo_mnn/python/demo_mnn.py b/deploy/third_engine/demo_mnn/python/demo_mnn.py deleted file mode 100644 index c5f88093869dfc7d20cff7a0f117a03b2477e8ab..0000000000000000000000000000000000000000 --- a/deploy/third_engine/demo_mnn/python/demo_mnn.py +++ /dev/null @@ -1,803 +0,0 @@ -# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. - -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at - -# http://www.apache.org/licenses/LICENSE-2.0 - -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn - -# -*- coding: utf-8 -*- -import argparse -from abc import ABCMeta, abstractmethod -from pathlib import Path - -import cv2 -import matplotlib.pyplot as plt -import numpy as np -from scipy.special import softmax -from tqdm import tqdm - -_COLORS = (np.array([ - 0.000, - 0.447, - 0.741, - 0.850, - 0.325, - 0.098, - 0.929, - 0.694, - 0.125, - 0.494, - 0.184, - 0.556, - 0.466, - 0.674, - 0.188, - 0.301, - 0.745, - 0.933, - 0.635, - 0.078, - 0.184, - 0.300, - 0.300, - 0.300, - 0.600, - 0.600, - 0.600, - 1.000, - 0.000, - 0.000, - 1.000, - 0.500, - 0.000, - 0.749, - 0.749, - 0.000, - 0.000, - 1.000, - 0.000, - 0.000, - 0.000, - 1.000, - 0.667, - 0.000, - 1.000, - 0.333, - 0.333, - 0.000, - 0.333, - 0.667, - 0.000, - 0.333, - 1.000, - 0.000, - 0.667, - 0.333, - 0.000, - 0.667, - 0.667, - 0.000, - 0.667, - 1.000, - 0.000, - 1.000, - 0.333, - 0.000, - 1.000, - 0.667, - 0.000, - 1.000, - 1.000, - 0.000, - 0.000, - 0.333, - 0.500, - 0.000, - 0.667, - 0.500, - 0.000, - 1.000, - 0.500, - 0.333, - 0.000, - 0.500, - 0.333, - 0.333, - 0.500, - 0.333, - 0.667, - 0.500, - 0.333, - 1.000, - 0.500, - 0.667, - 0.000, - 0.500, - 0.667, - 0.333, - 0.500, - 0.667, - 0.667, - 0.500, - 0.667, - 1.000, - 0.500, - 1.000, - 0.000, - 0.500, - 1.000, - 0.333, - 0.500, - 1.000, - 0.667, - 0.500, - 1.000, - 1.000, - 0.500, - 0.000, - 0.333, - 1.000, - 0.000, - 0.667, - 1.000, - 0.000, - 1.000, - 1.000, - 0.333, - 0.000, - 1.000, - 0.333, - 0.333, - 1.000, - 0.333, - 0.667, - 1.000, - 0.333, - 1.000, - 1.000, - 0.667, - 0.000, - 1.000, - 0.667, - 0.333, - 1.000, - 0.667, - 0.667, - 1.000, - 0.667, - 1.000, - 1.000, - 1.000, - 0.000, - 1.000, - 1.000, - 0.333, - 1.000, - 1.000, - 0.667, - 1.000, - 0.333, - 0.000, - 0.000, - 0.500, - 0.000, - 0.000, - 0.667, - 0.000, - 0.000, - 0.833, - 0.000, - 0.000, - 1.000, - 0.000, - 0.000, - 0.000, - 0.167, - 0.000, - 0.000, - 0.333, - 0.000, - 0.000, - 0.500, - 0.000, - 0.000, - 0.667, - 0.000, - 0.000, - 0.833, - 0.000, - 0.000, - 1.000, - 0.000, - 0.000, - 0.000, - 0.167, - 0.000, - 0.000, - 0.333, - 0.000, - 0.000, - 0.500, - 0.000, - 0.000, - 0.667, - 0.000, - 0.000, - 0.833, - 0.000, - 0.000, - 1.000, - 0.000, - 0.000, - 0.000, - 0.143, - 0.143, - 0.143, - 0.286, - 0.286, - 0.286, - 0.429, - 0.429, - 0.429, - 0.571, - 0.571, - 0.571, - 0.714, - 0.714, - 0.714, - 0.857, - 0.857, - 0.857, - 0.000, - 0.447, - 0.741, - 0.314, - 0.717, - 0.741, - 0.50, - 0.5, - 0, -]).astype(np.float32).reshape(-1, 3)) - - -def get_resize_matrix(raw_shape, dst_shape, keep_ratio): - """ - Get resize matrix for resizing raw img to input size - :param raw_shape: (width, height) of raw image - :param dst_shape: (width, height) of input image - :param keep_ratio: whether keep original ratio - :return: 3x3 Matrix - """ - r_w, r_h = raw_shape - d_w, d_h = dst_shape - Rs = np.eye(3) - if keep_ratio: - C = np.eye(3) - C[0, 2] = -r_w / 2 - C[1, 2] = -r_h / 2 - - if r_w / r_h < d_w / d_h: - ratio = d_h / r_h - else: - ratio = d_w / r_w - Rs[0, 0] *= ratio - Rs[1, 1] *= ratio - - T = np.eye(3) - T[0, 2] = 0.5 * d_w - T[1, 2] = 0.5 * d_h - return T @Rs @C - else: - Rs[0, 0] *= d_w / r_w - Rs[1, 1] *= d_h / r_h - return Rs - - -def warp_boxes(boxes, M, width, height): - """Apply transform to boxes - Copy from picodet/data/transform/warp.py - """ - n = len(boxes) - if n: - # warp points - xy = np.ones((n * 4, 3)) - xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( - n * 4, 2) # x1y1, x2y2, x1y2, x2y1 - xy = xy @M.T # transform - xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8) # rescale - # create new boxes - x = xy[:, [0, 2, 4, 6]] - y = xy[:, [1, 3, 5, 7]] - xy = np.concatenate( - (x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T - # clip boxes - xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width) - xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height) - return xy.astype(np.float32) - else: - return boxes - - -def overlay_bbox_cv(img, all_box, class_names): - """Draw result boxes - Copy from picodet/util/visualization.py - """ - # all_box array of [label, x0, y0, x1, y1, score] - all_box.sort(key=lambda v: v[5]) - for box in all_box: - label, x0, y0, x1, y1, score = box - color = (_COLORS[label] * 255).astype(np.uint8).tolist() - text = "{}:{:.1f}%".format(class_names[label], score * 100) - txt_color = (0, 0, 0) if np.mean(_COLORS[label]) > 0.5 else (255, 255, - 255) - font = cv2.FONT_HERSHEY_SIMPLEX - txt_size = cv2.getTextSize(text, font, 0.5, 2)[0] - cv2.rectangle(img, (x0, y0), (x1, y1), color, 2) - - cv2.rectangle( - img, - (x0, y0 - txt_size[1] - 1), - (x0 + txt_size[0] + txt_size[1], y0 - 1), - color, - -1, ) - cv2.putText(img, text, (x0, y0 - 1), font, 0.5, txt_color, thickness=1) - return img - - -def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200): - """ - - Args: - box_scores (N, 5): boxes in corner-form and probabilities. - iou_threshold: intersection over union threshold. - top_k: keep top_k results. If k <= 0, keep all the results. - candidate_size: only consider the candidates with the highest scores. - Returns: - picked: a list of indexes of the kept boxes - """ - scores = box_scores[:, -1] - boxes = box_scores[:, :-1] - picked = [] - indexes = np.argsort(scores) - indexes = indexes[-candidate_size:] - while len(indexes) > 0: - current = indexes[-1] - picked.append(current) - if 0 < top_k == len(picked) or len(indexes) == 1: - break - current_box = boxes[current, :] - indexes = indexes[:-1] - rest_boxes = boxes[indexes, :] - iou = iou_of( - rest_boxes, - np.expand_dims( - current_box, axis=0), ) - indexes = indexes[iou <= iou_threshold] - - return box_scores[picked, :] - - -def iou_of(boxes0, boxes1, eps=1e-5): - """Return intersection-over-union (Jaccard index) of boxes. - - Args: - boxes0 (N, 4): ground truth boxes. - boxes1 (N or 1, 4): predicted boxes. - eps: a small number to avoid 0 as denominator. - Returns: - iou (N): IoU values. - """ - overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2]) - overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:]) - - overlap_area = area_of(overlap_left_top, overlap_right_bottom) - area0 = area_of(boxes0[..., :2], boxes0[..., 2:]) - area1 = area_of(boxes1[..., :2], boxes1[..., 2:]) - return overlap_area / (area0 + area1 - overlap_area + eps) - - -def area_of(left_top, right_bottom): - """Compute the areas of rectangles given two corners. - - Args: - left_top (N, 2): left top corner. - right_bottom (N, 2): right bottom corner. - - Returns: - area (N): return the area. - """ - hw = np.clip(right_bottom - left_top, 0.0, None) - return hw[..., 0] * hw[..., 1] - - -class PicoDetABC(metaclass=ABCMeta): - def __init__( - self, - input_shape=[416, 416], - reg_max=7, - strides=[8, 16, 32, 64], - prob_threshold=0.4, - iou_threshold=0.3, - num_candidate=1000, - top_k=-1, ): - self.strides = strides - self.input_shape = input_shape - self.reg_max = reg_max - self.prob_threshold = prob_threshold - self.iou_threshold = iou_threshold - self.num_candidate = num_candidate - self.top_k = top_k - self.img_mean = [103.53, 116.28, 123.675] - self.img_std = [57.375, 57.12, 58.395] - self.input_size = (self.input_shape[1], self.input_shape[0]) - self.class_names = [ - "person", - "bicycle", - "car", - "motorcycle", - "airplane", - "bus", - "train", - "truck", - "boat", - "traffic_light", - "fire_hydrant", - "stop_sign", - "parking_meter", - "bench", - "bird", - "cat", - "dog", - "horse", - "sheep", - "cow", - "elephant", - "bear", - "zebra", - "giraffe", - "backpack", - "umbrella", - "handbag", - "tie", - "suitcase", - "frisbee", - "skis", - "snowboard", - "sports_ball", - "kite", - "baseball_bat", - "baseball_glove", - "skateboard", - "surfboard", - "tennis_racket", - "bottle", - "wine_glass", - "cup", - "fork", - "knife", - "spoon", - "bowl", - "banana", - "apple", - "sandwich", - "orange", - "broccoli", - "carrot", - "hot_dog", - "pizza", - "donut", - "cake", - "chair", - "couch", - "potted_plant", - "bed", - "dining_table", - "toilet", - "tv", - "laptop", - "mouse", - "remote", - "keyboard", - "cell_phone", - "microwave", - "oven", - "toaster", - "sink", - "refrigerator", - "book", - "clock", - "vase", - "scissors", - "teddy_bear", - "hair_drier", - "toothbrush", - ] - - def preprocess(self, img): - # resize image - ResizeM = get_resize_matrix((img.shape[1], img.shape[0]), - self.input_size, True) - img_resize = cv2.warpPerspective(img, ResizeM, dsize=self.input_size) - # normalize image - img_input = img_resize.astype(np.float32) / 255 - img_mean = np.array( - self.img_mean, dtype=np.float32).reshape(1, 1, 3) / 255 - img_std = np.array( - self.img_std, dtype=np.float32).reshape(1, 1, 3) / 255 - img_input = (img_input - img_mean) / img_std - # expand dims - img_input = np.transpose(img_input, [2, 0, 1]) - img_input = np.expand_dims(img_input, axis=0) - return img_input, ResizeM - - def postprocess(self, scores, raw_boxes, ResizeM, raw_shape): - # generate centers - decode_boxes = [] - select_scores = [] - for stride, box_distribute, score in zip(self.strides, raw_boxes, - scores): - # centers - fm_h = self.input_shape[0] / stride - fm_w = self.input_shape[1] / stride - h_range = np.arange(fm_h) - w_range = np.arange(fm_w) - ww, hh = np.meshgrid(w_range, h_range) - ct_row = (hh.flatten() + 0.5) * stride - ct_col = (ww.flatten() + 0.5) * stride - center = np.stack((ct_col, ct_row, ct_col, ct_row), axis=1) - - # box distribution to distance - reg_range = np.arange(self.reg_max + 1) - box_distance = box_distribute.reshape((-1, self.reg_max + 1)) - box_distance = softmax(box_distance, axis=1) - box_distance = box_distance * np.expand_dims(reg_range, axis=0) - box_distance = np.sum(box_distance, axis=1).reshape((-1, 4)) - box_distance = box_distance * stride - - # top K candidate - topk_idx = np.argsort(score.max(axis=1))[::-1] - topk_idx = topk_idx[:C] - center = center[topk_idx] - score = score[topk_idx] - box_distance = box_distance[topk_idx] - - # decode box - decode_box = center + [-1, -1, 1, 1] * box_distance - - select_scores.append(score) - decode_boxes.append(decode_box) - - # nms - bboxes = np.concatenate(decode_boxes, axis=0) - confidences = np.concatenate(select_scores, axis=0) - picked_box_probs = [] - picked_labels = [] - for class_index in range(0, confidences.shape[1]): - probs = confidences[:, class_index] - mask = probs > self.prob_threshold - probs = probs[mask] - if probs.shape[0] == 0: - continue - subset_boxes = bboxes[mask, :] - box_probs = np.concatenate( - [subset_boxes, probs.reshape(-1, 1)], axis=1) - box_probs = hard_nms( - box_probs, - iou_threshold=self.iou_threshold, - top_k=self.top_k, ) - picked_box_probs.append(box_probs) - picked_labels.extend([class_index] * box_probs.shape[0]) - if not picked_box_probs: - return np.array([]), np.array([]), np.array([]) - picked_box_probs = np.concatenate(picked_box_probs) - - # resize output boxes - picked_box_probs[:, :4] = warp_boxes(picked_box_probs[:, :4], - np.linalg.inv(ResizeM), - raw_shape[1], raw_shape[0]) - return ( - picked_box_probs[:, :4].astype(np.int32), - np.array(picked_labels), - picked_box_probs[:, 4], ) - - @abstractmethod - def infer_image(self, img_input): - pass - - def detect(self, img): - raw_shape = img.shape - img_input, ResizeM = self.preprocess(img) - scores, raw_boxes = self.infer_image(img_input) - if scores[0].ndim == 1: # handling num_classes=1 case - scores = [x[:, None] for x in scores] - bbox, label, score = self.postprocess(scores, raw_boxes, ResizeM, - raw_shape) - - print(bbox, score) - return bbox, label, score - - def draw_box(self, raw_img, bbox, label, score): - img = raw_img.copy() - all_box = [[x, ] + y + [z, ] - for x, y, z in zip(label, bbox.tolist(), score)] - img_draw = overlay_bbox_cv(img, all_box, self.class_names) - return img_draw - - def detect_folder(self, img_fold, result_path): - img_fold = Path(img_fold) - result_path = Path(result_path) - result_path.mkdir(parents=True, exist_ok=True) - - img_name_list = filter( - lambda x: str(x).endswith(".png") or str(x).endswith(".jpg"), - img_fold.iterdir(), ) - img_name_list = list(img_name_list) - print(f"find {len(img_name_list)} images") - - for img_path in tqdm(img_name_list): - img = cv2.imread(str(img_path)) - bbox, label, score = self.detect(img) - img_draw = self.draw_box(img, bbox, label, score) - save_path = str(result_path / img_path.name.replace(".png", ".jpg")) - cv2.imwrite(save_path, img_draw) - - -class PicoDetMNN(PicoDetABC): - import MNN as MNNlib - - def __init__(self, model_path, *args, **kwargs): - super(PicoDetMNN, self).__init__(*args, **kwargs) - print("Using MNN as inference backend") - print(f"Using weight: {model_path}") - - # load model - self.model_path = model_path - self.interpreter = self.MNNlib.Interpreter(self.model_path) - self.session = self.interpreter.createSession() - self.input_tensor = self.interpreter.getSessionInput(self.session) - - def infer_image(self, img_input): - tmp_input = self.MNNlib.Tensor( - (1, 3, self.input_size[1], self.input_size[0]), - self.MNNlib.Halide_Type_Float, - img_input, - self.MNNlib.Tensor_DimensionType_Caffe, ) - self.input_tensor.copyFrom(tmp_input) - self.interpreter.runSession(self.session) - score_out_name = [ - "save_infer_model/scale_0.tmp_1", "save_infer_model/scale_1.tmp_1", - "save_infer_model/scale_2.tmp_1", "save_infer_model/scale_3.tmp_1" - ] - scores = [ - self.interpreter.getSessionOutput(self.session, x).getData() - for x in score_out_name - ] - scores = [np.reshape(x, (-1, 80)) for x in scores] - boxes_out_name = [ - "save_infer_model/scale_4.tmp_1", "save_infer_model/scale_5.tmp_1", - "save_infer_model/scale_6.tmp_1", "save_infer_model/scale_7.tmp_1" - ] - raw_boxes = [ - self.interpreter.getSessionOutput(self.session, x).getData() - for x in boxes_out_name - ] - raw_boxes = [np.reshape(x, (-1, 32)) for x in raw_boxes] - return scores, raw_boxes - - -class PicoDetONNX(PicoDetABC): - import onnxruntime as ort - - def __init__(self, model_path, *args, **kwargs): - super(PicoDetONNX, self).__init__(*args, **kwargs) - print("Using ONNX as inference backend") - print(f"Using weight: {model_path}") - - # load model - self.model_path = model_path - self.ort_session = self.ort.InferenceSession(self.model_path) - self.input_name = self.ort_session.get_inputs()[0].name - - def infer_image(self, img_input): - inference_results = self.ort_session.run(None, - {self.input_name: img_input}) - scores = [np.squeeze(x) for x in inference_results[:3]] - raw_boxes = [np.squeeze(x) for x in inference_results[3:]] - return scores, raw_boxes - - -class PicoDetTorch(PicoDetABC): - import torch - - def __init__(self, model_path, cfg_path, *args, **kwargs): - from picodet.model.arch import build_model - from picodet.util import Logger, cfg, load_config, load_model_weight - - super(PicoDetTorch, self).__init__(*args, **kwargs) - print("Using PyTorch as inference backend") - print(f"Using weight: {model_path}") - - # load model - self.model_path = model_path - self.cfg_path = cfg_path - load_config(cfg, cfg_path) - self.logger = Logger(-1, cfg.save_dir, False) - self.model = build_model(cfg.model) - checkpoint = self.torch.load( - model_path, map_location=lambda storage, loc: storage) - load_model_weight(self.model, checkpoint, self.logger) - - def infer_image(self, img_input): - self.model.train(False) - with self.torch.no_grad(): - inference_results = self.model(self.torch.from_numpy(img_input)) - scores = [ - x.permute(0, 2, 3, 1).reshape((-1, 80)).sigmoid().detach().numpy() - for x in inference_results[0] - ] - raw_boxes = [ - x.permute(0, 2, 3, 1).reshape((-1, 32)).detach().numpy() - for x in inference_results[1] - ] - return scores, raw_boxes - - -def main(): - parser = argparse.ArgumentParser() - parser.add_argument( - "--model_path", - dest="model_path", - type=str, - default="../model/picodet-320.mnn") - parser.add_argument( - "--cfg_path", dest="cfg_path", type=str, default="config/picodet-m.yml") - parser.add_argument( - "--img_fold", dest="img_fold", type=str, default="../imgs") - parser.add_argument( - "--result_fold", dest="result_fold", type=str, default="../results") - parser.add_argument( - "--input_shape", - dest="input_shape", - nargs=2, - type=int, - default=[320, 320]) - parser.add_argument( - "--backend", choices=["MNN", "ONNX", "torch"], default="MNN") - args = parser.parse_args() - - print(f"Detecting {args.img_fold}") - - # load detector - if args.backend == "MNN": - detector = PicoDetMNN(args.model_path, input_shape=args.input_shape) - elif args.backend == "ONNX": - detector = PicoDetONNX(args.model_path, input_shape=args.input_shape) - elif args.backend == "torch": - detector = PicoDetTorch( - args.model_path, args.cfg_path, input_shape=args.input_shape) - else: - raise ValueError - - # detect folder - detector.detect_folder(args.img_fold, args.result_fold) - - -def test_one(): - detector = PicoDetMNN("../weight/picodet-416.mnn") - img = cv2.imread("../imgs/000252.jpg") - bbox, label, score = detector.detect(img) - img_draw = detector.draw_box(img, bbox, label, score) - cv2.imwrite('picodet_infer.jpg', img_draw) - - -if __name__ == "__main__": - # main() - test_one() diff --git a/deploy/third_engine/demo_ncnn/CMakeLists.txt b/deploy/third_engine/demo_ncnn/CMakeLists.txt index 4f5cc65fc6d349c98f6490055bb38139a7296d05..0d4344c699d58082eb37ebe6089e16ad120bc87e 100644 --- a/deploy/third_engine/demo_ncnn/CMakeLists.txt +++ b/deploy/third_engine/demo_ncnn/CMakeLists.txt @@ -1,4 +1,4 @@ -cmake_minimum_required(VERSION 3.4.1) +cmake_minimum_required(VERSION 3.9) set(CMAKE_CXX_STANDARD 17) project(picodet_demo) @@ -11,9 +11,11 @@ if(OPENMP_FOUND) set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}") endif() -find_package(OpenCV REQUIRED) +# find_package(OpenCV REQUIRED) +find_package(OpenCV REQUIRED PATHS "/path/to/opencv-3.4.16_gcc8.2_ffmpeg") -find_package(ncnn REQUIRED) +# find_package(ncnn REQUIRED) +find_package(ncnn REQUIRED PATHS "/path/to/ncnn/build/install/lib/cmake/ncnn") if(NOT TARGET ncnn) message(WARNING "ncnn NOT FOUND! Please set ncnn_DIR environment variable") else() diff --git a/deploy/third_engine/demo_ncnn/README.md b/deploy/third_engine/demo_ncnn/README.md index b15052d9812784fbbec543a394960a78128012c7..f9867b8acc9652e22bfd891671da4f5429436c3c 100644 --- a/deploy/third_engine/demo_ncnn/README.md +++ b/deploy/third_engine/demo_ncnn/README.md @@ -1,10 +1,8 @@ # PicoDet NCNN Demo -This project provides PicoDet image inference, webcam inference and benchmark using -[Tencent's NCNN framework](https://github.com/Tencent/ncnn). - -# How to build +该Demo提供的预测代码是根据[Tencent's NCNN framework](https://github.com/Tencent/ncnn)推理库预测的。 +# 第一步:编译 ## Windows ### Step1. Download and Install Visual Studio from https://visualstudio.microsoft.com/vs/community/ @@ -12,11 +10,16 @@ Download and Install Visual Studio from https://visualstudio.microsoft.com/vs/co ### Step2. Download and install OpenCV from https://github.com/opencv/opencv/releases -### Step3(Optional). +为了方便,如果环境是gcc8.2 x86环境,可直接下载以下库: +```shell +wget https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz +tar -xf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz +``` + +### Step3(可选). Download and install Vulkan SDK from https://vulkan.lunarg.com/sdk/home -### Step4. -Clone NCNN repository +### Step4:编译NCNN ``` shell script git clone --recursive https://github.com/Tencent/ncnn.git @@ -25,7 +28,7 @@ Build NCNN following this tutorial: [Build for Windows x64 using VS2017](https:/ ### Step5. -Add `ncnn_DIR` = `YOUR_NCNN_PATH/build/install/lib/cmake/ncnn` to system environment variables. +增加 `ncnn_DIR` = `YOUR_NCNN_PATH/build/install/lib/cmake/ncnn` 到系统变量中 Build project: Open x64 Native Tools Command Prompt for VS 2019 or 2017 @@ -42,10 +45,10 @@ msbuild picodet_demo.vcxproj /p:configuration=release /p:platform=x64 ### Step1. Build and install OpenCV from https://github.com/opencv/opencv -### Step2(Optional). +### Step2(可选). Download Vulkan SDK from https://vulkan.lunarg.com/sdk/home -### Step3. +### Step3:编译NCNN Clone NCNN repository ``` shell script @@ -54,15 +57,7 @@ git clone --recursive https://github.com/Tencent/ncnn.git Build NCNN following this tutorial: [Build for Linux / NVIDIA Jetson / Raspberry Pi](https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-linux) -### Step4. - -Set environment variables. Run: - -``` shell script -export ncnn_DIR=YOUR_NCNN_PATH/build/install/lib/cmake/ncnn -``` - -Build project +### Step4:编译可执行文件 ``` shell script cd @@ -71,47 +66,64 @@ cd build cmake .. make ``` - # Run demo -Download PicoDet ncnn model. -* [PicoDet ncnn model download link](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_ncnn.zip) - - -## Webcam - -```shell script -picodet_demo 0 0 +- 准备模型 + ```shell + modelName=picodet_s_320_coco_lcnet + # 导出Inference model + python tools/export_model.py \ + -c configs/picodet/${modelName}.yml \ + -o weights=${modelName}.pdparams \ + --output_dir=inference_model + # 转换到ONNX + paddle2onnx --model_dir inference_model/${modelName} \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 11 \ + --save_file ${modelName}.onnx + # 简化模型 + python -m onnxsim ${modelName}.onnx ${modelName}_processed.onnx + # 将模型转换至NCNN格式 + Run onnx2ncnn in ncnn tools to generate ncnn .param and .bin file. + ``` +转NCNN模型可以利用在线转换工具 [https://convertmodel.com](https://convertmodel.com/) + +为了快速测试,可直接下载:[picodet_s_320_coco_lcnet-opt.bin](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet-opt.bin)/ [picodet_s_320_coco_lcnet-opt.param](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet-opt.param)(不带后处理)。 + +**注意:**由于带后处理后,NCNN预测会出NAN,暂时使用不带后处理Demo即可,带后处理的Demo正在升级中,很快发布。 + + +## 开始运行 + +首先新建预测结果存放目录: +```shell +cp -r ../demo_onnxruntime/imgs . +cd build +mkdir ../results ``` -## Inference images - -```shell script -picodet_demo 1 IMAGE_FOLDER/*.jpg +- 预测一张图片 +``` shell +./picodet_demo 0 ../picodet_s_320_coco_lcnet.bin ../picodet_s_320_coco_lcnet.param 320 320 ../imgs/dog.jpg 0 ``` +具体参数解析可参考`main.cpp`。 -## Inference video +-测试速度Benchmark -```shell script -picodet_demo 2 VIDEO_PATH +``` shell +./picodet_demo 1 ../picodet_s_320_lcnet.bin ../picodet_s_320_lcnet.param 320 320 0 ``` -## Benchmark - -```shell script -picodet_demo 3 0 - -result: picodet min = 17.74 max = 22.71 avg = 18.16 -``` - -**** - -Notice: - -If benchmark speed is slow, try to limit omp thread num. - -Linux: +## FAQ -```shell script -export OMP_THREAD_LIMIT=4 +- 预测结果精度不对: +请先确认模型输入shape是否对齐,并且模型输出name是否对齐,不带后处理的PicoDet增强版模型输出name如下: +```shell +# 分类分支 | 检测分支 +{"transpose_0.tmp_0", "transpose_1.tmp_0"}, +{"transpose_2.tmp_0", "transpose_3.tmp_0"}, +{"transpose_4.tmp_0", "transpose_5.tmp_0"}, +{"transpose_6.tmp_0", "transpose_7.tmp_0"}, ``` +可使用[netron](https://netron.app)查看具体name,并修改`picodet_mnn.hpp`中相应`non_postprocess_heads_info`数组。 diff --git a/deploy/third_engine/demo_ncnn/main.cpp b/deploy/third_engine/demo_ncnn/main.cpp index 2f98d82ae5fcdf8c18d3f2eebd30ae3cf31f0cc7..8f69af93b2de7f9404fb86d5112ce62056d936b4 100644 --- a/deploy/third_engine/demo_ncnn/main.cpp +++ b/deploy/third_engine/demo_ncnn/main.cpp @@ -13,353 +13,198 @@ // limitations under the License. // reference from https://github.com/RangiLyu/nanodet/tree/main/demo_ncnn +#include "picodet.h" +#include +#include +#include #include #include #include -#include -#include -#include "picodet.h" -#include +#define __SAVE_RESULT__ // if defined save drawed results to ../results, else + // show it in windows struct object_rect { - int x; - int y; - int width; - int height; -}; - -int resize_uniform(cv::Mat& src, cv::Mat& dst, cv::Size dst_size, object_rect& effect_area) -{ - int w = src.cols; - int h = src.rows; - int dst_w = dst_size.width; - int dst_h = dst_size.height; - dst = cv::Mat(cv::Size(dst_w, dst_h), CV_8UC3, cv::Scalar(0)); - - float ratio_src = w * 1.0 / h; - float ratio_dst = dst_w * 1.0 / dst_h; - - int tmp_w = 0; - int tmp_h = 0; - if (ratio_src > ratio_dst) { - tmp_w = dst_w; - tmp_h = floor((dst_w * 1.0 / w) * h); - } - else if (ratio_src < ratio_dst) { - tmp_h = dst_h; - tmp_w = floor((dst_h * 1.0 / h) * w); - } - else { - cv::resize(src, dst, dst_size); - effect_area.x = 0; - effect_area.y = 0; - effect_area.width = dst_w; - effect_area.height = dst_h; - return 0; - } - - cv::Mat tmp; - cv::resize(src, tmp, cv::Size(tmp_w, tmp_h)); - - if (tmp_w != dst_w) { - int index_w = floor((dst_w - tmp_w) / 2.0); - for (int i = 0; i < dst_h; i++) { - memcpy(dst.data + i * dst_w * 3 + index_w * 3, tmp.data + i * tmp_w * 3, tmp_w * 3); - } - effect_area.x = index_w; - effect_area.y = 0; - effect_area.width = tmp_w; - effect_area.height = tmp_h; - } - else if (tmp_h != dst_h) { - int index_h = floor((dst_h - tmp_h) / 2.0); - memcpy(dst.data + index_h * dst_w * 3, tmp.data, tmp_w * tmp_h * 3); - effect_area.x = 0; - effect_area.y = index_h; - effect_area.width = tmp_w; - effect_area.height = tmp_h; - } - else { - printf("error\n"); - } - return 0; -} - -const int color_list[80][3] = -{ - {216 , 82 , 24}, - {236 ,176 , 31}, - {125 , 46 ,141}, - {118 ,171 , 47}, - { 76 ,189 ,237}, - {238 , 19 , 46}, - { 76 , 76 , 76}, - {153 ,153 ,153}, - {255 , 0 , 0}, - {255 ,127 , 0}, - {190 ,190 , 0}, - { 0 ,255 , 0}, - { 0 , 0 ,255}, - {170 , 0 ,255}, - { 84 , 84 , 0}, - { 84 ,170 , 0}, - { 84 ,255 , 0}, - {170 , 84 , 0}, - {170 ,170 , 0}, - {170 ,255 , 0}, - {255 , 84 , 0}, - {255 ,170 , 0}, - {255 ,255 , 0}, - { 0 , 84 ,127}, - { 0 ,170 ,127}, - { 0 ,255 ,127}, - { 84 , 0 ,127}, - { 84 , 84 ,127}, - { 84 ,170 ,127}, - { 84 ,255 ,127}, - {170 , 0 ,127}, - {170 , 84 ,127}, - {170 ,170 ,127}, - {170 ,255 ,127}, - {255 , 0 ,127}, - {255 , 84 ,127}, - {255 ,170 ,127}, - {255 ,255 ,127}, - { 0 , 84 ,255}, - { 0 ,170 ,255}, - { 0 ,255 ,255}, - { 84 , 0 ,255}, - { 84 , 84 ,255}, - { 84 ,170 ,255}, - { 84 ,255 ,255}, - {170 , 0 ,255}, - {170 , 84 ,255}, - {170 ,170 ,255}, - {170 ,255 ,255}, - {255 , 0 ,255}, - {255 , 84 ,255}, - {255 ,170 ,255}, - { 42 , 0 , 0}, - { 84 , 0 , 0}, - {127 , 0 , 0}, - {170 , 0 , 0}, - {212 , 0 , 0}, - {255 , 0 , 0}, - { 0 , 42 , 0}, - { 0 , 84 , 0}, - { 0 ,127 , 0}, - { 0 ,170 , 0}, - { 0 ,212 , 0}, - { 0 ,255 , 0}, - { 0 , 0 , 42}, - { 0 , 0 , 84}, - { 0 , 0 ,127}, - { 0 , 0 ,170}, - { 0 , 0 ,212}, - { 0 , 0 ,255}, - { 0 , 0 , 0}, - { 36 , 36 , 36}, - { 72 , 72 , 72}, - {109 ,109 ,109}, - {145 ,145 ,145}, - {182 ,182 ,182}, - {218 ,218 ,218}, - { 0 ,113 ,188}, - { 80 ,182 ,188}, - {127 ,127 , 0}, + int x; + int y; + int width; + int height; }; -void draw_bboxes(const cv::Mat& bgr, const std::vector& bboxes, object_rect effect_roi) -{ - static const char* class_names[] = { "person", "bicycle", "car", "motorcycle", "airplane", "bus", - "train", "truck", "boat", "traffic light", "fire hydrant", - "stop sign", "parking meter", "bench", "bird", "cat", "dog", - "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", - "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", - "skis", "snowboard", "sports ball", "kite", "baseball bat", - "baseball glove", "skateboard", "surfboard", "tennis racket", - "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", - "banana", "apple", "sandwich", "orange", "broccoli", "carrot", - "hot dog", "pizza", "donut", "cake", "chair", "couch", - "potted plant", "bed", "dining table", "toilet", "tv", "laptop", - "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", - "toaster", "sink", "refrigerator", "book", "clock", "vase", - "scissors", "teddy bear", "hair drier", "toothbrush" - }; - - cv::Mat image = bgr.clone(); - int src_w = image.cols; - int src_h = image.rows; - int dst_w = effect_roi.width; - int dst_h = effect_roi.height; - float width_ratio = (float)src_w / (float)dst_w; - float height_ratio = (float)src_h / (float)dst_h; - - - for (size_t i = 0; i < bboxes.size(); i++) - { - const BoxInfo& bbox = bboxes[i]; - cv::Scalar color = cv::Scalar(color_list[bbox.label][0], color_list[bbox.label][1], color_list[bbox.label][2]); - - cv::rectangle(image, cv::Rect(cv::Point((bbox.x1 - effect_roi.x) * width_ratio, (bbox.y1 - effect_roi.y) * height_ratio), - cv::Point((bbox.x2 - effect_roi.x) * width_ratio, (bbox.y2 - effect_roi.y) * height_ratio)), color); - - char text[256]; - sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100); - - int baseLine = 0; - cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine); - - int x = (bbox.x1 - effect_roi.x) * width_ratio; - int y = (bbox.y1 - effect_roi.y) * height_ratio - label_size.height - baseLine; - if (y < 0) - y = 0; - if (x + label_size.width > image.cols) - x = image.cols - label_size.width; - - cv::rectangle(image, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)), - color, -1); - - cv::putText(image, text, cv::Point(x, y + label_size.height), - cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255)); - } - cv::imwrite("../result/test_picodet.jpg", image); - printf("************infer image success!!!**********\n"); -} - - -int image_demo(PicoDet &detector, const char* imagepath) -{ - std::vector filenames; - cv::glob(imagepath, filenames, false); - - for (auto img_name : filenames) - { - cv::Mat image = cv::imread(img_name); - if (image.empty()) - { - fprintf(stderr, "cv::imread %s failed\n", img_name); - return -1; - } - object_rect effect_roi; - cv::Mat resized_img; - resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi); - auto results = detector.detect(resized_img, 0.4, 0.5); - char imgName[20] = {}; - draw_bboxes(image, results, effect_roi); - cv::waitKey(0); - +std::vector GenerateColorMap(int num_class) { + auto colormap = std::vector(3 * num_class, 0); + for (int i = 0; i < num_class; ++i) { + int j = 0; + int lab = i; + while (lab) { + colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j)); + colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)); + colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)); + ++j; + lab >>= 3; } - return 0; + } + return colormap; } -int webcam_demo(PicoDet& detector, int cam_id) -{ - cv::Mat image; - cv::VideoCapture cap(cam_id); - - while (true) - { - cap >> image; - object_rect effect_roi; - cv::Mat resized_img; - resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi); - auto results = detector.detect(resized_img, 0.4, 0.5); - draw_bboxes(image, results, effect_roi); - cv::waitKey(1); - } - return 0; +void draw_bboxes(const cv::Mat &im, const std::vector &bboxes, + std::string save_path = "None") { + static const char *class_names[] = { + "person", "bicycle", "car", + "motorcycle", "airplane", "bus", + "train", "truck", "boat", + "traffic light", "fire hydrant", "stop sign", + "parking meter", "bench", "bird", + "cat", "dog", "horse", + "sheep", "cow", "elephant", + "bear", "zebra", "giraffe", + "backpack", "umbrella", "handbag", + "tie", "suitcase", "frisbee", + "skis", "snowboard", "sports ball", + "kite", "baseball bat", "baseball glove", + "skateboard", "surfboard", "tennis racket", + "bottle", "wine glass", "cup", + "fork", "knife", "spoon", + "bowl", "banana", "apple", + "sandwich", "orange", "broccoli", + "carrot", "hot dog", "pizza", + "donut", "cake", "chair", + "couch", "potted plant", "bed", + "dining table", "toilet", "tv", + "laptop", "mouse", "remote", + "keyboard", "cell phone", "microwave", + "oven", "toaster", "sink", + "refrigerator", "book", "clock", + "vase", "scissors", "teddy bear", + "hair drier", "toothbrush"}; + + cv::Mat image = im.clone(); + int src_w = image.cols; + int src_h = image.rows; + int thickness = 2; + auto colormap = GenerateColorMap(sizeof(class_names)); + + for (size_t i = 0; i < bboxes.size(); i++) { + const BoxInfo &bbox = bboxes[i]; + std::cout << bbox.x1 << ". " << bbox.y1 << ". " << bbox.x2 << ". " + << bbox.y2 << ". " << std::endl; + int c1 = colormap[3 * bbox.label + 0]; + int c2 = colormap[3 * bbox.label + 1]; + int c3 = colormap[3 * bbox.label + 2]; + cv::Scalar color = cv::Scalar(c1, c2, c3); + // cv::Scalar color = cv::Scalar(0, 0, 255); + cv::rectangle(image, cv::Rect(cv::Point(bbox.x1, bbox.y1), + cv::Point(bbox.x2, bbox.y2)), + color, 1); + + char text[256]; + sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100); + + int baseLine = 0; + cv::Size label_size = + cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine); + + int x = bbox.x1; + int y = bbox.y1 - label_size.height - baseLine; + if (y < 0) + y = 0; + if (x + label_size.width > image.cols) + x = image.cols - label_size.width; + + cv::rectangle(image, cv::Rect(cv::Point(x, y), + cv::Size(label_size.width, + label_size.height + baseLine)), + color, -1); + + cv::putText(image, text, cv::Point(x, y + label_size.height), + cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255), 1); + } + + if (save_path == "None") { + cv::imshow("image", image); + } else { + cv::imwrite(save_path, image); + std::cout << "Result save in: " << save_path << std::endl; + } } -int video_demo(PicoDet& detector, const char* path) -{ - cv::Mat image; - cv::VideoCapture cap(path); - - while (true) - { - cap >> image; - object_rect effect_roi; - cv::Mat resized_img; - resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi); - auto results = detector.detect(resized_img, 0.4, 0.5); - draw_bboxes(image, results, effect_roi); - cv::waitKey(1); +int image_demo(PicoDet &detector, const char *imagepath, + int has_postprocess = 0) { + std::vector filenames; + cv::glob(imagepath, filenames, false); + bool is_postprocess = has_postprocess > 0 ? true : false; + for (auto img_name : filenames) { + cv::Mat image = cv::imread(img_name, cv::IMREAD_COLOR); + if (image.empty()) { + fprintf(stderr, "cv::imread %s failed\n", img_name.c_str()); + return -1; } - return 0; + std::vector results; + detector.detect(image, results, is_postprocess); + std::cout << "detect done." << std::endl; + +#ifdef __SAVE_RESULT__ + std::string save_path = img_name; + draw_bboxes(image, results, save_path.replace(3, 4, "results")); +#else + draw_bboxes(image, results); + cv::waitKey(0); +#endif + } + return 0; } -int benchmark(PicoDet& detector) -{ - int loop_num = 100; - int warm_up = 8; - - double time_min = DBL_MAX; - double time_max = -DBL_MAX; - double time_avg = 0; - ncnn::Mat input = ncnn::Mat(320, 320, 3); - input.fill(0.01f); - for (int i = 0; i < warm_up + loop_num; i++) - { - double start = ncnn::get_current_time(); - ncnn::Extractor ex = detector.Net->create_extractor(); - ex.input("image", input); // picodet - for (const auto& head_info : detector.heads_info) - { - ncnn::Mat dis_pred; - ncnn::Mat cls_pred; - ex.extract(head_info.dis_layer.c_str(), dis_pred); - ex.extract(head_info.cls_layer.c_str(), cls_pred); - } - double end = ncnn::get_current_time(); - - double time = end - start; - if (i >= warm_up) - { - time_min = (std::min)(time_min, time); - time_max = (std::max)(time_max, time); - time_avg += time; - } +int benchmark(PicoDet &detector, int width, int height, + int has_postprocess = 0) { + int loop_num = 100; + int warm_up = 8; + + double time_min = DBL_MAX; + double time_max = -DBL_MAX; + double time_avg = 0; + cv::Mat image(width, height, CV_8UC3, cv::Scalar(1, 1, 1)); + bool is_postprocess = has_postprocess > 0 ? true : false; + for (int i = 0; i < warm_up + loop_num; i++) { + double start = ncnn::get_current_time(); + std::vector results; + detector.detect(image, results, is_postprocess); + double end = ncnn::get_current_time(); + + double time = end - start; + if (i >= warm_up) { + time_min = (std::min)(time_min, time); + time_max = (std::max)(time_max, time); + time_avg += time; } - time_avg /= loop_num; - fprintf(stderr, "%20s min = %7.2f max = %7.2f avg = %7.2f\n", "picodet", time_min, time_max, time_avg); - return 0; + } + time_avg /= loop_num; + fprintf(stderr, "%20s min = %7.2f max = %7.2f avg = %7.2f\n", "picodet", + time_min, time_max, time_avg); + return 0; } - -int main(int argc, char** argv) -{ - if (argc != 3) - { - fprintf(stderr, "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; \n For benchmark, mode=3 path=0.\n", argv[0]); - return -1; - } - PicoDet detector = PicoDet("../weight/picodet_m_416.param", "../weight/picodet_m_416.bin", true); - int mode = atoi(argv[1]); - switch (mode) - { - case 0:{ - int cam_id = atoi(argv[2]); - webcam_demo(detector, cam_id); - break; - } - case 1:{ - const char* images = argv[2]; - image_demo(detector, images); - break; - } - case 2:{ - const char* path = argv[2]; - video_demo(detector, path); - break; - } - case 3:{ - benchmark(detector); - break; - } - default:{ - fprintf(stderr, "usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; \n For benchmark, mode=3 path=0.\n", argv[0]); - break; - } +int main(int argc, char **argv) { + int mode = atoi(argv[1]); + char *bin_model_path = argv[2]; + char *param_model_path = argv[3]; + int height = 320; + int width = 320; + if (argc == 5) { + height = atoi(argv[4]); + width = atoi(argv[5]); + } + PicoDet detector = + PicoDet(param_model_path, bin_model_path, width, height, true, 0.45, 0.3); + if (mode == 1) { + + benchmark(detector, width, height, atoi(argv[6])); + } else { + if (argc != 6) { + std::cout << "Must set image file, such as ./picodet_demo 0 " + "../picodet_s_320_lcnet.bin ../picodet_s_320_lcnet.param " + "320 320 img.jpg" + << std::endl; } + const char *images = argv[6]; + image_demo(detector, images, atoi(argv[7])); + } } diff --git a/deploy/third_engine/demo_ncnn/picodet.cpp b/deploy/third_engine/demo_ncnn/picodet.cpp index c4dec46b2b927ad761b43eebef6bb2500dd31bc4..d5f0ba3c788b0813f85dc61e35ac543661212d1c 100644 --- a/deploy/third_engine/demo_ncnn/picodet.cpp +++ b/deploy/third_engine/demo_ncnn/picodet.cpp @@ -48,7 +48,9 @@ int activation_function_softmax(const _Tp *src, _Tp *dst, int length) { bool PicoDet::hasGPU = false; PicoDet *PicoDet::detector = nullptr; -PicoDet::PicoDet(const char *param, const char *bin, bool useGPU) { +PicoDet::PicoDet(const char *param, const char *bin, int input_width, + int input_hight, bool useGPU, float score_threshold_ = 0.5, + float nms_threshold_ = 0.3) { this->Net = new ncnn::Net(); #if NCNN_VULKAN this->hasGPU = ncnn::get_gpu_count() > 0; @@ -57,21 +59,28 @@ PicoDet::PicoDet(const char *param, const char *bin, bool useGPU) { this->Net->opt.use_fp16_arithmetic = true; this->Net->load_param(param); this->Net->load_model(bin); + this->in_w = input_width; + this->in_h = input_hight; + this->score_threshold = score_threshold_; + this->nms_threshold = nms_threshold_; } PicoDet::~PicoDet() { delete this->Net; } void PicoDet::preprocess(cv::Mat &image, ncnn::Mat &in) { + // cv::resize(image, image, cv::Size(this->in_w, this->in_h), 0.f, 0.f); int img_w = image.cols; int img_h = image.rows; - in = ncnn::Mat::from_pixels(image.data, ncnn::Mat::PIXEL_BGR, img_w, img_h); + in = ncnn::Mat::from_pixels_resize(image.data, ncnn::Mat::PIXEL_BGR, img_w, + img_h, this->in_w, this->in_h); const float mean_vals[3] = {103.53f, 116.28f, 123.675f}; const float norm_vals[3] = {0.017429f, 0.017507f, 0.017125f}; in.substract_mean_normalize(mean_vals, norm_vals); } -std::vector PicoDet::detect(cv::Mat image, float score_threshold, - float nms_threshold) { +int PicoDet::detect(cv::Mat image, std::vector &result_list, + bool has_postprocess) { + ncnn::Mat input; preprocess(image, input); auto ex = this->Net->create_extractor(); @@ -82,34 +91,76 @@ std::vector PicoDet::detect(cv::Mat image, float score_threshold, #endif ex.input("image", input); // picodet + this->image_h = image.rows; + this->image_w = image.cols; + std::vector> results; results.resize(this->num_class); - for (const auto &head_info : this->heads_info) { + if (has_postprocess) { ncnn::Mat dis_pred; ncnn::Mat cls_pred; - ex.extract(head_info.dis_layer.c_str(), dis_pred); - ex.extract(head_info.cls_layer.c_str(), cls_pred); - this->decode_infer(cls_pred, dis_pred, head_info.stride, score_threshold, - results); + ex.extract(this->nms_heads_info[0].c_str(), dis_pred); + ex.extract(this->nms_heads_info[1].c_str(), cls_pred); + std::cout << dis_pred.h << " " << dis_pred.w << std::endl; + std::cout << cls_pred.h << " " << cls_pred.w << std::endl; + this->nms_boxes(cls_pred, dis_pred, this->score_threshold, results); + } else { + for (const auto &head_info : this->non_postprocess_heads_info) { + ncnn::Mat dis_pred; + ncnn::Mat cls_pred; + ex.extract(head_info.dis_layer.c_str(), dis_pred); + ex.extract(head_info.cls_layer.c_str(), cls_pred); + this->decode_infer(cls_pred, dis_pred, head_info.stride, + this->score_threshold, results); + } } - std::vector dets; for (int i = 0; i < (int)results.size(); i++) { - this->nms(results[i], nms_threshold); + this->nms(results[i], this->nms_threshold); for (auto box : results[i]) { - dets.push_back(box); + box.x1 = box.x1 / this->in_w * this->image_w; + box.x2 = box.x2 / this->in_w * this->image_w; + box.y1 = box.y1 / this->in_h * this->image_h; + box.y2 = box.y2 / this->in_h * this->image_h; + result_list.push_back(box); + } + } + return 0; +} + +void PicoDet::nms_boxes(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, + float score_threshold, + std::vector> &result_list) { + BoxInfo bbox; + int i, j; + for (i = 0; i < dis_pred.h; i++) { + bbox.x1 = dis_pred.row(i)[0]; + bbox.y1 = dis_pred.row(i)[1]; + bbox.x2 = dis_pred.row(i)[2]; + bbox.y2 = dis_pred.row(i)[3]; + const float *scores = cls_pred.row(i); + float score = 0; + int cur_label = 0; + for (int label = 0; label < this->num_class; label++) { + float score_ = cls_pred.row(label)[i]; + if (score_ > score) { + score = score_; + cur_label = label; + } } + bbox.score = score; + bbox.label = cur_label; + result_list[cur_label].push_back(bbox); } - return dets; } void PicoDet::decode_infer(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, int stride, float threshold, std::vector> &results) { - int feature_h = ceil((float)this->input_size[1] / stride); - int feature_w = ceil((float)this->input_size[0] / stride); + int feature_h = ceil((float)this->in_w / stride); + int feature_w = ceil((float)this->in_h / stride); for (int idx = 0; idx < feature_h * feature_w; idx++) { const float *scores = cls_pred.row(idx); @@ -151,8 +202,8 @@ BoxInfo PicoDet::disPred2Bbox(const float *&dfl_det, int label, float score, } float xmin = (std::max)(ct_x - dis_pred[0], .0f); float ymin = (std::max)(ct_y - dis_pred[1], .0f); - float xmax = (std::min)(ct_x + dis_pred[2], (float)this->input_size[0]); - float ymax = (std::min)(ct_y + dis_pred[3], (float)this->input_size[1]); + float xmax = (std::min)(ct_x + dis_pred[2], (float)this->in_w); + float ymax = (std::min)(ct_y + dis_pred[3], (float)this->in_w); return BoxInfo{xmin, ymin, xmax, ymax, score, label}; } diff --git a/deploy/third_engine/demo_ncnn/picodet.h b/deploy/third_engine/demo_ncnn/picodet.h index dfb0967c99779567c5921efda00df75ddc8079c5..dd8c8f5af96aed9393e207b6e920259d95befbe7 100644 --- a/deploy/third_engine/demo_ncnn/picodet.h +++ b/deploy/third_engine/demo_ncnn/picodet.h @@ -16,66 +16,72 @@ #ifndef PICODET_H #define PICODET_H -#include #include +#include -typedef struct HeadInfo -{ - std::string cls_layer; - std::string dis_layer; - int stride; -}; +typedef struct NonPostProcessHeadInfo { + std::string cls_layer; + std::string dis_layer; + int stride; +} NonPostProcessHeadInfo; -typedef struct BoxInfo -{ - float x1; - float y1; - float x2; - float y2; - float score; - int label; +typedef struct BoxInfo { + float x1; + float y1; + float x2; + float y2; + float score; + int label; } BoxInfo; -class PicoDet -{ +class PicoDet { public: - PicoDet(const char* param, const char* bin, bool useGPU); - - ~PicoDet(); + PicoDet(const char *param, const char *bin, int input_width, int input_hight, + bool useGPU, float score_threshold_, float nms_threshold_); - static PicoDet* detector; - ncnn::Net* Net; - static bool hasGPU; + ~PicoDet(); - std::vector heads_info{ - // cls_pred|dis_pred|stride - {"save_infer_model/scale_0.tmp_1", "save_infer_model/scale_4.tmp_1", 8}, - {"save_infer_model/scale_1.tmp_1", "save_infer_model/scale_5.tmp_1", 16}, - {"save_infer_model/scale_2.tmp_1", "save_infer_model/scale_6.tmp_1", 32}, - {"save_infer_model/scale_3.tmp_1", "save_infer_model/scale_7.tmp_1", 64}, - }; + static PicoDet *detector; + ncnn::Net *Net; + static bool hasGPU; - std::vector detect(cv::Mat image, float score_threshold, float nms_threshold); + int detect(cv::Mat image, std::vector &result_list, + bool has_postprocess); - std::vector labels{ "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", - "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", - "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", - "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", - "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", - "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", - "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", - "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", - "hair drier", "toothbrush" }; private: - void preprocess(cv::Mat& image, ncnn::Mat& in); - void decode_infer(ncnn::Mat& cls_pred, ncnn::Mat& dis_pred, int stride, float threshold, std::vector>& results); - BoxInfo disPred2Bbox(const float*& dfl_det, int label, float score, int x, int y, int stride); - static void nms(std::vector& result, float nms_threshold); - int input_size[2] = {320, 320}; - int num_class = 80; - int reg_max = 7; + void preprocess(cv::Mat &image, ncnn::Mat &in); + void decode_infer(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, int stride, + float threshold, + std::vector> &results); + BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x, + int y, int stride); + static void nms(std::vector &result, float nms_threshold); + void nms_boxes(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, + float score_threshold, + std::vector> &result_list); -}; + int image_w; + int image_h; + int in_w = 320; + int in_h = 320; + int num_class = 80; + int reg_max = 7; + + float score_threshold; + float nms_threshold; + std::vector bbox_output_data_; + std::vector class_output_data_; + + std::vector nms_heads_info{"tmp_16", "concat_4.tmp_0"}; + // If not export post-process, will use non_postprocess_heads_info + std::vector non_postprocess_heads_info{ + // cls_pred|dis_pred|stride + {"transpose_0.tmp_0", "transpose_1.tmp_0", 8}, + {"transpose_2.tmp_0", "transpose_3.tmp_0", 16}, + {"transpose_4.tmp_0", "transpose_5.tmp_0", 32}, + {"transpose_6.tmp_0", "transpose_7.tmp_0", 64}, + }; +}; #endif diff --git a/deploy/third_engine/demo_ncnn/python/demo_ncnn.py b/deploy/third_engine/demo_ncnn/python/demo_ncnn.py deleted file mode 100644 index 492eb1e0dff46b3056219d6c33db505f668afb01..0000000000000000000000000000000000000000 --- a/deploy/third_engine/demo_ncnn/python/demo_ncnn.py +++ /dev/null @@ -1,808 +0,0 @@ -# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. - -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at - -# http://www.apache.org/licenses/LICENSE-2.0 - -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# reference from https://github.com/RangiLyu/nanodet/tree/main/demo_ncnn - -# -*- coding: utf-8 -*- -import argparse -from abc import ABCMeta, abstractmethod -from pathlib import Path - -import cv2 -import matplotlib.pyplot as plt -import numpy as np -from scipy.special import softmax -from tqdm import tqdm - -_COLORS = (np.array([ - 0.000, - 0.447, - 0.741, - 0.850, - 0.325, - 0.098, - 0.929, - 0.694, - 0.125, - 0.494, - 0.184, - 0.556, - 0.466, - 0.674, - 0.188, - 0.301, - 0.745, - 0.933, - 0.635, - 0.078, - 0.184, - 0.300, - 0.300, - 0.300, - 0.600, - 0.600, - 0.600, - 1.000, - 0.000, - 0.000, - 1.000, - 0.500, - 0.000, - 0.749, - 0.749, - 0.000, - 0.000, - 1.000, - 0.000, - 0.000, - 0.000, - 1.000, - 0.667, - 0.000, - 1.000, - 0.333, - 0.333, - 0.000, - 0.333, - 0.667, - 0.000, - 0.333, - 1.000, - 0.000, - 0.667, - 0.333, - 0.000, - 0.667, - 0.667, - 0.000, - 0.667, - 1.000, - 0.000, - 1.000, - 0.333, - 0.000, - 1.000, - 0.667, - 0.000, - 1.000, - 1.000, - 0.000, - 0.000, - 0.333, - 0.500, - 0.000, - 0.667, - 0.500, - 0.000, - 1.000, - 0.500, - 0.333, - 0.000, - 0.500, - 0.333, - 0.333, - 0.500, - 0.333, - 0.667, - 0.500, - 0.333, - 1.000, - 0.500, - 0.667, - 0.000, - 0.500, - 0.667, - 0.333, - 0.500, - 0.667, - 0.667, - 0.500, - 0.667, - 1.000, - 0.500, - 1.000, - 0.000, - 0.500, - 1.000, - 0.333, - 0.500, - 1.000, - 0.667, - 0.500, - 1.000, - 1.000, - 0.500, - 0.000, - 0.333, - 1.000, - 0.000, - 0.667, - 1.000, - 0.000, - 1.000, - 1.000, - 0.333, - 0.000, - 1.000, - 0.333, - 0.333, - 1.000, - 0.333, - 0.667, - 1.000, - 0.333, - 1.000, - 1.000, - 0.667, - 0.000, - 1.000, - 0.667, - 0.333, - 1.000, - 0.667, - 0.667, - 1.000, - 0.667, - 1.000, - 1.000, - 1.000, - 0.000, - 1.000, - 1.000, - 0.333, - 1.000, - 1.000, - 0.667, - 1.000, - 0.333, - 0.000, - 0.000, - 0.500, - 0.000, - 0.000, - 0.667, - 0.000, - 0.000, - 0.833, - 0.000, - 0.000, - 1.000, - 0.000, - 0.000, - 0.000, - 0.167, - 0.000, - 0.000, - 0.333, - 0.000, - 0.000, - 0.500, - 0.000, - 0.000, - 0.667, - 0.000, - 0.000, - 0.833, - 0.000, - 0.000, - 1.000, - 0.000, - 0.000, - 0.000, - 0.167, - 0.000, - 0.000, - 0.333, - 0.000, - 0.000, - 0.500, - 0.000, - 0.000, - 0.667, - 0.000, - 0.000, - 0.833, - 0.000, - 0.000, - 1.000, - 0.000, - 0.000, - 0.000, - 0.143, - 0.143, - 0.143, - 0.286, - 0.286, - 0.286, - 0.429, - 0.429, - 0.429, - 0.571, - 0.571, - 0.571, - 0.714, - 0.714, - 0.714, - 0.857, - 0.857, - 0.857, - 0.000, - 0.447, - 0.741, - 0.314, - 0.717, - 0.741, - 0.50, - 0.5, - 0, -]).astype(np.float32).reshape(-1, 3)) - - -def get_resize_matrix(raw_shape, dst_shape, keep_ratio): - """ - Get resize matrix for resizing raw img to input size - :param raw_shape: (width, height) of raw image - :param dst_shape: (width, height) of input image - :param keep_ratio: whether keep original ratio - :return: 3x3 Matrix - """ - r_w, r_h = raw_shape - d_w, d_h = dst_shape - Rs = np.eye(3) - if keep_ratio: - C = np.eye(3) - C[0, 2] = -r_w / 2 - C[1, 2] = -r_h / 2 - - if r_w / r_h < d_w / d_h: - ratio = d_h / r_h - else: - ratio = d_w / r_w - Rs[0, 0] *= ratio - Rs[1, 1] *= ratio - - T = np.eye(3) - T[0, 2] = 0.5 * d_w - T[1, 2] = 0.5 * d_h - return T @Rs @C - else: - Rs[0, 0] *= d_w / r_w - Rs[1, 1] *= d_h / r_h - return Rs - - -def warp_boxes(boxes, M, width, height): - """Apply transform to boxes - Copy from picodet/data/transform/warp.py - """ - n = len(boxes) - if n: - # warp points - xy = np.ones((n * 4, 3)) - xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( - n * 4, 2) # x1y1, x2y2, x1y2, x2y1 - xy = xy @M.T # transform - xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8) # rescale - # create new boxes - x = xy[:, [0, 2, 4, 6]] - y = xy[:, [1, 3, 5, 7]] - xy = np.concatenate( - (x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T - # clip boxes - xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width) - xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height) - return xy.astype(np.float32) - else: - return boxes - - -def overlay_bbox_cv(img, all_box, class_names): - """Draw result boxes - Copy from picodet/util/visualization.py - """ - all_box.sort(key=lambda v: v[5]) - for box in all_box: - label, x0, y0, x1, y1, score = box - color = (_COLORS[label] * 255).astype(np.uint8).tolist() - text = "{}:{:.1f}%".format(class_names[label], score * 100) - txt_color = (0, 0, 0) if np.mean(_COLORS[label]) > 0.5 else (255, 255, - 255) - font = cv2.FONT_HERSHEY_SIMPLEX - txt_size = cv2.getTextSize(text, font, 0.5, 2)[0] - cv2.rectangle(img, (x0, y0), (x1, y1), color, 2) - - cv2.rectangle( - img, - (x0, y0 - txt_size[1] - 1), - (x0 + txt_size[0] + txt_size[1], y0 - 1), - color, - -1, ) - cv2.putText(img, text, (x0, y0 - 1), font, 0.5, txt_color, thickness=1) - return img - - -def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200): - """ - - Args: - box_scores (N, 5): boxes in corner-form and probabilities. - iou_threshold: intersection over union threshold. - top_k: keep top_k results. If k <= 0, keep all the results. - candidate_size: only consider the candidates with the highest scores. - Returns: - picked: a list of indexes of the kept boxes - """ - scores = box_scores[:, -1] - boxes = box_scores[:, :-1] - picked = [] - indexes = np.argsort(scores) - indexes = indexes[-candidate_size:] - while len(indexes) > 0: - current = indexes[-1] - picked.append(current) - if 0 < top_k == len(picked) or len(indexes) == 1: - break - current_box = boxes[current, :] - indexes = indexes[:-1] - rest_boxes = boxes[indexes, :] - iou = iou_of( - rest_boxes, - np.expand_dims( - current_box, axis=0), ) - indexes = indexes[iou <= iou_threshold] - - return box_scores[picked, :] - - -def iou_of(boxes0, boxes1, eps=1e-5): - """Return intersection-over-union (Jaccard index) of boxes. - - Args: - boxes0 (N, 4): ground truth boxes. - boxes1 (N or 1, 4): predicted boxes. - eps: a small number to avoid 0 as denominator. - Returns: - iou (N): IoU values. - """ - overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2]) - overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:]) - - overlap_area = area_of(overlap_left_top, overlap_right_bottom) - area0 = area_of(boxes0[..., :2], boxes0[..., 2:]) - area1 = area_of(boxes1[..., :2], boxes1[..., 2:]) - return overlap_area / (area0 + area1 - overlap_area + eps) - - -def area_of(left_top, right_bottom): - """Compute the areas of rectangles given two corners. - - Args: - left_top (N, 2): left top corner. - right_bottom (N, 2): right bottom corner. - - Returns: - area (N): return the area. - """ - hw = np.clip(right_bottom - left_top, 0.0, None) - return hw[..., 0] * hw[..., 1] - - -class picodetABC(metaclass=ABCMeta): - def __init__( - self, - input_shape=[320, 320], - reg_max=7, - strides=[8, 16, 32], - prob_threshold=0.4, - iou_threshold=0.3, - num_candidate=1000, - top_k=-1, ): - self.strides = strides - self.input_shape = input_shape - self.reg_max = reg_max - self.prob_threshold = prob_threshold - self.iou_threshold = iou_threshold - self.num_candidate = num_candidate - self.top_k = top_k - self.img_mean = [103.53, 116.28, 123.675] - self.img_std = [57.375, 57.12, 58.395] - self.input_size = (self.input_shape[1], self.input_shape[0]) - self.class_names = [ - "person", - "bicycle", - "car", - "motorcycle", - "airplane", - "bus", - "train", - "truck", - "boat", - "traffic_light", - "fire_hydrant", - "stop_sign", - "parking_meter", - "bench", - "bird", - "cat", - "dog", - "horse", - "sheep", - "cow", - "elephant", - "bear", - "zebra", - "giraffe", - "backpack", - "umbrella", - "handbag", - "tie", - "suitcase", - "frisbee", - "skis", - "snowboard", - "sports_ball", - "kite", - "baseball_bat", - "baseball_glove", - "skateboard", - "surfboard", - "tennis_racket", - "bottle", - "wine_glass", - "cup", - "fork", - "knife", - "spoon", - "bowl", - "banana", - "apple", - "sandwich", - "orange", - "broccoli", - "carrot", - "hot_dog", - "pizza", - "donut", - "cake", - "chair", - "couch", - "potted_plant", - "bed", - "dining_table", - "toilet", - "tv", - "laptop", - "mouse", - "remote", - "keyboard", - "cell_phone", - "microwave", - "oven", - "toaster", - "sink", - "refrigerator", - "book", - "clock", - "vase", - "scissors", - "teddy_bear", - "hair_drier", - "toothbrush", - ] - - def preprocess(self, img): - # resize image - ResizeM = get_resize_matrix((img.shape[1], img.shape[0]), - self.input_size, True) - img_resize = cv2.warpPerspective(img, ResizeM, dsize=self.input_size) - # normalize image - img_input = img_resize.astype(np.float32) / 255 - img_mean = np.array( - self.img_mean, dtype=np.float32).reshape(1, 1, 3) / 255 - img_std = np.array( - self.img_std, dtype=np.float32).reshape(1, 1, 3) / 255 - img_input = (img_input - img_mean) / img_std - # expand dims - img_input = np.transpose(img_input, [2, 0, 1]) - img_input = np.expand_dims(img_input, axis=0) - return img_input, ResizeM - - def postprocess(self, scores, raw_boxes, ResizeM, raw_shape): - # generate centers - decode_boxes = [] - select_scores = [] - for stride, box_distribute, score in zip(self.strides, raw_boxes, - scores): - # centers - fm_h = self.input_shape[0] / stride - fm_w = self.input_shape[1] / stride - h_range = np.arange(fm_h) - w_range = np.arange(fm_w) - ww, hh = np.meshgrid(w_range, h_range) - ct_row = (hh.flatten() + 0.5) * stride - ct_col = (ww.flatten() + 0.5) * stride - center = np.stack((ct_col, ct_row, ct_col, ct_row), axis=1) - - # box distribution to distance - reg_range = np.arange(self.reg_max + 1) - box_distance = box_distribute.reshape((-1, self.reg_max + 1)) - box_distance = softmax(box_distance, axis=1) - box_distance = box_distance * np.expand_dims(reg_range, axis=0) - box_distance = np.sum(box_distance, axis=1).reshape((-1, 4)) - box_distance = box_distance * stride - - # top K candidate - topk_idx = np.argsort(score.max(axis=1))[::-1] - topk_idx = topk_idx[:self.num_candidate] - center = center[topk_idx] - score = score[topk_idx] - box_distance = box_distance[topk_idx] - - # decode box - decode_box = center + [-1, -1, 1, 1] * box_distance - - select_scores.append(score) - decode_boxes.append(decode_box) - - # nms - bboxes = np.concatenate(decode_boxes, axis=0) - confidences = np.concatenate(select_scores, axis=0) - picked_box_probs = [] - picked_labels = [] - for class_index in range(0, confidences.shape[1]): - probs = confidences[:, class_index] - mask = probs > self.prob_threshold - probs = probs[mask] - if probs.shape[0] == 0: - continue - subset_boxes = bboxes[mask, :] - box_probs = np.concatenate( - [subset_boxes, probs.reshape(-1, 1)], axis=1) - box_probs = hard_nms( - box_probs, - iou_threshold=self.iou_threshold, - top_k=self.top_k, ) - picked_box_probs.append(box_probs) - picked_labels.extend([class_index] * box_probs.shape[0]) - if not picked_box_probs: - return np.array([]), np.array([]), np.array([]) - picked_box_probs = np.concatenate(picked_box_probs) - - # resize output boxes - picked_box_probs[:, :4] = warp_boxes(picked_box_probs[:, :4], - np.linalg.inv(ResizeM), - raw_shape[1], raw_shape[0]) - return ( - picked_box_probs[:, :4].astype(np.int32), - np.array(picked_labels), - picked_box_probs[:, 4], ) - - @abstractmethod - def infer_image(self, img_input): - pass - - def detect(self, img): - raw_shape = img.shape - img_input, ResizeM = self.preprocess(img) - scores, raw_boxes = self.infer_image(img_input) - if scores[0].ndim == 1: # handling num_classes=1 case - scores = [x[:, None] for x in scores] - bbox, label, score = self.postprocess(scores, raw_boxes, ResizeM, - raw_shape) - return bbox, label, score - - def draw_box(self, raw_img, bbox, label, score): - img = raw_img.copy() - all_box = [[x, ] + y + [z, ] - for x, y, z in zip(label, bbox.tolist(), score)] - img_draw = overlay_bbox_cv(img, all_box, self.class_names) - return img_draw - - def detect_folder(self, img_fold, result_path): - img_fold = Path(img_fold) - result_path = Path(result_path) - result_path.mkdir(parents=True, exist_ok=True) - - img_name_list = filter( - lambda x: str(x).endswith(".png") or str(x).endswith(".jpg"), - img_fold.iterdir(), ) - img_name_list = list(img_name_list) - print(f"find {len(img_name_list)} images") - - for img_path in tqdm(img_name_list): - img = cv2.imread(str(img_path)) - bbox, label, score = self.detect(img) - img_draw = self.draw_box(img, bbox, label, score) - save_path = str(result_path / img_path.name.replace(".png", ".jpg")) - cv2.imwrite(save_path, img_draw) - - -class picodetONNX(picodetABC): - def __init__(self, model_path, *args, **kwargs): - import onnxruntime as ort - - super(picodetONNX, self).__init__(*args, **kwargs) - print("Using ONNX as inference backend") - print(f"Using weight: {model_path}") - - # load model - self.model_path = model_path - self.ort_session = ort.InferenceSession(self.model_path) - self.input_name = self.ort_session.get_inputs()[0].name - - def infer_image(self, img_input): - inference_results = self.ort_session.run(None, - {self.input_name: img_input}) - scores = [np.squeeze(x) for x in inference_results[:3]] - raw_boxes = [np.squeeze(x) for x in inference_results[3:]] - return scores, raw_boxes - - -class picodetTorch(picodetABC): - def __init__(self, model_path, cfg_path, *args, **kwargs): - import torch - - from picodet.model.arch import build_model - from picodet.util import Logger, cfg, load_config, load_model_weight - - super(picodetTorch, self).__init__(*args, **kwargs) - print("Using PyTorch as inference backend") - print(f"Using weight: {model_path}") - - # load model - self.model_path = model_path - self.cfg_path = cfg_path - load_config(cfg, cfg_path) - self.logger = Logger(-1, cfg.save_dir, False) - self.model = build_model(cfg.model) - checkpoint = torch.load( - model_path, map_location=lambda storage, loc: storage) - load_model_weight(self.model, checkpoint, self.logger) - - def infer_image(self, img_input): - import torch - - self.model.train(False) - with torch.no_grad(): - inference_results = self.model(torch.from_numpy(img_input)) - scores = [ - x.permute(0, 2, 3, 1).reshape((-1, 80)).sigmoid().detach().numpy() - for x in inference_results[0] - ] - raw_boxes = [ - x.permute(0, 2, 3, 1).reshape((-1, 32)).detach().numpy() - for x in inference_results[1] - ] - return scores, raw_boxes - - -class picodetNCNN(picodetABC): - def __init__(self, model_param, model_bin, *args, **kwargs): - import ncnn - - super(picodetNCNN, self).__init__(*args, **kwargs) - print("Using ncnn as inference backend") - print(f"Using param: {model_param}, bin: {model_bin}") - - # load model - self.model_param = model_param - self.model_bin = model_bin - - self.net = ncnn.Net() - self.net.load_param(model_param) - self.net.load_model(model_bin) - self.input_name = "input.1" - - def infer_image(self, img_input): - import ncnn - - mat_in = ncnn.Mat(img_input.squeeze()) - ex = self.net.create_extractor() - ex.input(self.input_name, mat_in) - - score_out_name = [ - "save_infer_model/scale_0.tmp_1", "save_infer_model/scale_1.tmp_1", - "save_infer_model/scale_2.tmp_1", "save_infer_model/scale_3.tmp_1" - ] - scores = [np.array(ex.extract(x)[1]) for x in score_out_name] - scores = [np.reshape(x, (-1, 80)) for x in scores] - - boxes_out_name = [ - "save_infer_model/scale_4.tmp_1", "save_infer_model/scale_5.tmp_1", - "save_infer_model/scale_6.tmp_1", "save_infer_model/scale_7.tmp_1" - ] - raw_boxes = [np.array(ex.extract(x)[1]) for x in boxes_out_name] - raw_boxes = [np.reshape(x, (-1, 32)) for x in raw_boxes] - - return scores, raw_boxes - - -def main(): - parser = argparse.ArgumentParser() - parser.add_argument( - "--model_path", - dest="model_path", - type=str, - default="../model/picodet.param") - parser.add_argument( - "--model_bin", - dest="model_bin", - type=str, - default="../model/picodet.bin") - parser.add_argument( - "--cfg_path", dest="cfg_path", type=str, default="config/picodet.yml") - parser.add_argument( - "--img_fold", dest="img_fold", type=str, default="../imgs") - parser.add_argument( - "--result_fold", dest="result_fold", type=str, default="../results") - parser.add_argument( - "--input_shape", - dest="input_shape", - nargs=2, - type=int, - default=[320, 320]) - parser.add_argument( - "--backend", choices=["ncnn", "ONNX", "torch"], default="ncnn") - args = parser.parse_args() - - print(f"Detecting {args.img_fold}") - - # load detector - if args.backend == "ncnn": - detector = picodetNCNN( - args.model_path, args.model_bin, input_shape=args.input_shape) - elif args.backend == "ONNX": - detector = picodetONNX(args.model_path, input_shape=args.input_shape) - elif args.backend == "torch": - detector = picodetTorch( - args.model_path, args.cfg_path, input_shape=args.input_shape) - else: - raise ValueError - - # detect folder - detector.detect_folder(args.img_fold, args.result_fold) - - -def test_one(): - detector = picodetNCNN("../weight/picodet_m_416.param", - "../weight/picodet_m_416.bin") - img = cv2.imread("../000000000102.jpg") - bbox, label, score = detector.detect(img) - img_draw = detector.draw_box(img, bbox, label, score) - img_out = img_draw[..., ::-1] - cv2.imwrite('python_version.jpg', img_out) - - -if __name__ == "__main__": - # main() - test_one() diff --git a/deploy/third_engine/demo_onnxruntime/README.md b/deploy/third_engine/demo_onnxruntime/README.md new file mode 100644 index 0000000000000000000000000000000000000000..bdf7a9432f3e35499c616524b031a27cb2e99fc4 --- /dev/null +++ b/deploy/third_engine/demo_onnxruntime/README.md @@ -0,0 +1,43 @@ +# PicoDet ONNX Runtime Demo + +本文件夹提供利用[ONNX Runtime](https://onnxruntime.ai/docs/)进行 PicoDet 部署与Inference images 的 Demo。 + +## 安装 ONNX Runtime + +本demo采用的是 ONNX Runtime 1.10.0,可直接运行如下指令安装: +```shell +pip install onnxruntime +``` + +详细安装步骤,可参考 [Install ONNX Runtime](https://onnxruntime.ai/docs/install/)。 + +## Inference images + +- 准备测试模型:根据[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet)中【导出及转换模型】步骤,采用包含后处理的方式导出模型(`-o export.benchmark=False` ),并生成待测试模型简化后的onnx模型(可在下文链接中直接下载)。同时在本目录下新建```onnx_file```文件夹,将导出的onnx模型放在该目录下。 + +- 准备测试所用图片:将待测试图片放在```./imgs```文件夹下,本demo已提供了两张测试图片。 + +- 在本目录下直接运行: + ```shell + python infer_demo.py --modelpath ./onnx_file/picodet_s_320_lcnet_postprocessed.onnx + ``` + 将会对```./imgs```文件夹下所有图片进行识别,并将识别结果保存在```./results```文件夹下。 + +- 结果: +
    + +
    + +## 模型下载 + +| 模型 | 输入尺寸 | ONNX( w/ 后处理) | +| :-------- | :--------: | :---------------------: | +| PicoDet-XS | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_lcnet_postprocessed.onnx) | +| PicoDet-XS | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_lcnet_postprocessed.onnx) | +| PicoDet-S | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet_postprocessed.onnx) | +| PicoDet-S | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_lcnet_postprocessed.onnx) | +| PicoDet-M | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_lcnet_postprocessed.onnx) | +| PicoDet-M | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_lcnet_postprocessed.onnx) | +| PicoDet-L | 320*320 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_lcnet_postprocessed.onnx) | +| PicoDet-L | 416*416 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_lcnet_postprocessed.onnx) | +| PicoDet-L | 640*640 | [model](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_lcnet_postprocessed.onnx) | diff --git a/static/deploy/lite/coco_label_list.txt b/deploy/third_engine/demo_onnxruntime/coco_label.txt similarity index 88% rename from static/deploy/lite/coco_label_list.txt rename to deploy/third_engine/demo_onnxruntime/coco_label.txt index 1f42c8eb44628f95b2f4067de928a7f5c1e9c8dc..ca76c80b5b2cd0b25047f75736656cfebc9da7aa 100644 --- a/static/deploy/lite/coco_label_list.txt +++ b/deploy/third_engine/demo_onnxruntime/coco_label.txt @@ -1,8 +1,8 @@ person bicycle car -motorcycle -airplane +motorbike +aeroplane bus train truck @@ -55,12 +55,12 @@ pizza donut cake chair -couch -potted plant +sofa +pottedplant bed -dining table +diningtable toilet -tv +tvmonitor laptop mouse remote @@ -77,4 +77,4 @@ vase scissors teddy bear hair drier -toothbrush \ No newline at end of file +toothbrush diff --git a/deploy/third_engine/demo_onnxruntime/imgs/bus.jpg b/deploy/third_engine/demo_onnxruntime/imgs/bus.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b43e311165c785f000eb7493ff8fb662d06a3f83 Binary files /dev/null and b/deploy/third_engine/demo_onnxruntime/imgs/bus.jpg differ diff --git a/deploy/third_engine/demo_onnxruntime/imgs/dog.jpg b/deploy/third_engine/demo_onnxruntime/imgs/dog.jpg new file mode 100644 index 0000000000000000000000000000000000000000..77b0381222eaed50867643f4166092c781e56d5b Binary files /dev/null and b/deploy/third_engine/demo_onnxruntime/imgs/dog.jpg differ diff --git a/deploy/third_engine/demo_onnxruntime/infer_demo.py b/deploy/third_engine/demo_onnxruntime/infer_demo.py new file mode 100644 index 0000000000000000000000000000000000000000..41f407097828fa099a43831d4200193ba91557be --- /dev/null +++ b/deploy/third_engine/demo_onnxruntime/infer_demo.py @@ -0,0 +1,208 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np +import argparse +import onnxruntime as ort +from pathlib import Path +from tqdm import tqdm + + +class PicoDet(): + def __init__(self, + model_pb_path, + label_path, + prob_threshold=0.4, + iou_threshold=0.3): + self.classes = list( + map(lambda x: x.strip(), open(label_path, 'r').readlines())) + self.num_classes = len(self.classes) + self.prob_threshold = prob_threshold + self.iou_threshold = iou_threshold + self.mean = np.array( + [103.53, 116.28, 123.675], dtype=np.float32).reshape(1, 1, 3) + self.std = np.array( + [57.375, 57.12, 58.395], dtype=np.float32).reshape(1, 1, 3) + so = ort.SessionOptions() + so.log_severity_level = 3 + self.net = ort.InferenceSession(model_pb_path, so) + inputs_name = [a.name for a in self.net.get_inputs()] + inputs_shape = { + k: v.shape + for k, v in zip(inputs_name, self.net.get_inputs()) + } + self.input_shape = inputs_shape['image'][2:] + + def _normalize(self, img): + img = img.astype(np.float32) + img = (img / 255.0 - self.mean / 255.0) / (self.std / 255.0) + return img + + def resize_image(self, srcimg, keep_ratio=False): + top, left, newh, neww = 0, 0, self.input_shape[0], self.input_shape[1] + origin_shape = srcimg.shape[:2] + im_scale_y = newh / float(origin_shape[0]) + im_scale_x = neww / float(origin_shape[1]) + img_shape = np.array([ + [float(self.input_shape[0]), float(self.input_shape[1])] + ]).astype('float32') + scale_factor = np.array([[im_scale_y, im_scale_x]]).astype('float32') + + if keep_ratio and srcimg.shape[0] != srcimg.shape[1]: + hw_scale = srcimg.shape[0] / srcimg.shape[1] + if hw_scale > 1: + newh, neww = self.input_shape[0], int(self.input_shape[1] / + hw_scale) + img = cv2.resize( + srcimg, (neww, newh), interpolation=cv2.INTER_AREA) + left = int((self.input_shape[1] - neww) * 0.5) + img = cv2.copyMakeBorder( + img, + 0, + 0, + left, + self.input_shape[1] - neww - left, + cv2.BORDER_CONSTANT, + value=0) # add border + else: + newh, neww = int(self.input_shape[0] * + hw_scale), self.input_shape[1] + img = cv2.resize( + srcimg, (neww, newh), interpolation=cv2.INTER_AREA) + top = int((self.input_shape[0] - newh) * 0.5) + img = cv2.copyMakeBorder( + img, + top, + self.input_shape[0] - newh - top, + 0, + 0, + cv2.BORDER_CONSTANT, + value=0) + else: + img = cv2.resize( + srcimg, self.input_shape, interpolation=cv2.INTER_AREA) + + return img, img_shape, scale_factor + + def get_color_map_list(self, num_classes): + color_map = num_classes * [0, 0, 0] + for i in range(0, num_classes): + j = 0 + lab = i + while lab: + color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j)) + color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)) + color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)) + j += 1 + lab >>= 3 + color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)] + return color_map + + def detect(self, srcimg): + img, im_shape, scale_factor = self.resize_image(srcimg) + img = self._normalize(img) + + blob = np.expand_dims(np.transpose(img, (2, 0, 1)), axis=0) + + inputs_dict = { + 'im_shape': im_shape, + 'image': blob, + 'scale_factor': scale_factor + } + inputs_name = [a.name for a in self.net.get_inputs()] + net_inputs = {k: inputs_dict[k] for k in inputs_name} + + outs = self.net.run(None, net_inputs) + + outs = np.array(outs[0]) + expect_boxes = (outs[:, 1] > 0.5) & (outs[:, 0] > -1) + np_boxes = outs[expect_boxes, :] + + color_list = self.get_color_map_list(self.num_classes) + clsid2color = {} + + for i in range(np_boxes.shape[0]): + classid, conf = int(np_boxes[i, 0]), np_boxes[i, 1] + xmin, ymin, xmax, ymax = int(np_boxes[i, 2]), int(np_boxes[ + i, 3]), int(np_boxes[i, 4]), int(np_boxes[i, 5]) + + if classid not in clsid2color: + clsid2color[classid] = color_list[classid] + color = tuple(clsid2color[classid]) + + cv2.rectangle( + srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2) + print(self.classes[classid] + ': ' + str(round(conf, 3))) + cv2.putText( + srcimg, + self.classes[classid] + ':' + str(round(conf, 3)), (xmin, + ymin - 10), + cv2.FONT_HERSHEY_SIMPLEX, + 0.8, (0, 255, 0), + thickness=2) + + return srcimg + + def detect_folder(self, img_fold, result_path): + img_fold = Path(img_fold) + result_path = Path(result_path) + result_path.mkdir(parents=True, exist_ok=True) + + img_name_list = filter( + lambda x: str(x).endswith(".png") or str(x).endswith(".jpg"), + img_fold.iterdir(), ) + img_name_list = list(img_name_list) + print(f"find {len(img_name_list)} images") + + for img_path in tqdm(img_name_list): + img = cv2.imread(str(img_path)) + + srcimg = net.detect(img) + save_path = str(result_path / img_path.name.replace(".png", ".jpg")) + cv2.imwrite(save_path, srcimg) + + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + parser.add_argument( + '--modelpath', + type=str, + default='onnx_file/picodet_s_320_lcnet_postprocessed.onnx', + help="onnx filepath") + parser.add_argument( + '--classfile', + type=str, + default='coco_label.txt', + help="classname filepath") + parser.add_argument( + '--confThreshold', default=0.5, type=float, help='class confidence') + parser.add_argument( + '--nmsThreshold', default=0.6, type=float, help='nms iou thresh') + parser.add_argument( + "--img_fold", dest="img_fold", type=str, default="./imgs") + parser.add_argument( + "--result_fold", dest="result_fold", type=str, default="results") + args = parser.parse_args() + + net = PicoDet( + args.modelpath, + args.classfile, + prob_threshold=args.confThreshold, + iou_threshold=args.nmsThreshold) + + net.detect_folder(args.img_fold, args.result_fold) + print( + f'infer results in ./deploy/third_engine/demo_onnxruntime/{args.result_fold}' + ) diff --git a/deploy/third_engine/demo_openvino/picodet_openvino.h b/deploy/third_engine/demo_openvino/picodet_openvino.h index 9871184dd7ab15cc6d758c4f105aab2152cba9ea..2a5bced16a3c3d57096adbdfa263b634c74377db 100644 --- a/deploy/third_engine/demo_openvino/picodet_openvino.h +++ b/deploy/third_engine/demo_openvino/picodet_openvino.h @@ -13,66 +13,63 @@ // limitations under the License. // reference from https://github.com/RangiLyu/nanodet/tree/main/demo_openvino - #ifndef _PICODET_OPENVINO_H_ #define _PICODET_OPENVINO_H_ -#include -#include #include +#include +#include #define image_size 416 - -typedef struct HeadInfo -{ - std::string cls_layer; - std::string dis_layer; - int stride; +typedef struct HeadInfo { + std::string cls_layer; + std::string dis_layer; + int stride; } HeadInfo; -typedef struct BoxInfo -{ - float x1; - float y1; - float x2; - float y2; - float score; - int label; +typedef struct BoxInfo { + float x1; + float y1; + float x2; + float y2; + float score; + int label; } BoxInfo; -class PicoDet -{ +class PicoDet { public: - PicoDet(const char* param); + PicoDet(const char *param); - ~PicoDet(); + ~PicoDet(); - InferenceEngine::ExecutableNetwork network_; - InferenceEngine::InferRequest infer_request_; - // static bool hasGPU; + InferenceEngine::ExecutableNetwork network_; + InferenceEngine::InferRequest infer_request_; + // static bool hasGPU; - std::vector heads_info_{ - // cls_pred|dis_pred|stride - {"save_infer_model/scale_0.tmp_1", "save_infer_model/scale_4.tmp_1", 8}, - {"save_infer_model/scale_1.tmp_1", "save_infer_model/scale_5.tmp_1", 16}, - {"save_infer_model/scale_2.tmp_1", "save_infer_model/scale_6.tmp_1", 32}, - {"save_infer_model/scale_3.tmp_1", "save_infer_model/scale_7.tmp_1", 64}, - }; + std::vector heads_info_{ + // cls_pred|dis_pred|stride + {"transpose_0.tmp_0", "transpose_1.tmp_0", 8}, + {"transpose_2.tmp_0", "transpose_3.tmp_0", 16}, + {"transpose_4.tmp_0", "transpose_5.tmp_0", 32}, + {"transpose_6.tmp_0", "transpose_7.tmp_0", 64}, + }; - std::vector detect(cv::Mat image, float score_threshold, float nms_threshold); + std::vector detect(cv::Mat image, float score_threshold, + float nms_threshold); private: - void preprocess(cv::Mat& image, InferenceEngine::Blob::Ptr& blob); - void decode_infer(const float*& cls_pred, const float*& dis_pred, int stride, float threshold, std::vector>& results); - BoxInfo disPred2Bbox(const float*& dfl_det, int label, float score, int x, int y, int stride); - static void nms(std::vector& result, float nms_threshold); - std::string input_name_; - int input_size_ = image_size; - int num_class_ = 80; - int reg_max_ = 7; - + void preprocess(cv::Mat &image, InferenceEngine::Blob::Ptr &blob); + void decode_infer(const float *&cls_pred, const float *&dis_pred, int stride, + float threshold, + std::vector> &results); + BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x, + int y, int stride); + static void nms(std::vector &result, float nms_threshold); + std::string input_name_; + int input_size_ = image_size; + int num_class_ = 80; + int reg_max_ = 7; }; - #endif diff --git a/deploy/third_engine/demo_openvino/python/README.md b/deploy/third_engine/demo_openvino/python/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1862417db48882b02459dd3b2a425758473f09f2 --- /dev/null +++ b/deploy/third_engine/demo_openvino/python/README.md @@ -0,0 +1,75 @@ +# PicoDet OpenVINO Benchmark Demo + +本文件夹提供利用[Intel's OpenVINO Toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html)进行PicoDet测速的Benchmark Demo与带后处理的模型Inference Demo。 + +## 安装 OpenVINO Toolkit + +前往 [OpenVINO HomePage](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html),下载对应版本并安装。 + +本demo安装的是 OpenVINO 2022.1.0,可直接运行如下指令安装: +```shell +pip install openvino==2022.1.0 +``` + +详细安装步骤,可参考[OpenVINO官网](https://docs.openvinotoolkit.org/latest/get_started_guides.html) + +## Benchmark测试 + +- 准备测试模型:根据[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet)中【导出及转换模型】步骤,采用不包含后处理的方式导出模型(`-o export.benchmark=True` ),并生成待测试模型简化后的onnx模型(可在下文链接中直接下载)。同时在本目录下新建```out_onnxsim```文件夹,将导出的onnx模型放在该目录下。 + +- 准备测试所用图片:本demo默认利用PaddleDetection/demo/[000000014439.jpg](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/demo/000000014439.jpg) + +- 在本目录下直接运行: + +```shell +# Linux +python openvino_benchmark.py --img_path ../../../../demo/000000014439.jpg --onnx_path out_onnxsim/picodet_s_320_coco_lcnet.onnx --in_shape 320 +# Windows +python openvino_benchmark.py --img_path ..\..\..\..\demo\000000014439.jpg --onnx_path out_onnxsim\picodet_s_320_coco_lcnet.onnx --in_shape 320 +``` +- 注意:```--in_shape```为对应模型输入size,默认为320 + +## 真实图片测试(网络包含后处理,但不包含NMS) + +- 准备测试模型:根据[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet)中【导出及转换模型】步骤,采用**包含后处理**但**不包含NMS**的方式导出模型(`-o export.benchmark=False export.nms=False` ),并生成待测试模型简化后的onnx模型(可在下文链接中直接下载)。同时在本目录下新建```out_onnxsim_infer```文件夹,将导出的onnx模型放在该目录下。 + +- 准备测试所用图片:默认利用../../demo_onnxruntime/imgs/[bus.jpg](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/third_engine/demo_onnxruntime/imgs/bus.jpg) + +```shell +# Linux +python openvino_infer.py --img_path ../../demo_onnxruntime/imgs/bus.jpg --onnx_path out_onnxsim_infer/picodet_s_320_postproccesed_woNMS.onnx --in_shape 320 +# Windows +python openvino_infer.py --img_path ..\..\demo_onnxruntime\imgs\bus.jpg --onnx_path out_onnxsim_infer\picodet_s_320_postproccesed_woNMS.onnx --in_shape 320 +``` + +### 真实图片测试(网络不包含后处理) + +```shell +# Linux +python openvino_benchmark.py --benchmark 0 --img_path ../../../../demo/000000014439.jpg --onnx_path out_onnxsim/picodet_s_320_coco_lcnet.onnx --in_shape 320 +# Windows +python openvino_benchmark.py --benchmark 0 --img_path ..\..\..\..\demo\000000014439.jpg --onnx_path out_onnxsim\picodet_s_320_coco_lcnet.onnx --in_shape 320 +``` + +- 结果: +
    + +
    + +## Benchmark结果 + +- 测速结果如下: + +| 模型 | 输入尺寸 | ONNX | 预测时延[CPU](#latency)| +| :-------- | :--------: | :---------------------: | :----------------: | +| PicoDet-XS | 320*320 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_320_coco_lcnet.onnx) | 3.9ms | +| PicoDet-XS | 416*416 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_xs_416_coco_lcnet.onnx) | 6.1ms | +| PicoDet-S | 320*320 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet.onnx) | 4.8ms | +| PicoDet-S | 416*416 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_416_coco_lcnet.onnx) | 6.6ms | +| PicoDet-M | 320*320 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_320_coco_lcnet.onnx) | 8.2ms | +| PicoDet-M | 416*416 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_coco_lcnet.onnx) | 12.7ms | +| PicoDet-L | 320*320 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_320_coco_lcnet.onnx) | 11.5ms | +| PicoDet-L | 416*416 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_416_coco_lcnet.onnx) | 20.7ms | +| PicoDet-L | 640*640 | [( w/ 后处理;w/o NMS)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_lcnet_postproccesed_woNMS.onnx) | [( w/o 后处理)](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_l_640_coco_lcnet.onnx) | 62.5ms | + +- 测试环境: 英特尔酷睿i7 10750H CPU。 diff --git a/static/deploy/android_demo/app/src/main/assets/labels/coco-labels-2014_2017.txt b/deploy/third_engine/demo_openvino/python/coco_label.txt similarity index 88% rename from static/deploy/android_demo/app/src/main/assets/labels/coco-labels-2014_2017.txt rename to deploy/third_engine/demo_openvino/python/coco_label.txt index 1f42c8eb44628f95b2f4067de928a7f5c1e9c8dc..ca76c80b5b2cd0b25047f75736656cfebc9da7aa 100644 --- a/static/deploy/android_demo/app/src/main/assets/labels/coco-labels-2014_2017.txt +++ b/deploy/third_engine/demo_openvino/python/coco_label.txt @@ -1,8 +1,8 @@ person bicycle car -motorcycle -airplane +motorbike +aeroplane bus train truck @@ -55,12 +55,12 @@ pizza donut cake chair -couch -potted plant +sofa +pottedplant bed -dining table +diningtable toilet -tv +tvmonitor laptop mouse remote @@ -77,4 +77,4 @@ vase scissors teddy bear hair drier -toothbrush \ No newline at end of file +toothbrush diff --git a/deploy/third_engine/demo_openvino/python/openvino_benchmark.py b/deploy/third_engine/demo_openvino/python/openvino_benchmark.py new file mode 100644 index 0000000000000000000000000000000000000000..f21a8d5d1ed83c159818d2b405d1b5c9e5daa927 --- /dev/null +++ b/deploy/third_engine/demo_openvino/python/openvino_benchmark.py @@ -0,0 +1,365 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np +import time +import argparse +from scipy.special import softmax +from openvino.runtime import Core + + +def image_preprocess(img_path, re_shape): + img = cv2.imread(img_path) + img = cv2.resize( + img, (re_shape, re_shape), interpolation=cv2.INTER_LANCZOS4) + img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + img = np.transpose(img, [2, 0, 1]) / 255 + img = np.expand_dims(img, 0) + img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) + img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) + img -= img_mean + img /= img_std + return img.astype(np.float32) + + +def draw_box(img, results, class_label, scale_x, scale_y): + + label_list = list( + map(lambda x: x.strip(), open(class_label, 'r').readlines())) + + for i in range(len(results)): + print(label_list[int(results[i][0])], ':', results[i][1]) + bbox = results[i, 2:] + label_id = int(results[i, 0]) + score = results[i, 1] + if (score > 0.20): + xmin, ymin, xmax, ymax = [ + int(bbox[0] * scale_x), int(bbox[1] * scale_y), + int(bbox[2] * scale_x), int(bbox[3] * scale_y) + ] + cv2.rectangle(img, (xmin, ymin), (xmax, ymax), (0, 255, 0), 3) + font = cv2.FONT_HERSHEY_SIMPLEX + label_text = label_list[label_id] + cv2.rectangle(img, (xmin, ymin), (xmax, ymin - 60), (0, 255, 0), -1) + cv2.putText(img, "#" + label_text, (xmin, ymin - 10), font, 1, + (255, 255, 255), 2, cv2.LINE_AA) + cv2.putText(img, + str(round(score, 3)), (xmin, ymin - 40), font, 0.8, + (255, 255, 255), 2, cv2.LINE_AA) + return img + + +def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200): + """ + Args: + box_scores (N, 5): boxes in corner-form and probabilities. + iou_threshold: intersection over union threshold. + top_k: keep top_k results. If k <= 0, keep all the results. + candidate_size: only consider the candidates with the highest scores. + Returns: + picked: a list of indexes of the kept boxes + """ + scores = box_scores[:, -1] + boxes = box_scores[:, :-1] + picked = [] + indexes = np.argsort(scores) + indexes = indexes[-candidate_size:] + while len(indexes) > 0: + current = indexes[-1] + picked.append(current) + if 0 < top_k == len(picked) or len(indexes) == 1: + break + current_box = boxes[current, :] + indexes = indexes[:-1] + rest_boxes = boxes[indexes, :] + iou = iou_of( + rest_boxes, + np.expand_dims( + current_box, axis=0), ) + indexes = indexes[iou <= iou_threshold] + + return box_scores[picked, :] + + +def iou_of(boxes0, boxes1, eps=1e-5): + """Return intersection-over-union (Jaccard index) of boxes. + Args: + boxes0 (N, 4): ground truth boxes. + boxes1 (N or 1, 4): predicted boxes. + eps: a small number to avoid 0 as denominator. + Returns: + iou (N): IoU values. + """ + overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2]) + overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:]) + + overlap_area = area_of(overlap_left_top, overlap_right_bottom) + area0 = area_of(boxes0[..., :2], boxes0[..., 2:]) + area1 = area_of(boxes1[..., :2], boxes1[..., 2:]) + return overlap_area / (area0 + area1 - overlap_area + eps) + + +def area_of(left_top, right_bottom): + """Compute the areas of rectangles given two corners. + Args: + left_top (N, 2): left top corner. + right_bottom (N, 2): right bottom corner. + Returns: + area (N): return the area. + """ + hw = np.clip(right_bottom - left_top, 0.0, None) + return hw[..., 0] * hw[..., 1] + + +class PicoDetPostProcess(object): + """ + Args: + input_shape (int): network input image size + ori_shape (int): ori image shape of before padding + scale_factor (float): scale factor of ori image + enable_mkldnn (bool): whether to open MKLDNN + """ + + def __init__(self, + input_shape, + ori_shape, + scale_factor, + strides=[8, 16, 32, 64], + score_threshold=0.4, + nms_threshold=0.5, + nms_top_k=1000, + keep_top_k=100): + self.ori_shape = ori_shape + self.input_shape = input_shape + self.scale_factor = scale_factor + self.strides = strides + self.score_threshold = score_threshold + self.nms_threshold = nms_threshold + self.nms_top_k = nms_top_k + self.keep_top_k = keep_top_k + + def warp_boxes(self, boxes, ori_shape): + """Apply transform to boxes + """ + width, height = ori_shape[1], ori_shape[0] + n = len(boxes) + if n: + # warp points + xy = np.ones((n * 4, 3)) + xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( + n * 4, 2) # x1y1, x2y2, x1y2, x2y1 + # xy = xy @ M.T # transform + xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8) # rescale + # create new boxes + x = xy[:, [0, 2, 4, 6]] + y = xy[:, [1, 3, 5, 7]] + xy = np.concatenate( + (x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T + # clip boxes + xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width) + xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height) + return xy.astype(np.float32) + else: + return boxes + + def __call__(self, scores, raw_boxes): + batch_size = raw_boxes[0].shape[0] + reg_max = int(raw_boxes[0].shape[-1] / 4 - 1) + out_boxes_num = [] + out_boxes_list = [] + for batch_id in range(batch_size): + # generate centers + decode_boxes = [] + select_scores = [] + for stride, box_distribute, score in zip(self.strides, raw_boxes, + scores): + box_distribute = box_distribute[batch_id] + score = score[batch_id] + # centers + fm_h = self.input_shape[0] / stride + fm_w = self.input_shape[1] / stride + h_range = np.arange(fm_h) + w_range = np.arange(fm_w) + ww, hh = np.meshgrid(w_range, h_range) + ct_row = (hh.flatten() + 0.5) * stride + ct_col = (ww.flatten() + 0.5) * stride + center = np.stack((ct_col, ct_row, ct_col, ct_row), axis=1) + + # box distribution to distance + reg_range = np.arange(reg_max + 1) + box_distance = box_distribute.reshape((-1, reg_max + 1)) + box_distance = softmax(box_distance, axis=1) + box_distance = box_distance * np.expand_dims(reg_range, axis=0) + box_distance = np.sum(box_distance, axis=1).reshape((-1, 4)) + box_distance = box_distance * stride + + # top K candidate + topk_idx = np.argsort(score.max(axis=1))[::-1] + topk_idx = topk_idx[:self.nms_top_k] + center = center[topk_idx] + score = score[topk_idx] + box_distance = box_distance[topk_idx] + + # decode box + decode_box = center + [-1, -1, 1, 1] * box_distance + + select_scores.append(score) + decode_boxes.append(decode_box) + + # nms + bboxes = np.concatenate(decode_boxes, axis=0) + confidences = np.concatenate(select_scores, axis=0) + picked_box_probs = [] + picked_labels = [] + for class_index in range(0, confidences.shape[1]): + probs = confidences[:, class_index] + mask = probs > self.score_threshold + probs = probs[mask] + if probs.shape[0] == 0: + continue + subset_boxes = bboxes[mask, :] + box_probs = np.concatenate( + [subset_boxes, probs.reshape(-1, 1)], axis=1) + box_probs = hard_nms( + box_probs, + iou_threshold=self.nms_threshold, + top_k=self.keep_top_k, ) + picked_box_probs.append(box_probs) + picked_labels.extend([class_index] * box_probs.shape[0]) + + if len(picked_box_probs) == 0: + out_boxes_list.append(np.empty((0, 4))) + out_boxes_num.append(0) + + else: + picked_box_probs = np.concatenate(picked_box_probs) + + # resize output boxes + picked_box_probs[:, :4] = self.warp_boxes( + picked_box_probs[:, :4], self.ori_shape[batch_id]) + im_scale = np.concatenate([ + self.scale_factor[batch_id][::-1], + self.scale_factor[batch_id][::-1] + ]) + picked_box_probs[:, :4] /= im_scale + # clas score box + out_boxes_list.append( + np.concatenate( + [ + np.expand_dims( + np.array(picked_labels), + axis=-1), np.expand_dims( + picked_box_probs[:, 4], axis=-1), + picked_box_probs[:, :4] + ], + axis=1)) + out_boxes_num.append(len(picked_labels)) + + out_boxes_list = np.concatenate(out_boxes_list, axis=0) + out_boxes_num = np.asarray(out_boxes_num).astype(np.int32) + return out_boxes_list, out_boxes_num + + +def detect(img_file, compiled_model, re_shape, class_label): + output = compiled_model.infer_new_request({0: test_image}) + result_ie = list(output.values()) #[0] + + test_im_shape = np.array([[re_shape, re_shape]]).astype('float32') + test_scale_factor = np.array([[1, 1]]).astype('float32') + + np_score_list = [] + np_boxes_list = [] + + num_outs = int(len(result_ie) / 2) + for out_idx in range(num_outs): + np_score_list.append(result_ie[out_idx]) + np_boxes_list.append(result_ie[out_idx + num_outs]) + + postprocess = PicoDetPostProcess(test_image.shape[2:], test_im_shape, + test_scale_factor) + + np_boxes, np_boxes_num = postprocess(np_score_list, np_boxes_list) + + image = cv2.imread(img_file, 1) + scale_x = image.shape[1] / test_image.shape[3] + scale_y = image.shape[0] / test_image.shape[2] + res_image = draw_box(image, np_boxes, class_label, scale_x, scale_y) + + cv2.imwrite('res.jpg', res_image) + cv2.imshow("res", res_image) + cv2.waitKey() + + +def benchmark(test_image, compiled_model): + + # benchmark + loop_num = 100 + warm_up = 8 + timeall = 0 + time_min = float("inf") + time_max = float('-inf') + + for i in range(loop_num + warm_up): + time0 = time.time() + #perform the inference step + + output = compiled_model.infer_new_request({0: test_image}) + time1 = time.time() + timed = time1 - time0 + + if i >= warm_up: + timeall = timeall + timed + time_min = min(time_min, timed) + time_max = max(time_max, timed) + + time_avg = timeall / loop_num + + print('inference_time(ms): min={}, max={}, avg={}'.format( + round(time_min * 1000, 2), + round(time_max * 1000, 1), round(time_avg * 1000, 1))) + + +if __name__ == '__main__': + + parser = argparse.ArgumentParser() + parser.add_argument( + '--benchmark', type=int, default=1, help="0:detect; 1:benchmark") + parser.add_argument( + '--img_path', + type=str, + default='../../../../demo/000000014439.jpg', + help="image path") + parser.add_argument( + '--onnx_path', + type=str, + default='out_onnxsim/picodet_s_320_processed.onnx', + help="onnx filepath") + parser.add_argument('--in_shape', type=int, default=320, help="input_size") + parser.add_argument( + '--class_label', + type=str, + default='coco_label.txt', + help="class label file") + args = parser.parse_args() + + ie = Core() + net = ie.read_model(args.onnx_path) + test_image = image_preprocess(args.img_path, args.in_shape) + compiled_model = ie.compile_model(net, 'CPU') + + if args.benchmark == 0: + detect(args.img_path, compiled_model, args.in_shape, args.class_label) + if args.benchmark == 1: + benchmark(test_image, compiled_model) diff --git a/deploy/third_engine/demo_openvino/python/openvino_infer.py b/deploy/third_engine/demo_openvino/python/openvino_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..0ad51022b1793e7b6430025a7c71cc0de7658c8c --- /dev/null +++ b/deploy/third_engine/demo_openvino/python/openvino_infer.py @@ -0,0 +1,267 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np +import argparse +from scipy.special import softmax +from openvino.runtime import Core + + +def image_preprocess(img_path, re_shape): + img = cv2.imread(img_path) + img = cv2.resize( + img, (re_shape, re_shape), interpolation=cv2.INTER_LANCZOS4) + img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + img = np.transpose(img, [2, 0, 1]) / 255 + img = np.expand_dims(img, 0) + img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) + img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) + img -= img_mean + img /= img_std + return img.astype(np.float32) + + +def get_color_map_list(num_classes): + color_map = num_classes * [0, 0, 0] + for i in range(0, num_classes): + j = 0 + lab = i + while lab: + color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j)) + color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)) + color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)) + j += 1 + lab >>= 3 + color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)] + return color_map + + +def draw_box(srcimg, results, class_label): + label_list = list( + map(lambda x: x.strip(), open(class_label, 'r').readlines())) + for i in range(len(results)): + color_list = get_color_map_list(len(label_list)) + clsid2color = {} + classid, conf = int(results[i, 0]), results[i, 1] + xmin, ymin, xmax, ymax = int(results[i, 2]), int(results[i, 3]), int( + results[i, 4]), int(results[i, 5]) + + if classid not in clsid2color: + clsid2color[classid] = color_list[classid] + color = tuple(clsid2color[classid]) + + cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2) + print(label_list[classid] + ': ' + str(round(conf, 3))) + cv2.putText( + srcimg, + label_list[classid] + ':' + str(round(conf, 3)), (xmin, ymin - 10), + cv2.FONT_HERSHEY_SIMPLEX, + 0.8, (0, 255, 0), + thickness=2) + return srcimg + + +def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200): + """ + Args: + box_scores (N, 5): boxes in corner-form and probabilities. + iou_threshold: intersection over union threshold. + top_k: keep top_k results. If k <= 0, keep all the results. + candidate_size: only consider the candidates with the highest scores. + Returns: + picked: a list of indexes of the kept boxes + """ + scores = box_scores[:, -1] + boxes = box_scores[:, :-1] + picked = [] + indexes = np.argsort(scores) + indexes = indexes[-candidate_size:] + while len(indexes) > 0: + current = indexes[-1] + picked.append(current) + if 0 < top_k == len(picked) or len(indexes) == 1: + break + current_box = boxes[current, :] + indexes = indexes[:-1] + rest_boxes = boxes[indexes, :] + iou = iou_of( + rest_boxes, + np.expand_dims( + current_box, axis=0), ) + indexes = indexes[iou <= iou_threshold] + + return box_scores[picked, :] + + +def iou_of(boxes0, boxes1, eps=1e-5): + """Return intersection-over-union (Jaccard index) of boxes. + Args: + boxes0 (N, 4): ground truth boxes. + boxes1 (N or 1, 4): predicted boxes. + eps: a small number to avoid 0 as denominator. + Returns: + iou (N): IoU values. + """ + overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2]) + overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:]) + + overlap_area = area_of(overlap_left_top, overlap_right_bottom) + area0 = area_of(boxes0[..., :2], boxes0[..., 2:]) + area1 = area_of(boxes1[..., :2], boxes1[..., 2:]) + return overlap_area / (area0 + area1 - overlap_area + eps) + + +def area_of(left_top, right_bottom): + """Compute the areas of rectangles given two corners. + Args: + left_top (N, 2): left top corner. + right_bottom (N, 2): right bottom corner. + Returns: + area (N): return the area. + """ + hw = np.clip(right_bottom - left_top, 0.0, None) + return hw[..., 0] * hw[..., 1] + + +class PicoDetNMS(object): + """ + Args: + input_shape (int): network input image size + scale_factor (float): scale factor of ori image + """ + + def __init__(self, + input_shape, + scale_x, + scale_y, + strides=[8, 16, 32, 64], + score_threshold=0.4, + nms_threshold=0.5, + nms_top_k=1000, + keep_top_k=100): + self.input_shape = input_shape + self.scale_x = scale_x + self.scale_y = scale_y + self.strides = strides + self.score_threshold = score_threshold + self.nms_threshold = nms_threshold + self.nms_top_k = nms_top_k + self.keep_top_k = keep_top_k + + def __call__(self, decode_boxes, select_scores): + batch_size = 1 + out_boxes_list = [] + for batch_id in range(batch_size): + # nms + bboxes = np.concatenate(decode_boxes, axis=0) + confidences = np.concatenate(select_scores, axis=0) + picked_box_probs = [] + picked_labels = [] + for class_index in range(0, confidences.shape[1]): + probs = confidences[:, class_index] + mask = probs > self.score_threshold + probs = probs[mask] + if probs.shape[0] == 0: + continue + subset_boxes = bboxes[mask, :] + box_probs = np.concatenate( + [subset_boxes, probs.reshape(-1, 1)], axis=1) + box_probs = hard_nms( + box_probs, + iou_threshold=self.nms_threshold, + top_k=self.keep_top_k, ) + picked_box_probs.append(box_probs) + picked_labels.extend([class_index] * box_probs.shape[0]) + + if len(picked_box_probs) == 0: + out_boxes_list.append(np.empty((0, 4))) + + else: + picked_box_probs = np.concatenate(picked_box_probs) + + # resize output boxes + picked_box_probs[:, 0] *= self.scale_x + picked_box_probs[:, 2] *= self.scale_x + picked_box_probs[:, 1] *= self.scale_y + picked_box_probs[:, 3] *= self.scale_y + + # clas score box + out_boxes_list.append( + np.concatenate( + [ + np.expand_dims( + np.array(picked_labels), + axis=-1), np.expand_dims( + picked_box_probs[:, 4], axis=-1), + picked_box_probs[:, :4] + ], + axis=1)) + + out_boxes_list = np.concatenate(out_boxes_list, axis=0) + return out_boxes_list + + +def detect(img_file, compiled_model, class_label): + output = compiled_model.infer_new_request({0: test_image}) + result_ie = list(output.values()) + + decode_boxes = [] + select_scores = [] + num_outs = int(len(result_ie) / 2) + for out_idx in range(num_outs): + decode_boxes.append(result_ie[out_idx]) + select_scores.append(result_ie[out_idx + num_outs]) + + image = cv2.imread(img_file, 1) + scale_x = image.shape[1] / test_image.shape[3] + scale_y = image.shape[0] / test_image.shape[2] + + nms = PicoDetNMS(test_image.shape[2:], scale_x, scale_y) + np_boxes = nms(decode_boxes, select_scores) + + res_image = draw_box(image, np_boxes, class_label) + + cv2.imwrite('res.jpg', res_image) + cv2.imshow("res", res_image) + cv2.waitKey() + + +if __name__ == '__main__': + + parser = argparse.ArgumentParser() + parser.add_argument( + '--img_path', + type=str, + default='../../demo_onnxruntime/imgs/bus.jpg', + help="image path") + parser.add_argument( + '--onnx_path', + type=str, + default='out_onnxsim_infer/picodet_s_320_postproccesed_woNMS.onnx', + help="onnx filepath") + parser.add_argument('--in_shape', type=int, default=320, help="input_size") + parser.add_argument( + '--class_label', + type=str, + default='coco_label.txt', + help="class label file") + args = parser.parse_args() + + ie = Core() + net = ie.read_model(args.onnx_path) + test_image = image_preprocess(args.img_path, args.in_shape) + compiled_model = ie.compile_model(net, 'CPU') + + detect(args.img_path, compiled_model, args.class_label) diff --git a/deploy/third_engine/demo_openvino_kpts/picodet_openvino.h b/deploy/third_engine/demo_openvino_kpts/picodet_openvino.h index 242423a3af3644c3f3ad495ab0c291015560e49f..7bd3d79c44a2f6ae62eaba82bcafcae45a84254f 100644 --- a/deploy/third_engine/demo_openvino_kpts/picodet_openvino.h +++ b/deploy/third_engine/demo_openvino_kpts/picodet_openvino.h @@ -38,8 +38,8 @@ typedef struct BoxInfo { } BoxInfo; class PicoDet { - public: - PicoDet(const char* param); +public: + PicoDet(const char *param); ~PicoDet(); @@ -48,26 +48,23 @@ class PicoDet { std::vector heads_info_{ // cls_pred|dis_pred|stride - {"save_infer_model/scale_0.tmp_1", "save_infer_model/scale_4.tmp_1", 8}, - {"save_infer_model/scale_1.tmp_1", "save_infer_model/scale_5.tmp_1", 16}, - {"save_infer_model/scale_2.tmp_1", "save_infer_model/scale_6.tmp_1", 32}, - {"save_infer_model/scale_3.tmp_1", "save_infer_model/scale_7.tmp_1", 64}, + {"transpose_0.tmp_0", "transpose_1.tmp_0", 8}, + {"transpose_2.tmp_0", "transpose_3.tmp_0", 16}, + {"transpose_4.tmp_0", "transpose_5.tmp_0", 32}, + {"transpose_6.tmp_0", "transpose_7.tmp_0", 64}, }; - std::vector detect(cv::Mat image, - float score_threshold, + std::vector detect(cv::Mat image, float score_threshold, float nms_threshold); - private: - void preprocess(cv::Mat& image, InferenceEngine::Blob::Ptr& blob); - void decode_infer(const float*& cls_pred, - const float*& dis_pred, - int stride, +private: + void preprocess(cv::Mat &image, InferenceEngine::Blob::Ptr &blob); + void decode_infer(const float *&cls_pred, const float *&dis_pred, int stride, float threshold, - std::vector>& results); - BoxInfo disPred2Bbox( - const float*& dfl_det, int label, float score, int x, int y, int stride); - static void nms(std::vector& result, float nms_threshold); + std::vector> &results); + BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x, + int y, int stride); + static void nms(std::vector &result, float nms_threshold); std::string input_name_; int input_size_ = image_size; int num_class_ = 80; diff --git a/deploy/third_engine/onnx/infer.py b/deploy/third_engine/onnx/infer.py new file mode 100644 index 0000000000000000000000000000000000000000..9dbe2bde9e0d9e90639f53331a91fdecbeaefb8b --- /dev/null +++ b/deploy/third_engine/onnx/infer.py @@ -0,0 +1,148 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import yaml +import argparse +import numpy as np +import glob +from onnxruntime import InferenceSession + +from preprocess import Compose + +# Global dictionary +SUPPORT_MODELS = { + 'YOLO', 'RCNN', 'SSD', 'Face', 'FCOS', 'SOLOv2', 'TTFNet', 'S2ANet', 'JDE', + 'FairMOT', 'DeepSORT', 'GFL', 'PicoDet', 'CenterNet', 'TOOD', 'RetinaNet', + 'StrongBaseline', 'STGCN', 'YOLOX', 'HRNet' +} + +parser = argparse.ArgumentParser(description=__doc__) +parser.add_argument("--infer_cfg", type=str, help="infer_cfg.yml") +parser.add_argument( + '--onnx_file', type=str, default="model.onnx", help="onnx model file path") +parser.add_argument("--image_dir", type=str) +parser.add_argument("--image_file", type=str) + + +def get_test_images(infer_dir, infer_img): + """ + Get image path list in TEST mode + """ + assert infer_img is not None or infer_dir is not None, \ + "--image_file or --image_dir should be set" + assert infer_img is None or os.path.isfile(infer_img), \ + "{} is not a file".format(infer_img) + assert infer_dir is None or os.path.isdir(infer_dir), \ + "{} is not a directory".format(infer_dir) + + # infer_img has a higher priority + if infer_img and os.path.isfile(infer_img): + return [infer_img] + + images = set() + infer_dir = os.path.abspath(infer_dir) + assert os.path.isdir(infer_dir), \ + "infer_dir {} is not a directory".format(infer_dir) + exts = ['jpg', 'jpeg', 'png', 'bmp'] + exts += [ext.upper() for ext in exts] + for ext in exts: + images.update(glob.glob('{}/*.{}'.format(infer_dir, ext))) + images = list(images) + + assert len(images) > 0, "no image found in {}".format(infer_dir) + print("Found {} inference images in total.".format(len(images))) + + return images + + +class PredictConfig(object): + """set config of preprocess, postprocess and visualize + Args: + infer_config (str): path of infer_cfg.yml + """ + + def __init__(self, infer_config): + # parsing Yaml config for Preprocess + with open(infer_config) as f: + yml_conf = yaml.safe_load(f) + self.check_model(yml_conf) + self.arch = yml_conf['arch'] + self.preprocess_infos = yml_conf['Preprocess'] + self.min_subgraph_size = yml_conf['min_subgraph_size'] + self.label_list = yml_conf['label_list'] + self.use_dynamic_shape = yml_conf['use_dynamic_shape'] + self.draw_threshold = yml_conf.get("draw_threshold", 0.5) + self.mask = yml_conf.get("mask", False) + self.tracker = yml_conf.get("tracker", None) + self.nms = yml_conf.get("NMS", None) + self.fpn_stride = yml_conf.get("fpn_stride", None) + if self.arch == 'RCNN' and yml_conf.get('export_onnx', False): + print( + 'The RCNN export model is used for ONNX and it only supports batch_size = 1' + ) + self.print_config() + + def check_model(self, yml_conf): + """ + Raises: + ValueError: loaded model not in supported model type + """ + for support_model in SUPPORT_MODELS: + if support_model in yml_conf['arch']: + return True + raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[ + 'arch'], SUPPORT_MODELS)) + + def print_config(self): + print('----------- Model Configuration -----------') + print('%s: %s' % ('Model Arch', self.arch)) + print('%s: ' % ('Transform Order')) + for op_info in self.preprocess_infos: + print('--%s: %s' % ('transform op', op_info['type'])) + print('--------------------------------------------') + + +def predict_image(infer_config, predictor, img_list): + # load preprocess transforms + transforms = Compose(infer_config.preprocess_infos) + # predict image + for img_path in img_list: + inputs = transforms(img_path) + inputs_name = [var.name for var in predictor.get_inputs()] + inputs = {k: inputs[k][None, ] for k in inputs_name} + + outputs = predictor.run(output_names=None, input_feed=inputs) + + print("ONNXRuntime predict: ") + if infer_config.arch in ["HRNet"]: + print(np.array(outputs[0])) + else: + bboxes = np.array(outputs[0]) + for bbox in bboxes: + if bbox[0] > -1 and bbox[1] > infer_config.draw_threshold: + print(f"{int(bbox[0])} {bbox[1]} " + f"{bbox[2]} {bbox[3]} {bbox[4]} {bbox[5]}") + + +if __name__ == '__main__': + FLAGS = parser.parse_args() + # load image list + img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file) + # load predictor + predictor = InferenceSession(FLAGS.onnx_file) + # load infer config + infer_config = PredictConfig(FLAGS.infer_cfg) + + predict_image(infer_config, predictor, img_list) diff --git a/deploy/third_engine/onnx/preprocess.py b/deploy/third_engine/onnx/preprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..7ed422c53c2301a63fafd6f5c8019ed51d857865 --- /dev/null +++ b/deploy/third_engine/onnx/preprocess.py @@ -0,0 +1,491 @@ +import numpy as np +import cv2 +import copy + + +def decode_image(img_path): + with open(img_path, 'rb') as f: + im_read = f.read() + data = np.frombuffer(im_read, dtype='uint8') + im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + img_info = { + "im_shape": np.array( + im.shape[:2], dtype=np.float32), + "scale_factor": np.array( + [1., 1.], dtype=np.float32) + } + return im, img_info + + +class Resize(object): + """resize image by target_size and max_size + Args: + target_size (int): the target size of image + keep_ratio (bool): whether keep_ratio or not, default true + interp (int): method of resize + """ + + def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR): + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + self.keep_ratio = keep_ratio + self.interp = interp + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + im_channel = im.shape[2] + im_scale_y, im_scale_x = self.generate_scale(im) + im = cv2.resize( + im, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=self.interp) + im_info['im_shape'] = np.array(im.shape[:2]).astype('float32') + im_info['scale_factor'] = np.array( + [im_scale_y, im_scale_x]).astype('float32') + return im, im_info + + def generate_scale(self, im): + """ + Args: + im (np.ndarray): image (np.ndarray) + Returns: + im_scale_x: the resize ratio of X + im_scale_y: the resize ratio of Y + """ + origin_shape = im.shape[:2] + im_c = im.shape[2] + if self.keep_ratio: + im_size_min = np.min(origin_shape) + im_size_max = np.max(origin_shape) + target_size_min = np.min(self.target_size) + target_size_max = np.max(self.target_size) + im_scale = float(target_size_min) / float(im_size_min) + if np.round(im_scale * im_size_max) > target_size_max: + im_scale = float(target_size_max) / float(im_size_max) + im_scale_x = im_scale + im_scale_y = im_scale + else: + resize_h, resize_w = self.target_size + im_scale_y = resize_h / float(origin_shape[0]) + im_scale_x = resize_w / float(origin_shape[1]) + return im_scale_y, im_scale_x + + +class NormalizeImage(object): + """normalize image + Args: + mean (list): im - mean + std (list): im / std + is_scale (bool): whether need im / 255 + is_channel_first (bool): if True: image shape is CHW, else: HWC + """ + + def __init__(self, mean, std, is_scale=True): + self.mean = mean + self.std = std + self.is_scale = is_scale + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.astype(np.float32, copy=False) + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + + if self.is_scale: + im = im / 255.0 + im -= mean + im /= std + return im, im_info + + +class Permute(object): + """permute image + Args: + to_bgr (bool): whether convert RGB to BGR + channel_first (bool): whether convert HWC to CHW + """ + + def __init__(self, ): + super(Permute, self).__init__() + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + im = im.transpose((2, 0, 1)).copy() + return im, im_info + + +class PadStride(object): + """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config + Args: + stride (bool): model with FPN need image shape % stride == 0 + """ + + def __init__(self, stride=0): + self.coarsest_stride = stride + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + coarsest_stride = self.coarsest_stride + if coarsest_stride <= 0: + return im, im_info + im_c, im_h, im_w = im.shape + pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride) + pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride) + padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32) + padding_im[:, :im_h, :im_w] = im + return padding_im, im_info + + +class LetterBoxResize(object): + def __init__(self, target_size): + """ + Resize image to target size, convert normalized xywh to pixel xyxy + format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]). + Args: + target_size (int|list): image target size. + """ + super(LetterBoxResize, self).__init__() + if isinstance(target_size, int): + target_size = [target_size, target_size] + self.target_size = target_size + + def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)): + # letterbox: resize a rectangular image to a padded rectangular + shape = img.shape[:2] # [height, width] + ratio_h = float(height) / shape[0] + ratio_w = float(width) / shape[1] + ratio = min(ratio_h, ratio_w) + new_shape = (round(shape[1] * ratio), + round(shape[0] * ratio)) # [width, height] + padw = (width - new_shape[0]) / 2 + padh = (height - new_shape[1]) / 2 + top, bottom = round(padh - 0.1), round(padh + 0.1) + left, right = round(padw - 0.1), round(padw + 0.1) + + img = cv2.resize( + img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border + img = cv2.copyMakeBorder( + img, top, bottom, left, right, cv2.BORDER_CONSTANT, + value=color) # padded rectangular + return img, ratio, padw, padh + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + assert len(self.target_size) == 2 + assert self.target_size[0] > 0 and self.target_size[1] > 0 + height, width = self.target_size + h, w = im.shape[:2] + im, ratio, padw, padh = self.letterbox(im, height=height, width=width) + + new_shape = [round(h * ratio), round(w * ratio)] + im_info['im_shape'] = np.array(new_shape, dtype=np.float32) + im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32) + return im, im_info + + +class Pad(object): + def __init__(self, size, fill_value=[114.0, 114.0, 114.0]): + """ + Pad image to a specified size. + Args: + size (list[int]): image target size + fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0) + """ + super(Pad, self).__init__() + if isinstance(size, int): + size = [size, size] + self.size = size + self.fill_value = fill_value + + def __call__(self, im, im_info): + im_h, im_w = im.shape[:2] + h, w = self.size + if h == im_h and w == im_w: + im = im.astype(np.float32) + return im, im_info + + canvas = np.ones((h, w, 3), dtype=np.float32) + canvas *= np.array(self.fill_value, dtype=np.float32) + canvas[0:im_h, 0:im_w, :] = im.astype(np.float32) + im = canvas + return im, im_info + + +def rotate_point(pt, angle_rad): + """Rotate a point by an angle. + + Args: + pt (list[float]): 2 dimensional point to be rotated + angle_rad (float): rotation angle by radian + + Returns: + list[float]: Rotated point. + """ + assert len(pt) == 2 + sn, cs = np.sin(angle_rad), np.cos(angle_rad) + new_x = pt[0] * cs - pt[1] * sn + new_y = pt[0] * sn + pt[1] * cs + rotated_pt = [new_x, new_y] + + return rotated_pt + + +def _get_3rd_point(a, b): + """To calculate the affine matrix, three pairs of points are required. This + function is used to get the 3rd point, given 2D points a & b. + + The 3rd point is defined by rotating vector `a - b` by 90 degrees + anticlockwise, using b as the rotation center. + + Args: + a (np.ndarray): point(x,y) + b (np.ndarray): point(x,y) + + Returns: + np.ndarray: The 3rd point. + """ + assert len(a) == 2 + assert len(b) == 2 + direction = a - b + third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32) + + return third_pt + + +def get_affine_transform(center, + input_size, + rot, + output_size, + shift=(0., 0.), + inv=False): + """Get the affine transform matrix, given the center/scale/rot/output_size. + + Args: + center (np.ndarray[2, ]): Center of the bounding box (x, y). + scale (np.ndarray[2, ]): Scale of the bounding box + wrt [width, height]. + rot (float): Rotation angle (degree). + output_size (np.ndarray[2, ]): Size of the destination heatmaps. + shift (0-100%): Shift translation ratio wrt the width/height. + Default (0., 0.). + inv (bool): Option to inverse the affine transform direction. + (inv=False: src->dst or inv=True: dst->src) + + Returns: + np.ndarray: The transform matrix. + """ + assert len(center) == 2 + assert len(output_size) == 2 + assert len(shift) == 2 + if not isinstance(input_size, (np.ndarray, list)): + input_size = np.array([input_size, input_size], dtype=np.float32) + scale_tmp = input_size + + shift = np.array(shift) + src_w = scale_tmp[0] + dst_w = output_size[0] + dst_h = output_size[1] + + rot_rad = np.pi * rot / 180 + src_dir = rotate_point([0., src_w * -0.5], rot_rad) + dst_dir = np.array([0., dst_w * -0.5]) + + src = np.zeros((3, 2), dtype=np.float32) + src[0, :] = center + scale_tmp * shift + src[1, :] = center + src_dir + scale_tmp * shift + src[2, :] = _get_3rd_point(src[0, :], src[1, :]) + + dst = np.zeros((3, 2), dtype=np.float32) + dst[0, :] = [dst_w * 0.5, dst_h * 0.5] + dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir + dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :]) + + if inv: + trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) + else: + trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) + + return trans + + +class WarpAffine(object): + """Warp affine the image + """ + + def __init__(self, + keep_res=False, + pad=31, + input_h=512, + input_w=512, + scale=0.4, + shift=0.1): + self.keep_res = keep_res + self.pad = pad + self.input_h = input_h + self.input_w = input_w + self.scale = scale + self.shift = shift + + def __call__(self, im, im_info): + """ + Args: + im (np.ndarray): image (np.ndarray) + im_info (dict): info of image + Returns: + im (np.ndarray): processed image (np.ndarray) + im_info (dict): info of processed image + """ + img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR) + + h, w = img.shape[:2] + + if self.keep_res: + input_h = (h | self.pad) + 1 + input_w = (w | self.pad) + 1 + s = np.array([input_w, input_h], dtype=np.float32) + c = np.array([w // 2, h // 2], dtype=np.float32) + + else: + s = max(h, w) * 1.0 + input_h, input_w = self.input_h, self.input_w + c = np.array([w / 2., h / 2.], dtype=np.float32) + + trans_input = get_affine_transform(c, s, 0, [input_w, input_h]) + img = cv2.resize(img, (w, h)) + inp = cv2.warpAffine( + img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR) + return inp, im_info + + +# keypoint preprocess +def get_warp_matrix(theta, size_input, size_dst, size_target): + """This code is based on + https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py + + Calculate the transformation matrix under the constraint of unbiased. + Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased + Data Processing for Human Pose Estimation (CVPR 2020). + + Args: + theta (float): Rotation angle in degrees. + size_input (np.ndarray): Size of input image [w, h]. + size_dst (np.ndarray): Size of output image [w, h]. + size_target (np.ndarray): Size of ROI in input plane [w, h]. + + Returns: + matrix (np.ndarray): A matrix for transformation. + """ + theta = np.deg2rad(theta) + matrix = np.zeros((2, 3), dtype=np.float32) + scale_x = size_dst[0] / size_target[0] + scale_y = size_dst[1] / size_target[1] + matrix[0, 0] = np.cos(theta) * scale_x + matrix[0, 1] = -np.sin(theta) * scale_x + matrix[0, 2] = scale_x * ( + -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] * + np.sin(theta) + 0.5 * size_target[0]) + matrix[1, 0] = np.sin(theta) * scale_y + matrix[1, 1] = np.cos(theta) * scale_y + matrix[1, 2] = scale_y * ( + -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] * + np.cos(theta) + 0.5 * size_target[1]) + return matrix + + +class TopDownEvalAffine(object): + """apply affine transform to image and coords + + Args: + trainsize (list): [w, h], the standard size used to train + use_udp (bool): whether to use Unbiased Data Processing. + records(dict): the dict contained the image and coords + + Returns: + records (dict): contain the image and coords after tranformed + + """ + + def __init__(self, trainsize, use_udp=False): + self.trainsize = trainsize + self.use_udp = use_udp + + def __call__(self, image, im_info): + rot = 0 + imshape = im_info['im_shape'][::-1] + center = im_info['center'] if 'center' in im_info else imshape / 2. + scale = im_info['scale'] if 'scale' in im_info else imshape + if self.use_udp: + trans = get_warp_matrix( + rot, center * 2.0, + [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + else: + trans = get_affine_transform(center, scale, rot, self.trainsize) + image = cv2.warpAffine( + image, + trans, (int(self.trainsize[0]), int(self.trainsize[1])), + flags=cv2.INTER_LINEAR) + + return image, im_info + + +class Compose: + def __init__(self, transforms): + self.transforms = [] + for op_info in transforms: + new_op_info = op_info.copy() + op_type = new_op_info.pop('type') + self.transforms.append(eval(op_type)(**new_op_info)) + + def __call__(self, img_path): + img, im_info = decode_image(img_path) + for t in self.transforms: + img, im_info = t(img, im_info) + inputs = copy.deepcopy(im_info) + inputs['image'] = img + return inputs diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index b9915b5ab2bf55dd2a8d2a054fd8b550d4cc4faf..e19e6867a15ea18ddb8f85f46f6e020b79d4ebf6 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -7,7 +7,7 @@ ### 2.4(03.24/2022) - PP-YOLOE: - - 发布PP-YOLOE特色模型,COCO数据集精度51.4%,V100预测速度78.1 FPS,精度速度服务器端SOTA + - 发布PP-YOLOE特色模型,l版本COCO test2017数据集精度51.6%,V100预测速度78.1 FPS,精度速度服务器端SOTA - 发布s/m/l/x系列模型,打通TensorRT、ONNX部署能力 - 支持混合精度训练,训练较PP-YOLOv2加速33% @@ -22,8 +22,10 @@ - ReID支持Centroid模型 - 动作识别支持ST-GCN摔倒检测 +- 模型丰富度: + - 发布YOLOX,支持nano/tiny/s/m/l/x版本,x版本COCO val2017数据集精度51.8% + - 框架功能优化: - - 支持混合精度训练,通过`–amp`开启 - EMA训练速度优化20%,优化EMA训练模型保存方式 - 支持infer预测结果保存为COCO格式 diff --git a/docs/CHANGELOG_en.md b/docs/CHANGELOG_en.md index 7714b0c6787ef486623edd6494036f075651ef52..a7e6d422611eae7f0cfa66deb56f8e53e493d8c2 100644 --- a/docs/CHANGELOG_en.md +++ b/docs/CHANGELOG_en.md @@ -7,14 +7,14 @@ English | [简体中文](./CHANGELOG.md) ### 2.4(03.24/2022) - PP-YOLOE: - - Release PP-YOLOE object detection models, achieve mAP as 51.4% on COCO test dataset and 78.1 FPS on Nvidia V100, reach SOTA performance for object detection on GPU`` + - Release PP-YOLOE object detection models, achieve mAP as 51.6% on COCO test dataset and 78.1 FPS on Nvidia V100 by PP-YOLOE-l, reach SOTA performance for object detection on GPU`` - Release series models: s/m/l/x, and support deployment base on TensorRT & ONNX - Spport AMP training and training speed is 33% faster than PP-YOLOv2 - PP-PicoDet: - Release enhanced models of PP-PicoDet, mAP promoted ~2% on COCO and inference speed accelerated 63% on CPU - Release PP-PicoDet-XS model with 0.7M parameters - - Post-processing integrated into the network to optimize deployment pipeline + - Post-processing integrated into the network to optimize deployment pipeline - PP-Human: - Release PP-Human human analysis pipeline,including pedestrian detection, attribute recognition, human tracking, multi-camera tracking, human statistics, action recognition. Supporting deployment with TensorRT @@ -22,8 +22,10 @@ English | [简体中文](./CHANGELOG.md) - Release Centroid model for ReID - Release ST-GCN model for falldown action recognition +- Model richness: + - Publish YOLOX object detection model, release series models: nano/tiny/s/m/l/x, and YOLOX-x achieves mAP as 51.8% on COCO val2017 dataset + - Function Optimize: - - Support AMP training, enable with `--amp` - Optimize 20% training speed when training with EMA, improve saving method of EMA weights - Support saving inference results in COCO format diff --git a/docs/MODEL_ZOO_cn.md b/docs/MODEL_ZOO_cn.md index dc8c38374efa36e378dc0a6d170570872efffcb1..b71d07d5c2ea652b16855967236fa9323e2af0ee 100644 --- a/docs/MODEL_ZOO_cn.md +++ b/docs/MODEL_ZOO_cn.md @@ -87,6 +87,14 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型 请参考[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet) +### PP-YOLOE + +请参考[PP-YOLOE](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe) + +### YOLOX + +请参考[YOLOX](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolox) + ## 旋转框检测 @@ -94,6 +102,7 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型 请参考[S2ANet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/) + ## 关键点检测 ### PP-TinyPose @@ -108,16 +117,21 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型 请参考[HigherHRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/higherhrnet) + ## 多目标跟踪 -### DeepSort +### DeepSORT -请参考[DeepSort](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort) +请参考[DeepSORT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort) ### JDE 请参考[JDE](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde) -### fairmot +### FairMOT + +请参考[FairMOT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot) + +### ByteTrack -请参考[FairMot](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot) +请参考[ByteTrack](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/bytetrack) diff --git a/docs/MODEL_ZOO_en.md b/docs/MODEL_ZOO_en.md index b8f5403ad4a5c4ef619e9a4d2bf73375fd0b812f..fd23b895324f9544e9d007e25f6f607e4340c7b1 100644 --- a/docs/MODEL_ZOO_en.md +++ b/docs/MODEL_ZOO_en.md @@ -86,6 +86,14 @@ Please refer to[GFL](https://github.com/PaddlePaddle/PaddleDetection/tree/develo Please refer to[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet) +### PP-YOLOE + +Please refer to[PP-YOLOE](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe) + +### YOLOX + +Please refer to[YOLOX](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolox) + ## Rotating frame detection @@ -93,6 +101,7 @@ Please refer to[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/de Please refer to[S2ANet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/) + ## KeyPoint Detection ### PP-TinyPose @@ -107,16 +116,21 @@ Please refer to [HRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/dev Please refer to [HigherHRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/higherhrnet) + ## Multi-Object Tracking -### DeepSort +### DeepSORT -Please refer to [DeepSort](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort) +Please refer to [DeepSORT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort) ### JDE Please refer to [JDE](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde) -### fairmot +### FairMOT + +Please refer to [FairMOT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot) + +### ByteTrack -Please refer to [FairMot](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot) \ No newline at end of file +Please refer to [ByteTrack](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/bytetrack) diff --git a/docs/advanced_tutorials/READER.md b/docs/advanced_tutorials/READER.md index bc087f8959f008b8a9ac6b5613c1aca710f239a1..60c9fee67f2718a3de088eb52d899828924f9e34 100644 --- a/docs/advanced_tutorials/READER.md +++ b/docs/advanced_tutorials/READER.md @@ -90,7 +90,7 @@ COCO数据集目前分为COCO2014和COCO2017,主要由json文件和image文件 │ │ ... ``` -在`source/coco.py`中定义并注册了`COCODataSet`数据集类,其继承自`DetDataSet`,并实现了parse_dataset方法,调用[COCO API](https://github.com/cocodataset/cocoapi)加载并解析COCO格式数据源`roidbs`和`cname2cid`,具体可参见`source/coco.py`源码。将其他数据集转换成COCO格式可以参考[用户数据转成COCO数据](../tutorials/PrepareDataSet.md#用户数据转成COCO数据) +在`source/coco.py`中定义并注册了`COCODataSet`数据集类,其继承自`DetDataSet`,并实现了parse_dataset方法,调用[COCO API](https://github.com/cocodataset/cocoapi)加载并解析COCO格式数据源`roidbs`和`cname2cid`,具体可参见`source/coco.py`源码。将其他数据集转换成COCO格式可以参考[用户数据转成COCO数据](../tutorials/data/PrepareDetDataSet.md#用户数据转成COCO数据) #### 2.2Pascal VOC数据集 该数据集目前分为VOC2007和VOC2012,主要由xml文件和image文件组成,其组织结构如下所示: @@ -118,7 +118,7 @@ COCO数据集目前分为COCO2014和COCO2017,主要由json文件和image文件 │ ├── ImageSets │ │ ... ``` -在`source/voc.py`中定义并注册了`VOCDataSet`数据集,它继承自`DetDataSet`基类,并重写了`parse_dataset`方法,解析VOC数据集中xml格式标注文件,更新`roidbs`和`cname2cid`。将其他数据集转换成VOC格式可以参考[用户数据转成VOC数据](../tutorials/PrepareDataSet.md#用户数据转成VOC数据) +在`source/voc.py`中定义并注册了`VOCDataSet`数据集,它继承自`DetDataSet`基类,并重写了`parse_dataset`方法,解析VOC数据集中xml格式标注文件,更新`roidbs`和`cname2cid`。将其他数据集转换成VOC格式可以参考[用户数据转成VOC数据](../tutorials/data/PrepareDetDataSet.md#用户数据转成VOC数据) #### 2.3自定义数据集 如果COCODataSet和VOCDataSet不能满足你的需求,可以通过自定义数据集的方式来加载你的数据集。只需要以下两步即可实现自定义数据集 @@ -259,9 +259,11 @@ Reader相关的类定义在`reader.py`, 其中定义了`BaseDataLoader`类。`Ba ### 5.配置及运行 -#### 5.1配置 +#### 5.1 配置 +与数据预处理相关的模块的配置文件包含所有模型公用的Dataset的配置文件,以及不同模型专用的Reader的配置文件。 -与数据预处理相关的模块的配置文件包含所有模型公用的Datas set的配置文件以及不同模型专用的Reader的配置文件。关于Dataset的配置文件存在于`configs/datasets`文件夹。比如COCO数据集的配置文件如下: +##### 5.1.1 Dataset配置 +关于Dataset的配置文件存在于`configs/datasets`文件夹。比如COCO数据集的配置文件如下: ``` metric: COCO # 目前支持COCO, VOC, OID, WiderFace等评估标准 num_classes: 80 # num_classes数据集的类别数,不包含背景类 @@ -271,7 +273,7 @@ TrainDataset: image_dir: train2017 # 训练集的图片所在文件相对于dataset_dir的路径 anno_path: annotations/instances_train2017.json # 训练集的标注文件相对于dataset_dir的路径 dataset_dir: dataset/coco #数据集所在路径,相对于PaddleDetection路径 - data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # 控制dataset输出的sample所包含的字段 + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # 控制dataset输出的sample所包含的字段,注意此为TrainDataset独有的且必须配置的字段 EvalDataset: !COCODataSet @@ -281,9 +283,16 @@ EvalDataset: TestDataset: !ImageFolder - anno_path: dataset/coco/annotations/instances_val2017.json # 验证集的标注文件所在路径,相对于PaddleDetection的路径 + anno_path: annotations/instances_val2017.json # 标注文件所在路径,仅用于读取数据集的类别信息,支持json和txt格式 + dataset_dir: dataset/coco # 数据集所在路径,若添加了此行,则`anno_path`路径为`dataset_dir/anno_path`,若此行不设置或去掉此行,则`anno_path`路径即为`anno_path` ``` 在PaddleDetection的yml配置文件中,使用`!`直接序列化模块实例(可以是函数,实例等),上述的配置文件均使用Dataset进行了序列化。 + +**注意:** +请运行前自行仔细检查数据集的配置路径,在训练或验证时如果TrainDataset和EvalDataset的路径配置有误,会提示自动下载数据集。若使用自定义数据集,在推理时如果TestDataset路径配置有误,会提示使用默认COCO数据集的类别信息。 + + +##### 5.1.2 Reader配置 不同模型专用的Reader定义在每一个模型的文件夹下,如yolov3的Reader配置文件定义在`configs/yolov3/_base_/yolov3_reader.yml`。一个Reader的示例配置如下: ``` worker_num: 2 diff --git a/docs/advanced_tutorials/READER_en.md b/docs/advanced_tutorials/READER_en.md index e3924641759c100ff9b16ebf82e2d9dc28666fae..07940a965dd4e48499a96def925679f9ff269ad8 100644 --- a/docs/advanced_tutorials/READER_en.md +++ b/docs/advanced_tutorials/READER_en.md @@ -91,7 +91,7 @@ COCO datasets are currently divided into COCO2014 and COCO2017, which are mainly │ │ ... ``` class `COCODataSet` is defined and registered on `source/coco.py`. And implements the parse the dataset method, called [COCO API](https://github.com/cocodataset/cocoapi) to load and parse COCO format data source ` roidbs ` and ` cname2cid `, See `source/coco.py` source code for details. Converting other datasets to COCO format can be done by referring to [converting User Data to COCO Data](../tutorials/PrepareDataSet_en.md#convert-user-data-to-coco-data) -And implements the parse the dataset method, called [COCO API](https://github.com/cocodataset/cocoapi) to load and parse COCO format data source `roidbs` and `cname2cid`, See `source/coco.py` source code for details. Converting other datasets to COCO format can be done by referring to [converting User Data to COCO Data](../tutorials/PrepareDataSet_en.md#convert-user-data-to-coco-data) +And implements the parse the dataset method, called [COCO API](https://github.com/cocodataset/cocoapi) to load and parse COCO format data source `roidbs` and `cname2cid`, See `source/coco.py` source code for details. Converting other datasets to COCO format can be done by referring to [converting User Data to COCO Data](../tutorials/data/PrepareDetDataSet_en.md#convert-user-data-to-coco-data) #### 2.2Pascal VOC dataset @@ -120,7 +120,7 @@ The dataset is currently divided into VOC2007 and VOC2012, mainly composed of XM │ ├── ImageSets │ │ ... ``` -The `VOCDataSet` dataset is defined and registered in `source/voc.py` . It inherits the `DetDataSet` base class and rewrites the `parse_dataset` method to parse XML annotations in the VOC dataset. Update `roidbs` and `cname2cid`. To convert other datasets to VOC format, refer to [User Data to VOC Data](../tutorials/PrepareDataSet_en.md#convert-user-data-to-voc-data) +The `VOCDataSet` dataset is defined and registered in `source/voc.py` . It inherits the `DetDataSet` base class and rewrites the `parse_dataset` method to parse XML annotations in the VOC dataset. Update `roidbs` and `cname2cid`. To convert other datasets to VOC format, refer to [User Data to VOC Data](../tutorials/data/PrepareDetDataSet_en.md#convert-user-data-to-voc-data) #### 2.3Customize Dataset @@ -260,9 +260,11 @@ The Reader class is defined in `reader.py`, where the `BaseDataLoader` class is ### 5.Configuration and Operation -#### 5.1Configuration +#### 5.1 Configuration +The configuration files for modules related to data preprocessing contain the configuration files for Datasets common to all models and the configuration files for readers specific to different models. -The configuration files for modules related to data preprocessing contain the configuration files for Datas sets common to all models and the configuration files for readers specific to different models. The configuration file for the Dataset exists in the `configs/datasets` folder. For example, the COCO dataset configuration file is as follows: +##### 5.1.1 Dataset Configuration +The configuration file for the Dataset exists in the `configs/datasets` folder. For example, the COCO dataset configuration file is as follows: ``` metric: COCO # Currently supports COCO, VOC, OID, Wider Face and other evaluation standards num_classes: 80 # num_classes: The number of classes in the dataset, excluding background classes @@ -272,7 +274,7 @@ TrainDataset: image_dir: train2017 # The path where the training set image resides relative to the dataset_dir anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir dataset_dir: dataset/coco #The path where the dataset is located relative to the PaddleDetection path - data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # Controls the fields contained in the sample output of the dataset + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # Controls the fields contained in the sample output of the dataset, note data_fields are unique to the TrainDataset and must be configured EvalDataset: !COCODataSet @@ -281,9 +283,16 @@ EvalDataset: dataset_dir: dataset/coco # The path where the dataset is located relative to the PaddleDetection path TestDataset: !ImageFolder - anno_path: dataset/coco/annotations/instances_val2017.json # The path of the annotation file of the verification set, relative to the path of PaddleDetection + anno_path: dataset/coco/annotations/instances_val2017.json # The path of the annotation file, it is only used to read the category information of the dataset. JSON and TXT formats are supported + dataset_dir: dataset/coco # The path of the dataset, note if this row is added, `anno_path` will be 'dataset_dir/anno_path`, if not set or removed, `anno_path` is `anno_path` ``` In the YML profile for Paddle Detection, use `!`directly serializes module instances (functions, instances, etc.). The above configuration files are serialized using Dataset. + +**Note:** +Please carefully check the configuration path of the dataset before running. During training or verification, if the path of TrainDataset or EvalDataset is wrong, it will download the dataset automatically. When using a user-defined dataset, if the TestDataset path is incorrectly configured during inference, the category of the default COCO dataset will be used. + + +##### 5.1.2 Reader configuration The Reader configuration files for yolov3 are defined in `configs/yolov3/_base_/yolov3_reader.yml`. An example Reader configuration is as follows: ``` worker_num: 2 diff --git a/docs/advanced_tutorials/customization/action_recognotion/README.md b/docs/advanced_tutorials/customization/action_recognotion/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d9adf2184592e37c343d743e3ce93c8a4dccb493 --- /dev/null +++ b/docs/advanced_tutorials/customization/action_recognotion/README.md @@ -0,0 +1,54 @@ +简体中文 | [English](./README_en.md) + +# 行为识别任务二次开发 + +在产业落地过程中应用行为识别算法,不可避免地会出现希望自定义类型的行为识别的需求,或是对已有行为识别模型的优化,以提升在特定场景下模型的效果。鉴于行为的多样性,PP-Human支持抽烟、打电话、摔倒、打架、人员闯入五种异常行为识别,并根据行为的不同,集成了基于视频分类、基于检测、基于图像分类、基于跟踪以及基于骨骼点的五种行为识别技术方案,可覆盖90%+动作类型的识别,满足各类开发需求。我们在本文档通过案例来介绍如何根据期望识别的行为来进行行为识别方案的选择,以及使用PaddleDetection进行行为识别算法二次开发工作,包括:方案选择、数据准备、模型优化思路和新增行为的开发流程。 + + +## 方案选择 + +在PaddleDetection的PP-Human中,我们为行为识别提供了多种方案:基于视频分类、基于图像分类、基于检测、基于跟踪以及基于骨骼点的行为识别方案,以期望满足不同场景、不同目标行为的需求。对于二次开发,首先我们需要确定要采用何种方案来实现行为识别的需求,其核心是要通过对场景和具体行为的分析、并考虑数据采集成本等因素,综合选择一个合适的识别方案。我们在这里简要列举了当前PaddleDetection中所支持的方案的优劣势和适用场景,供大家参考。 + +image + +下面以PaddleDetection目前已经支持的几个具体动作为例,介绍每个动作方案的选型依据: + +### 吸烟 + +方案选择:基于人体id检测的行为识别 + +原因:吸烟动作中具有香烟这个明显特征目标,因此我们可以认为当在某个人物的对应图像中检测到香烟时,该人物即在吸烟动作中。相比于基于视频或基于骨骼点的识别方案,训练检测模型需要采集的是图片级别而非视频级别的数据,可以明显减轻数据收集与标注的难度。此外,目标检测任务具有丰富的预训练模型资源,整体模型的效果会更有保障, + +### 打电话 + +方案选择:基于人体id分类的行为识别 + +原因:打电话动作中虽然有手机这个特征目标,但为了区分看手机等动作,以及考虑到在安防场景下打电话动作中会出现较多对手机的遮挡(如手对手机的遮挡、人头对手机的遮挡等等),不利于检测模型正确检测到目标。同时打电话通常持续的时间较长,且人物本身的动作不会发生太大变化,因此可以因此采用帧级别图像分类的策略。 + 此外,打电话这个动作主要可以通过上半身判别,可以采用半身图片,去除冗余信息以降低模型训练的难度。 + +### 摔倒 + +方案选择:基于人体骨骼点的行为识别 + +原因:摔倒是一个明显的时序行为的动作,可由一个人物本身进行区分,具有场景无关的特性。由于PP-Human的场景定位偏向安防监控场景,背景变化较为复杂,且部署上需要考虑到实时性,因此采用了基于骨骼点的行为识别方案,以获得更好的泛化性及运行速度。 + +### 闯入 + +方案选择:基于人体id跟踪的行为识别 + +原因:闯入识别判断行人的路径或所在位置是否在某区域内即可,与人体自身动作无关,因此只需要跟踪人体跟踪结果分析是否存在闯入行为。 + +### 打架 + +方案选择:基于视频分类的行为识别 + +原因:与上面的动作不同,打架是一个典型的多人组成的行为。因此不再通过检测与跟踪模型来提取行人及其ID,而对整体视频片段进行处理。此外,打架场景下各个目标间的互相遮挡极为严重,关键点识别的准确性不高,采用基于骨骼点的方案难以保证精度。 + + +下面详细展开五大类方案的数据准备、模型优化和新增行为识别方法 + +1. [基于人体id检测的行为识别](./idbased_det.md) +2. [基于人体id分类的行为识别](./idbased_clas.md) +3. [基于人体骨骼点的行为识别](./skeletonbased_rec.md) +4. [基于人体id跟踪的行为识别](../pphuman_mot.md) +5. [基于视频分类的行为识别](./videobased_rec.md) diff --git a/docs/advanced_tutorials/customization/action_recognotion/README_en.md b/docs/advanced_tutorials/customization/action_recognotion/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..d04d426b7076abdd38a7317117f5daab6eeff0ad --- /dev/null +++ b/docs/advanced_tutorials/customization/action_recognotion/README_en.md @@ -0,0 +1,55 @@ +[简体中文](./README.md) | English + +# Secondary Development for Action Recognition Task + +In the process of industrial implementation, the application of action recognition algorithms will inevitably lead to the need for customized types of action, or the optimization of existing action recognition models to improve the performance of the model in specific scenarios. In view of the diversity of behaviors, PP-Human supports the identification of five abnormal behavioras of smoking, making phone calls, falling, fighting, and people intrusion. At the same time, according to the different behaviors, PP-Human integrates five action recognition technology solutions based on video classification, detection-based, image-based classification, tracking-based and skeleton-based, which can cover 90%+ action type recognition and meet various development needs. In this document, we use a case to introduce how to select a action recognition solution according to the expected behavior, and use PaddleDetection to carry out the secondary development of the action recognition algorithm, including: solution selection, data preparation, model optimization and development process for adding new actions. + + +## Solution Selection +In PaddleDetection's PP-Human, we provide a variety of solutions for behavior recognition: video classification, image classification, detection, tracking-based, and skeleton point-based behavior recognition solutions, in order to meet the needs of different scenes and different target behaviors. + +image + +The following takes several specific actions that PaddleDetection currently supports as an example to introduce the selection basis of each action: + +### Smoking + +Solution selection: action recognition based on detection with human id. + +Reason: The smoking action has a obvious feature target, that is, cigarette. So we can think that when a cigarette is detected in the corresponding image of a person, the person is with the smoking action. Compared with video-based or skeleton-based recognition schemes, training detection model needs to collect data at the image level rather than the video level, which can significantly reduce the difficulty of data collection and labeling. In addition, the detection task has abundant pre-training model resources, and the performance of the model will be more guaranteed. + +### Making Phone Calls + +Solution selection: action recognition based on classification with human id. + +Reason: Although there is a characteristic target of a mobile phone in the call action, in order to distinguish actions such as looking at the mobile phone, and considering that there will be much occlusion of the mobile phone in the calling action in the security scene (such as the occlusion of the mobile phone by the hand or head, etc.), is not conducive to the detection model to correctly detect the target. Simultaneous, calls usually last a long time, and the character's action do not change much, so a strategy for frame-level image classification can therefore be employed. In addition, the action of making a phone call can mainly be judged by the upper body, and the half-body picture can be used to remove redundant information to reduce the difficulty of model training. + + +### Falling + +Solution selection: action recognition based on skelenton. + +Reason: Falling is an obvious temporal action, which is distinguishable by a character himself, and it is scene-independent. Since PP-Human is towards the security monitoring scene, where the background changes are more complicated, and the real-time inference needs to be considered in the deployment, the action recognition based on skeleton points is adopted to obtain better generalization and running speed. + + +### People Intrusion + +Solution selection: action recognition based on tracking with human id. + +Reason: The intrusion recognition can be judged by whether the pedestrian's path or location is in a selected area, and it is unrelated to pedestrian's body action. Therefore, it is only necessary to track the human and use coordinate results to analyze whether there is intrusion behavior. + +### Fighting + +Solution selection: action recognition based on video classification. + +Reason: Unlike the actions above, fighting is a typical multiplayer action. Therefore, the detection and tracking model is no longer used to extract pedestrians and their IDs, but the entire video clip is processed. In addition, the mutual occlusion between various targets in the fighting scene is extremely serious, leading to the accuracy of keypoint recognition is not good. + + + +The following are detailed description for the five major categories of solutions, including the data preparation, model optimization and adding new actions. + +1. [action recognition based on detection with human id.](./idbased_det_en.md) +2. [action recognition based on classification with human id.](./idbased_clas_en.md) +3. [action recognition based on skelenton.](./skeletonbased_rec_en.md) +4. [action recognition based on tracking with human id](../pphuman_mot_en.md) +5. [action recognition based on video classification](./videobased_rec_en.md) diff --git a/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md b/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md new file mode 100644 index 0000000000000000000000000000000000000000..51f281835ab0b842a0718d726ae73a533587e82a --- /dev/null +++ b/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md @@ -0,0 +1,223 @@ +简体中文 | [English](./idbased_clas_en.md) + +# 基于人体id的分类模型开发 + +## 环境准备 + +基于人体id的分类方案是使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/installation/install_paddleclas.md)完成环境安装,以进行后续的模型训练及使用流程。 + +## 数据准备 + +基于图像分类的行为识别方案直接对视频中的图像帧结果进行识别,因此模型训练流程与通常的图像分类模型一致。 + +### 数据集下载 +打电话的行为识别是基于公开数据集[UAV-Human](https://github.com/SUTDCV/UAV-Human)进行训练的。请通过该链接填写相关数据集申请材料后获取下载链接。 + +在`UAVHuman/ActionRecognition/RGBVideos`路径下包含了该数据集中RGB视频数据集,每个视频的文件名即为其标注信息。 + +### 训练及测试图像处理 +根据视频文件名,其中与行为识别相关的为`A`相关的字段(即action),我们可以找到期望识别的动作类型数据。 +- 正样本视频:以打电话为例,我们只需找到包含`A024`的文件。 +- 负样本视频:除目标动作以外所有的视频。 + +鉴于视频数据转化为图像会有较多冗余,对于正样本视频,我们间隔8帧进行采样,并使用行人检测模型处理为半身图像(取检测框的上半部分,即`img = img[:H/2, :, :]`)。正样本视频中的采样得到的图像即视为正样本,负样本视频中采样得到的图像即为负样本。 + +**注意**: 正样本视频中并不完全符合打电话这一动作,在视频开头结尾部分会出现部分冗余动作,需要移除。 + +### 标注文件准备 + +基于图像分类的行为识别方案是借助[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)进行模型训练的。使用该方案训练的模型,需要准备期望识别的图像数据及对应标注文件。根据[PaddleClas数据集格式说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/data_preparation/classification_dataset.md#1-%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%BC%E5%BC%8F%E8%AF%B4%E6%98%8E)准备对应的数据即可。标注文件样例如下,其中`0`,`1`分别是图片对应所属的类别: +``` + # 每一行采用"空格"分隔图像路径与标注 + train/000001.jpg 0 + train/000002.jpg 0 + train/000003.jpg 1 + ... +``` + +此外,标签文件`phone_label_list.txt`,帮助将分类序号映射到具体的类型名称: +``` +0 make_a_phone_call # 类型0 +1 normal # 类型1 +``` + +完成上述内容后,放置于`dataset`目录下,文件结构如下: +``` +data/ +├── images # 放置所有图片 +├── phone_label_list.txt # 标签文件 +├── phone_train_list.txt # 训练列表,包含图片及其对应类型 +└── phone_val_list.txt # 测试列表,包含图片及其对应类型 +``` + +## 模型优化 + +### 检测-跟踪模型优化 +基于分类的行为识别模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,或是难以正确在不同帧之间正确分配人物ID,都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。 + + +### 半身图预测 +在打电话这一动作中,实际是通过上半身就能实现动作的区分的,因此在训练和预测过程中,将图像由行人全身图换为半身图 + +## 新增行为 + +### 数据准备 +参考前述介绍的内容,完成数据准备的部分,放置于`{root of PaddleClas}/dataset`下: +``` +data/ +├── images # 放置所有图片 +├── label_list.txt # 标签文件 +├── train_list.txt # 训练列表,包含图片及其对应类型 +└── val_list.txt # 测试列表,包含图片及其对应类型 +``` +其中,训练及测试列表如下: +``` + # 每一行采用"空格"分隔图像路径与标注 + train/000001.jpg 0 + train/000002.jpg 0 + train/000003.jpg 1 + train/000004.jpg 2 # 新增的类别直接填写对应类别号即可 + ... +``` +`label_list.txt`中需要同样对应扩展类型的名称: +``` +0 make_a_phone_call # 类型0 +1 Your New Action # 类型1 + ... +n normal # 类型n +``` + +### 配置文件设置 +在PaddleClas中已经集成了[训练配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml),需要重点关注的设置项如下: + +```yaml +# model architecture +Arch: + name: PPHGNet_tiny + class_num: 2 # 对应新增后的数量 + + ... + +# 正确设置image_root与cls_label_path,保证image_root + cls_label_path中的图片路径能够正确访问图片路径 +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ + cls_label_path: ./dataset/phone_train_list_halfbody.txt + + ... + +Infer: + infer_imgs: docs/images/inference_deployment/whl_demo.jpg + batch_size: 1 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 2 # 显示topk的数量,不要超过类别总数 + class_id_map_file: dataset/phone_label_list.txt # 修改后的label_list.txt路径 +``` + +### 模型训练及评估 +#### 模型训练 +通过如下命令启动训练: +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \ + -o Arch.pretrained=True +``` +其中 `Arch.pretrained` 为 `True`表示使用预训练权重帮助训练。 + +#### 模型评估 + +训练好模型之后,可以通过以下命令实现对模型指标的评估。 + +```bash +python3 tools/eval.py \ + -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \ + -o Global.pretrained_model=output/PPHGNet_tiny/best_model +``` + +其中 `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` 指定了当前最佳权重所在的路径,如果指定其他权重,只需替换对应的路径即可。 + +### 模型导出 +模型导出的详细介绍请参考[这里](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model) +可以参考以下步骤实现: +```python +python tools/export_model.py + -c ./PPHGNet_tiny_calling_halfbody.yaml \ + -o Global.pretrained_model=./output/PPHGNet_tiny/best_model \ + -o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody +``` +然后将导出的模型重命名,并加入配置文件,以适配PP-Human的使用。 +```bash +cd ./output_inference/PPHGNet_tiny_calling_halfbody + +mv inference.pdiparams model.pdiparams +mv inference.pdiparams.info model.pdiparams.info +mv inference.pdmodel model.pdmodel + +# 下载预测配置文件 +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml +``` + +至此,即可使用PP-Human进行实际预测了。 + + +### 自定义行为输出 +基于人体id的分类的行为识别方案中,将任务转化为对应人物的图像进行图片级别的分类。对应分类的类型最终即视为当前阶段的行为。因此在完成自定义模型的训练及部署的基础上,还需要将分类模型结果转化为最终的行为识别结果作为输出,并修改可视化的显示结果。 + +#### 转换为行为识别结果 +请对应修改[后处理函数](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L509)。 + +核心代码为: +```python +# 确定分类模型的最高分数输出结果 +cls_id_res = 1 +cls_score_res = -1.0 +for cls_id in range(len(cls_result[idx])): + score = cls_result[idx][cls_id] + if score > cls_score_res: + cls_id_res = cls_id + cls_score_res = score + +# Current now, class 0 is positive, class 1 is negative. +if cls_id_res == 1 or (cls_id_res == 0 and + cls_score_res < self.threshold): + # 如果分类结果不是目标行为或是置信度未达到阈值,则根据历史结果确定当前帧的行为 + history_cls, life_remain, history_score = self.result_history.get( + tracker_id, [1, self.frame_life, -1.0]) + cls_id_res = history_cls + cls_score_res = 1 - cls_score_res + life_remain -= 1 + if life_remain <= 0 and tracker_id in self.result_history: + del (self.result_history[tracker_id]) + elif tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + else: + self.result_history[ + tracker_id] = [cls_id_res, life_remain, cls_score_res] +else: + # 分类结果属于目标行为,则使用将该结果,并记录到历史结果中 + self.result_history[ + tracker_id] = [cls_id_res, self.frame_life, cls_score_res] + + ... +``` + +#### 修改可视化输出 +目前基于ID的行为识别,是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。 diff --git a/docs/advanced_tutorials/customization/action_recognotion/idbased_clas_en.md b/docs/advanced_tutorials/customization/action_recognotion/idbased_clas_en.md new file mode 100644 index 0000000000000000000000000000000000000000..fc28ccc7029c7ea7f1e63d0ee4f97962747e7ad3 --- /dev/null +++ b/docs/advanced_tutorials/customization/action_recognotion/idbased_clas_en.md @@ -0,0 +1,224 @@ +[简体中文](./idbased_clas.md) | English + +# Development for Action Recognition Based on Classification with Human ID + +## Environmental Preparation +The model of action recognition based on classification with human id is trained with [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). Please refer to [Install PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md) to complete the environment installation for subsequent model training and usage processes. + +## Data Preparation + +The model of action recognition based on classification with human id directly recognizes the image frames of video, so the model training process is same with the usual image classification model. + +### Dataset Download + +The action recognition of making phone calls is trained on the public dataset [UAV-Human](https://github.com/SUTDCV/UAV-Human). Please fill in the relevant application materials through this link to obtain the download link. + +The RGB video in this dataset is included in the `UAVHuman/ActionRecognition/RGBVideos` path, and the file name of each video is its annotation information. + +### Image Processing for Training and Validation +According to the video file name, in which the `A` field (i.e. action) related to action recognition, we can find the action type of the video data that we expect to recognize. +- Positive sample video: Taking phone calls as an example, we just need to find the file containing `A024`. +- Negative sample video: All videos except the target action. + +In view of the fact that there will be much redundancy when converting video data into images, for positive sample videos, we sample at intervals of 8 frames, and use the pedestrian detection model to process it into a half-body image (take the upper half of the detection frame, that is, `img = img[: H/2, :, :]`). The image sampled from the positive sample video is regarded as a positive sample, and the sampled image from the negative sample video is regarded as a negative sample. + +**Note**: The positive sample video does not completely are the action of making a phone call. There will be some redundant actions at the beginning and end of the video, which need to be removed. + + +### Preparation for Annotation File +The model of action recognition based on classification with human id is trained with [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). Thus the model trained with this scheme needs to prepare the desired image data and corresponding annotation files. Please refer to [Image Classification Datasets](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/data_preparation/classification_dataset_en.md) to prepare the data. An example of an annotation file is as follows, where `0` and `1` are the corresponding categories of the image: + +``` + # Each line uses "space" to separate the image path and label + train/000001.jpg 0 + train/000002.jpg 0 + train/000003.jpg 1 + ... +``` + +Additionally, the label file `phone_label_list.txt` helps map category numbers to specific type names: +``` +0 make_a_phone_call # type 0 +1 normal # type 1 +``` + +After the above content finished, place it to the `dataset` directory, the file structure is as follow: +``` +data/ +├── images # All images +├── phone_label_list.txt # Label file +├── phone_train_list.txt # Training list, including pictures and their corresponding types +└── phone_val_list.txt # Validation list, including pictures and their corresponding types +``` + +## Model Optimization + +### Detection-Tracking Model Optimization +The performance of action recognition based on classification with human id depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization. + + +### Half-Body Prediction +In the action of making a phone call, the action classification can be achieved through the upper body image. Therefore, during the training and prediction process, the image is changed from the pedestrian full-body to half-body. + +## Add New Action + +### Data Preparation +Referring to the previous introduction, complete the data preparation part and place it under `{root of PaddleClas}/dataset`: + +``` +data/ +├── images # All images +├── label_list.txt # Label file +├── train_list.txt # Training list, including pictures and their corresponding types +└── val_list.txt # Validation list, including pictures and their corresponding types +``` +Where the training list and validation list file are as follow: +``` + # Each line uses "space" to separate the image path and label + train/000001.jpg 0 + train/000002.jpg 0 + train/000003.jpg 1 + train/000004.jpg 2 # For the newly added categories, simply fill in the corresponding category number. + +`label_list.txt` should give name of the extension type: +``` +0 make_a_phone_call # class 0 +1 Your New Action # class 1 + ... +n normal # class n +``` + ... +``` + +### Configuration File Settings +The [training configuration file] (https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml) has been integrated in PaddleClas. The settings that need to be paid attention to are as follows: + +```yaml +# model architecture +Arch: + name: PPHGNet_tiny + class_num: 2 # Corresponding to the number of action categories + + ... + +# Please correctly set image_root and cls_label_path to ensure that the image_root + image path in cls_label_path can access the image correctly +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ + cls_label_path: ./dataset/phone_train_list_halfbody.txt + + ... + +Infer: + infer_imgs: docs/images/inference_deployment/whl_demo.jpg + batch_size: 1 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 2 # Display the number of topks, do not exceed the total number of categories + class_id_map_file: dataset/phone_label_list.txt # path of label_list.txt +``` + +### Model Training And Evaluation +#### Model Training +Start training with the following command: +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \ + -o Arch.pretrained=True +``` +where `Arch.pretrained=True` is to use pretrained weights to help with training. + +#### Model Evaluation +After training the model, use the following command to evaluate the model metrics. +```bash +python3 tools/eval.py \ + -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \ + -o Global.pretrained_model=output/PPHGNet_tiny/best_model +``` +Where `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` specifies the path where the current best weight is located. If other weights are needed, just replace the corresponding path. + +#### Model Export +For the detailed introduction of model export, please refer to [here](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model) +You can refer to the following steps: + +```python +python tools/export_model.py + -c ./PPHGNet_tiny_calling_halfbody.yaml \ + -o Global.pretrained_model=./output/PPHGNet_tiny/best_model \ + -o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody +``` + +Then rename the exported model and add the configuration file to suit the usage of PP-Human. +```bash +cd ./output_inference/PPHGNet_tiny_calling_halfbody + +mv inference.pdiparams model.pdiparams +mv inference.pdiparams.info model.pdiparams.info +mv inference.pdmodel model.pdmodel + +# Download configuration file for inference +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml +``` + +At this point, this model can be used in PP-Human. + +### Custom Action Output +In the model of action recognition based on classification with human id, the task is defined as a picture-level classification task of corresponding person. The type of the corresponding classification is finally regarded as the action type of the current stage. Therefore, on the basis of completing the training and deployment of the custom model, it is also necessary to convert the classification model results to the final action recognition results as output, and the displayed result of the visualization should be modified. + +Please modify the [postprocessing function](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L509). + +The core code are: +```python +# Get the highest score output of the classification model +cls_id_res = 1 +cls_score_res = -1.0 +for cls_id in range(len(cls_result[idx])): + score = cls_result[idx][cls_id] + if score > cls_score_res: + cls_id_res = cls_id + cls_score_res = score + +# Current now, class 0 is positive, class 1 is negative. +if cls_id_res == 1 or (cls_id_res == 0 and + cls_score_res < self.threshold): + # If the classification result is not the target action or its confidence does not reach the threshold, + # determine the action type of the current frame according to the historical results + history_cls, life_remain, history_score = self.result_history.get( + tracker_id, [1, self.frame_life, -1.0]) + cls_id_res = history_cls + cls_score_res = 1 - cls_score_res + life_remain -= 1 + if life_remain <= 0 and tracker_id in self.result_history: + del (self.result_history[tracker_id]) + elif tracker_id in self.result_history: + self.result_history[tracker_id][1] = life_remain + else: + self.result_history[ + tracker_id] = [cls_id_res, life_remain, cls_score_res] +else: + # If the classification result belongs to the target action, use the result and record it in the historical result + self.result_history[ + tracker_id] = [cls_id_res, self.frame_life, cls_score_res] + + ... +``` + +#### Modify Visual Output +At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result. diff --git a/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md b/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md new file mode 100644 index 0000000000000000000000000000000000000000..02c7d9940c625074f24b8c651c2bcc9f9c138828 --- /dev/null +++ b/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md @@ -0,0 +1,202 @@ +简体中文 | [English](./idbased_det_en.md) + +# 基于人体id的检测模型开发 + +## 环境准备 + +基于人体id的检测方案是直接使用[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL_cn.md)完成环境安装,以进行后续的模型训练及使用流程。 + +## 数据准备 +基于检测的行为识别方案中,数据准备的流程与一般的检测模型一致,详情可参考[目标检测数据准备](../../../tutorials/data/PrepareDetDataSet.md)。将图像和标注数据组织成PaddleDetection中支持的格式之一即可。 + +**注意** : 在实际使用的预测过程中,使用的是单人图像进行预测,因此在训练过程中建议将图像裁剪为单人图像,再进行烟头检测框的标注,以提升准确率。 + + +## 模型优化 + +### 检测-跟踪模型优化 +基于检测的行为识别模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,或是难以正确在不同帧之间正确分配人物ID,都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。 + + +### 更大的分辨率 +烟头的检测在监控视角下是一个典型的小目标检测问题,使用更大的分辨率有助于提升模型整体的识别率 + +### 预训练模型 +加入小目标场景数据集VisDrone下的预训练模型进行训练,模型mAP由38.1提升到39.7。 + +## 新增行为 +### 数据准备 +参考[目标检测数据准备](../../../tutorials/data/PrepareDetDataSet.md)完成训练数据准备。 + +准备完成后,数据路径为 +``` +dataset/smoking +├── smoking # 存放所有的图片 +│   ├── 1.jpg +│   ├── 2.jpg +├── smoking_test_cocoformat.json # 测试标注文件 +├── smoking_train_cocoformat.json # 训练标注文件 +``` + +以`COCO`格式为例,完成后的json标注文件内容如下: + +```json +# images字段下包含了图像的路径,id及对应宽高信息 + "images": [ + { + "file_name": "smoking/1.jpg", + "id": 0, # 此处id为图片id序号,不要重复 + "height": 437, + "width": 212 + }, + { + "file_name": "smoking/2.jpg", + "id": 1, + "height": 655, + "width": 365 + }, + + ... + +# categories 字段下包含所有类别信息,如果希望新增更多的检测类别,请在这里增加, 示例如下。 + "categories": [ + { + "supercategory": "cigarette", + "id": 1, + "name": "cigarette" + }, + { + "supercategory": "Class_Defined_by_Yourself", + "id": 2, + "name": "Class_Defined_by_Yourself" + }, + + ... + +# annotations 字段下包含了所有目标实例的信息,包括类别,检测框坐标, id, 所属图像id等信息 + "annotations": [ + { + "category_id": 1, # 对应定义的类别,在这里1代表cigarette + "bbox": [ + 97.0181345931, + 332.7033243081, + 7.5943999555, + 16.4545332369 + ], + "id": 0, # 此处id为实例的id序号,不要重复 + "image_id": 0, # 此处为实例所在图片的id序号,可能重复,此时即一张图片上有多个实例对象 + "iscrowd": 0, + "area": 124.96230648208665 + }, + { + "category_id": 2, # 对应定义的类别,在这里2代表Class_Defined_by_Yourself + "bbox": [ + 114.3895698372, + 221.9131122343, + 25.9530363697, + 50.5401234568 + ], + "id": 1, + "image_id": 1, + "iscrowd": 0, + "area": 1311.6696622034585 +``` + +### 配置文件设置 +参考[配置文件](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), 其中需要关注重点如下: + +```yaml +metric: COCO +num_classes: 1 # 如果新增了更多的类别,请对应修改此处 + +# 正确设置image_dir,anno_path,dataset_dir +# 保证dataset_dir + anno_path 能正确对应标注文件的路径 +# 保证dataset_dir + image_dir + 标注文件中的图片路径可以正确对应到图片路径 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_train_cocoformat.json + dataset_dir: dataset/smoking + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking + +TestDataset: + !ImageFolder + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking +``` + +### 模型训练及评估 +#### 模型训练 + +参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),执行下列步骤实现 +```bash +# At Root of PaddleDetection + +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval +``` + +#### 模型评估 + +训练好模型之后,可以通过以下命令实现对模型指标的评估 +```bash +# At Root of PaddleDetection + +python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml +``` + +### 模型导出 +注意:如果在Tensor-RT环境下预测, 请开启`-o trt=True`以获得更好的性能 +```bash +# At Root of PaddleDetection + +python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True +``` + +导出模型后,可以得到: +``` +ppyoloe_crn_s_80e_smoking_visdrone/ +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +└── model.pdmodel +``` + +至此,即可使用PP-Human进行实际预测了。 + + +### 自定义行为输出 +基于人体id的检测的行为识别方案中,将任务转化为在对应人物的图像中检测目标特征对象。当目标特征对象被检测到时,则视为行为正在发生。因此在完成自定义模型的训练及部署的基础上,还需要将检测模型结果转化为最终的行为识别结果作为输出,并修改可视化的显示结果。 + +#### 转换为行为识别结果 +请对应修改[后处理函数](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L338)。 +核心代码为: +```python +# 解析检测模型输出,并筛选出置信度高于阈值的有效检测框。 +# Current now, class 0 is positive, class 1 is negative. +action_ret = {'class': 1.0, 'score': -1.0} +box_num = np_boxes_num[idx] +boxes = det_result['boxes'][cur_box_idx:cur_box_idx + box_num] +cur_box_idx += box_num +isvalid = (boxes[:, 1] > self.threshold) & (boxes[:, 0] == 0) +valid_boxes = boxes[isvalid, :] + +if valid_boxes.shape[0] >= 1: + # 存在有效检测框时,行为识别结果的类别和分数对应修改 + action_ret['class'] = valid_boxes[0, 0] + action_ret['score'] = valid_boxes[0, 1] + # 由于动作的持续性,有效检测结果可复用一定帧数 + self.result_history[ + tracker_id] = [0, self.frame_life, valid_boxes[0, 1]] +else: + # 不存在有效检测框,则根据历史检测数据确定当前帧的结果 + ... +``` + +#### 修改可视化输出 +目前基于ID的行为识别,是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。 diff --git a/docs/advanced_tutorials/customization/action_recognotion/idbased_det_en.md b/docs/advanced_tutorials/customization/action_recognotion/idbased_det_en.md new file mode 100644 index 0000000000000000000000000000000000000000..49cac4343ae2296d9d97cff16f38002eedbb2dcf --- /dev/null +++ b/docs/advanced_tutorials/customization/action_recognotion/idbased_det_en.md @@ -0,0 +1,199 @@ +[简体中文](./idbased_det.md) | English + +# Development for Action Recognition Based on Detection with Human ID + +## Environmental Preparation +The model of action recognition based on detection with human id is trained with [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). Please refer to [Installation](../../../tutorials/INSTALL.md) to complete the environment installation for subsequent model training and usage processes. + +## Data Preparation + +The model of action recognition based on detection with human id directly recognizes the image frames of video, so the model training process is same with preparation process of general detection model. For details, please refer to [Data Preparation for Detection](../../../tutorials/data/PrepareDetDataSet_en.md). Please process image and annotation of data into one of the formats PaddleDetection supports. + +**Note**: In the actual prediction process, a single person image is used for prediction. So it is recommended to crop the image into a single person image during the training process, and label the cigarette detection bounding box to improve the accuracy. + + +## Model Optimization +### Detection-Tracking Model Optimization +The performance of action recognition based on detection with human id depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization. + + +### Larger resolution +The detection of cigarette is a typical small target detection problem from the monitoring perspective. Using a larger resolution can help improve the overall performance of the model. + +### Pretrained model +The pretrained model under the small target scene dataset VisDrone is used for training, and the mAP of the model is increased from 38.1 to 39.7. + +## Add New Action +### Data Preparation +please refer to [Data Preparation for Detection](../../../tutorials/data/PrepareDetDataSet_en.md) to complete the data preparation part. + +When finish this step, the path will look like: +``` +dataset/smoking +├── smoking # all images +│   ├── 1.jpg +│   ├── 2.jpg +├── smoking_test_cocoformat.json # Validation file +├── smoking_train_cocoformat.json # Training file +``` + +Taking the `COCO` format as an example, the content of the completed json annotation file is as follows: + +```json +# The "images" field contains the path, id and corresponding width and height information of the images. + "images": [ + { + "file_name": "smoking/1.jpg", + "id": 0, # Here id is the picture id serial number, do not duplicate + "height": 437, + "width": 212 + }, + { + "file_name": "smoking/2.jpg", + "id": 1, + "height": 655, + "width": 365 + }, + + ... + +# The "categories" field contains all category information. If you want to add more detection categories, please add them here. The example is as follows. + "categories": [ + { + "supercategory": "cigarette", + "id": 1, + "name": "cigarette" + }, + { + "supercategory": "Class_Defined_by_Yourself", + "id": 2, + "name": "Class_Defined_by_Yourself" + }, + + ... + +# The "annotations" field contains information about all instances, including category, bounding box coordinates, id, image id and other information + "annotations": [ + { + "category_id": 1, # Corresponding to the defined category, where 1 represents cigarette + "bbox": [ + 97.0181345931, + 332.7033243081, + 7.5943999555, + 16.4545332369 + ], + "id": 0, # Here id is the id serial number of the instance, do not duplicate + "image_id": 0, # Here is the id serial number of the image where the instance is located, which may be duplicated. In this case, there are multiple instance objects on one image. + "iscrowd": 0, + "area": 124.96230648208665 + }, + { + "category_id": 2, # Corresponding to the defined category, where 2 represents Class_Defined_by_Yourself + "bbox": [ + 114.3895698372, + 221.9131122343, + 25.9530363697, + 50.5401234568 + ], + "id": 1, + "image_id": 1, + "iscrowd": 0, + "area": 1311.6696622034585 +``` + +### Configuration File Settings +Refer to [Configuration File](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), the key should be paid attention to are as follows: +```yaml +metric: COCO +num_classes: 1 # If more categories are added, please modify here accordingly + +# Set image_dir,anno_path,dataset_dir correctly +# Ensure that dataset_dir + anno_path can correctly access to the path of the annotation file +# Ensure that dataset_dir + image_dir + the image path in the annotation file can correctly access to the image path +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_train_cocoformat.json + dataset_dir: dataset/smoking + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking + +TestDataset: + !ImageFolder + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking +``` + +### Model Training And Evaluation +#### Model Training +As [PP-YOLOE](../../../../configs/ppyoloe/README.md), start training with the following command: +```bash +# At Root of PaddleDetection + +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval +``` + +#### Model Evaluation +After training the model, use the following command to evaluate the model metrics. + +```bash +# At Root of PaddleDetection + +python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml +``` + +#### Model Export +Note: If predicting in Tensor-RT environment, please enable `-o trt=True` for better performance. +```bash +# At Root of PaddleDetection + +python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True +``` + +After exporting the model, you can get: +``` +ppyoloe_crn_s_80e_smoking_visdrone/ +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +└── model.pdmodel +``` + +At this point, this model can be used in PP-Human. + +### Custom Action Output +In the model of action recognition based on detection with human id, the task is defined to detect target objects in images of corresponding person. When the target object is detected, the behavior type of the character in a certain period of time. The type of the corresponding classification is regarded as the action of the current period. Therefore, on the basis of completing the training and deployment of the custom model, it is also necessary to convert the detection model results to the final action recognition results as output, and the displayed result of the visualization should be modified. + +#### Convert to Action Recognition Result +Please modify the [postprocessing function](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L338). + +The core code are: +```python +# Parse the detection model output and filter out valid detection boxes with confidence higher than a threshold. +# Current now, class 0 is positive, class 1 is negative. +action_ret = {'class': 1.0, 'score': -1.0} +box_num = np_boxes_num[idx] +boxes = det_result['boxes'][cur_box_idx:cur_box_idx + box_num] +cur_box_idx += box_num +isvalid = (boxes[:, 1] > self.threshold) & (boxes[:, 0] == 0) +valid_boxes = boxes[isvalid, :] + +if valid_boxes.shape[0] >= 1: + # When there is a valid detection frame, the category and score of the behavior recognition result are modified accordingly. + action_ret['class'] = valid_boxes[0, 0] + action_ret['score'] = valid_boxes[0, 1] + # Due to the continuity of the action, valid detection results can be reused for a certain number of frames. + self.result_history[ + tracker_id] = [0, self.frame_life, valid_boxes[0, 1]] +else: + # If there is no valid detection frame, the result of the current frame is determined according to the historical detection result. + ... +``` + +#### Modify Visual Output +At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result. diff --git a/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md b/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md new file mode 100644 index 0000000000000000000000000000000000000000..21cca224ba8d794e00d25937e14b6f10aeaf324f --- /dev/null +++ b/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md @@ -0,0 +1,205 @@ +简体中文 | [English](./skeletonbased_rec_en.md) + +# 基于人体骨骼点的行为识别 + +## 环境准备 + +基于骨骼点的行为识别方案是借助[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/install.md)完成PaddleVideo的环境安装,以进行后续的模型训练及使用流程。 + +## 数据准备 +使用该方案训练的模型,可以参考[此文档](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE)准备训练数据,以适配PaddleVideo进行训练,其主要流程包含以下步骤: + + +### 数据格式说明 +STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo中,训练数据为采用`.npy`格式存储的`Numpy`数据,标签则可以是`.npy`或`.pkl`格式存储的文件。对于序列数据的维度要求为`(N,C,T,V,M)`,当前方案仅支持单人构成的行为(但视频中可以存在多人,每个人独自进行行为识别判断),即`M=1`。 + +| 维度 | 大小 | 说明 | +| ---- | ---- | ---------- | +| N | 不定 | 数据集序列个数 | +| C | 2 | 关键点坐标维度,即(x, y) | +| T | 50 | 动作序列的时序维度(即持续帧数)| +| V | 17 | 每个人物关键点的个数,这里我们使用了`COCO`数据集的定义,具体可见[这里](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareKeypointDataSet_cn.md#COCO%E6%95%B0%E6%8D%AE%E9%9B%86) | +| M | 1 | 人物个数,这里我们每个动作序列只针对单人预测 | + +### 获取序列的骨骼点坐标 +对于一个待标注的序列(这里序列指一个动作片段,可以是视频或有顺序的图片集合)。可以通过模型预测或人工标注的方式获取骨骼点(也称为关键点)坐标。 +- 模型预测:可以直接选用[PaddleDetection KeyPoint模型系列](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/keypoint) 模型库中的模型,并根据`3、训练与测试 - 部署预测 - 检测+keypoint top-down模型联合部署`中的步骤获取目标序列的17个关键点坐标。 +- 人工标注:若对关键点的数量或是定义有其他需求,也可以直接人工标注各个关键点的坐标位置,注意对于被遮挡或较难标注的点,仍需要标注一个大致坐标,否则后续网络学习过程会受到影响。 + + +当使用模型预测获取时,可以参考如下步骤进行,请注意此时在PaddleDetection中进行操作。 + +```bash +# current path is under root of PaddleDetection + +# Step 1: download pretrained inference models. +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip +unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip +unzip -d output_inference/ dark_hrnet_w32_256x192.zip + +# Step 2: Get the keypoint coordinarys + +# if your data is image sequence +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True + +# if your data is video +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True +``` +这样我们会得到一个`det_keypoint_unite_image_results.json`的检测结果文件。内容的具体含义请见[这里](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108)。 + + +### 统一序列的时序长度 +由于实际数据中每个动作的长度不一,首先需要根据您的数据和实际场景预定时序长度(在PP-Human中我们采用50帧为一个动作序列),并对数据做以下处理: +- 实际长度超过预定长度的数据,随机截取一个50帧的片段 +- 实际长度不足预定长度的数据:补0,直到满足50帧 +- 恰好等于预定长度的数据: 无需处理 + +注意:在这一步完成后,请严格确认处理后的数据仍然包含了一个完整的行为动作,不会产生预测上的歧义,建议通过可视化数据的方式进行确认。 + +### 保存为PaddleVideo可用的文件格式 +在经过前两步处理后,我们得到了每个人物动作片段的标注,此时我们已有一个列表`all_kpts`,这个列表中包含多个关键点序列片段,其中每一个片段形状为(T, V, C) (在我们的例子中即(50, 17, 2)), 下面进一步将其转化为PaddleVideo可用的格式。 +- 调整维度顺序: 可通过`np.transpose`和`np.expand_dims`将每一个片段的维度转化为(C, T, V, M)的格式。 +- 将所有片段组合并保存为一个文件 + +注意:这里的`class_id`是`int`类型,与其他分类任务类似。例如`0:摔倒, 1:其他`。 + + +我们提供了执行该步骤的[脚本文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py),可以直接处理生成的`det_keypoint_unite_image_results.json`文件,该脚本执行的内容包括解析json文件内容、前述步骤中介绍的整理训练数据及保存数据文件。 + +```bash +mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations + +mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json + +cd {root of PaddleVideo}/applications/PPHuman/datasets/ + +python prepare_dataset.py +``` + +至此,我们得到了可用的训练数据(`.npy`)和对应的标注文件(`.pkl`)。 + + +## 模型优化 + +### 检测-跟踪模型优化 +基于骨骼点的行为识别模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,或是难以正确在不同帧之间正确分配人物ID,都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。 + +### 关键点模型优化 +骨骼点作为该方案的核心特征,对行人的骨骼点定位效果也决定了行为识别的整体效果。若发现在实际场景中对关键点坐标的识别结果有明显错误,从关键点组成的骨架图像看,已经难以辨别具体动作,可以参考[关键点检测任务二次开发](../keypoint_detection.md)对关键点模型进行优化。 + +### 坐标归一化处理 +在完成骨骼点坐标的获取后,建议根据各人物的检测框进行归一化处理,以消除人物位置、尺度的差异给网络带来的收敛难度。 + + +## 新增行为 + +基于关键点的行为识别方案中,行为识别模型使用的是[ST-GCN](https://arxiv.org/abs/1801.07455),并在[PaddleVideo训练流程](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md)的基础上修改适配,完成模型训练及导出使用流程。 + + +### 数据准备与配置文件修改 +- 按照`数据准备`, 准备训练数据(`.npy`)和对应的标注文件(`.pkl`)。对应放置在`{root of PaddleVideo}/applications/PPHuman/datasets/`下。 + +- 参考[配置文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), 需要重点关注的内容如下: + +```yaml +MODEL: #MODEL field + framework: + backbone: + name: "STGCN" + in_channels: 2 # 此处对应数据说明中的C维,表示二维坐标。 + dropout: 0.5 + layout: 'coco_keypoint' + data_bn: True + head: + name: "STGCNHead" + num_classes: 2 # 如果数据中有多种行为类型,需要修改此处使其与预测类型数目一致。 + if_top5: False # 行为类型数量不足5时请设置为False,否则会报错 + +... + + +# 请根据数据路径正确设置train/valid/test部分的数据及label路径 +DATASET: #DATASET field + batch_size: 64 + num_workers: 4 + test_batch_size: 1 + test_num_workers: 0 + train: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle + file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path + label_path: "./applications/PPHuman/datasets/train_label.pkl" + + valid: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset' + file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path + label_path: "./applications/PPHuman/datasets/val_label.pkl" + + test_mode: True + test: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset' + file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path + label_path: "./applications/PPHuman/datasets/val_label.pkl" + + test_mode: True +``` + +### 模型训练与测试 +- 在PaddleVideo中,使用以下命令即可开始训练: + +```bash +# current path is under root of PaddleVideo +python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml + +# 由于整个任务可能过拟合,建议同时开启验证以保存最佳模型 +python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml +``` + +- 在训练完成后,采用以下命令进行预测: +```bash +python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml -w output/STGCN/STGCN_best.pdparams +``` + +### 模型导出 +- 在PaddleVideo中,通过以下命令实现模型的导出,得到模型结构文件`STGCN.pdmodel`和模型权重文件`STGCN.pdiparams`,并增加配置文件: +```bash +# current path is under root of PaddleVideo +python tools/export_model.py -c applications/PPHuman/configs/stgcn_pphuman.yaml \ + -p output/STGCN/STGCN_best.pdparams \ + -o output_inference/STGCN + +cp applications/PPHuman/configs/infer_cfg.yml output_inference/STGCN + +# 重命名模型文件,适配PP-Human的调用 +cd output_inference/STGCN +mv STGCN.pdiparams model.pdiparams +mv STGCN.pdiparams.info model.pdiparams.info +mv STGCN.pdmodel model.pdmodel +``` +完成后的导出模型目录结构如下: +``` +STGCN +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +├── model.pdmodel +``` + +至此,就可以使用PP-Human进行行为识别的推理了。 + +**注意**:如果在训练时调整了视频序列的长度或关键点的数量,在此处需要对应修改配置文件中`INFERENCE`字段内容,以实现正确预测。 +```yaml +# 序列数据的维度为(N,C,T,V,M) +INFERENCE: + name: 'STGCN_Inference_helper' + num_channels: 2 # 对应C维 + window_size: 50 # 对应T维,请对应调整为数据长度 + vertex_nums: 17 # 对应V维,请对应调整为关键点数目 + person_nums: 1 # 对应M维 +``` + +### 自定义行为输出 +基于人体骨骼点的行为识别方案中,模型输出的分类结果即代表了该人物在一定时间段内行为类型。对应分类的类型最终即视为当前阶段的行为。因此在完成自定义模型的训练及部署的基础上,使用模型输出作为最终结果,修改可视化的显示结果即可。 + +#### 修改可视化输出 +目前基于ID的行为识别,是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。 diff --git a/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec_en.md b/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec_en.md new file mode 100644 index 0000000000000000000000000000000000000000..a1e8d1f3ca8096c03cab9bb735306ba0db742474 --- /dev/null +++ b/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec_en.md @@ -0,0 +1,200 @@ +[简体中文](./skeletonbased_rec.md) | English + +# Skeleton-based action recognition + +## Environmental Preparation +The skeleton-based action recognition is trained with [PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo). Please refer to [Installation](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/install.md) to complete the environment installation for subsequent model training and usage processes. + +## Data Preparation +For the model of skeleton-based model, you can refer to [this document](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE %AD%E7%BB%83%E6%95%B0%E6%8D%AE) to preparation training adapted to PaddleVideo. The main process includes the following steps: + + +### Data Format Description +STGCN is a model based on the sequence of skeleton point coordinates. In PaddleVideo, training data is `Numpy` data stored with `.npy` format, and labels can be files stored in `.npy` or `.pkl` format. The dimension requirement for sequence data is `(N,C,T,V,M)`, the current solution only supports behaviors composed of a single person (but there can be multiple people in the video, and each person performs action recognition separately), that is` M=1`. + +| Dim | Size | Description | +| ---- | ---- | ---------- | +| N | Not Fixed | The number of sequences in the dataset | +| C | 2 | Keypoint coordinate, i.e. (x, y) | +| T | 50 | The temporal dimension of the action sequence (i.e. the number of continuous frames)| +| V | 17 | The number of keypoints of each person, here we use the definition of the `COCO` dataset, see [here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareKeypointDataSet_en.md#description-for-coco-datasetkeypoint) | +| M | 1 | The number of persons, here we only predict a single person for each action sequence | + +### Get The Skeleton Point Coordinates of The Sequence +For a sequence to be labeled (here a sequence refers to an action segment, which can be a video or an ordered collection of pictures). The coordinates of skeletal points (also known as keypoints) can be obtained through model prediction or manual annotation. +- Model prediction: You can directly select the model in the [PaddleDetection KeyPoint Models](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/README_en.md) and according to `3, training and testing - Deployment Prediction - Detect + keypoint top-down model joint deployment` to get the 17 keypoint coordinates of the target sequence. + +When using the model to predict and obtain the coordinates, you can refer to the following steps, please note that the operation in PaddleDetection at this time. + +```bash +# current path is under root of PaddleDetection + +# Step 1: download pretrained inference models. +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip +unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip +unzip -d output_inference/ dark_hrnet_w32_256x192.zip + +# Step 2: Get the keypoint coordinarys + +# if your data is image sequence +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True + +# if your data is video +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True +``` +We can get a detection result file named `det_keypoint_unite_image_results.json`. The detail of content can be seen at [Here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108). + + +### Uniform Sequence Length +Since the length of each action in the actual data is different, the first step is to pre-determine the time sequence length according to your data and the actual scene (in PP-Human, we use 50 frames as an action sequence), and do the following processing to the data: +- If the actual length exceeds the predetermined length, a 50-frame segment will be randomly intercepted +- Data whose actual length is less than the predetermined length: fill with 0 until 50 frames are met +- data exactly equal to the predeter: no processing required + +Note: After this step is completed, please strictly confirm that the processed data contains a complete action, and there will be no ambiguity in prediction. It is recommended to confirm by visualizing the data. + +### Save to PaddleVideo usable formats +After the first two steps of processing, we get the annotation of each character action fragment. At this time, we have a list `all_kpts`, which contains multiple keypoint sequence fragments, each one has a shape of (T, V, C) (in our case (50, 17, 2)), which is further converted into a format usable by PaddleVideo. +- Adjust dimension order: `np.transpose` and `np.expand_dims` can be used to convert the dimension of each fragment into (C, T, V, M) format. +- Combine and save all clips as one file + +Note: `class_id` is a `int` type variable, similar to other classification tasks. For example `0: falling, 1: other`. + +We provide a [script file](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py) to do this step, which can directly process the generated `det_keypoint_unite_image_results.json` file. The content executed by the script includes parsing the content of the json file, unforming the training data sequence and saving the data file as described in the preceding steps. + +```bash +mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations + +mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json + +cd {root of PaddleVideo}/applications/PPHuman/datasets/ + +python prepare_dataset.py +``` + +Now, we have available training data (`.npy`) and corresponding annotation files (`.pkl`). + +## Model Optimization +### detection-tracking model optimization +The performance of action recognition based on skelenton depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization. + +### keypoint model optimization +As the core feature of the scheme, the skeleton point positioning performance also determines the overall effect of action recognition. If there are obvious errors in the recognition results of the keypoint coordinates of in the actual scene, it is difficult to distinguish the specific actions from the skeleton image composed of the keypoint. +You can refer to [Secondary Development of Keypoint Detection Task](../keypoint_detection_en.md) to optimize the keypoint model. + +### Coordinate Normalization +After getting coordinates of the skeleton points, it is recommended to perform normalization processing according to the detection bounding box of each person to reduce the convergence difficulty brought by the difference in the position and scale of the person. + +## Add New Action + +In skeleton-based action recognition, the model is [ST-GCN](https://arxiv.org/abs/1801.07455). Modified to adapt PaddleVideo based on [Training Step](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/model_zoo/recognition/stgcn.md). And complete the model training and exporting process. + +### Data Preparation And Configuration File Settings +- Prepare the training data (`.npy`) and the corresponding annotation file (`.pkl`) according to `Data preparation`. Correspondingly placed under `{root of PaddleVideo}/applications/PPHuman/datasets/`. + +- Refer [Configuration File](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), the things to focus on are as follows: + +```yaml +MODEL: #MODEL field + framework: + backbone: + name: "STGCN" + in_channels: 2 # This corresponds to the C dimension in the data format description, representing two-dimensional coordinates. + dropout: 0.5 + layout: 'coco_keypoint' + data_bn: True + head: + name: "STGCNHead" + num_classes: 2 # If there are multiple action types in the data, this needs to be modified to match the number of types. + if_top5: False # When the number of action types is less than 5, please set it to False, otherwise an error will be raised. + +... + + +# Please set the data and label path of the train/valid/test part correctly according to the data path +DATASET: #DATASET field + batch_size: 64 + num_workers: 4 + test_batch_size: 1 + test_num_workers: 0 + train: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle + file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path + label_path: "./applications/PPHuman/datasets/train_label.pkl" + + valid: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset' + file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path + label_path: "./applications/PPHuman/datasets/val_label.pkl" + + test_mode: True + test: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset' + file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path + label_path: "./applications/PPHuman/datasets/val_label.pkl" + + test_mode: True +``` + +### Model Training And Evaluation + +- In PaddleVideo, start training with the following command: +```bash +# current path is under root of PaddleVideo +python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml + +# Since the task may overfit, it is recommended to evaluate model during training to save the best model. +python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml +``` + +- After training the model, use the following command to do inference. +```bash +python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml -w output/STGCN/STGCN_best.pdparams +``` + +### Model Export + +In PaddleVideo, use the following command to export model and get structure file `STGCN.pdmodel` and weight file `STGCN.pdiparams`. And add the configuration file here. +```bash +# current path is under root of PaddleVideo +python tools/export_model.py -c applications/PPHuman/configs/stgcn_pphuman.yaml \ + -p output/STGCN/STGCN_best.pdparams \ + -o output_inference/STGCN + +cp applications/PPHuman/configs/infer_cfg.yml output_inference/STGCN + +# Rename model files to adapt PP-Human +cd output_inference/STGCN +mv STGCN.pdiparams model.pdiparams +mv STGCN.pdiparams.info model.pdiparams.info +mv STGCN.pdmodel model.pdmodel +``` + +The directory structure will look like: +``` +STGCN +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +├── model.pdmodel +``` +At this point, this model can be used in PP-Human. + +**Note**: If the length of the video sequence or the number of keypoints is changed during training, the content of the `INFERENCE` field in the configuration file needs to be modified accordingly to correct prediction. + +```yaml +# The dimension of the sequence data is (N,C,T,V,M) +INFERENCE: + name: 'STGCN_Inference_helper' + num_channels: 2 # Corresponding to C dimension + window_size: 50 # Corresponding to T dimension, please set it accordingly to the sequence length. + vertex_nums: 17 # Corresponding to V dimension, please set it accordingly to the number of keypoints + person_nums: 1 # Corresponding to M dimension +``` + +### Custom Action Output +In the skeleton-based action recognition, the classification result of the model represents the behavior type of the character in a certain period of time. The type of the corresponding classification is regarded as the action of the current period. Therefore, on the basis of completing the training and deployment of the custom model, the model output is directly used as the final result, and the displayed result of the visualization should be modified. + +#### Modify Visual Output +At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result. diff --git a/docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md b/docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md new file mode 100644 index 0000000000000000000000000000000000000000..522eebe3d135a789a7843676ca477ec5e4b23a2c --- /dev/null +++ b/docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md @@ -0,0 +1,159 @@ +# 基于视频分类的行为识别 + +## 数据准备 + +视频分类任务输入的视频格式一般为`.mp4`、`.avi`等格式视频或者是抽帧后的视频帧序列,标签则可以是`.txt`格式存储的文件。 + +对于打架识别任务,具体数据准备流程如下: + +### 数据集下载 + +打架识别基于6个公开的打架、暴力行为相关数据集合并后的数据进行模型训练。公开数据集具体信息如下: + +| 数据集 | 下载连接 | 简介 | 标注 | 数量 | 时长 | +| ---- | ---- | ---------- | ---- | ---- | ---------- | +| Surveillance Camera Fight Dataset| https://github.com/sayibet/fight-detection-surv-dataset | 裁剪视频,监控视角 | 视频级别 | 打架:150;非打架:150 | 2s | +| A Dataset for Automatic Violence Detection in Videos | https://github.com/airtlab/A-Dataset-for-Automatic-Violence-Detection-in-Videos | 裁剪视频,室内自行录制 | 视频级别 | 暴力行为:115个场景,2个机位,共230 ;非暴力行为:60个场景,2个机位,共120 | 几秒钟 | +| Hockey Fight Detection Dataset | https://www.kaggle.com/datasets/yassershrief/hockey-fight-vidoes?resource=download | 裁剪视频,非真实场景 | 视频级别 | 打架:500;非打架:500 | 2s | +| Video Fight Detection Dataset | https://www.kaggle.com/datasets/naveenk903/movies-fight-detection-dataset | 裁剪视频,非真实场景 | 视频级别 | 打架:100;非打架:101 | 2s | +| Real Life Violence Situations Dataset | https://www.kaggle.com/datasets/mohamedmustafa/real-life-violence-situations-dataset | 裁剪视频,非真实场景 | 视频级别 | 暴力行为:1000;非暴力行为:1000 | 几秒钟 | +| UBI Abnormal Event Detection Dataset| http://socia-lab.di.ubi.pt/EventDetection/ | 未裁剪视频,监控视角 | 帧级别 | 打架:216;非打架:784;裁剪后二次标注:打架1976,非打架1630 | 原视频几秒到几分钟不等,裁剪后2s | + +打架(暴力行为)视频3956个,非打架(非暴力行为)视频3501个,共7457个视频,每个视频几秒钟。 + +本项目为大家整理了前5个数据集,下载链接:[https://aistudio.baidu.com/aistudio/datasetdetail/149085](https://aistudio.baidu.com/aistudio/datasetdetail/149085)。 + +### 视频抽帧 + +首先下载PaddleVideo代码: +```bash +git clone https://github.com/PaddlePaddle/PaddleVideo.git +``` + +假设PaddleVideo源码路径为PaddleVideo_root。 + +为了加快训练速度,将视频进行抽帧。下面命令会根据视频的帧率FPS进行抽帧,如FPS=30,则每秒视频会抽取30帧图像。 + +```bash +cd ${PaddleVideo_root} +python data/ucf101/extract_rawframes.py dataset/ rawframes/ --level 2 --ext mp4 +``` +其中,假设视频已经存放在了`dataset`目录下,如果是其他路径请对应修改。打架(暴力)视频存放在`dataset/fight`中;非打架(非暴力)视频存放在`dataset/nofight`中。`rawframes`目录存放抽取的视频帧。 + +### 训练集和验证集划分 + +打架识别验证集1500条,来自Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、UBI Abnormal Event Detection Dataset三个数据集。 + +也可根据下面的命令将数据按照8:2的比例划分成训练集和测试集: + +```bash +python split_fight_train_test_dataset.py "rawframes" 2 0.8 +``` + +参数说明:“rawframes”为视频帧存放的文件夹;2表示目录结构为两级,第二级表示每个行为对应的子文件夹;0.8表示训练集比例。 + +其中`split_fight_train_test_dataset.py`文件在PaddleDetection中的`deploy/pipeline/tools`路径下。 + +执行完命令后会最终生成fight_train_list.txt和fight_val_list.txt两个文件。打架的标签为1,非打架的标签为0。 + +### 视频裁剪 +对于未裁剪的视频,如UBI Abnormal Event Detection Dataset数据集,需要先进行裁剪才能用于模型训练,`deploy/pipeline/tools/clip_video.py`中给出了视频裁剪的函数`cut_video`,输入为视频路径,裁剪的起始帧和结束帧以及裁剪后的视频保存路径。 + + +## 模型优化 + +### VideoMix +[VideoMix](https://arxiv.org/abs/2012.03457)是视频数据增强的方法之一,是对图像数据增强CutMix的扩展,可以缓解模型的过拟合问题。 + +与Mixup将两个视频片段的每个像素点按照一定比例融合不同的是,VideoMix是每个像素点要么属于片段A要么属于片段B。输出结果是两个片段原始标签的加权和,权重是两个片段各自的比例。 + +在baseline的基础上加入VideoMix数据增强后,精度由87.53%提升至88.01%。 + +### 更大的分辨率 +由于监控摄像头角度、距离等问题,存在监控画面下人比较小的情况,小目标行为的识别较困难,尝试增大输入图像的分辨率,模型精度由88.01%提升至89.06%。 + +## 新增行为 + +目前打架识别模型使用的是[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)套件中[PP-TSM](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md),并在PP-TSM视频分类模型训练流程的基础上修改适配,完成模型训练。 + +请先参考[使用说明](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/usage.md)了解PaddleVideo模型库的使用。 + + +| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 | +| ---- | ---- | ---------- | ---- | ---- | ---------- | +| 打架识别 | PP-TSM | 准确率:89.06% | T4, 2s视频128ms | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) | + +#### 模型训练 +下载预训练模型: +```bash +wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_vd_ssld_v2_pretrained.pdparams +``` + +执行训练: +```bash +# 单卡训练 +cd ${PaddleVideo_root} +python main.py --validate -c pptsm_fight_frames_dense.yaml +``` + +本方案针对的是视频的二分类问题,如果不是二分类,需要修改配置文件中`MODEL-->head-->num_classes`为具体的类别数目。 + + +```bash +cd ${PaddleVideo_root} +# 多卡训练 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python -B -m paddle.distributed.launch --gpus=“0,1,2,3” \ + --log_dir=log_pptsm_dense main.py --validate \ + -c pptsm_fight_frames_dense.yaml +``` + +#### 模型评估 +训练好的模型下载:[https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) + +模型评估: +```bash +cd ${PaddleVideo_root} +python main.py --test -c pptsm_fight_frames_dense.yaml \ + -w ppTSM_fight_best.pdparams +``` + +其中`ppTSM_fight_best.pdparams`为训练好的模型。 + +#### 模型导出 + +导出inference模型: + +```bash +cd ${PaddleVideo_root} +python tools/export_model.py -c pptsm_fight_frames_dense.yaml \ + -p ppTSM_fight_best.pdparams \ + -o inference/ppTSM +``` + + +#### 推理可视化 + +利用上步骤导出的模型,基于PaddleDetection中推理pipeline可完成自定义行为识别及可视化。 + +新增行为后,需要对现有的可视化代码进行修改,目前代码支持打架二分类可视化,新增类别后需要根据识别结果自适应可视化推理结果。 + +具体修改PaddleDetection中develop/deploy/pipeline/pipeline.py路径下PipePredictor类中visualize_video成员函数。当结果中存在'video_action'数据时,会对行为进行可视化。目前的逻辑是如果推理的类别为1,则为打架行为,进行可视化;否则不进行显示,即"video_action_score"为None。用户新增行为后,可根据类别index和对应的行为设置"video_action_text"字段,目前index=1对应"Fight"。相关代码块如下: + +``` +video_action_res = result.get('video_action') +if video_action_res is not None: + video_action_score = None + if video_action_res and video_action_res["class"] == 1: + video_action_score = video_action_res["score"] + mot_boxes = None + if mot_res: + mot_boxes = mot_res['boxes'] + image = visualize_action( + image, + mot_boxes, + action_visual_collector=None, + action_text="SkeletonAction", + video_action_score=video_action_score, + video_action_text="Fight") +``` diff --git a/docs/advanced_tutorials/customization/detection.md b/docs/advanced_tutorials/customization/detection.md new file mode 100644 index 0000000000000000000000000000000000000000..4f20cf3c58e8908136bd336abc413536a06a3467 --- /dev/null +++ b/docs/advanced_tutorials/customization/detection.md @@ -0,0 +1,84 @@ +简体中文 | [English](./detection_en.md) + +# 目标检测任务二次开发 + +在目标检测算法产业落地过程中,常常会出现需要额外训练以满足实际使用的要求,项目迭代过程中也会出先需要修改类别的情况。本文档详细介绍如何使用PaddleDetection进行目标检测算法二次开发,流程包括:数据准备、模型优化思路和修改类别开发流程。 + +## 数据准备 + +二次开发首先需要进行数据集的准备,针对场景特点采集合适的数据从而提升模型效果和泛化性能。然后使用Labeme,LabelImg等标注工具标注目标检测框,并将标注结果转化为COCO或VOC数据格式。详细文档可以参考[数据准备文档](../../tutorials/data/README.md) + +## 模型优化 + +### 1. 使用自定义数据集训练 + +基于准备的数据在数据配置文件中修改对应路径,例如`configs/dataset/coco_detection.yml`: + +``` +metric: COCO +num_classes: 80 + +TrainDataset: + !COCODataSet + image_dir: train2017 # 训练集的图片所在文件相对于dataset_dir的路径 + anno_path: annotations/instances_train2017.json # 训练集的标注文件相对于dataset_dir的路径 + dataset_dir: dataset/coco # 数据集所在路径,相对于PaddleDetection路径 + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val2017 # 验证集的图片所在文件相对于dataset_dir的路径 + anno_path: annotations/instances_val2017.json # 验证集的标注文件相对于dataset_dir的路径 + dataset_dir: dataset/coco # 数据集所在路径,相对于PaddleDetection路径 + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) # 标注文件所在文件 相对于dataset_dir的路径 + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' # 数据集所在路径,相对于PaddleDetection路径 +``` + +配置修改完成后,即可以启动训练评估,命令如下 + +``` +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --eval +``` + +更详细的命令参考[30分钟快速上手PaddleDetection](../../tutorials/GETTING_STARTED_cn.md) + + +### 2. 加载COCO模型作为预训练 + +目前PaddleDetection提供的配置文件加载的预训练模型均为ImageNet数据集的权重,加载到检测算法的骨干网络中,实际使用时,建议加载COCO数据集训练好的权重,通常能够对模型精度有较大提升,使用方法如下: + +#### 1) 设置预训练权重路径 + +COCO数据集训练好的模型权重均在各算法配置文件夹下,例如`configs/ppyoloe`下提供了PP-YOLOE-l COCO数据集权重:[链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) 。配置文件中设置`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams` + +#### 2) 修改超参数 + +加载COCO预训练权重后,需要修改学习率超参数,例如`configs/ppyoloe/_base_/optimizer_300e.yml`中: + +``` +epoch: 120 # 原始配置为300epoch,加载COCO权重后可以适当减少迭代轮数 + +LearningRate: + base_lr: 0.005 # 原始配置为0.025,加载COCO权重后需要降低学习率 + schedulers: + - !CosineDecay + max_epochs: 144 # 依据epoch数进行修改 + - !LinearWarmup + start_factor: 0. + epochs: 5 +``` + +## 修改类别 + +当实际使用场景类别发生变化时,需要修改数据配置文件,例如`configs/datasets/coco_detection.yml`中: + +``` +metric: COCO +num_classes: 10 # 原始类别80 +``` + +配置修改完成后,同样可以加载COCO预训练权重,PaddleDetection支持自动加载shape匹配的权重,对于shape不匹配的权重会自动忽略,因此无需其他修改。 diff --git a/docs/advanced_tutorials/customization/detection_en.md b/docs/advanced_tutorials/customization/detection_en.md new file mode 100644 index 0000000000000000000000000000000000000000..003ea152906b947473643b93cf1585b7f32d2155 --- /dev/null +++ b/docs/advanced_tutorials/customization/detection_en.md @@ -0,0 +1,89 @@ +[简体中文](./detection.md) | English + +# Customize Object Detection task + +In the practical application of object detection algorithms in a specific industry, additional training is often required for practical use. The project iteration will also need to modify categories. This document details how to use PaddleDetection for a customized object detection algorithm. The process includes data preparation, model optimization roadmap, and modifying the category development process. + +## Data Preparation + +Customization starts with the preparation of the dataset. We need to collect suitable data for the scenario features, so as to improve the model effect and generalization performance. Then Labeme, LabelImg and other labeling tools will be used to label the object detection bouding boxes and convert the labeling results into COCO or VOC data format. Details please refer to [Data Preparation](../../tutorials/data/PrepareDetDataSet_en.md) + +## Model Optimization + +### 1. Use customized dataset for training + +Modify the corresponding path in the data configuration file based on the prepared data, for example: + +configs/dataset/coco_detection.yml`: + +``` +metric: COCO +num_classes: 80 + +TrainDataset: + !COCODataSet + image_dir: train2017 # Path to the images of the training set relative to the dataset_dir + anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir + dataset_dir: dataset/coco # Path to the dataset relative to the PaddleDetection path + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: val2017 # Path to the images of the evaldataset set relative to the dataset_dir + anno_path: annotations/instances_val2017.json # Path to the annotation file of the evaldataset relative to the dataset_dir + dataset_dir: dataset/coco # Path to the dataset relative to the PaddleDetection path + +TestDataset: + !ImageFolder + anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) # Path to the annotation files relative to dataset_di. + dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' # Path to the dataset relative to the PaddleDetection path +``` + +Once the configuration changes are completed, the training evaluation can be started with the following command + +``` +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --eval +``` + +More details please refer to [Getting Started for PaddleDetection](../../tutorials/GETTING_STARTED_cn.md) + +### + +### 2. Load the COCO model as pre-training + +The currently provided pre-trained models in PaddleDetection's configurations are weights from the ImageNet dataset, loaded into the backbone network of the detection algorithm. For practical use, it is recommended to load the weights trained on the COCO dataset, which can usually provide a large improvement to the model accuracy. The method is as follows. + +#### 1) Set pre-training weight path + +The trained model weights for the COCO dataset are saved in the configuration folder of each algorithm, for example, PP-YOLOE-l COCO dataset weights are provided under `configs/ppyoloe`: [Link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) The configuration file sets`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams` + +#### 2) Modify hyperparameters + +After loading the COCO pre-training weights, the learning rate hyperparameters need to be modified, for example + +In `configs/ppyoloe/_base_/optimizer_300e.yml`: + +``` +epoch: 120 # The original configuration is 300 epoch, after loading COCO weights, the iteration number can be reduced appropriately + +LearningRate: + base_lr: 0.005 # The original configuration is 0.025, after loading COCO weights, the learning rate should be reduced. + schedulers: + - !CosineDecay + max_epochs: 144 # Modify based on the number of epochs + - LinearWarmup + start_factor: 0. + epochs: 5 +``` + +## Modify categories + +When the actual application scenario category changes, the data configuration file needs to be modified, for example in `configs/datasets/coco_detection.yml`: + +``` +metric: COCO +num_classes: 10 # original class 80 +``` + +After the configuration changes are completed, the COCO pre-training weights can also be loaded. PaddleDetection supports automatic loading of shape-matching weights, and weights that do not match the shape are automatically ignored, so no other modifications are needed. diff --git a/docs/advanced_tutorials/customization/keypoint_detection.md b/docs/advanced_tutorials/customization/keypoint_detection.md new file mode 100644 index 0000000000000000000000000000000000000000..c54ff231e07a5540d2d93ecda4507803fd81ac21 --- /dev/null +++ b/docs/advanced_tutorials/customization/keypoint_detection.md @@ -0,0 +1,261 @@ +简体中文 | [English](./keypoint_detection_en.md) + +# 关键点检测任务二次开发 + +在实际场景中应用关键点检测算法,不可避免地会出现需要二次开发的需求。包括对目前的预训练模型效果不满意,希望优化模型效果;或是目前的关键点点位定义不能满足实际场景需求,希望新增或是替换关键点点位的定义,训练新的关键点模型。本文档将介绍如何在PaddleDetection中,对关键点检测算法进行二次开发。 + +## 数据准备 + +### 基本流程说明 +在PaddleDetection中,目前支持的标注数据格式为`COCO`和`MPII`。这两个数据格式的详细说明,可以参考文档[关键点数据准备](../../tutorials/data/PrepareKeypointDataSet.md)。在这一步中,通过使用Labeme等标注工具,依照特征点序号标注对应坐标。并转化成对应可训练的标注格式。建议使用`COCO`格式进行。 + +### 合并数据集 +为了扩展使用的训练数据,合并多个不同的数据集一起训练是一个很直观的解决手段,但不同的数据集往往对关键点的定义并不一致。合并数据集的第一步是需要统一不同数据集的点位定义,确定标杆点位,即最终模型学习的特征点类型,然后根据各个数据集的点位定义与标杆点位定义之间的关系进行调整。 +- 在标杆点位中的点:调整点位序号,使其与标杆点位一致 +- 未在标杆点位中的点:舍去 +- 数据集缺少标杆点位中的点:对应将标注的标志位记为“未标注” + +在[关键点数据准备](../../tutorials/data/PrepareKeypointDataSet.md)中,提供了如何合并`COCO`数据集和`AI Challenger`数据集,并统一为以`COCO`为标杆点位定义的案例说明,供参考。 + + +## 模型优化 + +### 检测-跟踪模型优化 +在PaddleDetection中,关键点检测能力支持Top-Down、Bottom-Up两套方案,Top-Down先检测主体,再检测局部关键点,优点是精度较高,缺点是速度会随着检测对象的个数增加,Bottom-Up先检测关键点再组合到对应的部位上,优点是速度快,与检测对象个数无关,缺点是精度较低。关于两种方案的详情及对应模型,可参考[关键点检测系列模型](../../../configs/keypoint/README.md) + +当使用Top-Down方案时,模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,会使关键点检测部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](./detection.md)以及[多目标跟踪任务二次开发](./pphuman_mot.md)对检测/跟踪模型进行优化。 + +### 使用符合场景的数据迭代 +目前发布的关键点检测算法模型主要在`COCO`/ `AI Challenger`等开源数据集上迭代,这部分数据集中可能缺少与实际任务较为相似的监控场景(视角、光照等因素)、体育场景(存在较多非常规的姿态)。使用更符合实际任务场景的数据进行训练,有助于提升模型效果。 + +### 使用预训练模型迭代 +关键点模型的数据的标注复杂度较大,直接使用模型从零开始在业务数据集上训练,效果往往难以满足需求。在实际工程中使用时,建议加载已经训练好的权重,通常能够对模型精度有较大提升,以`HRNet`为例,使用方法如下: +```bash +python tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o pretrain_weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams +``` +在加载预训练模型后,可以适当减小初始学习率和最终迭代轮数, 建议初始学习率取默认配置值的1/2至1/5,并可开启`--eval`观察迭代过程中AP值的变化。 + + +### 遮挡数据增强 +关键点任务中有较多遮挡问题,包括自身遮挡与不同目标之间的遮挡。 + +1. 检测模型优化(仅针对Top-Down方案) + +参考[目标检测任务二次开发](./detection.md),提升检测模型在复杂场景下的效果。 + +2. 关键点数据增强 + +在关键点模型训练中增加遮挡的数据增强,参考[PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/tiny_pose/tinypose_256x192.yml#L100)。有助于模型提升这类场景下的表现。 + +### 对视频预测进行平滑处理 +关键点模型是在图片级别的基础上进行训练和预测的,对于视频类型的输入也是将视频拆分为帧进行预测。帧与帧之间虽然内容大多相似,但微小的差异仍然可能导致模型的输出发生较大的变化,表现为虽然预测的坐标大体正确,但视觉效果上有较大的抖动问题。通过添加滤波平滑处理,将每一帧预测的结果与历史结果综合考虑,得到最终的输出结果,可以有效提升视频上的表现。该部分内容可参考[滤波平滑处理](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/python/det_keypoint_unite_infer.py#L206)。 + + +## 新增或修改关键点点位定义 + +### 数据准备 +根据前述说明,完成数据的准备,放置于`{root of PaddleDetection}/dataset`下。 + +
    + 标注文件示例 + +一个标注文件示例如下: + +``` +self_dataset/ +├── train_coco_joint.json # 训练集标注文件 +├── val_coco_joint.json # 验证集标注文件 +├── images/ # 存放图片文件 +    ├── 0.jpg +    ├── 1.jpg +    ├── 2.jpg +``` +其中标注文件中需要注意的改动如下: +```json +{ + "images": [ + { + "file_name": "images/0.jpg", + "id": 0, # 图片id,注意不可重复 + "height": 1080, + "width": 1920 + }, + { + "file_name": "images/1.jpg", + "id": 1, + "height": 1080, + "width": 1920 + }, + { + "file_name": "images/2.jpg", + "id": 2, + "height": 1080, + "width": 1920 + }, + ... + + "categories": [ + { + "supercategory": "person", + "id": 1, + "name": "person", + "keypoints": [ # 点位序号的名称 + "point1", + "point2", + "point3", + "point4", + "point5", + ], + "skeleton": [ # 点位构成的骨骼, 训练中非必要 + [ + 1, + 2 + ], + [ + 1, + 3 + ], + [ + 2, + 4 + ], + [ + 3, + 5 + ] + ] + ... + + "annotations": [ + { + { + "category_id": 1, # 实例所属类别 + "num_keypoints": 3, # 该实例已标注点数量 + "bbox": [ # 检测框位置,格式为x, y, w, h + 799, + 575, + 55, + 185 + ], + # N*3 的列表,内容为x, y, v。 + "keypoints": [ + 807.5899658203125, + 597.5455322265625, + 2, + 0, + 0, + 0, # 未标注的点记为0,0,0 + 805.8563232421875, + 592.3446655273438, + 2, + 816.258056640625, + 594.0783081054688, + 2, + 0, + 0, + 0 + ] + "id": 1, # 实例id,不可重复 + "image_id": 8, # 实例所在图像的id,可重复。此时代表一张图像上存在多个目标 + "iscrowd": 0, # 是否遮挡,为0时参与训练 + "area": 10175 # 实例所占面积,可简单取为w * h。注意为0时会跳过,过小时在eval时会被忽略 + + ... +``` + +
    + + +### 配置文件设置 + +在配置文件中,完整的含义参考[config yaml配置项说明](../../tutorials/KeyPointConfigGuide_cn.md)。以[HRNet模型配置](../../../configs/keypoint/hrnet/hrnet_w32_256x192.yml)为例,重点需要关注的内容如下: + +
    + 配置文件示例 + +一个配置文件的示例如下 + +```yaml +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w32_256x192/model_final +epoch: 210 +num_joints: &num_joints 5 # 预测的点数与定义点数量一致 +pixel_std: &pixel_std 200 +metric: KeyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height] +hmsize: &hmsize [48, 64] +flip_perm: &flip_perm [[1, 2], [3, 4]] # 注意只有含义上镜像对称的点才写到这里 + +... + +# 保证dataset_dir + anno_path 能正确定位到标注文件位置 +# 保证dataset_dir + image_dir + 标注文件中的图片路径能正确定位到图片 +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: images + anno_path: train_coco_joint.json + dataset_dir: dataset/self_dataset + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + + +EvalDataset: + !KeypointTopDownCocoDataset + image_dir: images + anno_path: val_coco_joint.json + dataset_dir: dataset/self_dataset + bbox_file: bbox.json + num_joints: *num_joints + trainsize: *trainsize + pixel_std: *pixel_std + use_gt_bbox: True + image_thre: 0.0 +``` +
    + +### 模型训练及评估 +#### 模型训练 +通过如下命令启动训练: +```bash +CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml +``` + +#### 模型评估 +训练好模型之后,可以通过以下命令实现对模型指标的评估: +```bash +python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml +``` + +注意:由于测试依赖pycocotools工具,其默认为`COCO`数据集的17点,如果修改后的模型并非预测17点,直接使用评估命令会报错。 +需要修改以下内容以获得正确的评估结果: +- [sigma列表](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/keypoint_utils.py#L219),表示每个关键点的范围方差,越大则容忍度越高。其长度与预测点数一致。根据实际关键点可信区域设置,区域精确的一般0.25-0.5,例如眼睛。区域范围大的一般0.5-1.0,例如肩膀。若不确定建议0.75。 +- [pycocotools sigma列表](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L523),含义及内容同上,取值与sigma列表一致。 + +### 模型导出及预测 +#### Top-Down模型联合部署 +```shell +#导出关键点模型 +python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights={path_to_your_weights} + +#detector 检测 + keypoint top-down模型联合部署(联合推理只支持top-down方式) +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_256x192/ --video_file=../video/xxx.mp4 --device=gpu +``` +- 注意目前PP-Human中使用的为该方案 + +#### Bottom-Up模型独立部署 +```shell +#导出模型 +python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=output/higherhrnet_hrnet_w32_512/model_final.pdparams + +#部署推理 +python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=gpu --threshold=0.5 + +``` diff --git a/docs/advanced_tutorials/customization/keypoint_detection_en.md b/docs/advanced_tutorials/customization/keypoint_detection_en.md new file mode 100644 index 0000000000000000000000000000000000000000..5f96ddffe3726530108b9887256f1e1ea2360c60 --- /dev/null +++ b/docs/advanced_tutorials/customization/keypoint_detection_en.md @@ -0,0 +1,258 @@ +[简体中文](./keypoint_detection.md) | English + +# Customized Keypoint Detection + +When applying keypoint detection algorithms in real practice, inevitably, we may need customization as we may dissatisfy with the current pre-trained model results, or the current keypoint detection cannot meet the actual demand, or we may want to add or replace the definition of keypoints and train a new keypoint detection model. This document will introduce how to customize the keypoint detection algorithm in PaddleDetection. + +## Data Preparation + +### Basic Process Description + +PaddleDetection currently supports `COCO` and `MPII` annotation data formats. For detailed descriptions of these two data formats, please refer to the document [Keypoint Data Preparation](./../tutorials/data/PrepareKeypointDataSet.md). In this step, by using annotation tools such as Labeme, the corresponding coordinates are annotated according to the feature point serial numbers and then converted into the corresponding trainable annotation format. And we recommend `COCO` format. + +### Merging datasets + +To extend the training data, we can merge several different datasets together. But different datasets often have different definitions of key points. Therefore, the first step in merging datasets is to unify the point definitions of different datasets, and determine the benchmark points, i.e., the types of feature points finally learned by the model, and then adjust them according to the relationship between the point definitions of each dataset and the benchmark point definitions. + +- Points in the benchmark point location: adjust the point number to make it consistent with the benchmark point location +- Points that are not in the benchmark points: discard +- Points in the dataset that are missing from the benchmark: annotate the marked points as "unannotated". + +In [Key point data preparation](... /... /tutorials/data/PrepareKeypointDataSet.md), we provide a case illustration of how to merge the `COCO` dataset and the `AI Challenger` dataset and unify them as a benchmark point definition with `COCO` for your reference. + +## Model Optimization + +### Detection and tracking model optimization + +In PaddleDetection, the keypoint detection supports Top-Down and Bottom-Up solutions. Top-Down first detects the main body and then detects the local key points. It has higher accuracy but will take a longer time as the number of detected objects increases.The Bottom-Up plan first detects the keypoints and then combines them with the corresponding parts. It is fast and its speed is independent of the number of detected objects. Its disadvantage is that the accuracy is relatively low. For details of the two solutions and the corresponding models, please refer to [Keypoint Detection Series Models](../../../configs/keypoint/README.md) + +When using the Top-Down solution, the model's effects depend on the previous detection or tracking effect. If the pedestrian position cannot be accurately detected in the actual practice, the performance of the keypoint detection will be limited. If you encounter the above problem in actual application, please refer to [Customized Object Detection](./detection_en.md) and [Customized Multi-target tracking](./pphuman_mot_en.md) for optimization of the detection and tracking model. + +### Iterate with scenario-compatible data + +The currently released keypoint detection algorithm models are mainly iterated on open source datasets such as `COCO`/ `AI Challenger`, which may lack surveillance scenarios (angles, lighting and other factors), sports scenarios (more unconventional poses) that are more similar to the actual task. Training with data that more closely matches the actual task scenario can help improve the model's results. + +### Iteration via pre-trained models + +The data annotation of the keypoint model is complex, and using the model directly to train on the business dataset from scratch is often difficult to meet the demand. When used in practical projects, it is recommended to load the pre-trained weights, which usually improve the model accuracy significantly. Let's take `HRNet` as an example with the following method: + +``` +python tools/train.py \ + -c configs/keypoint/hrnet/hrnet_w32_256x192.yml \ + -o pretrain_weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams +``` + +After loading the pre-trained model, the initial learning rate and the rounds of iterations can be reduced appropriately. It is recommended that the initial learning rate be 1/2 to 1/5 of the default configuration, and you can enable`--eval` to observe the change of AP values during the iterations. + +## Data augmentation with occlusion + +There are a lot of data in occlusion in keypoint tasks, including self-covered objects and occlusion between different objects. + +1. Detection model optimization (only for Top-Down solutions) + +Refer to [Target Detection Task Secondary Development](. /detection.md) to improve the detection model in complex scenarios. + +2. Keypoint data augmentation + +Augmentation of covered data in keypoint model training to improve model performance in such scenarios, please refer to [PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/tiny_pose/) + +### Smooth video prediction + +The keypoint model is trained and predicted on the basis of image, and video input is also predicted by splitting the video into frames. Although the content is mostly similar between frames, small differences may still lead to large changes in the output of the model. As a result of that, although the predicted coordinates are roughly correct, there may be jitters in the visual effect. + +By adding a smoothing filter process, the performance of the video output can be effectively improved by combining the predicted results of each frame and the historical results. For this part, please see [Filter Smoothing](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/python/det_keypoint_unite_infer.py#L206). + +## Add or modify keypoint definition + +### Data Preparation + +Complete the data preparation according to the previous instructions and place it under `{root of PaddleDetection}/dataset`. + +
    + Examples of annotation file + +``` +self_dataset/ +├── train_coco_joint.json # training set annotation file +├── val_coco_joint.json # Validation set annotation file +├── images/ # Store the image files +    ├── 0.jpg +    ├── 1.jpg +    ├── 2.jpg +``` + +Notable changes as follows: + +``` +{ + "images": [ + { + "file_name": "images/0.jpg", + "id": 0, # image id, id cannotdo not repeat + "height": 1080, + "width": 1920 + }, + { + "file_name": "images/1.jpg", + "id": 1, + "height": 1080, + "width": 1920 + }, + { + "file_name": "images/2.jpg", + "id": 2, + "height": 1080, + "width": 1920 + }, + ... + + "categories": [ + { + "supercategory": "person", + "id": 1, + "name": "person", + "keypoints": [ # the name of the point serial number + "point1", + "point2", + "point3", + "point4", + "point5", + ], + "skeleton": [ # Skeleton composed of points, not necessary for training + [ + 1, + 2 + ], + [ + 1, + 3 + ], + [ + 2, + 4 + ], + [ + 3, + 5 + ] + ] + ... + + "annotations": [ + { + { + "category_id": 1, # The category to which the instance belongs + "num_keypoints": 3, # the number of marked points of the instance + "bbox": [ # location of detection box,format is x, y, w, h + 799, + 575, + 55, + 185 + ], + # N*3 list of x, y, v. + "keypoints": [ + 807.5899658203125, + 597.5455322265625, + 2, + 0, + 0, + 0, # unlabeled points noted as 0, 0, 0 + 805.8563232421875, + 592.3446655273438, + 2, + 816.258056640625, + 594.0783081054688, + 2, + 0, + 0, + 0 + ] + "id": 1, # the id of the instance, id cannot repeat + "image_id": 8, # The id of the image where the instance is located, repeatable. This represents the presence of multiple objects on a single image +"iscrowd": 0, # covered or not, when the value is 0, it will participate in training + "area": 10175 # the area occupied by the instance, can be simply taken as w * h. Note that when the value is 0, it will be skipped, and if it is too small, it will be ignored in eval + + ... +``` + +### Settings of configuration file + +In the configuration file, refer to [config yaml configuration](... /... /tutorials/KeyPointConfigGuide_cn.md) for more details . Take [HRNet model configuration](... /... /... /configs/keypoint/hrnet/hrnet_w32_256x192.yml) as an example, we need to focus on following contents: + +
    + Example of configuration + +``` +use_gpu: true +log_iter: 5 +save_dir: output +snapshot_epoch: 10 +weights: output/hrnet_w32_256x192/model_final +epoch: 210 +num_joints: &num_joints 5 # The number of predicted points matches the number of defined points +pixel_std: &pixel_std 200 +Metric. keyPointTopDownCOCOEval +num_classes: 1 +train_height: &train_height 256 +train_width: &train_width 192 +trainsize: &trainsize [*train_width, *train_height]. +hmsize: &hmsize [48, 64]. +flip_perm: &flip_perm [[1, 2], [3, 4]]. # Note that only points that are mirror-symmetric are recorded here. + +... + +# Ensure that dataset_dir + anno_path can correctly locate the annotation file +# Ensure that dataset_dir + image_dir + image path in annotation file can correctly locate the image. +TrainDataset: + !KeypointTopDownCocoDataset + image_dir: images + anno_path: train_coco_joint.json + dataset_dir: dataset/self_dataset + num_joints: *num_joints + trainsize. *trainsize + pixel_std: *pixel_std + use_gt_box: true + + +Evaluate the dataset. + !KeypointTopDownCocoDataset + image_dir: images + anno_path: val_coco_joint.json + dataset_dir: dataset/self_dataset + bbox_file: bbox.json + num_joints: *num_joints + trainsize. *trainsize + pixel_std: *pixel_std + use_gt_box: true + image_thre: 0.0 +``` + +### Model Training and Evaluation + +#### Model Training + +Run the following command to start training: + +``` +CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml +``` + +#### Model Evaluation + +After training the model, you can evaluate the model metrics by running the following commands: + +``` +python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml +``` + +### Model Export and Inference + +#### Top-Down model deployment + +``` +#Export keypoint model +python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights={path_to_your_weights} + +#detector detection + keypoint top-down model co-deployment(for top-down solutions only) +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_256x192/ --video_file=../video/xxx.mp4 --device=gpu +``` diff --git a/docs/advanced_tutorials/customization/pphuman_attribute.md b/docs/advanced_tutorials/customization/pphuman_attribute.md new file mode 100644 index 0000000000000000000000000000000000000000..692a9dba57dba058db970995f2874732bab8e07d --- /dev/null +++ b/docs/advanced_tutorials/customization/pphuman_attribute.md @@ -0,0 +1,295 @@ +简体中文 | [English](./pphuman_attribute_en.md) + +# 属性识别任务二次开发 + +## 数据准备 + +### 数据格式 + +格式采用PA100K的属性标注格式,共有26位属性。 + +这26位属性的名称、位置、种类数量见下表。 + +| Attribute | index | length | +|:----------|:----------|:----------| +| 'Hat','Glasses' | [0, 1] | 2 | +| 'ShortSleeve','LongSleeve','UpperStride','UpperLogo','UpperPlaid','UpperSplice' | [2, 3, 4, 5, 6, 7] | 6 | +| 'LowerStripe','LowerPattern','LongCoat','Trousers','Shorts','Skirt&Dress' | [8, 9, 10, 11, 12, 13] | 6 | +| 'boots' | [14, ] | 1 | +| 'HandBag','ShoulderBag','Backpack','HoldObjectsInFront' | [15, 16, 17, 18] | 4 | +| 'AgeOver60', 'Age18-60', 'AgeLess18' | [19, 20, 21] | 3 | +| 'Female' | [22, ] | 1 | +| 'Front','Side','Back' | [23, 24, 25] | 3 | + + +举例: + +[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0] + +第一组,位置[0, 1]数值分别是[0, 1],表示'no hat'、'has glasses'。 + +第二组,位置[22, ]数值分别是[0, ], 表示gender属性是'male', 否则是'female'。 + +第三组,位置[23, 24, 25]数值分别是[0, 1, 0], 表示方向属性是侧面'side'。 + +其他组依次类推 + +### 数据标注 + +理解了上面`属性标注`格式的含义后,就可以进行数据标注的工作。其本质是:每张单人图建立一组26个长度的标注项,分别与26个位置的属性值对应。 + +举例: + +对于一张原始图片, + +1) 使用检测框,标注图片中每一个人的位置。 + +2) 每一个检测框(对应每一个人),包含一组26位的属性值数组,数组的每一位以0或1表示。对应上述26个属性。例如,如果图片是'Female',则数组第22位为0,如果满足'Age18-60',则位置[19, 20, 21]对应的数值是[0, 1, 0], 或者满足'AgeOver60',则相应数值为[1, 0, 0]. + +标注完成后利用检测框将每一个人截取成单人图,其图片与26位属性标注建立对应关系。也可先截成单人图再进行标注,效果相同。 + + +## 模型训练 + +数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。 + +其主要有两步工作需要完成:1)将数据与标注数据整理成训练格式。2)修改配置文件开始训练。 + +### 训练数据格式 + +训练数据包括训练使用的图片和一个训练列表train.txt,其具体位置在训练配置中指定,其放置方式示例如下: +``` +Attribute/ +|-- data 训练图片文件夹 +| |-- 00001.jpg +| |-- 00002.jpg +| `-- 0000x.jpg +`-- train.txt 训练数据列表 + +``` + +train.txt文件内为所有训练图片名称(相对于根路径的文件路径)+ 26个标注值 + +其每一行表示一个人的图片和标注结果。其格式为: + +``` +00001.jpg 0,0,1,0,.... +``` + +注意:1)图片与标注值之间是以Tab[\t]符号隔开, 2)标注值之间是以逗号[,]隔开。该格式不能错,否则解析失败。 + +### 修改配置开始训练 + +首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md)): + +```shell +git clone https://github.com/PaddlePaddle/PaddleClas +``` + +需要在配置文件`PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`中,修改的配置项如下: + +``` +DataLoader: + Train: + dataset: + name: MultiLabelDataset + image_root: "dataset/pa100k/" #指定训练图片所在根路径 + cls_label_path: "dataset/pa100k/train_list.txt" #指定训练列表文件位置 + label_ratio: True + transform_ops: + + Eval: + dataset: + name: MultiLabelDataset + image_root: "dataset/pa100k/" #指定评估图片所在根路径 + cls_label_path: "dataset/pa100k/val_list.txt" #指定评估列表文件位置 + label_ratio: True + transform_ops: +``` +注意: +1. 这里image_root路径+train.txt中图片相对路径,对应图片的完整路径位置。 +2. 如果有修改属性数量,则还需修改内容配置项中属性种类数量: +``` +# model architecture +Arch: + name: "PPLCNet_x1_0" + pretrained: True + use_ssld: True + class_num: 26 #属性种类数量 +``` + +然后运行以下命令开始训练。 + +``` +#多卡训练 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml + +#单卡训练 +python3 tools/train.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml +``` + +训练完成后可以执行以下命令进行性能评估: +``` +#多卡评估 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model + +#单卡评估 +python3 tools/eval.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model +``` + +### 模型导出 + +使用下述命令将训练好的模型导出为预测部署模型。 + +``` +python3 tools/export_model.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \ + -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer +``` + +导出模型后,需要下载[infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_cfg.yml)文件,并放置到导出的模型文件夹`PPLCNet_x1_0_person_attribute_infer`中。 + +使用时在PP-Human中的配置文件`./deploy/pipeline/config/infer_cfg_pphuman.yml`中修改新的模型路径`model_dir`项,并开启功能`enable: True`。 +``` +ATTR: + model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_person_attribute_infer/ #新导出的模型路径位置 + enable: True #开启功能 +``` +然后可以使用-->至此即完成新增属性类别识别任务。 + +## 属性增减 + +上述是以26个属性为例的标注、训练过程。 + +如果需要增加、减少属性数量,则需要: + +1)标注时需增加新属性类别信息或删减属性类别信息; + +2)对应修改训练中train.txt所使用的属性数量和名称; + +3)修改训练配置,例如``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml``文件中的属性数量,详细见上述`修改配置开始训练`部分。 + +增加属性示例: + +1. 在标注数据时在26位后继续增加新的属性标注数值; +2. 在train.txt文件的标注数值中也增加新的属性数值。 +3. 注意属性类型在train.txt中属性数值列表中的位置的对应关系需要时固定的,例如第[19, 20, 21]位表示年龄,所有图片都要使用[19, 20, 21]位置表示年龄,不再赘述。 + +
    + +
    + +删减属性同理。 +例如,如果不需要年龄属性,则位置[19, 20, 21]的数值可以去掉。只需在train.txt中标注的26个数字中全部删除第19-21位数值即可,同时标注数据时也不再需要标注这3位属性值。 + +## 修改后处理代码 + +修改了属性定义后,pipeline后处理部分也需要做相应修改,主要影响结果可视化时的显示结果。 + +相应代码在路径`deploy/pipeline/pphuman/attr_infer.py`文件中`postprocess`函数。 + +其函数实现说明如下: + +``` +# 函数入口 + def postprocess(self, inputs, result): + # postprocess output of predictor + im_results = result['output'] + +# 1) 定义各组属性实际意义,其数量及位置与输出结果中占用位数一一对应。 + labels = self.pred_config.labels + age_list = ['AgeLess18', 'Age18-60', 'AgeOver60'] + direct_list = ['Front', 'Side', 'Back'] + bag_list = ['HandBag', 'ShoulderBag', 'Backpack'] + upper_list = ['UpperStride', 'UpperLogo', 'UpperPlaid', 'UpperSplice'] + lower_list = [ + 'LowerStripe', 'LowerPattern', 'LongCoat', 'Trousers', 'Shorts', + 'Skirt&Dress' + ] +# 2) 部分属性所用阈值与通用值有明显区别,单独设置 + glasses_threshold = 0.3 + hold_threshold = 0.6 + + batch_res = [] + for res in im_results: + res = res.tolist() + label_res = [] + # gender +# 3) 单个位置属性类别,判断该位置是否大于阈值,来分配二分类结果 + gender = 'Female' if res[22] > self.threshold else 'Male' + label_res.append(gender) + # age +# 4)多个位置属性类别,N选一形式,选择得分最高的属性 + age = age_list[np.argmax(res[19:22])] + label_res.append(age) + # direction + direction = direct_list[np.argmax(res[23:])] + label_res.append(direction) + # glasses + glasses = 'Glasses: ' + if res[1] > glasses_threshold: + glasses += 'True' + else: + glasses += 'False' + label_res.append(glasses) + # hat + hat = 'Hat: ' + if res[0] > self.threshold: + hat += 'True' + else: + hat += 'False' + label_res.append(hat) + # hold obj + hold_obj = 'HoldObjectsInFront: ' + if res[18] > hold_threshold: + hold_obj += 'True' + else: + hold_obj += 'False' + label_res.append(hold_obj) + # bag + bag = bag_list[np.argmax(res[15:18])] + bag_score = res[15 + np.argmax(res[15:18])] + bag_label = bag if bag_score > self.threshold else 'No bag' + label_res.append(bag_label) + # upper +# 5)同一类属性,分为两组(这里是款式和花色),每小组内单独选择,相当于两组不同属性。 + upper_label = 'Upper:' + sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve' + upper_label += ' {}'.format(sleeve) + upper_res = res[4:8] + if np.max(upper_res) > self.threshold: + upper_label += ' {}'.format(upper_list[np.argmax(upper_res)]) + label_res.append(upper_label) + # lower + lower_res = res[8:14] + lower_label = 'Lower: ' + has_lower = False + for i, l in enumerate(lower_res): + if l > self.threshold: + lower_label += ' {}'.format(lower_list[i]) + has_lower = True + if not has_lower: + lower_label += ' {}'.format(lower_list[np.argmax(lower_res)]) + + label_res.append(lower_label) + # shoe + shoe = 'Boots' if res[14] > self.threshold else 'No boots' + label_res.append(shoe) + + batch_res.append(label_res) + result = {'output': batch_res} + return result +``` diff --git a/docs/advanced_tutorials/customization/pphuman_attribute_en.md b/docs/advanced_tutorials/customization/pphuman_attribute_en.md new file mode 100644 index 0000000000000000000000000000000000000000..fdfe74eb2b3c531a0e03cc6e2c7e8ad78114f0b3 --- /dev/null +++ b/docs/advanced_tutorials/customization/pphuman_attribute_en.md @@ -0,0 +1,223 @@ +[简体中文](pphuman_attribute.md) | English + +# Customized attribute recognition + +## Data Preparation + +### Data format + +We use the PA100K attribute annotation format, with a total of 26 attributes. + +The names, locations, and the number of these 26 attributes are shown in the table below. + +| Attribute | index | length | +|:------------------------------------------------------------------------------- |:---------------------- |:------ | +| 'Hat','Glasses' | [0, 1] | 2 | +| 'ShortSleeve','LongSleeve','UpperStride','UpperLogo','UpperPlaid','UpperSplice' | [2, 3, 4, 5, 6, 7] | 6 | +| 'LowerStripe','LowerPattern','LongCoat','Trousers','Shorts','Skirt&Dress' | [8, 9, 10, 11, 12, 13] | 6 | +| 'boots' | [14, ] | 1 | +| 'HandBag','ShoulderBag','Backpack','HoldObjectsInFront' | [15, 16, 17, 18] | 4 | +| 'AgeOver60', 'Age18-60', 'AgeLess18' | [19, 20, 21] | 3 | +| 'Female' | [22, ] | 1 | +| 'Front','Side','Back' | [23, 24, 25] | 3 | + +Examples: + +[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0] + +The first group: position [0, 1] values are [0, 1], which means'no hat', 'has glasses'. + +The second group: position [22, ] values are [0, ], indicating that the gender attribute is 'male', otherwise it is 'female'. + +The third group: position [23, 24, 25] values are [0, 1, 0], indicating that the direction attribute is 'side'. + +Other groups follow in this order + + + +### Data Annotation + +After knowing the purpose of the above `attribute annotation` format, we can start to annotate data. The essence is that each single-person image creates a set of 26 annotation items, corresponding to the attribute values at 26 positions. + +Examples: + +For an original image: + +1) Using bounding boxes to annotate the position of each person in the picture. + +2) Each detection box (corresponding to each person) contains 26 attribute values which are represented by 0 or 1. It corresponds to the above 26 attributes. For example, if the picture is 'Female', then the 22nd bit of the array is 0. If the person is between 'Age18-60', then the corresponding value at position [19, 20, 21] is [0, 1, 0], or if the person matches 'AgeOver60', then the corresponding value is [1, 0, 0]. + +After the annotation is completed, the model will use the detection box to intercept each person into a single-person picture, and its picture establishes a corresponding relationship with the 26 attribute annotation. It is also possible to cut into a single-person image first and then annotate it. The results are the same. + + + +Model Training + +Once the data is annotated, it can be used for model training to complete the optimization of the customized model. + +There are two main steps: 1) Organize the data and annotated data into the training format. 2) Modify the configuration file to start training. + +### Training data format + +The training data includes the images used for training and a training list called train.txt. Its location is specified in the training configuration, with the following example: + +``` +Attribute/ +|-- data Training images folder +|-- 00001.jpg +|-- 00002.jpg +| `-- 0000x.jpg +train.txt List of training data +``` + +train.txt file contains the names of all training images (file path relative to the root path) + 26 annotation values + +Each line of it represents a person's image and annotation result. The format is as follows: + +``` +00001.jpg 0,0,1,0,.... +``` + +Note 1) The images are separated by Tab[\t], 2) The annotated values are separated by commas [,]. If the format is wrong, the parsing will fail. + + + +### Modify the configuration to start training + +First run the following command to download the training code (for more environmental issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md)): + +``` +git clone https://github.com/PaddlePaddle/PaddleClas +``` + +You need to modify the following configuration in the configuration file `PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml` + +``` +DataLoader: + Train: + Train: dataset: + name: MultiLabelDataset + image_root: "dataset/pa100k/" #Specify the root path of training image + cls_label_path: "dataset/pa100k/train_list.txt" #Specify the location of the training list file + label_ratio: True + transform_ops: + + Eval: + dataset: + name: MultiLabelDataset + image_root: "dataset/pa100k/" #Specify the root path of evaluated image + cls_label_path: "dataset/pa100k/val_list.txt" #Specify the location of the evaluation list file + label_ratio: True + transform_ops: +``` + +Note: + +1. here image_root path and the relative path of the image in train.txt, corresponding to the full path of the image. +2. If you modify the number of attributes, the number of attribute types in the content configuration item should also be modified accordingly. + +``` +# model architecture +Arch: +name: "PPLCNet_x1_0" +pretrained: True +use_ssld: True +class_num: 26 #Attribute classes and numbers +``` + +Then run the following command to start training: + +``` +#Multi-card training +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml + +#Single card training +python3 tools/train.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml +``` + +You can run the following commands for performance evaluation after the training is completed: + +``` +#Multi-card evaluation +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model + +#Single card evaluation +python3 tools/eval.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model +``` + +### Model Export + +Use the following command to export the trained model as an inference deployment model. + +``` +python3 tools/export_model.py \ + -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \ + -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \ + -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer +``` + +After exporting the model, you need to download the [infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_cfg.yml) file and put it into the exported model folder `PPLCNet_x1_0_person_ attribute_infer` . + +When you use the model, you need to modify the new model path `model_dir` entry and set `enable: True` in the configuration file of PP-Human `. /deploy/pipeline/config/infer_cfg_pphuman.yml` . + +``` +ATTR: + model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_person_attribute_infer/ #The exported model location + enable: True #Whether to enable the function +``` + + + +Now, the model is ready for you. + + To this point, a new attribute category recognition task is completed. + + + +## Adding or deleting attributes + +The above is the annotation and training process with 26 attributes. + +If the attributes need to be added or deleted, you need to + +1) New attribute category information needs to be added or deleted when annotating the data. + +2) Modify the number and name of attributes used in train.txt corresponding to the training. + +3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above. + +Example of adding attributes. + +1. Continue to add new attribute annotation values after 26 values when annotating the data. +2. Add new attribute values to the annotated values in the train.txt file as well. +3. The above is the annotation and training process with 26 attributes. + + If the attributes need to be added or deleted, you need to + 1) New attribute category information needs to be added or deleted when annotating the data. + + 2) Modify the number and name of attributes used in train.txt corresponding to the training. + + 3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above. + + Example of adding attributes. + + 1. Continue to add new attribute annotation values after 26 values when annotating the data. + 2. Add new attribute values to the annotated values in the train.txt file as well. + 3. Note that the correlation of attribute types and values in train.txt needs to be fixed, for example, the [19, 20, 21] position indicates age, and all images should use the [19, 20, 21] position to indicate age. + + + + The same applies to the deletion of attributes. + For example, if the age attribute is not needed, the values in positions [19, 20, 21] can be removed. You can simply remove all the values in positions 19-21 from the 26 numbers marked in train.txt, and you no longer need to annotate these 3 attribute values. diff --git a/docs/advanced_tutorials/customization/pphuman_mot.md b/docs/advanced_tutorials/customization/pphuman_mot.md new file mode 100644 index 0000000000000000000000000000000000000000..209c603267c6799d2ed3b8e096d977fa2ff5f7ab --- /dev/null +++ b/docs/advanced_tutorials/customization/pphuman_mot.md @@ -0,0 +1,63 @@ +简体中文 | [English](./pphuman_mot_en.md) + +# 多目标跟踪任务二次开发 + +在产业落地过程中应用多目标跟踪算法,不可避免地会出现希望自定义类型的多目标跟踪的需求,或是对已有多目标跟踪模型的优化,以提升在特定场景下模型的效果。我们在本文档通过案例来介绍如何根据期望识别的行为来进行多目标跟踪方案的选择,以及使用PaddleDetection进行多目标跟踪算法二次开发工作,包括:数据准备、模型优化思路和跟踪类别修改的开发流程。 + +## 数据准备 + +多目标跟踪模型方案采用[ByteTrack](https://arxiv.org/pdf/2110.06864.pdf),其中使用PP-YOLOE替换原文的YOLOX作为检测器,使用BYTETracker作为跟踪器,详细文档参考[ByteTrack](../../../configs/mot/bytetrack)。原文的ByteTrack只支持行人单类别,PaddleDetection中也支持多类别同时进行跟踪。训练ByteTrack也就是训练检测器的过程,只需要准备好检测标注即可,不需要ReID标注信息,即当成纯检测来做即可。数据集最好是连续视频中抽取出来的而不是无关联的图片集合。 +二次开发首先需要进行数据集的准备,针对场景特点采集合适的数据从而提升模型效果和泛化性能。然后使用Labeme,LabelImg等标注工具标注目标检测框,并将标注结果转化为COCO或VOC数据格式。详细文档可以参考[数据准备文档](../../tutorials/data/README.md) + +## 模型优化 + +### 1. 使用自定义数据集训练 + +ByteTrack跟踪方案采用的数据集只需要有检测标注即可。参照[MOT数据集准备](../../../configs/mot)和[MOT数据集教程](docs/tutorials/data/PrepareMOTDataSet.md)。 + +``` +# 单卡训练 +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp + +# 多卡训练 +python -m paddle.distributed.launch --log_dir=log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp +``` + +更详细的命令参考[30分钟快速上手PaddleDetection](../../tutorials/GETTING_STARTED_cn.md)和[ByteTrack](../../../configs/mot/bytetrack/detector) + + +### 2. 加载COCO模型作为预训练 + +目前PaddleDetection提供的配置文件加载的预训练模型均为ImageNet数据集的权重,加载到检测算法的骨干网络中,实际使用时,建议加载COCO数据集训练好的权重,通常能够对模型精度有较大提升,使用方法如下: + +#### 1) 设置预训练权重路径 + +COCO数据集训练好的模型权重均在各算法配置文件夹下,例如`configs/ppyoloe`下提供了PP-YOLOE-l COCO数据集权重:[链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) 。配置文件中设置`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams` + +#### 2) 修改超参数 + +加载COCO预训练权重后,需要修改学习率超参数,例如`configs/ppyoloe/_base_/optimizer_300e.yml`中: + +``` +epoch: 120 # 原始配置为300epoch,加载COCO权重后可以适当减少迭代轮数 + +LearningRate: + base_lr: 0.005 # 原始配置为0.025,加载COCO权重后需要降低学习率 + schedulers: + - !CosineDecay + max_epochs: 144 # 依据epoch数进行修改,一般为epoch数的1.2倍 + - !LinearWarmup + start_factor: 0. + epochs: 5 +``` + +## 跟踪类别修改 + +当实际使用场景类别发生变化时,需要修改数据配置文件,例如`configs/datasets/coco_detection.yml`中: + +``` +metric: COCO +num_classes: 10 # 原始类别1 +``` + +配置修改完成后,同样可以加载COCO预训练权重,PaddleDetection支持自动加载shape匹配的权重,对于shape不匹配的权重会自动忽略,因此无需其他修改。 diff --git a/docs/advanced_tutorials/customization/pphuman_mot_en.md b/docs/advanced_tutorials/customization/pphuman_mot_en.md new file mode 100644 index 0000000000000000000000000000000000000000..0aaea495663666782a318ccbc945f11169598eff --- /dev/null +++ b/docs/advanced_tutorials/customization/pphuman_mot_en.md @@ -0,0 +1,65 @@ +[简体中文](./pphuman_mot.md) | English + +# Customized multi-object tracking task + +When applying multi-object tracking algorithms in industrial applications, there will be inevitable demands for customized types of multi-object tracking or optimization of existing multi-object tracking models to improve the effectiveness of the models in specific scenarios. In this document, we present examples of how to choose a multi-object tracking solution based on the expected identified behavior, and how to use PaddleDetection for further development of multi-object tracking algorithms, including data preparation, model optimization ideas, and the development process of tracking category modification. + +## Data Preparation + +The multi-object tracking model scheme uses [ByteTrack](https://arxiv.org/pdf/2110.06864.pdf), which adopts PP-YOLOE to replace the original YOLOX as a detector and BYTETracker as a tracker, for details, please refer to [ByteTrack](... /... /... /configs/mot/bytetrack). The original ByteTrack only supports single pedestrian category, while PaddleDetection supports multiple categories for simultaneous tracking. Training ByteTrack, which is the process of training the detector, only requires the detection annotations to be prepared, and does not require ReID annotation information, i.e., it can be done as pure detection. The dataset should preferably be extracted from continuous video rather than a collection of unrelated images. + +Customization starts with the preparation of the dataset. We need to collect suitable data for the scenario features, so as to improve the model effect and generalization performance. Then Labeme, LabelImg and other labeling tools will be used to label the object detection frame and convert the labeling results into COCO or VOC data format. Details please refer to [Data Preparation](../../tutorials/data/README.md) + +## Model Optimization + +### 1. Use customized data set for training + +The dataset used by the ByteTrack tracking solution only needs detection annotations. Refer to [MOT dataset preparation](... /... /... /configs/mot) and [MOT dataset tutorial](docs/tutorials/data/PrepareMOTDataSet.md). + +``` +# Single card training +CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp + +# Multi-card training +python -m paddle.distributed.launch --log_dir=log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp +``` + +More details please refer to [Getting Started for PaddleDetection](../../tutorials/GETTING_STARTED_cn.md) and [ByteTrack](../../../configs/mot/bytetrack/detector) + +### 2. Load the COCO model as the pre-trained model + +The currently provided pre-trained models in PaddleDetection's configurations are weights from the ImageNet dataset, loaded into the backbone network of the detection algorithm. For practical use, it is recommended to load the weights trained on the COCO dataset, which can usually provide a large improvement to the model accuracy. The method is as follows. + +#### 1) Set pre-training weight path + +The trained model weights for the COCO dataset are saved in the configuration folder of each algorithm, for example, PP-YOLOE-l COCO dataset weights are provided under `configs/ppyoloe`: [Link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) The configuration file sets`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams` + +#### 2) Modify hyperparameters + +After loading the COCO pre-training weights, the learning rate hyperparameters need to be modified, for example + +In `configs/ppyoloe/*base*/optimizer_300e.yml`: + +``` +epoch: 120 # The original configuration is 300 epoch, after loading COCO weights, the iteration number can be reduced appropriately + +LearningRate: +base_lr: 0.005 # The original configuration is 0.025, after loading COCO weights, the learning rate should be reduced. + schedulers: + - !CosineDecay + max_epochs: 144 # Modified according to the number of epochs, usually 1.2 times the number of epochs + - LinearWarmup + start_factor: 0. + epochs: 5 +``` + +## Modify categories + +When the actual application scenario category changes, the data configuration file needs to be modified, for example in `configs/datasets/coco_detection.yml`: + +``` +metric: COCO +num_classes: 10 # original class 80 +``` + +After the configuration changes are completed, the COCO pre-training weights can also be loaded. PaddleDetection supports automatic loading of shape-matching weights, and weights that do not match the shape are automatically ignored, so no other modifications are needed. diff --git a/docs/advanced_tutorials/customization/pphuman_mtmct.md b/docs/advanced_tutorials/customization/pphuman_mtmct.md new file mode 100644 index 0000000000000000000000000000000000000000..0d784e243cb782c2a5152dff3bd3652138391344 --- /dev/null +++ b/docs/advanced_tutorials/customization/pphuman_mtmct.md @@ -0,0 +1,159 @@ +简体中文 | [English](./pphuman_mtmct_en.md) + +# 跨镜跟踪任务二次开发 + +## 数据准备 + +### 数据格式 + +跨镜跟踪使用行人REID技术实现,其训练方式采用多分类模型训练,使用时取分类softmax头部前的特征作为检索特征向量。 + +因此其格式与多分类任务相同。每一个行人分配一个专属id,不同行人id不同,同一行人在不同图片中的id相同。 + +例如图片0001.jpg、0003.jpg是同一个人,0002.jpg、0004.jpg是不同的其他行人。则标注id为: + +``` +0001.jpg 00001 +0002.jpg 00002 +0003.jpg 00001 +0004.jpg 00003 +... +``` + +依次类推。 + +### 数据标注 + +理解了上面`标注`格式的含义后,就可以进行数据标注的工作。其本质是:每张单人图建立一个标注项,对应该行人分配的id。 + +举例: + +对于一张原始图片, + +1) 使用检测框,标注图片中每一个人的位置。 + +2) 每一个检测框(对应每一个人),包含一个int类型的id属性。例如,上述举例中的0001.jpg中的人,对应id:1. + +标注完成后利用检测框将每一个人截取成单人图,其图片与id属性标注建立对应关系。也可先截成单人图再进行标注,效果相同。 + +## 模型训练 + + +数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。 + +其主要有两步工作需要完成:1)将数据与标注数据整理成训练格式。2)修改配置文件开始训练。 + +### 训练数据格式 + +训练数据包括训练使用的图片和一个训练列表bounding_box_train.txt,其具体位置在训练配置中指定,其放置方式示例如下: + +``` +REID/ +|-- data 训练图片文件夹 +| |-- 00001.jpg +| |-- 00002.jpg +| `-- 0000x.jpg +`-- bounding_box_train.txt 训练数据列表 + +``` + +bounding_box_train.txt文件内为所有训练图片名称(相对于根路径的文件路径)+ 1个id标注值 + +其每一行表示一个人的图片和id标注结果。其格式为: + +``` +0001.jpg 00001 +0002.jpg 00002 +0003.jpg 00001 +0004.jpg 00003 +``` +注意:图片与标注值之间是以Tab[\t]符号隔开。该格式不能错,否则解析失败。 + +### 修改配置开始训练 + +首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md)): + +```shell +git clone https://github.com/PaddlePaddle/PaddleClas +``` + + +需要在配置文件[softmax_triplet_with_center.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml)中,修改的配置项如下: + +``` + Head: + name: "FC" + embedding_size: *feat_dim + class_num: &class_num 751 #行人id总数量 + +DataLoader: + Train: + dataset: + name: "Market1501" + image_root: "./dataset/" #训练图片根路径 + cls_label_path: "bounding_box_train" #训练文件列表 + + + Eval: + Query: + dataset: + name: "Market1501" + image_root: "./dataset/" #评估图片根路径 + cls_label_path: "query" #评估文件列表 + +``` +注意: + +1. 这里image_root路径+bounding_box_train.txt中图片相对路径,对应图片存放的完整路径。 + +然后运行以下命令开始训练。 + +``` +#多卡训练 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml + +#单卡训练 +python3 tools/train.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml +``` + +训练完成后可以执行以下命令进行性能评估: +``` +#多卡评估 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model + +#单卡评估 +python3 tools/eval.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model +``` + +### 模型导出 + +使用下述命令将训练好的模型导出为预测部署模型。 + +``` +python3 tools/export_model.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model \ + -o Global.save_inference_dir=deploy/models/strong_baseline_inference +``` + +导出模型后,下载[infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/REID/infer_cfg.yml)文件到新导出的模型文件夹'strong_baseline_inference'中。 + +使用时在PP-Human中的配置文件infer_cfg_pphuman.yml中修改模型路径`model_dir`并开启功能`enable`。 +``` +REID: + model_dir: [YOUR_DEPLOY_MODEL_DIR]/strong_baseline_inference/ + enable: True +``` +然后可以使用。至此完成模型开发。 diff --git a/docs/advanced_tutorials/customization/pphuman_mtmct_en.md b/docs/advanced_tutorials/customization/pphuman_mtmct_en.md new file mode 100644 index 0000000000000000000000000000000000000000..e3638c2700a4cd20d5bcb6f85c6c6d2b610a56ee --- /dev/null +++ b/docs/advanced_tutorials/customization/pphuman_mtmct_en.md @@ -0,0 +1,165 @@ +[简体中文](./pphuman_mtmct.md) | English + +# Customized Multi-Target Multi-Camera Tracking Module of PP-Human + +## Data Preparation + +### Data Format + + + +Multi-target multi-camera tracking, or mtmct is achieved by the pedestrian REID technique. It is trained with a multiclassification model and uses the features before the head of the classification softmax as the retrieval feature vector. + +Therefore its format is the same as the multi-classification task. Each pedestrian is assigned an exclusive id, which is different for different pedestrians while the same pedestrian has the same id in different images. + +For example, images 0001.jpg, 0003.jpg are the same person, 0002.jpg, 0004.jpg are different pedestrians. Then the labeled ids are. + +``` +0001.jpg 00001 +0002.jpg 00002 +0003.jpg 00001 +0004.jpg 00003 +... +``` + +### Data Annotation + +After understanding the meaning of the `annotation` format above, we can work on the data annotation. The essence of data annotation is that each single person diagram creates an annotation item that corresponds to the id assigned to that pedestrian. + +For example: + +For an original picture + +1) Use bouding boxes to annotate the position of each person in the picture. + +2) Each bouding box (corresponding to each person) contains an int id attribute. For example, the person in 0001.jpg in the above example corresponds to id: 1. + +After the annotation is completed, use the detection box to intercept each person into a single picture, the picture and id attribute annotation will establish a corresponding relationship. You can also first cut into a single image and then annotate, the result is the same. + + + +## Model Training + +Once the data is annotated, it can be used for model training to complete the optimization of the customized model. + +There are two main steps to implement: 1) organize the data and annotated data into a training format. 2) modify the configuration file to start training. + +### Training data format + +The training data consists of the images used for training and a training list bounding_box_train.txt, the location of which is specified in the training configuration, with the following example placement. + + +``` +REID/ +|-- data Training image folder +|-- 00001.jpg +|-- 00002.jpg +|-- 0000x.jpg +`-- bounding_box_train.txt List of training data +``` + +bounding_box_train.txt file contains the names of all training images (file path relative to the root path) + 1 id annotation value + +Each line represents a person's image and id annotation result. The format is as follows: + +``` +0001.jpg 00001 +0002.jpg 00002 +0003.jpg 00001 +0004.jpg 00003 +``` + +Note: The images are separated from the annotated values by a Tab[\t] symbol. This format must be correct, otherwise, the parsing will fail. + + + +### Modify the configuration to start training + +First, execute the following command to download the training code (for more environment issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md): + +``` +git clone https://github.com/PaddlePaddle/PaddleClas +``` + +You need to change the following configuration items in the configuration file [softmax_triplet_with_center.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/reid/strong_ baseline/softmax_triplet_with_center.yaml): + +``` + Head: + name: "FC" + embedding_size: *feat_dim + class_num: &class_num 751 #Total number of pedestrian ids + +DataLoader: + Train: + dataset: + name: "Market1501" + image_root: ". /dataset/" #training image root path + cls_label_path: "bounding_box_train" #training_file_list + + + Eval: + Query: + dataset: + name: "Market1501" + image_root: ". /dataset/" #Evaluated image root path + cls_label_path: "query" #List of evaluation files +``` + +Note: + +1. Here the image_root path + the relative path of the image in the bounding_box_train.txt corresponds to the full path where the image is stored. + +Then run the following command to start the training. + +``` +#Multi-card training +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml + +#Single card training +python3 tools/train.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml +``` + +After the training is completed, you may run the following commands for performance evaluation: + +``` +#Multi-card evaluation +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model + +#Single card evaluation +python3 tools/eval.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model +``` + +### Model Export + +Use the following command to export the trained model as an inference deployment model. + +``` +python3 tools/export_model.py \ + -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \ + -o Global.pretrained_model=./output/strong_baseline/best_model \ + -o Global.save_inference_dir=deploy/models/strong_baseline_inference +``` + +After exporting the model, download the [infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/REID/infer_cfg.yml) file to the newly exported model folder 'strong_baseline_ inference'. + +Change the model path `model_dir` in the configuration file `infer_cfg_pphuman.yml` in PP-Human and set `enable`. + +``` +REID: + model_dir: [YOUR_DEPLOY_MODEL_DIR]/strong_baseline_inference/ + enable: True +``` + +Now, the model is ready. diff --git a/docs/advanced_tutorials/openvino_inference/README.md b/docs/advanced_tutorials/openvino_inference/README.md index 0a67651d9baadf6708adcda21628f9a1dcaaf31c..ac372eaa30d1f02a5016b42b928e9c56bfea547a 100644 --- a/docs/advanced_tutorials/openvino_inference/README.md +++ b/docs/advanced_tutorials/openvino_inference/README.md @@ -3,9 +3,9 @@ ## Introduction PaddleDetection has been a vibrant open-source project and has a large amout of contributors and maintainers around it. It is an AI framework which enables developers to quickly integrate AI capacities into their own projects and applications. -Intel OpenVINO is a widely used free toolkit. It facilitates the optimization of a deep learning model from a framework and deployment using an inference engine onto Intel hardware. +Intel OpenVINO is a widely used free toolkit. It facilitates the optimization of a deep learning model from a framework and deployment using an inference engine onto Intel hardware. -Apparently, the upstream(Paddle) and the downstream(Intel OpenVINO) can work together to streamline and simplify the process of developing an AI model and deploying the model onto hardware, which, in turn, makes our lives easier. +Apparently, the upstream(Paddle) and the downstream(Intel OpenVINO) can work together to streamline and simplify the process of developing an AI model and deploying the model onto hardware, which, in turn, makes our lives easier. This article will show you how to use a PaddleDetection model [FairMOT](../../../configs/mot/fairmot/README.md) from the Model Zoo in PaddleDetection and use it with OpenVINO to do the inference. @@ -50,7 +50,7 @@ Once the Paddle model has been converted to ONNX format, we can then use it with 1. ### Get the execution network -So the 1st thing to do here is to get an execution network which can be used later to do the inference. +So the 1st thing to do here is to get an execution network which can be used later to do the inference. Here is the code. @@ -70,7 +70,7 @@ Every AI model has its own steps of preprocessing, let's have a look how to do i ``` def prepare_input(): transforms = [ - T.Resize(target_size=(target_width, target_height)), + T.Resize(target_size=(target_width, target_height)), T.Normalize(mean=(0,0,0), std=(1,1,1)) ] img_file = root_path / "images/street.jpeg" @@ -87,7 +87,7 @@ def prepare_input(): 3. ### Prediction -After we have done all the load network and preprocessing, it finally comes to the stage of prediction. +After we have done all the load network and preprocessing, it finally comes to the stage of prediction. ``` @@ -100,7 +100,7 @@ You might be surprised to see the very exciting stage this small. Hang on there, 4. ### Post-processing -MOT(Multi-Object Tracking) is special, not like other AI models which require a few steps of post-processing. Instead, FairMOT requires a special object called tracker, to handle the prediction results. The prediction results are prediction detections and prediction embeddings. +MOT(Multi-Object Tracking) is special, not like other AI models which require a few steps of post-processing. Instead, FairMOT requires a special object called tracker, to handle the prediction results. The prediction results are prediction detections and prediction embeddings. Luckily, PaddleDetection has made this procesure easy for us, it has exported the JDETracker from `ppdet`, so that we do not need to write much code to handle it. @@ -156,4 +156,4 @@ So these are the all steps which you need to follow in order to run FairMOT on y A companion article which explains in details of this procedure will be released soon and a link to that article will be updated here soon. -To see the full code, please take a look at [Paddle OpenVINO Prediction](docs/advanced_tutorials/openvino_inference/fairmot_onnx_openvino.py). \ No newline at end of file +To see the full code, please take a look at [Paddle OpenVINO Prediction](./fairmot_onnx_openvino.py). diff --git a/docs/advanced_tutorials/openvino_inference/README_cn.md b/docs/advanced_tutorials/openvino_inference/README_cn.md index 2fc001757157501bccaeffc37d900cbc6d31e7eb..aaaf84eb05c26359fcc48cb14a3f6104bd834d5d 100644 --- a/docs/advanced_tutorials/openvino_inference/README_cn.md +++ b/docs/advanced_tutorials/openvino_inference/README_cn.md @@ -68,7 +68,7 @@ def get_net(): ``` def prepare_input(): transforms = [ - T.Resize(target_size=(target_width, target_height)), + T.Resize(target_size=(target_width, target_height)), T.Normalize(mean=(0,0,0), std=(1,1,1)) ] img_file = root_path / "images/street.jpeg" @@ -93,7 +93,7 @@ def predict(exec_net, input): return result ``` -您可能会惊讶地看到, 最激动人心的步骤居然如此简单。 不过下一个阶段会加复杂。 +您可能会惊讶地看到, 最激动人心的步骤居然如此简单。 不过下一个阶段会更加复杂。 4. ### 后处理 @@ -138,7 +138,7 @@ def postprocess(pred_dets, pred_embs, threshold = 0.5): 5. ### 画出检测框(可选) -这一步是可选的。出于演示目的,我只使用 `plot_tracking_dict()` 方法在图像上绘制所有边界框。 但是,如果您没有相同的要求,则不需要这样做。 +这一步是可选的。出于演示目的,我只使用 `plot_tracking_dict()` 方法在图像上绘制所有边界框。 但是,如果您没有相同的要求,则不需要这样做。 ``` online_im = plot_tracking_dict( @@ -154,4 +154,4 @@ online_im = plot_tracking_dict( 之后会有一篇详细解释此过程的配套文章将会发布,并且该文章的链接将很快在此处更新。 -完整代码请查看 [Paddle OpenVINO 预测](docs/advanced_tutorials/openvino_inference/fairmot_onnx_openvino.py). \ No newline at end of file +完整代码请查看 [Paddle OpenVINO 预测](./fairmot_onnx_openvino.py). diff --git a/docs/images/add_attribute.png b/docs/images/add_attribute.png new file mode 100644 index 0000000000000000000000000000000000000000..1d6092c4a3f778f08b0636875bdcb30a51d0655d Binary files /dev/null and b/docs/images/add_attribute.png differ diff --git a/docs/images/bus.jpg b/docs/images/bus.jpg new file mode 100644 index 0000000000000000000000000000000000000000..cdbbf8c9ba9990fb228360db590e37f078160767 Binary files /dev/null and b/docs/images/bus.jpg differ diff --git a/docs/images/dog.jpg b/docs/images/dog.jpg new file mode 100644 index 0000000000000000000000000000000000000000..237c084d9b0dd5cf32e9ec5463ab027ebd148df8 Binary files /dev/null and b/docs/images/dog.jpg differ diff --git a/docs/images/fitness_demo.gif b/docs/images/fitness_demo.gif index b56ab6563fb02cc77bc29235b3090f46f5597859..d96a3720158add7fec7eeaeae92b5d2d243bca3d 100644 Binary files a/docs/images/fitness_demo.gif and b/docs/images/fitness_demo.gif differ diff --git a/docs/images/fps_map.png b/docs/images/fps_map.png index d73877729c0775709e5954c008a88776bf48606a..5c22d725b01d374de8f394096f140b3d33cebfa7 100644 Binary files a/docs/images/fps_map.png and b/docs/images/fps_map.png differ diff --git a/docs/images/pphumanv2.png b/docs/images/pphumanv2.png new file mode 100644 index 0000000000000000000000000000000000000000..829dd60865b18b211e97e6a1c405dc2ee3d24c4b Binary files /dev/null and b/docs/images/pphumanv2.png differ diff --git a/docs/images/res.jpg b/docs/images/res.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6f281fa3be0053d5a919da4ee36c6005e0664daa Binary files /dev/null and b/docs/images/res.jpg differ diff --git a/docs/images/tinypose_app.png b/docs/images/tinypose_app.png index 750a532fa13f51c33b0a7a24372a6fbeb28b9111..fd43ebcdcaec7bda1c57378e7b82b9d103ee3cb2 100644 Binary files a/docs/images/tinypose_app.png and b/docs/images/tinypose_app.png differ diff --git a/docs/tutorials/DistributedTraining_cn.md b/docs/tutorials/DistributedTraining_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..80ff32ecb0f3bf86b30ce59db55e0fc1d4197cbe --- /dev/null +++ b/docs/tutorials/DistributedTraining_cn.md @@ -0,0 +1,50 @@ +[English](DistributedTraining_en.md) | 简体中文 + + +# 分布式训练 + +## 1. 简介 + +* 分布式训练指的是将训练任务按照一定方法拆分到多个计算节点进行计算,再按照一定的方法对拆分后计算得到的梯度等信息进行聚合与更新。飞桨分布式训练技术源自百度的业务实践,在自然语言处理、计算机视觉、搜索和推荐等领域经过超大规模业务检验。分布式训练的高性能,是飞桨的核心优势技术之一,PaddleDetection同时支持单机训练与多机训练。更多关于分布式训练的方法与文档可以参考:[分布式训练快速开始教程](https://fleet-x.readthedocs.io/en/latest/paddle_fleet_rst/parameter_server/ps_quick_start.html)。 + +## 2. 使用方法 + +### 2.1 单机训练 + +* 以PP-YOLOE-s为例,本地准备好数据之后,使用`paddle.distributed.launch`或者`fleetrun`的接口启动训练任务即可。下面为运行脚本示例。 + +```bash +fleetrun \ +--selected_gpu 0,1,2,3,4,5,6,7 \ +tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \ +--eval &>logs.txt 2>&1 & +``` + +### 2.2 多机训练 + +* 相比单机训练,多机训练时,只需要添加`--ips`的参数,该参数表示需要参与分布式训练的机器的ip列表,不同机器的ip用逗号隔开。下面为运行代码示例。 + +```shell +ip_list="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" +fleetrun \ +--ips=${ip_list} \ +--selected_gpu 0,1,2,3,4,5,6,7 \ +tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \ +--eval &>logs.txt 2>&1 & +``` + +**注:** +* 不同机器的ip信息需要用逗号隔开,可以通过`ifconfig`或者`ipconfig`查看。 +* 不同机器之间需要做免密设置,且可以直接ping通,否则无法完成通信。 +* 不同机器之间的代码、数据与运行命令或脚本需要保持一致,且所有的机器上都需要运行设置好的训练命令或者脚本。最终`ip_list`中的第一台机器的第一块设备是trainer0,以此类推。 +* 不同机器的起始端口可能不同,建议在启动多机任务前,在不同的机器中设置相同的多机运行起始端口,命令为`export FLAGS_START_PORT=17000`,端口值建议在`10000~20000`之间。 + + +## 3. 性能效果测试 + +* 在单机和4机8卡V100的机器上,基于[PP-YOLOE-s](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml)进行模型训练,模型的训练耗时情况如下所示。 + +机器 | 精度 | 耗时 +-|-|- +单机8卡 | 42.7% | 39h +4机8卡 | 42.1% | 13h diff --git a/docs/tutorials/DistributedTraining_en.md b/docs/tutorials/DistributedTraining_en.md new file mode 100644 index 0000000000000000000000000000000000000000..9fea8637340dd76fa9d416d9a40ccd88d17d86db --- /dev/null +++ b/docs/tutorials/DistributedTraining_en.md @@ -0,0 +1,44 @@ +English | [简体中文](DistributedTraining_cn.md) + + +## 1. Usage + +### 1.1 Single-machine + +* Take PP-YOLOE-s as an example, after preparing the data locally, use the interface of `paddle.distributed.launch` or `fleetrun` to start the training task. Below is an example of running the script. + +```bash +fleetrun \ +--selected_gpu 0,1,2,3,4,5,6,7 \ +tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \ +--eval &>logs.txt 2>&1 & +``` + +### 1.2 Multi-machine + +* Compared with single-machine training, when training on multiple machines, you only need to add the `--ips` parameter, which indicates the ip list of machines that need to participate in distributed training. The ips of different machines are separated by commas. Below is an example of running code. + +```shell +ip_list="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" +fleetrun \ +--ips=${ip_list} \ +--selected_gpu 0,1,2,3,4,5,6,7 \ +tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \ +--eval &>logs.txt 2>&1 & +``` + +**Note:** +* The ip information of different machines needs to be separated by commas, which can be viewed through `ifconfig` or `ipconfig`. +* Password-free settings are required between different machines, and they can be pinged directly, otherwise the communication cannot be completed. +* The code, data, and running commands or scripts between different machines need to be consistent, and the set training commands or scripts need to be run on all machines. The first device of the first machine in the final `ip_list` is trainer0, and so on. +* The starting port of different machines may be different. It is recommended to set the same starting port for multi-machine running in different machines before starting the multi-machine task. The command is `export FLAGS_START_PORT=17000`, and the port value is recommended to be `10000~20000`. + + +## 2. Performance + +* On single-machine and 4-machine 8-card V100 machines, model training is performed based on [PP-YOLOE-s](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml). The model training time is as follows. + +Machine | mAP | Time cost +-|-|- +single machine | 42.7% | 39h +4 machines | 42.1% | 13h diff --git "a/docs/tutorials/FAQ/FAQ\347\254\254\351\233\266\346\234\237.md" "b/docs/tutorials/FAQ/FAQ\347\254\254\351\233\266\346\234\237.md" index 5d45d14049ea6e929d1d0df50ea72b013f296f65..4478495bff8e52ed1377ad8e09ee63a49ce606da 100644 --- "a/docs/tutorials/FAQ/FAQ\347\254\254\351\233\266\346\234\237.md" +++ "b/docs/tutorials/FAQ/FAQ\347\254\254\351\233\266\346\234\237.md" @@ -59,7 +59,7 @@ TrainReader: - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} # 训练时batch_size batch_size: 24 - # 读取数据是是否乱序 + # 读取数据是否乱序 shuffle: true # 是否丢弃最后不能完整组成batch的数据 drop_last: true diff --git a/docs/tutorials/GETTING_STARTED.md b/docs/tutorials/GETTING_STARTED.md index cdca9fbb8ec7022b11ab2f039acf6b6915957221..6ed4043a2f69fcbba19e757c6abdaa6b8507fc7b 100644 --- a/docs/tutorials/GETTING_STARTED.md +++ b/docs/tutorials/GETTING_STARTED.md @@ -11,13 +11,12 @@ instructions](INSTALL_cn.md). ## Data preparation -- Please refer to [PrepareDataSet](PrepareDataSet.md) for data preparation +- Please refer to [PrepareDetDataSet](./data/PrepareDetDataSet_en.md) for data preparation - Please set the data path for data configuration file in ```configs/datasets``` - ## Training & Evaluation & Inference -PaddleDetection provides scripts for training, evalution and inference with various features according to different configure. +PaddleDetection provides scripts for training, evalution and inference with various features according to different configure. And for more distribued training details see [DistributedTraining].(./DistributedTraining_en.md) ```bash # training on single-GPU @@ -26,6 +25,9 @@ python tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml # training on multi-GPU export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml +# training on multi-machines and multi-GPUs +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +$fleetrun --ips="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" --selected_gpu 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml # GPU evaluation export CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams @@ -53,7 +55,7 @@ list below can be viewed by `--help` | --draw_threshold | infer | Threshold to reserve the result for visualization | 0.5 | such as `--draw_threshold 0.7` | | --infer_dir | infer | Directory for images to perform inference on | None | One of `infer_dir` and `infer_img` is requied | | --infer_img | infer | Image path | None | One of `infer_dir` and `infer_img` is requied, `infer_img` has higher priority over `infer_dir` | - +| --save_results | infer | Whether to save detection results to file | False | Optional @@ -128,7 +130,7 @@ list below can be viewed by `--help` --output_dir=infer_output/ \ --draw_threshold=0.5 \ -o weights=output/faster_rcnn_r50_fpn_1x_coco/model_final \ - --use_vdl=Ture + --use_vdl=True ``` `--draw_threshold` is an optional argument. Default is 0.5. diff --git a/docs/tutorials/GETTING_STARTED_cn.md b/docs/tutorials/GETTING_STARTED_cn.md index ea94a5e09b4ef04edd4c415808faea81bdc6fba1..c0230f514746344b0d2ad1f1c36f8c68c5c4e45d 100644 --- a/docs/tutorials/GETTING_STARTED_cn.md +++ b/docs/tutorials/GETTING_STARTED_cn.md @@ -12,7 +12,7 @@ PaddleDetection作为成熟的目标检测开发套件,提供了从数据准 ## 2 准备数据 目前PaddleDetection支持:COCO VOC WiderFace, MOT四种数据格式。 -- 首先按照[准备数据文档](PrepareDataSet.md) 准备数据。 +- 首先按照[准备数据文档](./data/PrepareDetDataSet.md) 准备数据。 - 然后设置`configs/datasets`中相应的coco或voc等数据配置文件中的数据路径。 - 在本项目中,我们使用路标识别数据集 ```bash @@ -83,7 +83,7 @@ ppyolov2_reader.yml 主要说明数据读取器配置,如batch size,并发 * 关于数据的路径修改说明 在修改配置文件中,用户如何实现自定义数据集是非常关键的一步,如何定义数据集请参考[如何自定义数据集](https://aistudio.baidu.com/aistudio/projectdetail/1917140) * 默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8) -* 更多使用问题,请参考[FAQ](FAQ.md) +* 更多使用问题,请参考[FAQ](FAQ) ## 4 训练 @@ -99,6 +99,15 @@ python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 #windows和Mac下不需要执行该命令 python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml ``` + +* [GPU多机多卡训练](./DistributedTraining_cn.md) +```bash +$fleetrun \ +--ips="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" \ +--selected_gpu 0,1,2,3,4,5,6,7 \ +tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \ +``` + * Fine-tune其他任务 使用预训练模型fine-tune其他任务时,可以直接加载预训练模型,形状不匹配的参数将自动忽略,例如: @@ -161,7 +170,7 @@ python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \ --output_dir=infer_output/ \ --draw_threshold=0.5 \ -o weights=output/yolov3_mobilenet_v1_roadsign/model_final \ - --use_vdl=Ture + --use_vdl=True ``` `--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,不同阈值会产生不同的结果 @@ -215,7 +224,7 @@ visualdl --logdir vdl_dir/scalar/ | --draw_threshold | infer | 可视化时分数阈值 | 0.5 | 例如`--draw_threshold=0.7` | | --infer_dir | infer | 用于预测的图片文件夹路径 | None | `--infer_img`和`--infer_dir`必须至少设置一个 | | --infer_img | infer | 用于预测的图片路径 | None | `--infer_img`和`--infer_dir`必须至少设置一个,`infer_img`具有更高优先级 | -| --save_txt | infer | 是否在文件夹下将图片的预测结果保存到文本文件中 | False | 可选 | +| --save_results | infer | 是否在文件夹下将图片的预测结果保存到文件中 | False | 可选 | ## 8 模型导出 @@ -245,7 +254,7 @@ PaddleDetection提供了PaddleInference、PaddleServing、PaddleLite多种部署 ```bash python deploy/python/infer.py --model_dir=./output_inference/yolov3_mobilenet_v1_roadsign --image_file=demo/road554.png --device=GPU ``` -* 同时`infer.py`提供了丰富的接口,用户进行接入视频文件、摄像头进行预测,更多内容请参考[Python端预测部署](../../deploy/python.md) +* 同时`infer.py`提供了丰富的接口,用户进行接入视频文件、摄像头进行预测,更多内容请参考[Python端预测部署](../../deploy/python) ### PaddleDetection支持的部署形式说明 |形式|语言|教程|设备/平台| |-|-|-|-| diff --git a/docs/tutorials/INSTALL.md b/docs/tutorials/INSTALL.md index 01cc10b5fb92cad6dbafa84f3371f6ee5cde6cfb..418289b0d4b34c3e4b3df05adb52c0dc277e33dd 100644 --- a/docs/tutorials/INSTALL.md +++ b/docs/tutorials/INSTALL.md @@ -22,7 +22,8 @@ Dependency of PaddleDetection and PaddlePaddle: | PaddleDetection version | PaddlePaddle version | tips | | :----------------: | :---------------: | :-------: | -| develop | >= 2.2.0rc | Dygraph mode is set as default | +| develop | >= 2.2.2 | Dygraph mode is set as default | +| release/2.4 | >= 2.2.2 | Dygraph mode is set as default | | release/2.3 | >= 2.2.0rc | Dygraph mode is set as default | | release/2.2 | >= 2.1.2 | Dygraph mode is set as default | | release/2.1 | >= 2.1.0 | Dygraph mode is set as default | @@ -109,6 +110,16 @@ Ran 7 tests in 12.816s OK ``` +## Use built Docker images + +> If you do not have a Docker environment, please refer to [Docker](https://www.docker.com/). + +We provide docker images containing the latest PaddleDetection code, and all environment and package dependencies are pre-installed. All you have to do is to **pull and run the docker image**. Then you can enjoy PaddleDetection without any extra steps. + +Get these images and guidance in [docker hub](https://hub.docker.com/repository/docker/paddlecloud/paddledetection), including CPU, GPU, ROCm environment versions. + +If you have some customized requirements about automatic building docker images, you can get it in github repo [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton). + ## Inference demo **Congratulation!** Now you have installed PaddleDetection successfully and try our inference demo: diff --git a/docs/tutorials/INSTALL_cn.md b/docs/tutorials/INSTALL_cn.md index ee8672235cb7edde98b1f49f52f24e4cd2f04fad..4afa7600f3dcd7ec1b3feb9d1cf0285b4de3bde5 100644 --- a/docs/tutorials/INSTALL_cn.md +++ b/docs/tutorials/INSTALL_cn.md @@ -18,8 +18,9 @@ PaddleDetection 依赖 PaddlePaddle 版本关系: | PaddleDetection版本 | PaddlePaddle版本 | 备注 | | :------------------: | :---------------: | :-------: | -| develop | >= 2.2.0rc | 默认使用动态图模式 | -| release/2.3 | >= 2.2.0rc | 默认使用动态图模式 | +| develop | >= 2.2.2 | 默认使用动态图模式 | +| release/2.4 | >= 2.2.2 | 默认使用动态图模式 | +| release/2.3 | >= 2.2.0rc | 默认使用动态图模式 | | release/2.2 | >= 2.1.2 | 默认使用动态图模式 | | release/2.1 | >= 2.1.0 | 默认使用动态图模式 | | release/2.0 | >= 2.0.1 | 默认使用动态图模式 | @@ -102,6 +103,14 @@ Ran 7 tests in 12.816s OK ``` +## 使用Docker镜像 +> 如果您没有Docker运行环境,请参考[Docker官网](https://www.docker.com/)进行安装。 + +我们提供了包含最新 PaddleDetection 代码的docker镜像,并预先安装好了所有的环境和库依赖,您只需要**拉取docker镜像**,然后**运行docker镜像**,无需其他任何额外操作,即可开始使用PaddleDetection的所有功能。 + +在[Docker Hub](https://hub.docker.com/repository/docker/paddlecloud/paddledetection)中获取这些镜像及相应的使用指南,包括CPU、GPU、ROCm版本。 +如果您对自动化制作docker镜像感兴趣,或有自定义需求,请访问[PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton)做进一步了解。 + ## 快速体验 **恭喜!** 您已经成功安装了PaddleDetection,接下来快速体验目标检测效果 diff --git a/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md b/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md index 460af362bff54f708cc49f2806434a77582c9aee..32b9024bfa025f008ff236214c68ffe8c1d7b5ec 100644 --- a/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md +++ b/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md @@ -90,7 +90,7 @@ TrainReader: - PadBatch: {pad_to_stride: 32} # 训练时batch_size batch_size: 1 - # 读取数据是是否乱序 + # 读取数据是否乱序 shuffle: true # 是否丢弃最后不能完整组成batch的数据 drop_last: true @@ -110,7 +110,7 @@ EvalReader: - PadBatch: {pad_to_stride: 32} # 评估时batch_size batch_size: 1 - # 读取数据是是否乱序 + # 读取数据是否乱序 shuffle: false # 是否丢弃最后不能完整组成batch的数据 drop_last: false @@ -130,7 +130,7 @@ TestReader: - PadBatch: {pad_to_stride: 32} # 测试时batch_size batch_size: 1 - # 读取数据是是否乱序 + # 读取数据是否乱序 shuffle: false # 是否丢弃最后不能完整组成batch的数据 drop_last: false diff --git a/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md b/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md index 9c7985fd26971b5e412d7a920a0d46eed62204e5..2cbc188dc345c84ca619284baaf610d757cc3414 100644 --- a/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md +++ b/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md @@ -102,7 +102,7 @@ TrainReader: - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} # 训练时batch_size batch_size: 24 - # 读取数据是是否乱序 + # 读取数据是否乱序 shuffle: true # 是否丢弃最后不能完整组成batch的数据 drop_last: true diff --git a/docs/tutorials/data/DetAnnoTools.md b/docs/tutorials/data/DetAnnoTools.md new file mode 100644 index 0000000000000000000000000000000000000000..fd7c8fee9124cddc2146cda252a11a9c95bf679f --- /dev/null +++ b/docs/tutorials/data/DetAnnoTools.md @@ -0,0 +1,279 @@ +简体中文 | [English](DetAnnoTools_en.md) + + + +# 目标检测标注工具 + +## 目录 + +[LabelMe](#LabelMe) + +* [使用说明](#使用说明) + * [安装](#LabelMe安装) + * [图片标注过程](#LabelMe图片标注过程) +* [标注格式](#LabelMe标注格式) + * [导出数据格式](#LabelMe导出数据格式) + * [格式转化总结](#格式转化总结) + * [标注文件(json)-->VOC](#标注文件(json)-->VOC数据集) + * [标注文件(json)-->COCO](#标注文件(json)-->COCO数据集) + +[LabelImg](#LabelImg) + +* [使用说明](#使用说明) + * [LabelImg安装](#LabelImg安装) + * [安装注意事项](#安装注意事项) + * [图片标注过程](#LabelImg图片标注过程) +* [标注格式](#LabelImg标注格式) + * [导出数据格式](#LabelImg导出数据格式) + * [格式转换注意事项](#格式转换注意事项) + + + +## [LabelMe](https://github.com/wkentaro/labelme) + +### 使用说明 + +#### LabelMe安装 + +具体安装操作请参考[LabelMe官方教程](https://github.com/wkentaro/labelme)中的Installation + +
    + Ubuntu + +``` +sudo apt-get install labelme + +# or +sudo pip3 install labelme + +# or install standalone executable from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + +
    + macOS + +``` +brew install pyqt # maybe pyqt5 +pip install labelme + +# or +brew install wkentaro/labelme/labelme # command line interface +# brew install --cask wkentaro/labelme/labelme # app + +# or install standalone executable/app from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + + + +推荐使用Anaconda的安装方式 + +``` +conda create –name=labelme python=3 +conda activate labelme +pip install pyqt5 +pip install labelme +``` + + + + + +#### LabelMe图片标注过程 + +启动labelme后,选择图片文件或者图片所在文件夹 + +左侧编辑栏选择`create polygons` 绘制标注区域如下图所示(右击图像区域可以选择不同的标注形状),绘制好区域后按下回车,弹出新的框填入标注区域对应的标签,如:people + +左侧菜单栏点击保存,生成`json`形式的**标注文件** + +![](https://media3.giphy.com/media/XdnHZgge5eynRK3ATK/giphy.gif?cid=790b7611192e4c0ec2b5e6990b6b0f65623154ffda66b122&rid=giphy.gif&ct=g) + + + +### LabelMe标注格式 + +#### LabelMe导出数据格式 + +``` +#生成标注文件 +png/jpeg/jpg-->labelme标注-->json +``` + + + + + +#### 格式转化总结 + +``` +#标注文件转化为VOC数据集格式 +json-->labelme2voc.py-->VOC数据集 + +#标注文件转化为COCO数据集格式 +json-->labelme2coco.py-->COCO数据集 +``` + + + + + +#### 标注文件(json)-->VOC数据集 + +使用[官方给出的labelme2voc.py](https://github.com/wkentaro/labelme/blob/main/examples/bbox_detection/labelme2voc.py)这份脚本 + +下载该脚本,在命令行中使用 + +```Te +python labelme2voc.py data_annotated(标注文件所在文件夹) data_dataset_voc(输出文件夹) --labels labels.txt +``` + +运行后,在指定的输出文件夹中会如下的目录 + +``` +# It generates: +# - data_dataset_voc/JPEGImages +# - data_dataset_voc/Annotations +# - data_dataset_voc/AnnotationsVisualization + +``` + + + + + +#### 标注文件(json)-->COCO数据集 + +使用[PaddleDetection提供的x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/tools/x2coco.py) 将labelme标注的数据转换为COCO数据集形式 + +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` + +用户数据集转成COCO数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错): + +``` +dataset/xxx/ +├── annotations +│ ├── train.json # coco数据的标注文件 +│ ├── valid.json # coco数据的标注文件 +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` + + + + + +## [LabelImg](https://github.com/tzutalin/labelImg) + +### 使用说明 + +#### LabelImg安装 + +安装操作请参考[LabelImg官方教程](https://github.com/tzutalin/labelImg) + +
    + Ubuntu + +``` +sudo apt-get install pyqt5-dev-tools +sudo pip3 install -r requirements/requirements-linux-python3.txt +make qt5py3 +python3 labelImg.py +python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + +
    + +
    +macOS + +``` +brew install qt # Install qt-5.x.x by Homebrew +brew install libxml2 + +or using pip + +pip3 install pyqt5 lxml # Install qt and lxml by pip + +make qt5py3 +python3 labelImg.py +python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + +
    + + + +推荐使用Anaconda的安装方式 + + 首先下载并进入 [labelImg](https://github.com/tzutalin/labelImg#labelimg) 的目录 + +``` +conda install pyqt=5 +conda install -c anaconda lxml +pyrcc5 -o libs/resources.py resources.qrc +python labelImg.py +python labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + + + + + +#### 安装注意事项 + +以Anaconda安装方式为例,比Labelme配置要麻烦一些 + +启动方式是通过python运行脚本`python labelImg.py <图片路径>` + + + +#### LabelImg图片标注过程 + +启动labelImg后,选择图片文件或者图片所在文件夹 + +左侧编辑栏选择`创建区块` 绘制标注区,在弹出新的框选择对应的标签 + +左侧菜单栏点击保存,可以选择VOC/YOLO/CreateML三种类型的标注文件 + + + +![](https://user-images.githubusercontent.com/34162360/177526022-fd9c63d8-e476-4b63-ae02-76d032bb7656.gif) + + + + + +### LabelImg标注格式 + +#### LabelImg导出数据格式 + +``` +#生成标注文件 +png/jpeg/jpg-->labelImg标注-->xml/txt/json +``` + + + +#### 格式转换注意事项 + +**PaddleDetection支持VOC或COCO格式的数据**,经LabelImg标注导出后的标注文件,需要修改为**VOC或COCO格式**,调整说明可以参考[准备训练数据](./PrepareDataSet.md#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE) + diff --git a/docs/tutorials/data/DetAnnoTools_en.md b/docs/tutorials/data/DetAnnoTools_en.md new file mode 100644 index 0000000000000000000000000000000000000000..7b948d213fc0f1f49c6ee21276220ee94f3496c9 --- /dev/null +++ b/docs/tutorials/data/DetAnnoTools_en.md @@ -0,0 +1,271 @@ +[简体中文](DetAnnoTools.md) | English + + + +# Object Detection Annotation Tools + +## Concents + +[LabelMe](#LabelMe) + +* [Instruction](#Instruction-of-LabelMe) + * [Installation](#Installation) + * [Annotation of Images](#Annotation-of-images-in-LabelMe) +* [Annotation Format](#Annotation-Format-of-LabelMe) + * [Export Format](#Export-Format-of-LabelMe) + * [Summary of Format Conversion](#Summary-of-Format-Conversion) + * [Annotation file(json)—>VOC Dataset](#annotation-filejsonvoc-dataset) + * [Annotation file(json)—>COCO Dataset](#annotation-filejsoncoco-dataset) + +[LabelImg](#LabelImg) + +* [Instruction](#Instruction-of-LabelImg) + * [Installation](#Installation-of-LabelImg) + * [Installation Notes](#Installation-Notes) + * [Annotation of images](#Annotation-of-images-in-LabelImg) +* [Annotation Format](#Annotation-Format-of-LabelImg) + * [Export Format](#Export-Format-of-LabelImg) + * [Notes of Format Conversion](#Notes-of-Format-Conversion) + + + +## [LabelMe](https://github.com/wkentaro/labelme) + +### Instruction of LabelMe + +#### Installation + +Please refer to [The github of LabelMe](https://github.com/wkentaro/labelme) for installation details. + +
    + Ubuntu + +``` +sudo apt-get install labelme + +# or +sudo pip3 install labelme + +# or install standalone executable from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + +
    + macOS + +``` +brew install pyqt # maybe pyqt5 +pip install labelme + +# or +brew install wkentaro/labelme/labelme # command line interface +# brew install --cask wkentaro/labelme/labelme # app + +# or install standalone executable/app from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + + + +We recommend installing by Anoncanda. + +``` +conda create –name=labelme python=3 +conda activate labelme +pip install pyqt5 +pip install labelme +``` + + + + + +#### Annotation of Images in LabelMe + +After starting labelme, select an image or an folder with images. + +Select `create polygons` in the formula bar. Draw an annotation area as shown in the following GIF. You can right-click on the image to select different shape. When finished, press the Enter/Return key, then fill the corresponding label in the popup box, such as, people. + +Click the save button in the formula bar,it will generate an annotation file in json. + +![](https://media3.giphy.com/media/XdnHZgge5eynRK3ATK/giphy.gif?cid=790b7611192e4c0ec2b5e6990b6b0f65623154ffda66b122&rid=giphy.gif&ct=g) + + + +### Annotation Format of LabelMe + +#### Export Format of LabelMe + +``` +#generate an annotation file +png/jpeg/jpg-->labelme-->json +``` + + + + + +#### Summary of Format Conversion + +``` +#convert annotation file to VOC dataset format +json-->labelme2voc.py-->VOC dataset + +#convert annotation file to COCO dataset format +json-->labelme2coco.py-->COCO dataset +``` + + + + + +#### Annotation file(json)—>VOC Dataset + +Use this script [labelme2voc.py](https://github.com/wkentaro/labelme/blob/main/examples/bbox_detection/labelme2voc.py) in command line. + +```Te +python labelme2voc.py data_annotated(annotation folder) data_dataset_voc(output folder) --labels labels.txt +``` + +Then, it will generate following contents: + +``` +# It generates: +# - data_dataset_voc/JPEGImages +# - data_dataset_voc/Annotations +# - data_dataset_voc/AnnotationsVisualization + +``` + + + + + +#### Annotation file(json)—>COCO Dataset + +Convert the data annotated by LabelMe to COCO dataset by the script [x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/tools/x2coco.py) provided by PaddleDetection. + +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` + +After the user dataset is converted to COCO data, the directory structure is as follows (Try to avoid use Chinese for the path name in case of errors caused by Chinese coding problems): + +``` +dataset/xxx/ +├── annotations +│ ├── train.json # Annotation file of coco data +│ ├── valid.json # Annotation file of coco data +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` + + + + + +## [LabelImg](https://github.com/tzutalin/labelImg) + +### Instruction + +#### Installation of LabelImg + +Please refer to [The github of LabelImg](https://github.com/tzutalin/labelImg) for installation details. + +
    + Ubuntu + +``` +sudo apt-get install pyqt5-dev-tools +sudo pip3 install -r requirements/requirements-linux-python3.txt +make qt5py3 +python3 labelImg.py +python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + +
    + +
    +macOS + +``` +brew install qt # Install qt-5.x.x by Homebrew +brew install libxml2 + +or using pip + +pip3 install pyqt5 lxml # Install qt and lxml by pip + +make qt5py3 +python3 labelImg.py +python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + +
    + + + +We recommend installing by Anoncanda. + +Download and go to the folder of [labelImg](https://github.com/tzutalin/labelImg#labelimg) + +``` +conda install pyqt=5 +conda install -c anaconda lxml +pyrcc5 -o libs/resources.py resources.qrc +python labelImg.py +python labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE] +``` + + + + + +#### Installation Notes + +Use python scripts to startup LabelImg: `python labelImg.py ` + +#### Annotation of images in LabelImg + +After the startup of LabelImg, select an image or a folder with images. + +Select `Create RectBox` in the formula bar. Draw an annotation area as shown in the following GIF. When finished, select corresponding label in the popup box. Then save the annotated file in three forms: VOC/YOLO/CreateML. + + + +![](https://user-images.githubusercontent.com/34162360/177526022-fd9c63d8-e476-4b63-ae02-76d032bb7656.gif) + + + + + +### Annotation Format of LabelImg + +#### Export Format of LabelImg + +``` +#generate annotation files +png/jpeg/jpg-->labelImg-->xml/txt/json +``` + + + +#### Notes of Format Conversion + +**PaddleDetection supports the format of VOC or COCO.** The annotation file generated by LabelImg needs to be converted by VOC or COCO. You can refer to [PrepareDataSet](./PrepareDataSet.md#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE). + diff --git a/docs/tutorials/data/KeyPointAnnoTools.md b/docs/tutorials/data/KeyPointAnnoTools.md new file mode 100755 index 0000000000000000000000000000000000000000..678b94ac7571a76048dc9a232bd288e579a0483a --- /dev/null +++ b/docs/tutorials/data/KeyPointAnnoTools.md @@ -0,0 +1,165 @@ +简体中文 | [English](KeyPointAnnoTools_en.md) + +# 关键点检测标注工具 + +## 目录 + +[LabelMe](#LabelMe) + +- [使用说明](#使用说明) + - [安装](#安装) + - [关键点数据说明](#关键点数据说明) + - [图片标注过程](#图片标注过程) +- [标注格式](#标注格式) + - [导出数据格式](#导出数据格式) + - [格式转化总结](#格式转化总结) + - [标注文件(json)-->COCO](#标注文件(json)-->COCO数据集) + + + +## [LabelMe](https://github.com/wkentaro/labelme) + +### 使用说明 + +#### 安装 + +具体安装操作请参考[LabelMe官方教程](https://github.com/wkentaro/labelme)中的Installation + +
    + Ubuntu + +``` +sudo apt-get install labelme + +# or +sudo pip3 install labelme + +# or install standalone executable from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + +
    + macOS + +``` +brew install pyqt # maybe pyqt5 +pip install labelme + +# or +brew install wkentaro/labelme/labelme # command line interface +# brew install --cask wkentaro/labelme/labelme # app + +# or install standalone executable/app from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + + + +推荐使用Anaconda的安装方式 + +``` +conda create –name=labelme python=3 +conda activate labelme +pip install pyqt5 +pip install labelme +``` + + + +#### 关键点数据说明 + +以COCO数据集为例,共需采集17个关键点 + +``` +keypoint indexes: + 0: 'nose', + 1: 'left_eye', + 2: 'right_eye', + 3: 'left_ear', + 4: 'right_ear', + 5: 'left_shoulder', + 6: 'right_shoulder', + 7: 'left_elbow', + 8: 'right_elbow', + 9: 'left_wrist', + 10: 'right_wrist', + 11: 'left_hip', + 12: 'right_hip', + 13: 'left_knee', + 14: 'right_knee', + 15: 'left_ankle', + 16: 'right_ankle' +``` + + + + + +#### 图片标注过程 + +启动labelme后,选择图片文件或者图片所在文件夹 + +左侧编辑栏选择`create polygons` ,右击图像区域选择标注形状,绘制好关键点后按下回车,弹出新的框填入标注关键点对应的标签 + +左侧菜单栏点击保存,生成`json`形式的**标注文件** + +![操作说明](https://user-images.githubusercontent.com/34162360/178250648-29ee781a-676b-419c-83b1-de1e4e490526.gif) + + + +### 标注格式 + +#### 导出数据格式 + +``` +#生成标注文件 +png/jpeg/jpg-->labelme标注-->json +``` + + + +#### 格式转化总结 + +``` +#标注文件转化为COCO数据集格式 +json-->labelme2coco.py-->COCO数据集 +``` + + + + + +#### 标注文件(json)-->COCO数据集 + +使用[PaddleDetection提供的x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/tools/x2coco.py) 将labelme标注的数据转换为COCO数据集形式 + +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` + +用户数据集转成COCO数据后目录结构如下(注意数据集中路径名、文件名尽量不要使用中文,避免中文编码问题导致出错): + +``` +dataset/xxx/ +├── annotations +│ ├── train.json # coco数据的标注文件 +│ ├── valid.json # coco数据的标注文件 +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` + diff --git a/docs/tutorials/data/KeyPointAnnoTools_en.md b/docs/tutorials/data/KeyPointAnnoTools_en.md new file mode 100755 index 0000000000000000000000000000000000000000..3ef0548426d79cfd89267cbf6e8087e5dfa407dd --- /dev/null +++ b/docs/tutorials/data/KeyPointAnnoTools_en.md @@ -0,0 +1,165 @@ +[简体中文](KeyPointAnnoTools.md) | English + +# Key Points Detection Annotation Tool + +## Concents + +[LabelMe](#LabelMe) + +- [Instruction](#Instruction) + - [Installation](#Installation) + - [Notes of Key Points Data](#Notes-of-Key-Points-Data) + - [Annotation of LabelMe](#Annotation-of-LabelMe) +- [Annotation Format](#Annotation-Format) + - [Data Export Format](#Data-Export-Format) + - [Summary of Format Conversion](#Summary-of-Format-Conversion) + - [Annotation file(json)—>COCO Dataset](#annotation-filejsoncoco-dataset) + + + +## [LabelMe](https://github.com/wkentaro/labelme) + +### Instruction + +#### Installation + +Please refer to [The github of LabelMe](https://github.com/wkentaro/labelme) for installation details. + +
    + Ubuntu + +``` +sudo apt-get install labelme + +# or +sudo pip3 install labelme + +# or install standalone executable from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + +
    + macOS + +``` +brew install pyqt # maybe pyqt5 +pip install labelme + +# or +brew install wkentaro/labelme/labelme # command line interface +# brew install --cask wkentaro/labelme/labelme # app + +# or install standalone executable/app from: +# https://github.com/wkentaro/labelme/releases +``` + +
    + + + +We recommend installing by Anoncanda. + +``` +conda create –name=labelme python=3 +conda activate labelme +pip install pyqt5 +pip install labelme +``` + + + +#### Notes of Key Points Data + +COCO dataset needs to collect 17 key points. + +``` +keypoint indexes: + 0: 'nose', + 1: 'left_eye', + 2: 'right_eye', + 3: 'left_ear', + 4: 'right_ear', + 5: 'left_shoulder', + 6: 'right_shoulder', + 7: 'left_elbow', + 8: 'right_elbow', + 9: 'left_wrist', + 10: 'right_wrist', + 11: 'left_hip', + 12: 'right_hip', + 13: 'left_knee', + 14: 'right_knee', + 15: 'left_ankle', + 16: 'right_ankle' +``` + + + + + +#### Annotation of LabelMe + +After starting labelme, select an image or an folder with images. + +Select `create polygons` in the formula bar. Draw an annotation area as shown in the following GIF. You can right-click on the image to select different shape. When finished, press the Enter/Return key, then fill the corresponding label in the popup box, such as, people. + +Click the save button in the formula bar,it will generate an annotation file in json. + +![操作说明](https://user-images.githubusercontent.com/34162360/178250648-29ee781a-676b-419c-83b1-de1e4e490526.gif) + + + +### Annotation Format + +#### Data Export Format + +``` +#generate an annotation file +png/jpeg/jpg-->labelme-->json +``` + + + +#### Summary of Format Conversion + +``` +#convert annotation file to COCO dataset format +json-->labelme2coco.py-->COCO dataset +``` + + + + + +#### Annotation file(json)—>COCO Dataset + +Convert the data annotated by LabelMe to COCO dataset by this script [x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/tools/x2coco.py). + +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` + +After the user dataset is converted to COCO data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems): + +``` +dataset/xxx/ +├── annotations +│ ├── train.json # Annotation file of coco data +│ ├── valid.json # Annotation file of coco data +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` + diff --git a/docs/tutorials/data/MOTAnnoTools.md b/docs/tutorials/data/MOTAnnoTools.md new file mode 100644 index 0000000000000000000000000000000000000000..433a1a2808cf05cecbffe80e5d8297f2224d3bfb --- /dev/null +++ b/docs/tutorials/data/MOTAnnoTools.md @@ -0,0 +1,75 @@ +# 多目标跟踪标注工具 + + + +## 目录 + +* [前期准备](#前期准备) +* [SDE数据集](#SDE数据集) + * [LabelMe](#LabelMe) + * [LabelImg](#LabelImg) +* [JDE数据集](#JDE数据集) + * [DarkLabel](#DarkLabel) + * [标注格式](#标注格式) + + +### 前期准备 + +请先查看[多目标跟踪数据集准备](PrepareMOTDataSet.md)确定MOT模型选型和MOT数据集的类型。 +通常综合数据标注成本和模型精度速度平衡考虑,更推荐使用SDE系列数据集,和SDE系列模型的ByteTrack或OC-SORT。SDE系列数据集的标注工具与目标检测任务是一致的。 + +### SDE数据集 +SDE数据集是纯检测标注的数据集,用户自定义数据集可以参照[DET数据准备文档](./PrepareDetDataSet.md)准备。 + +#### LabelMe +LabelMe的使用可以参考[DetAnnoTools](DetAnnoTools.md) + +#### LabelImg +LabelImg的使用可以参考[DetAnnoTools](DetAnnoTools.md) + + +### JDE数据集 +JDE数据集是同时有检测和ReID标注的数据集,标注成本比SDE数据集更高。 + +#### [DarkLabel](https://github.com/darkpgmr/DarkLabel) + +#### 使用说明 + +##### 安装 + +从官方给出的下载[链接](https://github.com/darkpgmr/DarkLabel/releases)中下载想要的版本,Windows环境解压后能够直接使用 + +**视频/图片标注过程** + +1. 启动应用程序后,能看到左侧的工具栏 +2. 选择视频/图像文件后,按需选择标注形式: + * Box仅绘制标注框 + * Box+Label绘制标注框&标签 + * Box+Label+AutoID绘制标注框&标签&ID号 + * Popup LabelSelect可以自行定义标签 +3. 在视频帧/图像上进行拖动鼠标,进行标注框的绘制 +4. 绘制完成后,在上数第六行里选择保存标注文件的形式,默认.txt + +![1](https://user-images.githubusercontent.com/34162360/179673519-511b4167-97ed-4228-8869-db9c69a68b6b.mov) + + + +##### 注意事项 + +1. 如果标注的是视频文件,需要在工具栏上数第五行的下拉框里选择`[fn,cname,id,x1,y1,w,h]` (DarkLabel2.4版本) +2. 鼠标移动到标注框所在区域,右键可以删除标注框 +3. 按下shift,可以选中标注框,进行框的移动和对某条边的编辑 +4. 按住enter回车,可以自动跟踪标注目标 +5. 自动跟踪标注目标过程中可以暂停(松开enter),按需修改标注框 + + + +##### 其他使用参考视频 + +* [DarkLabel (Video/Image Annotation Tool) - Ver.2.0](https://www.youtube.com/watch?v=lok30aIZgUw) +* [DarkLabel (Image/Video Annotation Tool)](https://www.youtube.com/watch?v=vbydG78Al8s&t=11s) + + + +#### 标注格式 +标注文件需要转化为MOT JDE数据集格式,包含`images`和`labels_with_ids`文件夹,具体参照[用户自定义数据集准备](PrepareMOTDataSet.md#用户自定义数据集准备)。 diff --git a/docs/tutorials/PrepareDataSet.md b/docs/tutorials/data/PrepareDetDataSet.md similarity index 83% rename from docs/tutorials/PrepareDataSet.md rename to docs/tutorials/data/PrepareDetDataSet.md index ce829db69865888b9877d90a5cce59792dcefdc8..c6f65fc0ee4161e36af2f467f2c9d43af77344eb 100644 --- a/docs/tutorials/PrepareDataSet.md +++ b/docs/tutorials/data/PrepareDetDataSet.md @@ -1,4 +1,4 @@ -# 如何准备训练数据 +# 目标检测数据准备 ## 目录 - [目标检测数据说明](#目标检测数据说明) - [准备训练数据](#准备训练数据) @@ -8,11 +8,13 @@ - [COCO数据数据](#COCO数据数据) - [COCO数据集下载](#COCO数据下载) - [COCO数据标注文件介绍](#COCO数据标注文件介绍) - - [用户数据](#用户数据) + - [用户数据准备](#用户数据准备) - [用户数据转成VOC数据](#用户数据转成VOC数据) - [用户数据转成COCO数据](#用户数据转成COCO数据) - [用户数据自定义reader](#用户数据自定义reader) - - [用户数据数据转换示例](#用户数据数据转换示例) + - [用户数据使用示例](#用户数据使用示例) + - [数据格式转换](#数据格式转换) + - [自定义数据训练](#自定义数据训练) - [(可选)生成Anchor](#(可选)生成Anchor) ### 目标检测数据说明 @@ -236,15 +238,7 @@ json文件中包含以下key: print('\n查看一条目标物体标注信息:', coco_anno['annotations'][0]) ``` - COCO数据准备如下。 - `dataset/coco/`最初文件组织结构 - ``` - >>cd dataset/coco/ - >>tree - ├── download_coco.py - ``` - -#### 用户数据 +#### 用户数据准备 对于用户数据有3种处理方法: (1) 将用户数据转成VOC数据(根据需要仅包含物体检测所必须的标签即可) (2) 将用户数据转成COCO数据(根据需要仅包含物体检测所必须的标签即可) @@ -328,11 +322,11 @@ dataset/xxx/ ... ``` -##### 用户数据自定义reader -如果数据集有新的数据需要添加进PaddleDetection中,您可参考数据处理文档中的[添加新数据源](../advanced_tutorials/READER.md#2.3自定义数据集)文档部分,开发相应代码完成新的数据源支持,同时数据处理具体代码解析等可阅读[数据处理文档](../advanced_tutorials/READER.md) +##### 用户数据自定义reader +如果数据集有新的数据需要添加进PaddleDetection中,您可参考数据处理文档中的[添加新数据源](../advanced_tutorials/READER.md#2.3自定义数据集)文档部分,开发相应代码完成新的数据源支持,同时数据处理具体代码解析等可阅读[数据处理文档](../advanced_tutorials/READER.md)。 -#### 用户数据数据转换示例 +#### 用户数据使用示例 以[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据为例,说明如何准备自定义数据。 Kaggle上的 [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据包含877张图像,数据类别4类:crosswalk,speedlimit,stop,trafficlight。 @@ -357,6 +351,8 @@ Kaggle上的 [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-de │ | ... ``` +#### 数据格式转换 + 将数据划分为训练集和测试集 ``` # 生成 label_list.txt 文件 @@ -423,6 +419,67 @@ roadsign数据集统计: (1)用户数据,建议在训练前仔细检查数据,避免因数据标注格式错误或图像数据不完整造成训练过程中的crash (2)如果图像尺寸太大的话,在不限制读入数据尺寸情况下,占用内存较多,会造成内存/显存溢出,请合理设置batch_size,可从小到大尝试 +#### 自定义数据训练 + +数据准备完成后,需要修改PaddleDetection中关于Dataset的配置文件,在`configs/datasets`文件夹下。比如roadsign数据集的配置文件如下: +``` +metric: VOC # 目前支持COCO, VOC, WiderFace等评估标准 +num_classes: 4 # 数据集的类别数,不包含背景类,roadsign数据集为4类,其他数据需要修改为自己的数据类别 + +TrainDataset: + !VOCDataSet + dataset_dir: dataset/roadsign_voc # 训练集的图片所在文件相对于dataset_dir的路径 + anno_path: train.txt # 训练集的标注文件相对于dataset_dir的路径 + label_list: label_list.txt # 数据集所在路径,相对于PaddleDetection路径 + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] # 控制dataset输出的sample所包含的字段,注意此为训练集Reader独有的且必须配置的字段 + +EvalDataset: + !VOCDataSet + dataset_dir: dataset/roadsign_voc # 数据集所在路径,相对于PaddleDetection路径 + anno_path: valid.txt # 验证集的标注文件相对于dataset_dir的路径 + label_list: label_list.txt # 标签文件,相对于dataset_dir的路径 + data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] + +TestDataset: + !ImageFolder + anno_path: label_list.txt # 标注文件所在路径,仅用于读取数据集的类别信息,支持json和txt格式 + dataset_dir: dataset/roadsign_voc # 数据集所在路径,若添加了此行,则`anno_path`路径为相对于`dataset_dir`路径,若此行不设置或去掉此行,则为相对于PaddleDetection路径 +``` + +然后在对应模型配置文件中将自定义数据文件路径替换为新路径,以`configs/yolov3/yolov3_mobilenet_v1_roadsign.yml`为例 + +``` +_BASE_: [ + '../datasets/roadsign_voc.yml', # 指定为自定义数据集配置路径 + '../runtime.yml', + '_base_/optimizer_40e.yml', + '_base_/yolov3_mobilenet_v1.yml', + '_base_/yolov3_reader.yml', +] +pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams +weights: output/yolov3_mobilenet_v1_roadsign/model_final + +YOLOv3Loss: + ignore_thresh: 0.7 + label_smooth: true +``` + + +在PaddleDetection的yml配置文件中,使用`!`直接序列化模块实例(可以是函数,实例等),上述的配置文件均使用Dataset进行了序列化。 + +配置修改完成后,即可以启动训练评估,命令如下 + +``` +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval +``` + +更详细的命令参考[30分钟快速上手PaddleDetection](../GETTING_STARTED_cn.md) + +**注意:** +请运行前自行仔细检查数据集的配置路径,在训练或验证时如果TrainDataset和EvalDataset的路径配置有误,会提示自动下载数据集。若使用自定义数据集,在推理时如果TestDataset路径配置有误,会提示使用默认COCO数据集的类别信息。 + + ### (可选)生成Anchor 在yolo系列模型中,大多数情况下使用默认的anchor设置即可, 你也可以运行`tools/anchor_cluster.py`来得到适用于你的数据集Anchor,使用方法如下: diff --git a/docs/tutorials/PrepareDataSet_en.md b/docs/tutorials/data/PrepareDetDataSet_en.md similarity index 88% rename from docs/tutorials/PrepareDataSet_en.md rename to docs/tutorials/data/PrepareDetDataSet_en.md index 77206402b4686b5698d2df11fa6c8529051d05ea..aa8c5f3e183146c078c57e1328008a7fb007001e 100644 --- a/docs/tutorials/PrepareDataSet_en.md +++ b/docs/tutorials/data/PrepareDetDataSet_en.md @@ -250,7 +250,7 @@ There are three processing methods for user data: (1) Convert user data into VOC data (only include labels necessary for object detection as required) (2) Convert user data into coco data (only include labels necessary for object detection as required) (3) Customize a reader for user data (for complex data, you need to customize the reader) - + ##### Convert User Data to VOC Data After the user dataset is converted to VOC data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems): @@ -332,6 +332,33 @@ dataset/xxx/ ##### Reader of User Define Data If new data in the dataset needs to be added to paddedetection, you can refer to the [add new data source] (../advanced_tutorials/READER.md#2.3_Customizing_Dataset) document section in the data processing document to develop corresponding code to complete the new data source support. At the same time, you can read the [data processing document] (../advanced_tutorials/READER.md) for specific code analysis of data processing +The configuration file for the Dataset exists in the `configs/datasets` folder. For example, the COCO dataset configuration file is as follows: +``` +metric: COCO # Currently supports COCO, VOC, OID, Wider Face and other evaluation standards +num_classes: 80 # num_classes: The number of classes in the dataset, excluding background classes + +TrainDataset: + !COCODataSet + image_dir: train2017 # The path where the training set image resides relative to the dataset_dir + anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir + dataset_dir: dataset/coco #The path where the dataset is located relative to the PaddleDetection path + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # Controls the fields contained in the sample output of the dataset, note data_fields are unique to the trainreader and must be configured + +EvalDataset: + !COCODataSet + image_dir: val2017 # The path where the images of the validation set reside relative to the dataset_dir + anno_path: annotations/instances_val2017.json # The path to the annotation file of the validation set relative to the dataset_dir + dataset_dir: dataset/coco # The path where the dataset is located relative to the PaddleDetection path +TestDataset: + !ImageFolder + anno_path: dataset/coco/annotations/instances_val2017.json # The path of the annotation file, it is only used to read the category information of the dataset. JSON and TXT formats are supported + dataset_dir: dataset/coco # The path of the dataset, note if this row is added, `anno_path` will be 'dataset_dir/anno_path`, if not set or removed, `anno_path` is `anno_path` +``` +In the YML profile for Paddle Detection, use `!`directly serializes module instances (functions, instances, etc.). The above configuration files are serialized using Dataset. + +**Note:** +Please carefully check the configuration path of the dataset before running. During training or verification, if the path of TrainDataset or EvalDataset is wrong, it will download the dataset automatically. When using a user-defined dataset, if the TestDataset path is incorrectly configured during inference, the category of the default COCO dataset will be used. + #### Example of User Data Conversion Take [Kaggle Dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) competition data as an example to illustrate how to prepare custom data. The dataset of Kaggle [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) competition contains 877 images, four categories:crosswalk,speedlimit,stop,trafficlight. Available for download from kaggle, also available from [link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar). diff --git a/docs/tutorials/PrepareKeypointDataSet_cn.md b/docs/tutorials/data/PrepareKeypointDataSet.md similarity index 67% rename from docs/tutorials/PrepareKeypointDataSet_cn.md rename to docs/tutorials/data/PrepareKeypointDataSet.md index 791fd1e49e8367815967654de24aed8eb2485635..4efa90b8d2b2a70430c13feccffe0342ce94e5fd 100644 --- a/docs/tutorials/PrepareKeypointDataSet_cn.md +++ b/docs/tutorials/data/PrepareKeypointDataSet.md @@ -1,14 +1,16 @@ 简体中文 | [English](PrepareKeypointDataSet_en.md) -# 如何准备关键点数据集 +# 关键点数据准备 ## 目录 - [COCO数据集](#COCO数据集) - [MPII数据集](#MPII数据集) -- [训练其他数据集](#训练其他数据集) +- [用户数据准备](#用户数据准备) + - [数据格式转换](#数据格式转换) + - [自定义数据训练](#自定义数据训练) ## COCO数据集 ### COCO数据集的准备 -我们提供了一键脚本来自动完成COCO2017数据集的下载及准备工作,请参考[COCO数据集下载](https://github.com/PaddlePaddle/PaddleDetection/blob/f0a30f3ba6095ebfdc8fffb6d02766406afc438a/docs/tutorials/PrepareDataSet.md#COCO%E6%95%B0%E6%8D%AE)。 +我们提供了一键脚本来自动完成COCO2017数据集的下载及准备工作,请参考[COCO数据集下载](https://github.com/PaddlePaddle/PaddleDetection/blob/f0a30f3ba6095ebfdc8fffb6d02766406afc438a/docs/tutorials/PrepareDetDataSet.md#COCO%E6%95%B0%E6%8D%AE)。 ### COCO数据集(KeyPoint)说明 在COCO中,关键点序号与部位的对应关系为: @@ -110,7 +112,10 @@ MPII keypoint indexes: - `scale`:表示人物的比例,对应200px。 -## 训练其他数据集 +## 用户数据准备 + +### 数据格式转换 + 这里我们以`AIChallenger`数据集为例,展示如何将其他数据集对齐到COCO格式并加入关键点模型训练中。 @@ -139,3 +144,33 @@ AI Challenger Description: 5. 整理图像路径`file_name`,使其能够被正确访问到。 我们提供了整合`COCO`训练集和`AI Challenger`数据集的[标注文件](https://bj.bcebos.com/v1/paddledet/data/keypoint/aic_coco_train_cocoformat.json),供您参考调整后的效果。 + +### 自定义数据训练 + +以[tinypose_256x192](../../../configs/keypoint/tiny_pose/README.md)为例来说明对于自定义数据如何修改: + +#### 1、配置文件[tinypose_256x192.yml](../../../configs/keypoint/tiny_pose/tinypose_256x192.yml) + +基本的修改内容及其含义如下: + +``` +num_joints: &num_joints 17 #自定义数据的关键点数量 +train_height: &train_height 256 #训练图片尺寸-高度h +train_width: &train_width 192 #训练图片尺寸-宽度w +hmsize: &hmsize [48, 64] #对应训练尺寸的输出尺寸,这里是输入[w,h]的1/4 +flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #关键点定义中左右对称的关键点,用于flip增强。若没有对称结构在 TrainReader 的 RandomFlipHalfBodyTransform 一栏中 flip_pairs 后面加一行 "flip: False"(注意缩紧对齐) +num_joints_half_body: 8 #半身关键点数量,用于半身增强 +prob_half_body: 0.3 #半身增强实现概率,若不需要则修改为0 +upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #上半身对应关键点id,用于半身增强中获取上半身对应的关键点。 +``` + +上述是自定义数据时所需要的修改部分,完整的配置及含义说明可参考文件:[关键点配置文件说明](../KeyPointConfigGuide_cn.md)。 + +#### 2、其他代码修改(影响测试、可视化) +- keypoint_utils.py中的sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07,.87, .87, .89, .89]) / 10.0,表示每个关键点的确定范围方差,根据实际关键点可信区域设置,区域精确的一般0.25-0.5,例如眼睛。区域范围大的一般0.5-1.0,例如肩膀。若不确定建议0.75。 +- visualizer.py中的draw_pose函数中的EDGES,表示可视化时关键点之间的连接线关系。 +- pycocotools工具中的sigmas,同第一个keypoint_utils.py中的设置。用于coco指标评估时计算。 + +#### 3、数据准备注意 +- 训练数据请按coco数据格式处理。需要包括关键点[Nx3]、检测框[N]标注。 +- 请注意area>0,area=0时数据在训练时会被过滤掉。此外,由于COCO的评估机制,area较小的数据在评估时也会被过滤掉,我们建议在自定义数据时取`area = bbox_w * bbox_h`。 diff --git a/docs/tutorials/PrepareKeypointDataSet_en.md b/docs/tutorials/data/PrepareKeypointDataSet_en.md similarity index 98% rename from docs/tutorials/PrepareKeypointDataSet_en.md rename to docs/tutorials/data/PrepareKeypointDataSet_en.md index 4656922ab79d93720538bd13ef4bc3f188819862..80272910cee355e28d6aa219e30bc98de599bbd0 100644 --- a/docs/tutorials/PrepareKeypointDataSet_en.md +++ b/docs/tutorials/data/PrepareKeypointDataSet_en.md @@ -1,4 +1,4 @@ -[简体中文](PrepareKeypointDataSet_cn.md) | English +[简体中文](PrepareKeypointDataSet.md) | English # How to prepare dataset? ## Table of Contents @@ -8,7 +8,7 @@ ## COCO ### Preperation for COCO dataset -We provide a one-click script to automatically complete the download and preparation of the COCO2017 dataset. Please refer to [COCO Download](https://github.com/PaddlePaddle/PaddleDetection/blob/f0a30f3ba6095ebfdc8fffb6d02766406afc438a/docs/tutorials/PrepareDataSet.md#COCO%E6%95%B0%E6%8D%AE). +We provide a one-click script to automatically complete the download and preparation of the COCO2017 dataset. Please refer to [COCO Download](https://github.com/PaddlePaddle/PaddleDetection/blob/f0a30f3ba6095ebfdc8fffb6d02766406afc438a/docs/tutorials/PrepareDetDataSet_en.md#COCO%E6%95%B0%E6%8D%AE). ### Description for COCO dataset(Keypoint): In COCO, the indexes and corresponding keypoint name are: diff --git a/docs/tutorials/PrepareMOTDataSet_cn.md b/docs/tutorials/data/PrepareMOTDataSet.md similarity index 40% rename from docs/tutorials/PrepareMOTDataSet_cn.md rename to docs/tutorials/data/PrepareMOTDataSet.md index 7c97e073f3edb81a8cf122476f022a1a05bfd647..633efa95e510f242f6f9ff3cfadd1def1d2d0356 100644 --- a/docs/tutorials/PrepareMOTDataSet_cn.md +++ b/docs/tutorials/data/PrepareMOTDataSet.md @@ -1,30 +1,99 @@ -简体中文 | [English](GETTING_STARTED.md) - -# 目录 -## 多目标跟踪数据集准备 -- [MOT数据集](#MOT数据集) -- [数据集目录](#数据集目录) -- [数据格式](#数据格式) -- [用户数据准备](#用户数据准备) +简体中文 | [English](PrepareMOTDataSet_en.md) + +# 多目标跟踪数据集准备 +## 目录 +- [简介和模型选型](#简介和模型选型) +- [MOT数据集准备](#MOT数据集准备) + - [SDE数据集](#SDE数据集) + - [JDE数据集](#JDE数据集) +- [用户自定义数据集准备](#用户自定义数据集准备) + - [SDE数据集](#SDE数据集) + - [JDE数据集](#JDE数据集) - [引用](#引用) -### MOT数据集 -PaddleDetection复现[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) 和[FairMOT](https://github.com/ifzhang/FairMOT),是使用的和他们相同的MIX数据集,包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。使用前6者作为联合数据集参与训练,MOT16作为评测数据集。如果您想使用这些数据集,请**遵循他们的License**。 +## 简介和模型选型 +PaddleDetection中提供了SDE和JDE两个系列的多种算法实现: +- SDE(Separate Detection and Embedding) + - [ByteTrack](../../../configs/mot/bytetrack) + - [DeepSORT](../../../configs/mot/deepsort) + +- JDE(Joint Detection and Embedding) + - [JDE](../../../configs/mot/jde) + - [FairMOT](../../../configs/mot/fairmot) + - [MCFairMOT](../../../configs/mot/mcfairmot) **注意:** -- 多目标跟踪数据集一般是用于单类别的多目标跟踪,DeepSORT、JDE和FairMOT均为单类别跟踪模型,MIX数据集以及其子数据集也都是单类别的行人跟踪数据集,可认为相比于行人检测数据集多了id号的标注。 -- 为了训练更多场景的垂类模型例如车辆等,垂类数据集也需要处理成与MIX数据集相同的格式,PaddleDetection也提供了[车辆跟踪](../../configs/mot/vehicle/README_cn.md)、[人头跟踪](../../configs/mot/headtracking21/README_cn.md)以及更通用的[行人跟踪](../../configs/mot/pedestrian/README_cn.md)的垂类数据集和模型。用户自定义数据集也可参照本文档准备。 -- 多类别跟踪模型是[MCFairMOT](../../configs/mot/mcfairmot/README_cn.md),多类别数据集是VisDrone数据集的整合版,可参照[MCFairMOT](../../configs/mot/mcfairmot/README_cn.md)的文档说明。 -- 跨镜头跟踪模型,是选用的[AIC21 MTMCT](https://www.aicitychallenge.org) (CityFlow)车辆跨镜头跟踪数据集,数据集和模型可参照[跨境头跟踪](../../configs/mot/mtmct/README_cn.md)的文档说明。 + - 以上算法原论文均为单类别的多目标跟踪,PaddleDetection团队同时也支持了[ByteTrack](./bytetrack)和FairMOT([MCFairMOT](./mcfairmot))的多类别的多目标跟踪; + - [DeepSORT](../../../configs/mot/deepsort)和[JDE](../../../configs/mot/jde)均只支持单类别的多目标跟踪; + - [DeepSORT](../../../configs/mot/deepsort)需要额外添加ReID权重一起执行,[ByteTrack](../../../configs/mot/bytetrack)可加可不加ReID权重,默认不加; + + +关于模型选型,PaddleDetection团队提供的总结建议如下: + +| MOT方式 | 经典算法 | 算法流程 | 数据集要求 | 其他特点 | +| :--------------| :--------------| :------- | :----: | :----: | +| SDE系列 | DeepSORT,ByteTrack | 分离式,两个独立模型权重先检测后ReID,也可不加ReID | 检测和ReID数据相对独立,不加ReID时即纯检测数据集 |检测和ReID可分别调优,鲁棒性较高,AI竞赛常用| +| JDE系列 | FairMOT | 联合式,一个模型权重端到端同时检测和ReID | 必须同时具有检测和ReID标注 | 检测和ReID联合训练,不易调优,泛化性不强| + +**注意:** + - 由于数据标注的成本较大,建议选型前优先考虑**数据集要求**,如果数据集只有检测框标注而没有ReID标注,是无法使用JDE系列算法训练的,更推荐使用SDE系列; + - SDE系列算法在检测器精度足够高时,也可以不使用ReID权重进行物体间的长时序关联,可以参照[ByteTrack](bytetrack); + - 耗时速度和模型权重参数量计算量有一定关系,耗时从理论上看`不使用ReID的SDE系列 < JDE系列 < 使用ReID的SDE系列`; + + +## MOT数据集准备 +PaddleDetection团队提供了众多公开数据集或整理后数据集的下载链接,参考[数据集下载汇总](../../../configs/mot/DataDownload.md),用户可以自行下载使用。 -### 数据集目录 -首先按照以下命令下载image_lists.zip并解压放在`PaddleDetection/dataset/mot`目录下: +根据模型选型总结,MOT数据集可以分为两类:一类纯检测框标注的数据集,仅SDE系列可以使用;另一类是同时有检测和ReID标注的数据集,SDE系列和JDE系列都可以使用。 + +### SDE数据集 +SDE数据集是纯检测标注的数据集,用户自定义数据集可以参照[DET数据准备文档](./PrepareDetDataSet.md)准备。 + +以MOT17数据集为例,下载并解压放在`PaddleDetection/dataset/mot`目录下: +``` +wget https://dataset.bj.bcebos.com/mot/MOT17.zip + +``` +并修改数据集部分的配置文件如下: +``` +num_classes: 1 + +TrainDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/train_half.json + image_dir: images/train + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json + image_dir: images/train + +TestDataset: + !ImageFolder + dataset_dir: dataset/mot/MOT17 + anno_path: annotations/val_half.json +``` + +数据集目录为: +``` +dataset/mot + |——————MOT17 + |——————annotations + |——————images +``` + +### JDE数据集 +JDE数据集是同时有检测和ReID标注的数据集,首先按照以下命令`image_lists.zip`并解压放在`PaddleDetection/dataset/mot`目录下: ``` wget https://dataset.bj.bcebos.com/mot/image_lists.zip ``` -然后按照以下命令可以快速下载MIX数据集的各个子数据集,并解压放在`PaddleDetection/dataset/mot`目录下: +然后按照以下命令可以快速下载各个公开数据集,也解压放在`PaddleDetection/dataset/mot`目录下: ``` +# MIX数据,同JDE,FairMOT论文使用的数据集 wget https://dataset.bj.bcebos.com/mot/MOT17.zip wget https://dataset.bj.bcebos.com/mot/Caltech.zip wget https://dataset.bj.bcebos.com/mot/CUHKSYSU.zip @@ -33,24 +102,17 @@ wget https://dataset.bj.bcebos.com/mot/Cityscapes.zip wget https://dataset.bj.bcebos.com/mot/ETHZ.zip wget https://dataset.bj.bcebos.com/mot/MOT16.zip ``` - -最终目录为: +数据集目录为: ``` dataset/mot |——————image_lists - |——————caltech.10k.val |——————caltech.all - |——————caltech.train - |——————caltech.val |——————citypersons.train - |——————citypersons.val |——————cuhksysu.train - |——————cuhksysu.val |——————eth.train |——————mot16.train |——————mot17.train |——————prw.train - |——————prw.val |——————Caltech |——————Cityscapes |——————CUHKSYSU @@ -60,7 +122,7 @@ dataset/mot |——————PRW ``` -### 数据格式 +#### JDE数据集的格式 这几个相关数据集都遵循以下结构: ``` MOT17 @@ -74,16 +136,26 @@ MOT17 ``` [class] [identity] [x_center] [y_center] [width] [height] ``` -**注意**: -- `class`为类别id,支持单类别和多类别,从`0`开始计,单类别即为`0`。 -- `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 -- `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意他们的值是由图片的宽度/高度标准化的,因此它们是从0到1的浮点数。 + - `class`为类别id,支持单类别和多类别,从`0`开始计,单类别即为`0`。 + - `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 + - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意他们的值是由图片的宽度/高度标准化的,因此它们是从0到1的浮点数。 + + +**注意:** + - MIX数据集是[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT)和[FairMOT](https://github.com/ifzhang/FairMOT)原论文使用的数据集,包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。使用前6者作为联合数据集参与训练,MOT16作为评测数据集。如果您想使用这些数据集,请**遵循他们的License**。 + - MIX数据集以及其子数据集都是单类别的行人跟踪数据集,可认为相比于行人检测数据集多了id号的标注。 + - 更多场景的垂类模型例如车辆行人人头跟踪等,垂类数据集也需要处理成与MIX数据集相同的格式,参照[数据集下载汇总](DataDownload.md)、[车辆跟踪](vehicle/README_cn.md)、[人头跟踪](headtracking21/README_cn.md)以及更通用的[行人跟踪](pedestrian/README_cn.md)。 + - 用户自定义数据集可参照[MOT数据集准备教程](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。 +## 用户自定义数据集准备 -### 用户数据准备 +### SDE数据集 +如果用户选择SDE系列方案,是准备准检测标注的自定义数据集,则可以参照[DET数据准备文档](./PrepareDetDataSet.md)准备。 -为了规范地进行训练和评测,用户数据需要转成和MOT-16数据集相同的目录和格式: +### JDE数据集 +如果用户选择JDE系列方案,则需要同时具有检测和ReID标注,且符合MOT-17数据集的格式。 +为了规范地进行训练和评测,用户数据需要转成和MOT-17数据集相同的目录和格式: ``` custom_data |——————images @@ -109,45 +181,49 @@ custom_data └—————— ... ``` -#### images文件夹 -- `gt.txt`是原始标注文件,而训练所用标注是`labels_with_ids`文件夹。 -- `img1`文件夹里是按照一定帧率抽好的图片。 -- `seqinfo.ini`文件是视频信息描述文件,需要如下格式的信息: -``` -[Sequence] -name=MOT16-02 -imDir=img1 -frameRate=30 -seqLength=600 -imWidth=1920 -imHeight=1080 -imExt=.jpg -``` +##### images文件夹 + - `gt.txt`是原始标注文件,而训练所用标注是`labels_with_ids`文件夹。 + - `gt.txt`里是当前视频中所有图片的原始标注文件,每行都描述一个边界框,格式如下: + ``` + [frame_id],[identity],[bb_left],[bb_top],[width],[height],[score],[label],[vis_ratio] + ``` + - `img1`文件夹里是按照一定帧率抽好的图片。 + - `seqinfo.ini`文件是视频信息描述文件,需要如下格式的信息: + ``` + [Sequence] + name=MOT17-02 + imDir=img1 + frameRate=30 + seqLength=600 + imWidth=1920 + imHeight=1080 + imExt=.jpg + ``` -`gt.txt`里是当前视频中所有图片的原始标注文件,每行都描述一个边界框,格式如下: +其中`gt.txt`里是当前视频中所有图片的原始标注文件,每行都描述一个边界框,格式如下: ``` [frame_id],[identity],[bb_left],[bb_top],[width],[height],[score],[label],[vis_ratio] ``` **注意**: -- `frame_id`为当前图片帧序号 -- `identity`是从`1`到`num_identities`的整数(`num_identities`是**当前视频或图片序列**的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 -- `bb_left`是目标框的左边界的x坐标 -- `bb_top`是目标框的上边界的y坐标 -- `width,height`是真实的像素宽高 -- `score`是当前目标是否进入考虑范围内的标志(值为0表示此目标在计算中被忽略,而值为1则用于将其标记为活动实例),默认为`1` -- `label`是当前目标的种类标签,由于目前仅支持单类别跟踪,默认为`1`,MOT-16数据集中会有其他类别标签,但都是当作ignore类别计算 -- `vis_ratio`是当前目标被其他目标包含或覆挡后的可见率,是从0到1的浮点数,默认为`1` + - `frame_id`为当前图片帧序号 + - `identity`是从`1`到`num_identities`的整数(`num_identities`是**当前视频或图片序列**的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 + - `bb_left`是目标框的左边界的x坐标 + - `bb_top`是目标框的上边界的y坐标 + - `width,height`是真实的像素宽高 + - `score`是当前目标是否进入考虑范围内的标志(值为0表示此目标在计算中被忽略,而值为1则用于将其标记为活动实例),默认为`1` + - `label`是当前目标的种类标签,由于目前仅支持单类别跟踪,默认为`1`,MOT-16数据集中会有其他类别标签,但都是当作ignore类别计算 + - `vis_ratio`是当前目标被其他目标包含或覆挡后的可见率,是从0到1的浮点数,默认为`1` -#### labels_with_ids文件夹 +##### labels_with_ids文件夹 所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径,可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中,每行都描述一个边界框,格式如下: ``` [class] [identity] [x_center] [y_center] [width] [height] ``` **注意**: -- `class`为类别id,支持单类别和多类别,从`0`开始计,单类别即为`0`。 -- `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 -- `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意是由图片的宽度/高度标准化的,因此它们是从0到1的浮点数。 + - `class`为类别id,支持单类别和多类别,从`0`开始计,单类别即为`0`。 + - `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 + - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意是由图片的宽度/高度标准化的,因此它们是从0到1的浮点数。 可采用如下脚本生成相应的`labels_with_ids`: ``` @@ -155,6 +231,7 @@ cd dataset/mot python gen_labels_MOT.py ``` + ### 引用 Caltech: ``` diff --git a/docs/tutorials/PrepareMOTDataSet.md b/docs/tutorials/data/PrepareMOTDataSet_en.md similarity index 99% rename from docs/tutorials/PrepareMOTDataSet.md rename to docs/tutorials/data/PrepareMOTDataSet_en.md index 44f5a4713a3b461e84ab0dfa2a01af1045b7dc4c..f1bd170ccad18457c7faf6c0f0a57f6211af7784 100644 --- a/docs/tutorials/PrepareMOTDataSet.md +++ b/docs/tutorials/data/PrepareMOTDataSet_en.md @@ -1,4 +1,4 @@ -English | [简体中文](PrepareMOTDataSet_cn.md) +English | [简体中文](PrepareMOTDataSet.md) # Contents ## Multi-Object Tracking Dataset Preparation diff --git a/docs/tutorials/data/README.md b/docs/tutorials/data/README.md new file mode 100644 index 0000000000000000000000000000000000000000..947b650e18cbc9cf9bb57c8b6600588ed0a6501f --- /dev/null +++ b/docs/tutorials/data/README.md @@ -0,0 +1,27 @@ +# 数据准备 + +数据对于深度学习开发起到了至关重要的作用,数据采集和标注的质量是提升业务模型效果的重要因素。本文档主要介绍PaddleDetection中如何进行数据准备,包括采集高质量数据方法,覆盖多场景类型,提升模型泛化能力;以及各类任务数据标注工具和方法,并在PaddleDetection下使用 + +## 数据采集 +在深度学习任务的实际落地中,数据采集往往决定了最终模型的效果,对于数据采集的几点建议如下: + +### 确定方向 +任务类型、数据的类别和目标场景这些因素决定了要收集什么数据,首先需要根据这些因素来确定整体数据收集的工作方向。 + +### 开源数据集 +在实际场景中数据采集成本其实十分高昂,完全靠自己收集在时间和金钱上都有很高的成本,开源数据集是帮助增加训练数据量的重要手段,所以很多时候会考虑加入一些相似任务的开源数据。在使用中请遵守各个开源数据集的license规定的使用条件。 + +### 增加场景数据 +开源数据一般不会覆盖实际使用的的目标场景,用户需要评估开源数据集中已包含的场景和目标场景间的差异,有针对性地补充目标场景数据,尽量让训练和部署数据的场景一致。 + +### 类别均衡 +在采集阶段,也需要尽量保持类别均衡,帮助模型正确学习到目标特征。 + + +## 数据标注及格式说明 + +| 任务类型 | 数据标注 | 数据格式说明 | +|:--------:| :--------:|:--------:| +| 目标检测 | [文档链接](DetAnnoTools.md) | [文档链接](PrepareDetDataSet.md) | +| 关键点检测 | [文档链接](KeyPointAnnoTools.md) | [文档链接](PrepareKeypointDataSet.md) | +| 多目标跟踪 | [文档链接](MOTAnnoTools.md) | [文档链接](PrepareMOTDataSet.md) | diff --git a/docs/tutorials/logging_en.md b/docs/tutorials/logging_en.md new file mode 100644 index 0000000000000000000000000000000000000000..b45ceba69d39098f70d0b8825d372529ce40cd0b --- /dev/null +++ b/docs/tutorials/logging_en.md @@ -0,0 +1,46 @@ +# Logging + +This document talks about how to track metrics and visualize model performance during training. The library currently supports [VisualDL](https://www.paddlepaddle.org.cn/documentation/docs/en/guides/03_VisualDL/visualdl_usage_en.html) and [Weights & Biases](https://docs.wandb.ai). + +## VisualDL +Logging to VisualDL is supported only in python >= 3.5. To install VisualDL + +``` +pip install visualdl +``` + +PaddleDetection uses a callback to log the training metrics at the end of every step and metrics from the validation step at the end of every epoch. To use VisualDL for visualization, add the `--use_vdl` flag to the training command and `--vdl_log_dir ` to set the directory which stores the records. + +For example + +``` +python tools/train -c config.yml --use_vdl --vdl_log_dir ./logs +``` + +Another possible way to do this is to add the aforementioned flags to the `config.yml` file. + +## Weights & Biases +W&B is a MLOps tool that can be used for experiment tracking, dataset/model versioning, visualizing results and collaborating with colleagues. A W&B logger is integrated directly into PaddleDetection and to use it, first you need to install the wandb sdk and login to your wandb account. + +``` +pip install wandb +wandb login +``` + +To use wandb to log metrics while training add the `--use_wandb` flag to the training command and any other arguments for the W&B logger can be provided like this - + +``` +python tools/train -c config.yml --use_wandb -o wandb-project=MyDetector wandb-entity=MyTeam wandb-save_dir=./logs +``` + +The arguments to the W&B logger must be proceeded by `-o` and each invidiual argument must contain the prefix "wandb-". + +If this is too tedious, an alternative way is to add the arguments to the `config.yml` file under the `wandb` header. For example + +``` +use_wandb: True +wandb: + project: MyProject + entity: MyTeam + save_dir: ./logs +``` diff --git a/industrial_tutorial/README.md b/industrial_tutorial/README.md new file mode 100644 index 0000000000000000000000000000000000000000..94441e6c740d47b3869d1fa3eebd99729436baea --- /dev/null +++ b/industrial_tutorial/README.md @@ -0,0 +1,37 @@ +# 产业实践范例 + +PaddleDetection场景应用覆盖通用,制造,城市,交通行业的主要检测垂类应用,在PP-YOLOE,PP-PicoDet,PP-Human,PP-Vehicle的能力基础之上,以notebook的形式展示利用场景数据微调、模型优化方法、数据增广等内容,为开发者快速落地目标检测应用提供示范与启发。 + + +
    + +
    + + +欢迎扫码加入用户交流答疑群 +
    + +
    + +## 范例列表 + +- [基于PP-TinyPose增强版的智能健身动作识别](https://aistudio.baidu.com/aistudio/projectdetail/4385813) + +- [基于PP-Human的打架识别](https://aistudio.baidu.com/aistudio/projectdetail/4086987?contributionType=1) + +- [基于PP-PicoDet的通信塔识别及Android端部署](https://aistudio.baidu.com/aistudio/projectdetail/3561097) + +- [基于Faster-RCNN的瓷砖表面瑕疵检测](https://aistudio.baidu.com/aistudio/projectdetail/2571419) + +- [基于PaddleDetection的PCB瑕疵检测](https://aistudio.baidu.com/aistudio/projectdetail/2367089) + +- [基于FairMOT实现人流量统计](https://aistudio.baidu.com/aistudio/projectdetail/2421822) + +- [基于YOLOv3实现跌倒检测](https://aistudio.baidu.com/aistudio/projectdetail/2500639) + +- [基于PP-PicoDetv2 的路面垃圾检测](https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0) + +- [基于人体关键点检测的合规检测](https://aistudio.baidu.com/aistudio/projectdetail/4061642?contributionType=1) + + + *范例将持续更新中 diff --git a/ppdet/core/workspace.py b/ppdet/core/workspace.py index e633746ed804e29ca9cc53c9b6cf39c1a8a168a6..e56feb31be4d02e81abcdfb6a33fbfc111abb1cc 100644 --- a/ppdet/core/workspace.py +++ b/ppdet/core/workspace.py @@ -210,9 +210,17 @@ def create(cls_or_name, **kwargs): assert type(cls_or_name) in [type, str ], "should be a class or name of a class" name = type(cls_or_name) == str and cls_or_name or cls_or_name.__name__ - assert name in global_config and \ - isinstance(global_config[name], SchemaDict), \ - "the module {} is not registered".format(name) + if name in global_config: + if isinstance(global_config[name], SchemaDict): + pass + elif hasattr(global_config[name], "__dict__"): + # support instance return directly + return global_config[name] + else: + raise ValueError("The module {} is not registered".format(name)) + else: + raise ValueError("The module {} is not registered".format(name)) + config = global_config[name] cls = getattr(config.pymodule, name) cls_kwargs = {} diff --git a/ppdet/data/reader.py b/ppdet/data/reader.py index c9ea09af2f7250d67cb005345a48d59107ab7eab..f04fd6b3380c915abaf1e8104d8901268d12775f 100644 --- a/ppdet/data/reader.py +++ b/ppdet/data/reader.py @@ -23,7 +23,7 @@ else: import numpy as np from paddle.io import DataLoader, DistributedBatchSampler -from paddle.fluid.dataloader.collate import default_collate_fn +from .utils import default_collate_fn from ppdet.core.workspace import register from . import transform diff --git a/ppdet/data/shm_utils.py b/ppdet/data/shm_utils.py index 38d8ba66cd71baa169c27a44e59a1d4d908b8d7c..a929a809cec9bc1e6b1dd335faa0ba4f2e44ff87 100644 --- a/ppdet/data/shm_utils.py +++ b/ppdet/data/shm_utils.py @@ -34,7 +34,10 @@ SHM_DEFAULT_MOUNT = '/dev/shm' def _parse_size_in_M(size_str): - num, unit = size_str[:-1], size_str[-1] + if size_str[-1] == 'B': + num, unit = size_str[:-2], size_str[-2] + else: + num, unit = size_str[:-1], size_str[-1] assert unit in SIZE_UNIT, \ "unknown shm size unit {}".format(unit) return float(num) * \ diff --git a/ppdet/data/source/__init__.py b/ppdet/data/source/__init__.py index 3854d3d2530b032b3c84d1ab5f2e01ea963c5c70..e3abb16b606de5501886f1a615fd25a7cd114e61 100644 --- a/ppdet/data/source/__init__.py +++ b/ppdet/data/source/__init__.py @@ -27,3 +27,4 @@ from .category import * from .keypoint_coco import * from .mot import * from .sniper_coco import SniperCOCODataSet +from .dataset import ImageFolder diff --git a/ppdet/data/source/category.py b/ppdet/data/source/category.py index 9390e54c4ce5dacce4674363689b629261c787c6..de447161710d32ef623bab5692c40d39efb7e9c7 100644 --- a/ppdet/data/source/category.py +++ b/ppdet/data/source/category.py @@ -39,24 +39,49 @@ def get_categories(metric_type, anno_file=None, arch=None): if arch == 'keypoint_arch': return (None, {'id': 'keypoint'}) + if anno_file == None or (not os.path.isfile(anno_file)): + logger.warning( + "anno_file '{}' is None or not set or not exist, " + "please recheck TrainDataset/EvalDataset/TestDataset.anno_path, " + "otherwise the default categories will be used by metric_type.". + format(anno_file)) + if metric_type.lower() == 'coco' or metric_type.lower( ) == 'rbox' or metric_type.lower() == 'snipercoco': if anno_file and os.path.isfile(anno_file): - # lazy import pycocotools here - from pycocotools.coco import COCO - - coco = COCO(anno_file) - cats = coco.loadCats(coco.getCatIds()) - - clsid2catid = {i: cat['id'] for i, cat in enumerate(cats)} - catid2name = {cat['id']: cat['name'] for cat in cats} + if anno_file.endswith('json'): + # lazy import pycocotools here + from pycocotools.coco import COCO + coco = COCO(anno_file) + cats = coco.loadCats(coco.getCatIds()) + + clsid2catid = {i: cat['id'] for i, cat in enumerate(cats)} + catid2name = {cat['id']: cat['name'] for cat in cats} + + elif anno_file.endswith('txt'): + cats = [] + with open(anno_file) as f: + for line in f.readlines(): + cats.append(line.strip()) + if cats[0] == 'background': cats = cats[1:] + + clsid2catid = {i: i for i in range(len(cats))} + catid2name = {i: name for i, name in enumerate(cats)} + + else: + raise ValueError("anno_file {} should be json or txt.".format( + anno_file)) return clsid2catid, catid2name # anno file not exist, load default categories of COCO17 else: if metric_type.lower() == 'rbox': + logger.warning( + "metric_type: {}, load default categories of DOTA.".format( + metric_type)) return _dota_category() - + logger.warning("metric_type: {}, load default categories of COCO.". + format(metric_type)) return _coco17_category() elif metric_type.lower() == 'voc': @@ -77,6 +102,8 @@ def get_categories(metric_type, anno_file=None, arch=None): # anno file not exist, load default categories of # VOC all 20 categories else: + logger.warning("metric_type: {}, load default categories of VOC.". + format(metric_type)) return _vocall_category() elif metric_type.lower() == 'oid': @@ -104,6 +131,9 @@ def get_categories(metric_type, anno_file=None, arch=None): return clsid2catid, catid2name # anno file not exist, load default category 'pedestrian'. else: + logger.warning( + "metric_type: {}, load default categories of pedestrian MOT.". + format(metric_type)) return _mot_category(category='pedestrian') elif metric_type.lower() in ['kitti', 'bdd100kmot']: @@ -122,6 +152,9 @@ def get_categories(metric_type, anno_file=None, arch=None): return clsid2catid, catid2name # anno file not exist, load default categories of visdrone all 10 categories else: + logger.warning( + "metric_type: {}, load default categories of VisDrone.".format( + metric_type)) return _visdrone_category() else: diff --git a/ppdet/data/source/coco.py b/ppdet/data/source/coco.py index 5c401de14398f72084d8acbd56c1a52325edbe54..1f7c9b7bb528ac26ee405947bdafd55909fd1d84 100644 --- a/ppdet/data/source/coco.py +++ b/ppdet/data/source/coco.py @@ -39,6 +39,7 @@ class COCODataSet(DetDataset): empty_ratio (float): the ratio of empty record number to total record's, if empty_ratio is out of [0. ,1.), do not sample the records and use all the empty entries. 1. as default + repeat (int): repeat times for dataset, use in benchmark. """ def __init__(self, @@ -49,9 +50,15 @@ class COCODataSet(DetDataset): sample_num=-1, load_crowd=False, allow_empty=False, - empty_ratio=1.): - super(COCODataSet, self).__init__(dataset_dir, image_dir, anno_path, - data_fields, sample_num) + empty_ratio=1., + repeat=1): + super(COCODataSet, self).__init__( + dataset_dir, + image_dir, + anno_path, + data_fields, + sample_num, + repeat=repeat) self.load_image_only = False self.load_semantic = False self.load_crowd = load_crowd diff --git a/ppdet/data/source/dataset.py b/ppdet/data/source/dataset.py index 1bef548e696764964608ade67b373a1c19c84a96..d735cfc4a2ac2b709e74cb797a61832d70bd9a51 100644 --- a/ppdet/data/source/dataset.py +++ b/ppdet/data/source/dataset.py @@ -5,7 +5,7 @@ # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -23,6 +23,7 @@ from paddle.io import Dataset from ppdet.core.workspace import register, serializable from ppdet.utils.download import get_dataset_path import copy +from ppdet.data import source @serializable @@ -37,6 +38,7 @@ class DetDataset(Dataset): data_fields (list): key name of data dictionary, at least have 'image'. sample_num (int): number of samples to load, -1 means all. use_default_label (bool): whether to load default label list. + repeat (int): repeat times for dataset, use in benchmark. """ def __init__(self, @@ -46,6 +48,7 @@ class DetDataset(Dataset): data_fields=['image'], sample_num=-1, use_default_label=None, + repeat=1, **kwargs): super(DetDataset, self).__init__() self.dataset_dir = dataset_dir if dataset_dir is not None else '' @@ -54,28 +57,32 @@ class DetDataset(Dataset): self.data_fields = data_fields self.sample_num = sample_num self.use_default_label = use_default_label + self.repeat = repeat self._epoch = 0 self._curr_iter = 0 def __len__(self, ): - return len(self.roidbs) + return len(self.roidbs) * self.repeat + + def __call__(self, *args, **kwargs): + return self def __getitem__(self, idx): + n = len(self.roidbs) + if self.repeat > 1: + idx %= n # data batch roidb = copy.deepcopy(self.roidbs[idx]) if self.mixup_epoch == 0 or self._epoch < self.mixup_epoch: - n = len(self.roidbs) idx = np.random.randint(n) roidb = [roidb, copy.deepcopy(self.roidbs[idx])] elif self.cutmix_epoch == 0 or self._epoch < self.cutmix_epoch: - n = len(self.roidbs) idx = np.random.randint(n) roidb = [roidb, copy.deepcopy(self.roidbs[idx])] elif self.mosaic_epoch == 0 or self._epoch < self.mosaic_epoch: - n = len(self.roidbs) roidb = [roidb, ] + [ copy.deepcopy(self.roidbs[np.random.randint(n)]) - for _ in range(3) + for _ in range(4) ] if isinstance(roidb, Sequence): for r in roidb: @@ -149,12 +156,15 @@ class ImageFolder(DetDataset): self.sample_num = sample_num def check_or_download_dataset(self): + return + + def get_anno(self): + if self.anno_path is None: + return if self.dataset_dir: - # NOTE: ImageFolder is only used for prediction, in - # infer mode, image_dir is set by set_images - # so we only check anno_path here - self.dataset_dir = get_dataset_path(self.dataset_dir, - self.anno_path, None) + return os.path.join(self.dataset_dir, self.anno_path) + else: + return self.anno_path def parse_dataset(self, ): if not self.roidbs: @@ -195,3 +205,44 @@ class ImageFolder(DetDataset): def set_images(self, images): self.image_dir = images self.roidbs = self._load_images() + + def get_label_list(self): + # Only VOC dataset needs label list in ImageFold + return self.anno_path + + +@register +class CommonDataset(object): + def __init__(self, **dataset_args): + super(CommonDataset, self).__init__() + dataset_args = copy.deepcopy(dataset_args) + type = dataset_args.pop("name") + self.dataset = getattr(source, type)(**dataset_args) + + def __call__(self): + return self.dataset + + +@register +class TrainDataset(CommonDataset): + pass + + +@register +class EvalMOTDataset(CommonDataset): + pass + + +@register +class TestMOTDataset(CommonDataset): + pass + + +@register +class EvalDataset(CommonDataset): + pass + + +@register +class TestDataset(CommonDataset): + pass diff --git a/ppdet/data/source/mot.py b/ppdet/data/source/mot.py index 1baadf570d13afe6f9c648fdff755ac314d1aa35..90a8a1fe88d70e1627623c1cc721f2c6eb9781e4 100644 --- a/ppdet/data/source/mot.py +++ b/ppdet/data/source/mot.py @@ -39,6 +39,7 @@ class MOTDataSet(DetDataset): image_lists (str|list): mot data image lists, muiti-source mot dataset. data_fields (list): key name of data dictionary, at least have 'image'. sample_num (int): number of samples to load, -1 means all. + repeat (int): repeat times for dataset, use in benchmark. Notes: MOT datasets root directory following this: @@ -77,11 +78,13 @@ class MOTDataSet(DetDataset): dataset_dir=None, image_lists=[], data_fields=['image'], - sample_num=-1): + sample_num=-1, + repeat=1): super(MOTDataSet, self).__init__( dataset_dir=dataset_dir, data_fields=data_fields, - sample_num=sample_num) + sample_num=sample_num, + repeat=repeat) self.dataset_dir = dataset_dir self.image_lists = image_lists if isinstance(self.image_lists, str): @@ -95,7 +98,8 @@ class MOTDataSet(DetDataset): # only used to get categories and metric # only check first data, but the label_list of all data should be same. first_mot_data = self.image_lists[0].split('.')[0] - anno_file = os.path.join(self.dataset_dir, first_mot_data, 'label_list.txt') + anno_file = os.path.join(self.dataset_dir, first_mot_data, + 'label_list.txt') return anno_file def parse_dataset(self): @@ -276,7 +280,8 @@ class MCMOTDataSet(DetDataset): # only used to get categories and metric # only check first data, but the label_list of all data should be same. first_mot_data = self.image_lists[0].split('.')[0] - anno_file = os.path.join(self.dataset_dir, first_mot_data, 'label_list.txt') + anno_file = os.path.join(self.dataset_dir, first_mot_data, + 'label_list.txt') return anno_file def parse_dataset(self): @@ -472,7 +477,7 @@ class MOTImageFolder(DetDataset): image_dir=None, sample_num=-1, keep_ori_im=False, - anno_path=None, + anno_path=None, **kwargs): super(MOTImageFolder, self).__init__( dataset_dir, image_dir, sample_num=sample_num) @@ -576,6 +581,7 @@ class MOTImageFolder(DetDataset): def get_anno(self): return self.anno_path + def _is_valid_video(f, extensions=('.mp4', '.avi', '.mov', '.rmvb', 'flv')): return f.lower().endswith(extensions) diff --git a/ppdet/data/source/voc.py b/ppdet/data/source/voc.py index 1c2a7ef98ccbac760430befc375a79cdebc51a7c..2f103588537c5499ef83133fe3f8d4ba7303e685 100644 --- a/ppdet/data/source/voc.py +++ b/ppdet/data/source/voc.py @@ -46,6 +46,7 @@ class VOCDataSet(DetDataset): empty_ratio (float): the ratio of empty record number to total record's, if empty_ratio is out of [0. ,1.), do not sample the records and use all the empty entries. 1. as default + repeat (int): repeat times for dataset, use in benchmark. """ def __init__(self, @@ -56,13 +57,15 @@ class VOCDataSet(DetDataset): sample_num=-1, label_list=None, allow_empty=False, - empty_ratio=1.): + empty_ratio=1., + repeat=1): super(VOCDataSet, self).__init__( dataset_dir=dataset_dir, image_dir=image_dir, anno_path=anno_path, data_fields=data_fields, - sample_num=sample_num) + sample_num=sample_num, + repeat=repeat) self.label_list = label_list self.allow_empty = allow_empty self.empty_ratio = empty_ratio diff --git a/ppdet/data/transform/mot_operators.py b/ppdet/data/transform/mot_operators.py index ef7d7be4514bf015b74852d2978e5c68ef67753d..e533ea3dc186a1b5cae4ee221920839848f387b6 100644 --- a/ppdet/data/transform/mot_operators.py +++ b/ppdet/data/transform/mot_operators.py @@ -529,7 +529,7 @@ class Gt2FairMOTTarget(Gt2TTFTarget): Generate FairMOT targets by ground truth data. Difference between Gt2FairMOTTarget and Gt2TTFTarget are: 1. the gaussian kernal radius to generate a heatmap. - 2. the targets needed during traing. + 2. the targets needed during training. Args: num_classes(int): the number of classes. diff --git a/ppdet/data/transform/operators.py b/ppdet/data/transform/operators.py index 7eae8db0ba67a87a1337bd4a289d505aee365aaa..09a87b128fccd214b9a5d86f5abb063e601a82d4 100644 --- a/ppdet/data/transform/operators.py +++ b/ppdet/data/transform/operators.py @@ -824,7 +824,7 @@ class Resize(BaseOperator): im_scale_x = resize_w / im_shape[1] im = self.apply_image(sample['image'], [im_scale_x, im_scale_y]) - sample['image'] = im + sample['image'] = im.astype(np.float32) sample['im_shape'] = np.asarray([resize_h, resize_w], dtype=np.float32) if 'scale_factor' in sample: scale_factor = sample['scale_factor'] @@ -1054,7 +1054,7 @@ class CropWithSampling(BaseOperator): [max sample, max trial, min scale, max scale, min aspect ratio, max aspect ratio, min overlap, max overlap] - avoid_no_bbox (bool): whether to to avoid the + avoid_no_bbox (bool): whether to avoid the situation where the box does not appear. """ super(CropWithSampling, self).__init__() @@ -1145,7 +1145,7 @@ class CropWithDataAchorSampling(BaseOperator): das_anchor_scales (list[float]): a list of anchor scales in data anchor smapling. min_size (float): minimum size of sampled bbox. - avoid_no_bbox (bool): whether to to avoid the + avoid_no_bbox (bool): whether to avoid the situation where the box does not appear. """ super(CropWithDataAchorSampling, self).__init__() @@ -2034,13 +2034,14 @@ class Pad(BaseOperator): if self.size: h, w = self.size assert ( - im_h < h and im_w < w + im_h <= h and im_w <= w ), '(h, w) of target size should be greater than (im_h, im_w)' else: h = int(np.ceil(im_h / self.size_divisor) * self.size_divisor) w = int(np.ceil(im_w / self.size_divisor) * self.size_divisor) if h == im_h and w == im_w: + sample['image'] = im.astype(np.float32) return sample if self.pad_mode == -1: @@ -2139,16 +2140,29 @@ class Rbox2Poly(BaseOperator): @register_op class AugmentHSV(BaseOperator): - def __init__(self, fraction=0.50, is_bgr=True): - """ - Augment the SV channel of image data. - Args: - fraction (float): the fraction for augment. Default: 0.5. - is_bgr (bool): whether the image is BGR mode. Default: True. - """ + """ + Augment the SV channel of image data. + Args: + fraction (float): the fraction for augment. Default: 0.5. + is_bgr (bool): whether the image is BGR mode. Default: True. + hgain (float): H channel gains + sgain (float): S channel gains + vgain (float): V channel gains + """ + + def __init__(self, + fraction=0.50, + is_bgr=True, + hgain=None, + sgain=None, + vgain=None): super(AugmentHSV, self).__init__() self.fraction = fraction self.is_bgr = is_bgr + self.hgain = hgain + self.sgain = sgain + self.vgain = vgain + self.use_hsvgain = False if hgain is None else True def apply(self, sample, context=None): img = sample['image'] @@ -2156,27 +2170,39 @@ class AugmentHSV(BaseOperator): img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) else: img_hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV) - S = img_hsv[:, :, 1].astype(np.float32) - V = img_hsv[:, :, 2].astype(np.float32) - a = (random.random() * 2 - 1) * self.fraction + 1 - S *= a - if a > 1: - np.clip(S, a_min=0, a_max=255, out=S) + if self.use_hsvgain: + hsv_augs = np.random.uniform( + -1, 1, 3) * [self.hgain, self.sgain, self.vgain] + # random selection of h, s, v + hsv_augs *= np.random.randint(0, 2, 3) + img_hsv[..., 0] = (img_hsv[..., 0] + hsv_augs[0]) % 180 + img_hsv[..., 1] = np.clip(img_hsv[..., 1] + hsv_augs[1], 0, 255) + img_hsv[..., 2] = np.clip(img_hsv[..., 2] + hsv_augs[2], 0, 255) - a = (random.random() * 2 - 1) * self.fraction + 1 - V *= a - if a > 1: - np.clip(V, a_min=0, a_max=255, out=V) + else: + S = img_hsv[:, :, 1].astype(np.float32) + V = img_hsv[:, :, 2].astype(np.float32) + + a = (random.random() * 2 - 1) * self.fraction + 1 + S *= a + if a > 1: + np.clip(S, a_min=0, a_max=255, out=S) + + a = (random.random() * 2 - 1) * self.fraction + 1 + V *= a + if a > 1: + np.clip(V, a_min=0, a_max=255, out=V) + + img_hsv[:, :, 1] = S.astype(np.uint8) + img_hsv[:, :, 2] = V.astype(np.uint8) - img_hsv[:, :, 1] = S.astype(np.uint8) - img_hsv[:, :, 2] = V.astype(np.uint8) if self.is_bgr: cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img) else: cv2.cvtColor(img_hsv, cv2.COLOR_HSV2RGB, dst=img) - sample['image'] = img + sample['image'] = img.astype(np.float32) return sample @@ -3018,3 +3044,409 @@ class CenterRandColor(BaseOperator): img = func(img, img_gray) sample['image'] = img return sample + + +@register_op +class Mosaic(BaseOperator): + """ Mosaic operator for image and gt_bboxes + The code is based on https://github.com/Megvii-BaseDetection/YOLOX/blob/main/yolox/data/datasets/mosaicdetection.py + + 1. get mosaic coords + 2. clip bbox and get mosaic_labels + 3. random_affine augment + 4. Mixup augment as copypaste (optinal), not used in tiny/nano + + Args: + prob (float): probability of using Mosaic, 1.0 as default + input_dim (list[int]): input shape + degrees (list[2]): the rotate range to apply, transform range is [min, max] + translate (list[2]): the translate range to apply, transform range is [min, max] + scale (list[2]): the scale range to apply, transform range is [min, max] + shear (list[2]): the shear range to apply, transform range is [min, max] + enable_mixup (bool): whether to enable Mixup or not + mixup_prob (float): probability of using Mixup, 1.0 as default + mixup_scale (list[int]): scale range of Mixup + remove_outside_box (bool): whether remove outside boxes, False as + default in COCO dataset, True in MOT dataset + """ + + def __init__(self, + prob=1.0, + input_dim=[640, 640], + degrees=[-10, 10], + translate=[-0.1, 0.1], + scale=[0.1, 2], + shear=[-2, 2], + enable_mixup=True, + mixup_prob=1.0, + mixup_scale=[0.5, 1.5], + remove_outside_box=False): + super(Mosaic, self).__init__() + self.prob = prob + if isinstance(input_dim, Integral): + input_dim = [input_dim, input_dim] + self.input_dim = input_dim + self.degrees = degrees + self.translate = translate + self.scale = scale + self.shear = shear + self.enable_mixup = enable_mixup + self.mixup_prob = mixup_prob + self.mixup_scale = mixup_scale + self.remove_outside_box = remove_outside_box + + def get_mosaic_coords(self, mosaic_idx, xc, yc, w, h, input_h, input_w): + # (x1, y1, x2, y2) means coords in large image, + # small_coords means coords in small image in mosaic aug. + if mosaic_idx == 0: + # top left + x1, y1, x2, y2 = max(xc - w, 0), max(yc - h, 0), xc, yc + small_coords = w - (x2 - x1), h - (y2 - y1), w, h + elif mosaic_idx == 1: + # top right + x1, y1, x2, y2 = xc, max(yc - h, 0), min(xc + w, input_w * 2), yc + small_coords = 0, h - (y2 - y1), min(w, x2 - x1), h + elif mosaic_idx == 2: + # bottom left + x1, y1, x2, y2 = max(xc - w, 0), yc, xc, min(input_h * 2, yc + h) + small_coords = w - (x2 - x1), 0, w, min(y2 - y1, h) + elif mosaic_idx == 3: + # bottom right + x1, y1, x2, y2 = xc, yc, min(xc + w, input_w * 2), min(input_h * 2, + yc + h) + small_coords = 0, 0, min(w, x2 - x1), min(y2 - y1, h) + + return (x1, y1, x2, y2), small_coords + + def random_affine_augment(self, + img, + labels=[], + input_dim=[640, 640], + degrees=[-10, 10], + scales=[0.1, 2], + shears=[-2, 2], + translates=[-0.1, 0.1]): + # random rotation and scale + degree = random.uniform(degrees[0], degrees[1]) + scale = random.uniform(scales[0], scales[1]) + assert scale > 0, "Argument scale should be positive." + R = cv2.getRotationMatrix2D(angle=degree, center=(0, 0), scale=scale) + M = np.ones([2, 3]) + + # random shear + shear = random.uniform(shears[0], shears[1]) + shear_x = math.tan(shear * math.pi / 180) + shear_y = math.tan(shear * math.pi / 180) + M[0] = R[0] + shear_y * R[1] + M[1] = R[1] + shear_x * R[0] + + # random translation + translate = random.uniform(translates[0], translates[1]) + translation_x = translate * input_dim[0] + translation_y = translate * input_dim[1] + M[0, 2] = translation_x + M[1, 2] = translation_y + + # warpAffine + img = cv2.warpAffine( + img, M, dsize=tuple(input_dim), borderValue=(114, 114, 114)) + + num_gts = len(labels) + if num_gts > 0: + # warp corner points + corner_points = np.ones((4 * num_gts, 3)) + corner_points[:, :2] = labels[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( + 4 * num_gts, 2) # x1y1, x2y2, x1y2, x2y1 + # apply affine transform + corner_points = corner_points @M.T + corner_points = corner_points.reshape(num_gts, 8) + + # create new boxes + corner_xs = corner_points[:, 0::2] + corner_ys = corner_points[:, 1::2] + new_bboxes = np.concatenate((corner_xs.min(1), corner_ys.min(1), + corner_xs.max(1), corner_ys.max(1))) + new_bboxes = new_bboxes.reshape(4, num_gts).T + + # clip boxes + new_bboxes[:, 0::2] = np.clip(new_bboxes[:, 0::2], 0, input_dim[0]) + new_bboxes[:, 1::2] = np.clip(new_bboxes[:, 1::2], 0, input_dim[1]) + labels[:, :4] = new_bboxes + + return img, labels + + def __call__(self, sample, context=None): + if not isinstance(sample, Sequence): + return sample + + assert len( + sample) == 5, "Mosaic needs 5 samples, 4 for mosaic and 1 for mixup." + if np.random.uniform(0., 1.) > self.prob: + return sample[0] + + mosaic_gt_bbox, mosaic_gt_class, mosaic_is_crowd, mosaic_difficult = [], [], [], [] + input_h, input_w = self.input_dim + yc = int(random.uniform(0.5 * input_h, 1.5 * input_h)) + xc = int(random.uniform(0.5 * input_w, 1.5 * input_w)) + mosaic_img = np.full((input_h * 2, input_w * 2, 3), 114, dtype=np.uint8) + + # 1. get mosaic coords + for mosaic_idx, sp in enumerate(sample[:4]): + img = sp['image'] + gt_bbox = sp['gt_bbox'] + h0, w0 = img.shape[:2] + scale = min(1. * input_h / h0, 1. * input_w / w0) + img = cv2.resize( + img, (int(w0 * scale), int(h0 * scale)), + interpolation=cv2.INTER_LINEAR) + (h, w, c) = img.shape[:3] + + # suffix l means large image, while s means small image in mosaic aug. + (l_x1, l_y1, l_x2, l_y2), ( + s_x1, s_y1, s_x2, s_y2) = self.get_mosaic_coords( + mosaic_idx, xc, yc, w, h, input_h, input_w) + + mosaic_img[l_y1:l_y2, l_x1:l_x2] = img[s_y1:s_y2, s_x1:s_x2] + padw, padh = l_x1 - s_x1, l_y1 - s_y1 + + # Normalized xywh to pixel xyxy format + _gt_bbox = gt_bbox.copy() + if len(gt_bbox) > 0: + _gt_bbox[:, 0] = scale * gt_bbox[:, 0] + padw + _gt_bbox[:, 1] = scale * gt_bbox[:, 1] + padh + _gt_bbox[:, 2] = scale * gt_bbox[:, 2] + padw + _gt_bbox[:, 3] = scale * gt_bbox[:, 3] + padh + + mosaic_gt_bbox.append(_gt_bbox) + mosaic_gt_class.append(sp['gt_class']) + if 'is_crowd' in sp: + mosaic_is_crowd.append(sp['is_crowd']) + if 'difficult' in sp: + mosaic_difficult.append(sp['difficult']) + + # 2. clip bbox and get mosaic_labels([gt_bbox, gt_class, is_crowd]) + if len(mosaic_gt_bbox): + mosaic_gt_bbox = np.concatenate(mosaic_gt_bbox, 0) + mosaic_gt_class = np.concatenate(mosaic_gt_class, 0) + if mosaic_is_crowd: + mosaic_is_crowd = np.concatenate(mosaic_is_crowd, 0) + mosaic_labels = np.concatenate([ + mosaic_gt_bbox, + mosaic_gt_class.astype(mosaic_gt_bbox.dtype), + mosaic_is_crowd.astype(mosaic_gt_bbox.dtype) + ], 1) + elif mosaic_difficult: + mosaic_difficult = np.concatenate(mosaic_difficult, 0) + mosaic_labels = np.concatenate([ + mosaic_gt_bbox, + mosaic_gt_class.astype(mosaic_gt_bbox.dtype), + mosaic_difficult.astype(mosaic_gt_bbox.dtype) + ], 1) + else: + mosaic_labels = np.concatenate([ + mosaic_gt_bbox, mosaic_gt_class.astype(mosaic_gt_bbox.dtype) + ], 1) + if self.remove_outside_box: + # for MOT dataset + flag1 = mosaic_gt_bbox[:, 0] < 2 * input_w + flag2 = mosaic_gt_bbox[:, 2] > 0 + flag3 = mosaic_gt_bbox[:, 1] < 2 * input_h + flag4 = mosaic_gt_bbox[:, 3] > 0 + flag_all = flag1 * flag2 * flag3 * flag4 + mosaic_labels = mosaic_labels[flag_all] + else: + mosaic_labels[:, 0] = np.clip(mosaic_labels[:, 0], 0, + 2 * input_w) + mosaic_labels[:, 1] = np.clip(mosaic_labels[:, 1], 0, + 2 * input_h) + mosaic_labels[:, 2] = np.clip(mosaic_labels[:, 2], 0, + 2 * input_w) + mosaic_labels[:, 3] = np.clip(mosaic_labels[:, 3], 0, + 2 * input_h) + else: + mosaic_labels = np.zeros((1, 6)) + + # 3. random_affine augment + mosaic_img, mosaic_labels = self.random_affine_augment( + mosaic_img, + mosaic_labels, + input_dim=self.input_dim, + degrees=self.degrees, + translates=self.translate, + scales=self.scale, + shears=self.shear) + + # 4. Mixup augment as copypaste, https://arxiv.org/abs/2012.07177 + # optinal, not used(enable_mixup=False) in tiny/nano + if (self.enable_mixup and not len(mosaic_labels) == 0 and + random.random() < self.mixup_prob): + sample_mixup = sample[4] + mixup_img = sample_mixup['image'] + if 'is_crowd' in sample_mixup: + cp_labels = np.concatenate([ + sample_mixup['gt_bbox'], + sample_mixup['gt_class'].astype(mosaic_labels.dtype), + sample_mixup['is_crowd'].astype(mosaic_labels.dtype) + ], 1) + elif 'difficult' in sample_mixup: + cp_labels = np.concatenate([ + sample_mixup['gt_bbox'], + sample_mixup['gt_class'].astype(mosaic_labels.dtype), + sample_mixup['difficult'].astype(mosaic_labels.dtype) + ], 1) + else: + cp_labels = np.concatenate([ + sample_mixup['gt_bbox'], + sample_mixup['gt_class'].astype(mosaic_labels.dtype) + ], 1) + mosaic_img, mosaic_labels = self.mixup_augment( + mosaic_img, mosaic_labels, self.input_dim, cp_labels, mixup_img) + + sample0 = sample[0] + sample0['image'] = mosaic_img.astype(np.uint8) # can not be float32 + sample0['h'] = float(mosaic_img.shape[0]) + sample0['w'] = float(mosaic_img.shape[1]) + sample0['im_shape'][0] = sample0['h'] + sample0['im_shape'][1] = sample0['w'] + sample0['gt_bbox'] = mosaic_labels[:, :4].astype(np.float32) + sample0['gt_class'] = mosaic_labels[:, 4:5].astype(np.float32) + if 'is_crowd' in sample[0]: + sample0['is_crowd'] = mosaic_labels[:, 5:6].astype(np.float32) + if 'difficult' in sample[0]: + sample0['difficult'] = mosaic_labels[:, 5:6].astype(np.float32) + return sample0 + + def mixup_augment(self, origin_img, origin_labels, input_dim, cp_labels, + img): + jit_factor = random.uniform(*self.mixup_scale) + FLIP = random.uniform(0, 1) > 0.5 + if len(img.shape) == 3: + cp_img = np.ones( + (input_dim[0], input_dim[1], 3), dtype=np.uint8) * 114 + else: + cp_img = np.ones(input_dim, dtype=np.uint8) * 114 + + cp_scale_ratio = min(input_dim[0] / img.shape[0], + input_dim[1] / img.shape[1]) + resized_img = cv2.resize( + img, (int(img.shape[1] * cp_scale_ratio), + int(img.shape[0] * cp_scale_ratio)), + interpolation=cv2.INTER_LINEAR) + + cp_img[:int(img.shape[0] * cp_scale_ratio), :int(img.shape[ + 1] * cp_scale_ratio)] = resized_img + + cp_img = cv2.resize(cp_img, (int(cp_img.shape[1] * jit_factor), + int(cp_img.shape[0] * jit_factor))) + cp_scale_ratio *= jit_factor + + if FLIP: + cp_img = cp_img[:, ::-1, :] + + origin_h, origin_w = cp_img.shape[:2] + target_h, target_w = origin_img.shape[:2] + padded_img = np.zeros( + (max(origin_h, target_h), max(origin_w, target_w), 3), + dtype=np.uint8) + padded_img[:origin_h, :origin_w] = cp_img + + x_offset, y_offset = 0, 0 + if padded_img.shape[0] > target_h: + y_offset = random.randint(0, padded_img.shape[0] - target_h - 1) + if padded_img.shape[1] > target_w: + x_offset = random.randint(0, padded_img.shape[1] - target_w - 1) + padded_cropped_img = padded_img[y_offset:y_offset + target_h, x_offset: + x_offset + target_w] + + # adjust boxes + cp_bboxes_origin_np = cp_labels[:, :4].copy() + cp_bboxes_origin_np[:, 0::2] = np.clip(cp_bboxes_origin_np[:, 0::2] * + cp_scale_ratio, 0, origin_w) + cp_bboxes_origin_np[:, 1::2] = np.clip(cp_bboxes_origin_np[:, 1::2] * + cp_scale_ratio, 0, origin_h) + + if FLIP: + cp_bboxes_origin_np[:, 0::2] = ( + origin_w - cp_bboxes_origin_np[:, 0::2][:, ::-1]) + cp_bboxes_transformed_np = cp_bboxes_origin_np.copy() + if self.remove_outside_box: + # for MOT dataset + cp_bboxes_transformed_np[:, 0::2] -= x_offset + cp_bboxes_transformed_np[:, 1::2] -= y_offset + else: + cp_bboxes_transformed_np[:, 0::2] = np.clip( + cp_bboxes_transformed_np[:, 0::2] - x_offset, 0, target_w) + cp_bboxes_transformed_np[:, 1::2] = np.clip( + cp_bboxes_transformed_np[:, 1::2] - y_offset, 0, target_h) + + cls_labels = cp_labels[:, 4:5].copy() + box_labels = cp_bboxes_transformed_np + if cp_labels.shape[-1] == 6: + crd_labels = cp_labels[:, 5:6].copy() + labels = np.hstack((box_labels, cls_labels, crd_labels)) + else: + labels = np.hstack((box_labels, cls_labels)) + if self.remove_outside_box: + labels = labels[labels[:, 0] < target_w] + labels = labels[labels[:, 2] > 0] + labels = labels[labels[:, 1] < target_h] + labels = labels[labels[:, 3] > 0] + + origin_labels = np.vstack((origin_labels, labels)) + origin_img = origin_img.astype(np.float32) + origin_img = 0.5 * origin_img + 0.5 * padded_cropped_img.astype( + np.float32) + + return origin_img.astype(np.uint8), origin_labels + + +@register_op +class PadResize(BaseOperator): + """ PadResize for image and gt_bbbox + + Args: + target_size (list[int]): input shape + fill_value (float): pixel value of padded image + """ + + def __init__(self, target_size, fill_value=114): + super(PadResize, self).__init__() + if isinstance(target_size, Integral): + target_size = [target_size, target_size] + self.target_size = target_size + self.fill_value = fill_value + + def _resize(self, img, bboxes, labels): + ratio = min(self.target_size[0] / img.shape[0], + self.target_size[1] / img.shape[1]) + w, h = int(img.shape[1] * ratio), int(img.shape[0] * ratio) + resized_img = cv2.resize(img, (w, h), interpolation=cv2.INTER_LINEAR) + + if len(bboxes) > 0: + bboxes *= ratio + mask = np.minimum(bboxes[:, 2] - bboxes[:, 0], + bboxes[:, 3] - bboxes[:, 1]) > 1 + bboxes = bboxes[mask] + labels = labels[mask] + return resized_img, bboxes, labels + + def _pad(self, img): + h, w, _ = img.shape + if h == self.target_size[0] and w == self.target_size[1]: + return img + padded_img = np.full( + (self.target_size[0], self.target_size[1], 3), + self.fill_value, + dtype=np.uint8) + padded_img[:h, :w] = img + return padded_img + + def apply(self, sample, context=None): + image = sample['image'] + bboxes = sample['gt_bbox'] + labels = sample['gt_class'] + image, bboxes, labels = self._resize(image, bboxes, labels) + sample['image'] = self._pad(image).astype(np.float32) + sample['gt_bbox'] = bboxes + sample['gt_class'] = labels + return sample diff --git a/ppdet/data/utils.py b/ppdet/data/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..02573e61484bc5ef07353dbef124c8afa54ccc64 --- /dev/null +++ b/ppdet/data/utils.py @@ -0,0 +1,72 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import numbers +import numpy as np + +try: + from collections.abc import Sequence, Mapping +except: + from collections import Sequence, Mapping + + +def default_collate_fn(batch): + """ + Default batch collating function for :code:`paddle.io.DataLoader`, + get input data as a list of sample datas, each element in list + if the data of a sample, and sample data should composed of list, + dictionary, string, number, numpy array, this + function will parse input data recursively and stack number, + numpy array and paddle.Tensor datas as batch datas. e.g. for + following input data: + [{'image': np.array(shape=[3, 224, 224]), 'label': 1}, + {'image': np.array(shape=[3, 224, 224]), 'label': 3}, + {'image': np.array(shape=[3, 224, 224]), 'label': 4}, + {'image': np.array(shape=[3, 224, 224]), 'label': 5},] + + + This default collate function zipped each number and numpy array + field together and stack each field as the batch field as follows: + {'image': np.array(shape=[4, 3, 224, 224]), 'label': np.array([1, 3, 4, 5])} + Args: + batch(list of sample data): batch should be a list of sample data. + + Returns: + Batched data: batched each number, numpy array and paddle.Tensor + in input data. + """ + sample = batch[0] + if isinstance(sample, np.ndarray): + batch = np.stack(batch, axis=0) + return batch + elif isinstance(sample, numbers.Number): + batch = np.array(batch) + return batch + elif isinstance(sample, (str, bytes)): + return batch + elif isinstance(sample, Mapping): + return { + key: default_collate_fn([d[key] for d in batch]) + for key in sample + } + elif isinstance(sample, Sequence): + sample_fields_num = len(sample) + if not all(len(sample) == sample_fields_num for sample in iter(batch)): + raise RuntimeError( + "fileds number not same among samples in a batch") + return [default_collate_fn(fields) for fields in zip(*batch)] + + raise TypeError("batch data con only contains: tensor, numpy.ndarray, " + "dict, list, number, but got {}".format(type(sample))) diff --git a/ppdet/engine/callbacks.py b/ppdet/engine/callbacks.py index 77ca94602c518411f040fa9e55f6c2ee656d726f..09683d18b4ef16ff7a5221114937133a91974d5a 100644 --- a/ppdet/engine/callbacks.py +++ b/ppdet/engine/callbacks.py @@ -198,7 +198,7 @@ class Checkpointer(Callback): "training iterations being too few or not " \ "loading the correct weights.") return - if map_res[key][0] > self.best_ap: + if map_res[key][0] >= self.best_ap: self.best_ap = map_res[key][0] save_name = 'best_model' weight = self.weight.state_dict() @@ -288,6 +288,151 @@ class VisualDLWriter(Callback): self.vdl_mAP_step) self.vdl_mAP_step += 1 +class WandbCallback(Callback): + def __init__(self, model): + super(WandbCallback, self).__init__(model) + + try: + import wandb + self.wandb = wandb + except Exception as e: + logger.error('wandb not found, please install wandb. ' + 'Use: `pip install wandb`.') + raise e + + self.wandb_params = model.cfg.get('wandb', None) + self.save_dir = os.path.join(self.model.cfg.save_dir, + self.model.cfg.filename) + if self.wandb_params is None: + self.wandb_params = {} + for k, v in model.cfg.items(): + if k.startswith("wandb_"): + self.wandb_params.update({ + k.lstrip("wandb_"): v + }) + + self._run = None + if dist.get_world_size() < 2 or dist.get_rank() == 0: + _ = self.run + self.run.config.update(self.model.cfg) + self.run.define_metric("epoch") + self.run.define_metric("eval/*", step_metric="epoch") + + self.best_ap = 0 + + @property + def run(self): + if self._run is None: + if self.wandb.run is not None: + logger.info("There is an ongoing wandb run which will be used" + "for logging. Please use `wandb.finish()` to end that" + "if the behaviour is not intended") + self._run = self.wandb.run + else: + self._run = self.wandb.init(**self.wandb_params) + return self._run + + def save_model(self, + optimizer, + save_dir, + save_name, + last_epoch, + ema_model=None, + ap=None, + tags=None): + if dist.get_world_size() < 2 or dist.get_rank() == 0: + model_path = os.path.join(save_dir, save_name) + metadata = {} + metadata["last_epoch"] = last_epoch + if ap: + metadata["ap"] = ap + if ema_model is None: + ema_artifact = self.wandb.Artifact(name="ema_model-{}".format(self.run.id), type="model", metadata=metadata) + model_artifact = self.wandb.Artifact(name="model-{}".format(self.run.id), type="model", metadata=metadata) + + ema_artifact.add_file(model_path + ".pdema", name="model_ema") + model_artifact.add_file(model_path + ".pdparams", name="model") + + self.run.log_artifact(ema_artifact, aliases=tags) + self.run.log_artfact(model_artifact, aliases=tags) + else: + model_artifact = self.wandb.Artifact(name="model-{}".format(self.run.id), type="model", metadata=metadata) + model_artifact.add_file(model_path + ".pdparams", name="model") + self.run.log_artifact(model_artifact, aliases=tags) + + def on_step_end(self, status): + + mode = status['mode'] + if dist.get_world_size() < 2 or dist.get_rank() == 0: + if mode == 'train': + training_status = status['training_staus'].get() + for k, v in training_status.items(): + training_status[k] = float(v) + metrics = { + "train/" + k: v for k,v in training_status.items() + } + self.run.log(metrics) + + def on_epoch_end(self, status): + mode = status['mode'] + epoch_id = status['epoch_id'] + save_name = None + if dist.get_world_size() < 2 or dist.get_rank() == 0: + if mode == 'train': + end_epoch = self.model.cfg.epoch + if ( + epoch_id + 1 + ) % self.model.cfg.snapshot_epoch == 0 or epoch_id == end_epoch - 1: + save_name = str(epoch_id) if epoch_id != end_epoch - 1 else "model_final" + tags = ["latest", "epoch_{}".format(epoch_id)] + self.save_model( + self.model.optimizer, + self.save_dir, + save_name, + epoch_id + 1, + self.model.use_ema, + tags=tags + ) + if mode == 'eval': + merged_dict = {} + for metric in self.model._metrics: + for key, map_value in metric.get_results().items(): + merged_dict["eval/{}-mAP".format(key)] = map_value[0] + merged_dict["epoch"] = status["epoch_id"] + self.run.log(merged_dict) + + if 'save_best_model' in status and status['save_best_model']: + for metric in self.model._metrics: + map_res = metric.get_results() + if 'bbox' in map_res: + key = 'bbox' + elif 'keypoint' in map_res: + key = 'keypoint' + else: + key = 'mask' + if key not in map_res: + logger.warning("Evaluation results empty, this may be due to " \ + "training iterations being too few or not " \ + "loading the correct weights.") + return + if map_res[key][0] >= self.best_ap: + self.best_ap = map_res[key][0] + save_name = 'best_model' + tags = ["best", "epoch_{}".format(epoch_id)] + + self.save_model( + self.model.optimizer, + self.save_dir, + save_name, + last_epoch=epoch_id + 1, + ema_model=self.model.use_ema, + ap=self.best_ap, + tags=tags + ) + + def on_train_end(self, status): + self.run.finish() + class SniperProposalsGenerator(Callback): def __init__(self, model): diff --git a/ppdet/engine/export_utils.py b/ppdet/engine/export_utils.py index bddb4af07ca91f5b82389fdeba39ba77c1a1344e..6af8b0f4757dca4d9b0e0ba76cffc1ff3308b9de 100644 --- a/ppdet/engine/export_utils.py +++ b/ppdet/engine/export_utils.py @@ -48,6 +48,7 @@ TRT_MIN_SUBGRAPH = { 'PicoDet': 3, 'CenterNet': 5, 'TOOD': 5, + 'YOLOX': 8, } KEYPOINT_ARCH = ['HigherHRNet', 'TopDownHRNet'] @@ -57,7 +58,9 @@ MOT_ARCH = ['DeepSORT', 'JDE', 'FairMOT', 'ByteTrack'] def _prune_input_spec(input_spec, program, targets): # try to prune static program to figure out pruned input spec # so we perform following operations in static mode + device = paddle.get_device() paddle.enable_static() + paddle.set_device(device) pruned_input_spec = [{}] program = program.clone() program = program._prune(targets=targets) @@ -68,7 +71,7 @@ def _prune_input_spec(input_spec, program, targets): pruned_input_spec[0][name] = spec except Exception: pass - paddle.disable_static() + paddle.disable_static(place=device) return pruned_input_spec @@ -147,6 +150,12 @@ def _dump_infer_config(config, path, image_shape, model): infer_cfg['min_subgraph_size'] = min_subgraph_size arch_state = True break + + if infer_arch == 'YOLOX': + infer_cfg['arch'] = infer_arch + infer_cfg['min_subgraph_size'] = TRT_MIN_SUBGRAPH[infer_arch] + arch_state = True + if not arch_state: logger.error( 'Architecture: {} is not supported for exporting model now.\n'. diff --git a/ppdet/engine/tracker.py b/ppdet/engine/tracker.py index 691a42a8ccb6a12a4054016bbbea33ead7971402..090c72c7eb86562beee9257cf9c32c966af093bc 100644 --- a/ppdet/engine/tracker.py +++ b/ppdet/engine/tracker.py @@ -17,22 +17,21 @@ from __future__ import division from __future__ import print_function import os -import cv2 import glob import re import paddle +import paddle.nn as nn import numpy as np -import os.path as osp +from tqdm import tqdm from collections import defaultdict from ppdet.core.workspace import create from ppdet.utils.checkpoint import load_weight, load_pretrain_weight from ppdet.modeling.mot.utils import Detection, get_crops, scale_coords, clip_box from ppdet.modeling.mot.utils import MOTTimer, load_det_results, write_mot_results, save_vis_results -from ppdet.modeling.mot.tracker import JDETracker, DeepSORTTracker - -from ppdet.metrics import Metric, MOTMetric, KITTIMOTMetric -from ppdet.metrics import MCMOTMetric +from ppdet.modeling.mot.tracker import JDETracker, DeepSORTTracker, OCSORTTracker +from ppdet.modeling.architectures import YOLOX +from ppdet.metrics import Metric, MOTMetric, KITTIMOTMetric, MCMOTMetric import ppdet.utils.stats as stats from .callbacks import Callback, ComposeCallback @@ -62,6 +61,12 @@ class Tracker(object): # build model self.model = create(cfg.architecture) + if isinstance(self.model.detector, YOLOX): + for k, m in self.model.named_sublayers(): + if isinstance(m, nn.BatchNorm2D): + m._epsilon = 1e-3 # for amp(fp16) + m._momentum = 0.97 # 0.03 in pytorch + self.status = {} self.start_epoch = 0 @@ -142,11 +147,8 @@ class Tracker(object): self.model.eval() results = defaultdict(list) # support single class and multi classes - for step_id, data in enumerate(dataloader): + for step_id, data in enumerate(tqdm(dataloader)): self.status['step_id'] = step_id - if frame_id % 40 == 0: - logger.info('Processing frame {} ({:.2f} fps)'.format( - frame_id, 1. / max(1e-5, timer.average_time))) # forward timer.tic() pred_dets, pred_embs = self.model(data) @@ -210,12 +212,8 @@ class Tracker(object): det_file)) tracker = self.model.tracker - for step_id, data in enumerate(dataloader): + for step_id, data in enumerate(tqdm(dataloader)): self.status['step_id'] = step_id - if frame_id % 40 == 0: - logger.info('Processing frame {} ({:.2f} fps)'.format( - frame_id, 1. / max(1e-5, timer.average_time))) - ori_image = data['ori_image'] # [bs, H, W, 3] ori_image_shape = data['ori_image'].shape[1:3] # ori_image_shape: [H, W] @@ -339,8 +337,8 @@ class Tracker(object): results[0].append( (frame_id + 1, online_tlwhs, online_scores, online_ids)) save_vis_results(data, frame_id, online_ids, online_tlwhs, - online_scores, timer.average_time, show_image, - save_dir, self.cfg.num_classes) + online_scores, timer.average_time, show_image, + save_dir, self.cfg.num_classes) elif isinstance(tracker, JDETracker): # trick hyperparams only used for MOTChallenge (MOT17, MOT20) Test-set @@ -366,13 +364,35 @@ class Tracker(object): online_scores[cls_id].append(tscore) # save results results[cls_id].append( - (frame_id + 1, online_tlwhs[cls_id], online_scores[cls_id], - online_ids[cls_id])) + (frame_id + 1, online_tlwhs[cls_id], + online_scores[cls_id], online_ids[cls_id])) timer.toc() save_vis_results(data, frame_id, online_ids, online_tlwhs, - online_scores, timer.average_time, show_image, - save_dir, self.cfg.num_classes) - + online_scores, timer.average_time, show_image, + save_dir, self.cfg.num_classes) + elif isinstance(tracker, OCSORTTracker): + # OC_SORT Tracker + online_targets = tracker.update(pred_dets_old, pred_embs) + online_tlwhs = [] + online_ids = [] + online_scores = [] + for t in online_targets: + tlwh = [t[0], t[1], t[2] - t[0], t[3] - t[1]] + tscore = float(t[4]) + tid = int(t[5]) + if tlwh[2] * tlwh[3] > 0: + online_tlwhs.append(tlwh) + online_ids.append(tid) + online_scores.append(tscore) + timer.toc() + # save results + results[0].append( + (frame_id + 1, online_tlwhs, online_scores, online_ids)) + save_vis_results(data, frame_id, online_ids, online_tlwhs, + online_scores, timer.average_time, show_image, + save_dir, self.cfg.num_classes) + else: + raise ValueError(tracker) frame_id += 1 return results, frame_id, timer.average_time, timer.calls @@ -417,7 +437,7 @@ class Tracker(object): save_dir = os.path.join(output_dir, 'mot_outputs', seq) if save_images or save_videos else None - logger.info('start seq: {}'.format(seq)) + logger.info('Evaluate seq: {}'.format(seq)) self.dataset.set_images(self.get_infer_images(infer_dir)) dataloader = create('EvalMOTReader')(self.dataset, 0) @@ -458,7 +478,6 @@ class Tracker(object): os.system(cmd_str) logger.info('Save video in {}.'.format(output_video_path)) - logger.info('Evaluate seq: {}'.format(seq)) # update metrics for metric in self._metrics: metric.update(data_root, seq, data_type, result_root, @@ -582,6 +601,7 @@ class Tracker(object): write_mot_results(result_filename, results, data_type, self.cfg.num_classes) + def get_trick_hyperparams(video_name, ori_buffer, ori_thresh): if video_name[:3] != 'MOT': # only used for MOTChallenge (MOT17, MOT20) Test-set @@ -610,5 +630,5 @@ def get_trick_hyperparams(video_name, ori_buffer, ori_thresh): track_thresh = 0.3 else: track_thresh = ori_thresh - + return track_buffer, ori_thresh diff --git a/ppdet/engine/trainer.py b/ppdet/engine/trainer.py index fa9167f05b0f8554cd2650a337a51bd31c355b6c..c253b40aa58a32f2b8cbe89d902480a3a537e1e9 100644 --- a/ppdet/engine/trainer.py +++ b/ppdet/engine/trainer.py @@ -20,16 +20,18 @@ import os import sys import copy import time +from tqdm import tqdm import numpy as np import typing from PIL import Image, ImageOps, ImageFile + ImageFile.LOAD_TRUNCATED_IMAGES = True import paddle +import paddle.nn as nn import paddle.distributed as dist from paddle.distributed import fleet -from paddle import amp from paddle.static import InputSpec from ppdet.optimizer import ModelEMA @@ -41,11 +43,14 @@ from ppdet.metrics import RBoxMetric, JDEDetMetric, SNIPERCOCOMetric from ppdet.data.source.sniper_coco import SniperCOCODataSet from ppdet.data.source.category import get_categories import ppdet.utils.stats as stats +from ppdet.utils.fuse_utils import fuse_conv_bn from ppdet.utils import profiler -from .callbacks import Callback, ComposeCallback, LogPrinter, Checkpointer, WiferFaceEval, VisualDLWriter, SniperProposalsGenerator +from .callbacks import Callback, ComposeCallback, LogPrinter, Checkpointer, WiferFaceEval, VisualDLWriter, SniperProposalsGenerator, WandbCallback from .export_utils import _dump_infer_config, _prune_input_spec +from paddle.distributed.fleet.utils.hybrid_parallel_util import fused_allreduce_gradients + from ppdet.utils.logger import setup_logger logger = setup_logger('ppdet.engine') @@ -62,12 +67,19 @@ class Trainer(object): self.mode = mode.lower() self.optimizer = None self.is_loaded_weights = False + self.use_amp = self.cfg.get('amp', False) + self.amp_level = self.cfg.get('amp_level', 'O1') + self.custom_white_list = self.cfg.get('custom_white_list', None) + self.custom_black_list = self.cfg.get('custom_black_list', None) # build data loader + capital_mode = self.mode.capitalize() if cfg.architecture in MOT_ARCH and self.mode in ['eval', 'test']: - self.dataset = cfg['{}MOTDataset'.format(self.mode.capitalize())] + self.dataset = self.cfg['{}MOTDataset'.format( + capital_mode)] = create('{}MOTDataset'.format(capital_mode))() else: - self.dataset = cfg['{}Dataset'.format(self.mode.capitalize())] + self.dataset = self.cfg['{}Dataset'.format(capital_mode)] = create( + '{}Dataset'.format(capital_mode))() if cfg.architecture == 'DeepSORT' and self.mode == 'train': logger.error('DeepSORT has no need of training on mot dataset.') @@ -78,7 +90,7 @@ class Trainer(object): self.dataset.set_images(images) if self.mode == 'train': - self.loader = create('{}Reader'.format(self.mode.capitalize()))( + self.loader = create('{}Reader'.format(capital_mode))( self.dataset, cfg.worker_num) if cfg.architecture == 'JDE' and self.mode == 'train': @@ -98,23 +110,26 @@ class Trainer(object): self.model = self.cfg.model self.is_loaded_weights = True + if cfg.architecture == 'YOLOX': + for k, m in self.model.named_sublayers(): + if isinstance(m, nn.BatchNorm2D): + m._epsilon = 1e-3 # for amp(fp16) + m._momentum = 0.97 # 0.03 in pytorch + #normalize params for deploy if 'slim' in cfg and cfg['slim_type'] == 'OFA': self.model.model.load_meanstd(cfg['TestReader'][ 'sample_transforms']) + elif 'slim' in cfg and cfg['slim_type'] == 'Distill': + self.model.student_model.load_meanstd(cfg['TestReader'][ + 'sample_transforms']) + elif 'slim' in cfg and cfg[ + 'slim_type'] == 'DistillPrune' and self.mode == 'train': + self.model.student_model.load_meanstd(cfg['TestReader'][ + 'sample_transforms']) else: self.model.load_meanstd(cfg['TestReader']['sample_transforms']) - self.use_ema = ('use_ema' in cfg and cfg['use_ema']) - if self.use_ema: - ema_decay = self.cfg.get('ema_decay', 0.9998) - cycle_epoch = self.cfg.get('cycle_epoch', -1) - self.ema = ModelEMA( - self.model, - decay=ema_decay, - use_thres_step=True, - cycle_epoch=cycle_epoch) - # EvalDataset build with BatchSampler to evaluate in single device # TODO: multi-device evaluate if self.mode == 'eval': @@ -141,6 +156,21 @@ class Trainer(object): if self.cfg.get('unstructured_prune'): self.pruner = create('UnstructuredPruner')(self.model, steps_per_epoch) + if self.use_amp and self.amp_level == 'O2': + self.model, self.optimizer = paddle.amp.decorate( + models=self.model, + optimizers=self.optimizer, + level=self.amp_level) + self.use_ema = ('use_ema' in cfg and cfg['use_ema']) + if self.use_ema: + ema_decay = self.cfg.get('ema_decay', 0.9998) + cycle_epoch = self.cfg.get('cycle_epoch', -1) + ema_decay_type = self.cfg.get('ema_decay_type', 'threshold') + self.ema = ModelEMA( + self.model, + decay=ema_decay, + ema_decay_type=ema_decay_type, + cycle_epoch=cycle_epoch) self._nranks = dist.get_world_size() self._local_rank = dist.get_rank() @@ -164,6 +194,8 @@ class Trainer(object): self._callbacks.append(VisualDLWriter(self)) if self.cfg.get('save_proposals', False): self._callbacks.append(SniperProposalsGenerator(self)) + if self.cfg.get('use_wandb', False) or 'wandb' in self.cfg: + self._callbacks.append(WandbCallback(self)) self._compose_callback = ComposeCallback(self._callbacks) elif self.mode == 'eval': self._callbacks = [LogPrinter(self)] @@ -184,7 +216,7 @@ class Trainer(object): classwise = self.cfg['classwise'] if 'classwise' in self.cfg else False if self.cfg.metric == 'COCO' or self.cfg.metric == "SNIPERCOCO": # TODO: bias should be unified - bias = self.cfg['bias'] if 'bias' in self.cfg else 0 + bias = 1 if self.cfg.get('bias', False) else 0 output_eval = self.cfg['output_eval'] \ if 'output_eval' in self.cfg else None save_prediction_only = self.cfg.get('save_prediction_only', False) @@ -196,13 +228,14 @@ class Trainer(object): # when do validation in train, annotation file should be get from # EvalReader instead of self.dataset(which is TrainReader) - anno_file = self.dataset.get_anno() - dataset = self.dataset if self.mode == 'train' and validate: eval_dataset = self.cfg['EvalDataset'] eval_dataset.check_or_download_dataset() anno_file = eval_dataset.get_anno() dataset = eval_dataset + else: + dataset = self.dataset + anno_file = dataset.get_anno() IouType = self.cfg['IouType'] if 'IouType' in self.cfg else 'bbox' if self.cfg.metric == "COCO": @@ -258,12 +291,18 @@ class Trainer(object): save_prediction_only=save_prediction_only) ] elif self.cfg.metric == 'VOC': + output_eval = self.cfg['output_eval'] \ + if 'output_eval' in self.cfg else None + save_prediction_only = self.cfg.get('save_prediction_only', False) + self._metrics = [ VOCMetric( label_list=self.dataset.get_label_list(), class_num=self.cfg.num_classes, map_type=self.cfg.map_type, - classwise=classwise) + classwise=classwise, + output_eval=output_eval, + save_prediction_only=save_prediction_only) ] elif self.cfg.metric == 'WiderFace': multi_scale = self.cfg.multi_scale_eval if 'multi_scale_eval' in self.cfg else True @@ -353,14 +392,22 @@ class Trainer(object): def train(self, validate=False): assert self.mode == 'train', "Model not in 'train' mode" Init_mark = False + if validate: + self.cfg['EvalDataset'] = self.cfg.EvalDataset = create( + "EvalDataset")() + model = self.model sync_bn = (getattr(self.cfg, 'norm_type', None) == 'sync_bn' and self.cfg.use_gpu and self._nranks > 1) if sync_bn: - self.model = paddle.nn.SyncBatchNorm.convert_sync_batchnorm( - self.model) + model = paddle.nn.SyncBatchNorm.convert_sync_batchnorm(model) - model = self.model + # enabel auto mixed precision mode + if self.use_amp: + scaler = paddle.amp.GradScaler( + enable=self.cfg.use_gpu or self.cfg.use_npu, + init_loss_scaling=self.cfg.get('init_loss_scaling', 1024)) + # get distributed model if self.cfg.get('fleet', False): model = fleet.distributed_model(model) self.optimizer = fleet.distributed_optimizer(self.optimizer) @@ -368,12 +415,7 @@ class Trainer(object): find_unused_parameters = self.cfg[ 'find_unused_parameters'] if 'find_unused_parameters' in self.cfg else False model = paddle.DataParallel( - self.model, find_unused_parameters=find_unused_parameters) - - # enabel auto mixed precision mode - if self.cfg.get('amp', False): - scaler = amp.GradScaler( - enable=self.cfg.use_gpu or self.cfg.use_npu, init_loss_scaling=1024) + model, find_unused_parameters=find_unused_parameters) self.status.update({ 'epoch_id': self.start_epoch, @@ -395,6 +437,9 @@ class Trainer(object): self._compose_callback.on_train_begin(self.status) + use_fused_allreduce_gradients = self.cfg[ + 'use_fused_allreduce_gradients'] if 'use_fused_allreduce_gradients' in self.cfg else False + for epoch_id in range(self.start_epoch, self.cfg.epoch): self.status['mode'] = 'train' self.status['epoch_id'] = epoch_id @@ -409,23 +454,56 @@ class Trainer(object): self._compose_callback.on_step_begin(self.status) data['epoch_id'] = epoch_id - if self.cfg.get('amp', False): - with amp.auto_cast(enable=self.cfg.use_gpu): - # model forward - outputs = model(data) - loss = outputs['loss'] - - # model backward - scaled_loss = scaler.scale(loss) - scaled_loss.backward() + if self.use_amp: + if isinstance( + model, paddle. + DataParallel) and use_fused_allreduce_gradients: + with model.no_sync(): + with paddle.amp.auto_cast( + enable=self.cfg.use_gpu, + custom_white_list=self.custom_white_list, + custom_black_list=self.custom_black_list, + level=self.amp_level): + # model forward + outputs = model(data) + loss = outputs['loss'] + # model backward + scaled_loss = scaler.scale(loss) + scaled_loss.backward() + fused_allreduce_gradients( + list(model.parameters()), None) + else: + with paddle.amp.auto_cast( + enable=self.cfg.use_gpu, + custom_white_list=self.custom_white_list, + custom_black_list=self.custom_black_list, + level=self.amp_level): + # model forward + outputs = model(data) + loss = outputs['loss'] + # model backward + scaled_loss = scaler.scale(loss) + scaled_loss.backward() # in dygraph mode, optimizer.minimize is equal to optimizer.step scaler.minimize(self.optimizer, scaled_loss) else: - # model forward - outputs = model(data) - loss = outputs['loss'] - # model backward - loss.backward() + if isinstance( + model, paddle. + DataParallel) and use_fused_allreduce_gradients: + with model.no_sync(): + # model forward + outputs = model(data) + loss = outputs['loss'] + # model backward + loss.backward() + fused_allreduce_gradients( + list(model.parameters()), None) + else: + # model forward + outputs = model(data) + loss = outputs['loss'] + # model backward + loss.backward() self.optimizer.step() curr_lr = self.optimizer.get_lr() self.lr.step() @@ -503,7 +581,15 @@ class Trainer(object): self.status['step_id'] = step_id self._compose_callback.on_step_begin(self.status) # forward - outs = self.model(data) + if self.use_amp: + with paddle.amp.auto_cast( + enable=self.cfg.use_gpu, + custom_white_list=self.custom_white_list, + custom_black_list=self.custom_black_list, + level=self.amp_level): + outs = self.model(data) + else: + outs = self.model(data) # update metrics for metric in self._metrics: @@ -535,10 +621,45 @@ class Trainer(object): images, draw_threshold=0.5, output_dir='output', - save_txt=False): + save_results=False): self.dataset.set_images(images) loader = create('TestReader')(self.dataset, 0) + def setup_metrics_for_loader(): + # mem + metrics = copy.deepcopy(self._metrics) + mode = self.mode + save_prediction_only = self.cfg[ + 'save_prediction_only'] if 'save_prediction_only' in self.cfg else None + output_eval = self.cfg[ + 'output_eval'] if 'output_eval' in self.cfg else None + + # modify + self.mode = '_test' + self.cfg['save_prediction_only'] = True + self.cfg['output_eval'] = output_dir + self._init_metrics() + + # restore + self.mode = mode + self.cfg.pop('save_prediction_only') + if save_prediction_only is not None: + self.cfg['save_prediction_only'] = save_prediction_only + + self.cfg.pop('output_eval') + if output_eval is not None: + self.cfg['output_eval'] = output_eval + + _metrics = copy.deepcopy(self._metrics) + self._metrics = metrics + + return _metrics + + if save_results: + metrics = setup_metrics_for_loader() + else: + metrics = [] + imid2path = self.dataset.get_imid2path() anno_file = self.dataset.get_anno() @@ -552,11 +673,14 @@ class Trainer(object): flops_loader = create('TestReader')(self.dataset, 0) self._flops(flops_loader) results = [] - for step_id, data in enumerate(loader): + for step_id, data in enumerate(tqdm(loader)): self.status['step_id'] = step_id # forward outs = self.model(data) + for _m in metrics: + _m.update(data, outs) + for key in ['im_shape', 'scale_factor', 'im_id']: if isinstance(data, typing.Sequence): outs[key] = data[0][key] @@ -566,11 +690,16 @@ class Trainer(object): if hasattr(value, 'numpy'): outs[key] = value.numpy() results.append(outs) + # sniper if type(self.dataset) == SniperCOCODataSet: results = self.dataset.anno_cropper.aggregate_chips_detections( results) + for _m in metrics: + _m.accumulate() + _m.reset() + for outs in results: batch_res = get_infer_results(outs, clsid2catid) bbox_num = outs['bbox_num'] @@ -602,15 +731,7 @@ class Trainer(object): logger.info("Detection bbox results save in {}".format( save_name)) image.save(save_name, quality=95) - if save_txt: - save_path = os.path.splitext(save_name)[0] + '.txt' - results = {} - results["im_id"] = im_id - if bbox_res: - results["bbox_res"] = bbox_res - if keypoint_res: - results["keypoint_res"] = keypoint_res - save_result(save_path, results, catid2name, draw_threshold) + start = end def _get_save_image_name(self, output_dir, image_path): @@ -623,7 +744,10 @@ class Trainer(object): name, ext = os.path.splitext(image_name) return os.path.join(output_dir, "{}".format(name)) + ext - def _get_infer_cfg_and_input_spec(self, save_dir, prune_input=True): + def _get_infer_cfg_and_input_spec(self, + save_dir, + prune_input=True, + kl_quant=False): image_shape = None im_shape = [None, 2] scale_factor = [None, 2] @@ -647,9 +771,10 @@ class Trainer(object): if hasattr(self.model, 'deploy'): self.model.deploy = True - for layer in self.model.sublayers(): - if hasattr(layer, 'convert_to_deploy'): - layer.convert_to_deploy() + if 'slim' not in self.cfg: + for layer in self.model.sublayers(): + if hasattr(layer, 'convert_to_deploy'): + layer.convert_to_deploy() export_post_process = self.cfg['export'].get( 'post_process', False) if hasattr(self.cfg, 'export') else True @@ -703,11 +828,29 @@ class Trainer(object): "image": InputSpec( shape=image_shape, name='image') }] + if kl_quant: + if self.cfg.architecture == 'PicoDet' or 'ppyoloe' in self.cfg.weights: + pruned_input_spec = [{ + "image": InputSpec( + shape=image_shape, name='image'), + "scale_factor": InputSpec( + shape=scale_factor, name='scale_factor') + }] + elif 'tinypose' in self.cfg.weights: + pruned_input_spec = [{ + "image": InputSpec( + shape=image_shape, name='image') + }] return static_model, pruned_input_spec def export(self, output_dir='output_inference'): self.model.eval() + + if hasattr(self.cfg, 'export') and 'fuse_conv_bn' in self.cfg[ + 'export'] and self.cfg['export']['fuse_conv_bn']: + self.model = fuse_conv_bn(self.model) + model_name = os.path.splitext(os.path.split(self.cfg.filename)[-1])[0] save_dir = os.path.join(output_dir, model_name) if not os.path.exists(save_dir): @@ -717,7 +860,7 @@ class Trainer(object): save_dir) # dy2st and save model - if 'slim' not in self.cfg or self.cfg['slim_type'] != 'QAT': + if 'slim' not in self.cfg or 'QAT' not in self.cfg['slim_type']: paddle.jit.save( static_model, os.path.join(save_dir, 'model'), @@ -741,8 +884,9 @@ class Trainer(object): break # TODO: support prune input_spec + kl_quant = True if hasattr(self.cfg.slim, 'ptq') else False _, pruned_input_spec = self._get_infer_cfg_and_input_spec( - save_dir, prune_input=False) + save_dir, prune_input=False, kl_quant=kl_quant) self.cfg.slim.save_quantized_model( self.model, diff --git a/ppdet/ext_op/README.md b/ppdet/ext_op/README.md index 7ada0acf7fd75266fed6c66a9a010debc645bee8..0d67062ade859b0ca025d6ad35d9a630cf4ec523 100644 --- a/ppdet/ext_op/README.md +++ b/ppdet/ext_op/README.md @@ -1,5 +1,5 @@ # 自定义OP编译 -旋转框IOU计算OP是参考[自定义外部算子](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/new_custom_op.html) 。 +旋转框IOU计算OP是参考[自定义外部算子](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/custom_op/new_cpp_op_cn.html) 。 ## 1. 环境依赖 - Paddle >= 2.0.1 @@ -7,13 +7,13 @@ ## 2. 安装 ``` -python3.7 setup.py install +python setup.py install ``` -按照如下方式使用 +编译完成后即可使用,以下为`rbox_iou`的使用示例 ``` # 引入自定义op -from rbox_iou_ops import rbox_iou +from ext_op import rbox_iou paddle.set_device('gpu:0') paddle.disable_static() @@ -29,10 +29,7 @@ print('iou', iou) ``` ## 3. 单元测试 -单元测试`test.py`文件中,通过对比python实现的结果和测试自定义op结果。 - -由于python计算细节与cpp计算细节略有区别,误差区间设置为0.02。 +可以通过执行单元测试来确认自定义算子功能的正确性,执行单元测试的示例如下所示: ``` -python3.7 test.py +python unittest/test_matched_rbox_iou.py ``` -提示`rbox_iou OP compute right!`说明OP测试通过。 diff --git a/ppdet/ext_op/csrc/rbox_iou/matched_rbox_iou_op.cc b/ppdet/ext_op/csrc/rbox_iou/matched_rbox_iou_op.cc new file mode 100644 index 0000000000000000000000000000000000000000..2c3c58b606c22607272d6d37877d11399d7542d9 --- /dev/null +++ b/ppdet/ext_op/csrc/rbox_iou/matched_rbox_iou_op.cc @@ -0,0 +1,90 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// The code is based on +// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated + +#include "paddle/extension.h" +#include "rbox_iou_op.h" + +template +void matched_rbox_iou_cpu_kernel(const int rbox_num, const T *rbox1_data_ptr, + const T *rbox2_data_ptr, T *output_data_ptr) { + + int i; + for (i = 0; i < rbox_num; i++) { + output_data_ptr[i] = + rbox_iou_single(rbox1_data_ptr + i * 5, rbox2_data_ptr + i * 5); + } +} + +#define CHECK_INPUT_CPU(x) \ + PD_CHECK(x.place() == paddle::PlaceType::kCPU, #x " must be a CPU Tensor.") + +std::vector MatchedRboxIouCPUForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2) { + CHECK_INPUT_CPU(rbox1); + CHECK_INPUT_CPU(rbox2); + PD_CHECK(rbox1.shape()[0] == rbox2.shape()[0], "inputs must be same dim"); + + auto rbox_num = rbox1.shape()[0]; + auto output = paddle::Tensor(paddle::PlaceType::kCPU, {rbox_num}); + + PD_DISPATCH_FLOATING_TYPES(rbox1.type(), "rotated_iou_cpu_kernel", ([&] { + matched_rbox_iou_cpu_kernel( + rbox_num, rbox1.data(), + rbox2.data(), + output.mutable_data()); + })); + + return {output}; +} + +#ifdef PADDLE_WITH_CUDA +std::vector MatchedRboxIouCUDAForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2); +#endif + +#define CHECK_INPUT_SAME(x1, x2) \ + PD_CHECK(x1.place() == x2.place(), "input must be smae pacle.") + +std::vector MatchedRboxIouForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2) { + CHECK_INPUT_SAME(rbox1, rbox2); + if (rbox1.place() == paddle::PlaceType::kCPU) { + return MatchedRboxIouCPUForward(rbox1, rbox2); +#ifdef PADDLE_WITH_CUDA + } else if (rbox1.place() == paddle::PlaceType::kGPU) { + return MatchedRboxIouCUDAForward(rbox1, rbox2); +#endif + } +} + +std::vector> +MatchedRboxIouInferShape(std::vector rbox1_shape, + std::vector rbox2_shape) { + return {{rbox1_shape[0]}}; +} + +std::vector MatchedRboxIouInferDtype(paddle::DataType t1, + paddle::DataType t2) { + return {t1}; +} + +PD_BUILD_OP(matched_rbox_iou) + .Inputs({"RBOX1", "RBOX2"}) + .Outputs({"Output"}) + .SetKernelFn(PD_KERNEL(MatchedRboxIouForward)) + .SetInferShapeFn(PD_INFER_SHAPE(MatchedRboxIouInferShape)) + .SetInferDtypeFn(PD_INFER_DTYPE(MatchedRboxIouInferDtype)); diff --git a/ppdet/ext_op/csrc/rbox_iou/matched_rbox_iou_op.cu b/ppdet/ext_op/csrc/rbox_iou/matched_rbox_iou_op.cu new file mode 100644 index 0000000000000000000000000000000000000000..8d03ecce6a775162980746adf727738a6beb102b --- /dev/null +++ b/ppdet/ext_op/csrc/rbox_iou/matched_rbox_iou_op.cu @@ -0,0 +1,63 @@ +// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// The code is based on +// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated + +#include "paddle/extension.h" +#include "rbox_iou_op.h" + +/** + Computes ceil(a / b) +*/ + +static inline int CeilDiv(const int a, const int b) { return (a + b - 1) / b; } + +template +__global__ void +matched_rbox_iou_cuda_kernel(const int rbox_num, const T *rbox1_data_ptr, + const T *rbox2_data_ptr, T *output_data_ptr) { + for (int tid = blockIdx.x * blockDim.x + threadIdx.x; tid < rbox_num; + tid += blockDim.x * gridDim.x) { + output_data_ptr[tid] = + rbox_iou_single(rbox1_data_ptr + tid * 5, rbox2_data_ptr + tid * 5); + } +} + +#define CHECK_INPUT_GPU(x) \ + PD_CHECK(x.place() == paddle::PlaceType::kGPU, #x " must be a GPU Tensor.") + +std::vector MatchedRboxIouCUDAForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2) { + CHECK_INPUT_GPU(rbox1); + CHECK_INPUT_GPU(rbox2); + PD_CHECK(rbox1.shape()[0] == rbox2.shape()[0], "inputs must be same dim"); + + auto rbox_num = rbox1.shape()[0]; + + auto output = paddle::Tensor(paddle::PlaceType::kGPU, {rbox_num}); + + const int thread_per_block = 512; + const int block_per_grid = CeilDiv(rbox_num, thread_per_block); + + PD_DISPATCH_FLOATING_TYPES( + rbox1.type(), "matched_rbox_iou_cuda_kernel", ([&] { + matched_rbox_iou_cuda_kernel< + data_t><<>>( + rbox_num, rbox1.data(), rbox2.data(), + output.mutable_data()); + })); + + return {output}; +} diff --git a/ppdet/ext_op/rbox_iou_op.cc b/ppdet/ext_op/csrc/rbox_iou/rbox_iou_op.cc similarity index 100% rename from ppdet/ext_op/rbox_iou_op.cc rename to ppdet/ext_op/csrc/rbox_iou/rbox_iou_op.cc diff --git a/ppdet/ext_op/rbox_iou_op.cu b/ppdet/ext_op/csrc/rbox_iou/rbox_iou_op.cu similarity index 63% rename from ppdet/ext_op/rbox_iou_op.cu rename to ppdet/ext_op/csrc/rbox_iou/rbox_iou_op.cu index 8ec43e54b4a813ef5829ba3120cc4a2be4d5d9b9..16d1d36f1002832d01db826743ce5c57ac557463 100644 --- a/ppdet/ext_op/rbox_iou_op.cu +++ b/ppdet/ext_op/csrc/rbox_iou/rbox_iou_op.cu @@ -12,10 +12,11 @@ // See the License for the specific language governing permissions and // limitations under the License. // -// The code is based on https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated +// The code is based on +// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated -#include "rbox_iou_op.h" #include "paddle/extension.h" +#include "rbox_iou_op.h" // 2D block with 32 * 16 = 512 threads per block const int BLOCK_DIM_X = 32; @@ -25,17 +26,13 @@ const int BLOCK_DIM_Y = 16; Computes ceil(a / b) */ -static inline int CeilDiv(const int a, const int b) { - return (a + b - 1) / b; -} +static inline int CeilDiv(const int a, const int b) { return (a + b - 1) / b; } template -__global__ void rbox_iou_cuda_kernel( - const int rbox1_num, - const int rbox2_num, - const T* rbox1_data_ptr, - const T* rbox2_data_ptr, - T* output_data_ptr) { +__global__ void rbox_iou_cuda_kernel(const int rbox1_num, const int rbox2_num, + const T *rbox1_data_ptr, + const T *rbox2_data_ptr, + T *output_data_ptr) { // get row_start and col_start const int rbox1_block_idx = blockIdx.x * blockDim.x; @@ -47,7 +44,6 @@ __global__ void rbox_iou_cuda_kernel( __shared__ T block_boxes1[BLOCK_DIM_X * 5]; __shared__ T block_boxes2[BLOCK_DIM_Y * 5]; - // It's safe to copy using threadIdx.x since BLOCK_DIM_X >= BLOCK_DIM_Y if (threadIdx.x < rbox1_thread_num && threadIdx.y == 0) { block_boxes1[threadIdx.x * 5 + 0] = @@ -62,7 +58,8 @@ __global__ void rbox_iou_cuda_kernel( rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 4]; } - // threadIdx.x < BLOCK_DIM_Y=rbox2_thread_num, just use same condition as above: threadIdx.y == 0 + // threadIdx.x < BLOCK_DIM_Y=rbox2_thread_num, just use same condition as + // above: threadIdx.y == 0 if (threadIdx.x < rbox2_thread_num && threadIdx.y == 0) { block_boxes2[threadIdx.x * 5 + 0] = rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 0]; @@ -80,41 +77,38 @@ __global__ void rbox_iou_cuda_kernel( __syncthreads(); if (threadIdx.x < rbox1_thread_num && threadIdx.y < rbox2_thread_num) { - int offset = (rbox1_block_idx + threadIdx.x) * rbox2_num + rbox2_block_idx + threadIdx.y; - output_data_ptr[offset] = rbox_iou_single(block_boxes1 + threadIdx.x * 5, block_boxes2 + threadIdx.y * 5); + int offset = (rbox1_block_idx + threadIdx.x) * rbox2_num + rbox2_block_idx + + threadIdx.y; + output_data_ptr[offset] = rbox_iou_single( + block_boxes1 + threadIdx.x * 5, block_boxes2 + threadIdx.y * 5); } } -#define CHECK_INPUT_GPU(x) PD_CHECK(x.place() == paddle::PlaceType::kGPU, #x " must be a GPU Tensor.") +#define CHECK_INPUT_GPU(x) \ + PD_CHECK(x.place() == paddle::PlaceType::kGPU, #x " must be a GPU Tensor.") -std::vector RboxIouCUDAForward(const paddle::Tensor& rbox1, const paddle::Tensor& rbox2) { - CHECK_INPUT_GPU(rbox1); - CHECK_INPUT_GPU(rbox2); +std::vector RboxIouCUDAForward(const paddle::Tensor &rbox1, + const paddle::Tensor &rbox2) { + CHECK_INPUT_GPU(rbox1); + CHECK_INPUT_GPU(rbox2); - auto rbox1_num = rbox1.shape()[0]; - auto rbox2_num = rbox2.shape()[0]; + auto rbox1_num = rbox1.shape()[0]; + auto rbox2_num = rbox2.shape()[0]; - auto output = paddle::Tensor(paddle::PlaceType::kGPU, {rbox1_num, rbox2_num}); + auto output = paddle::Tensor(paddle::PlaceType::kGPU, {rbox1_num, rbox2_num}); - const int blocks_x = CeilDiv(rbox1_num, BLOCK_DIM_X); - const int blocks_y = CeilDiv(rbox2_num, BLOCK_DIM_Y); + const int blocks_x = CeilDiv(rbox1_num, BLOCK_DIM_X); + const int blocks_y = CeilDiv(rbox2_num, BLOCK_DIM_Y); - dim3 blocks(blocks_x, blocks_y); - dim3 threads(BLOCK_DIM_X, BLOCK_DIM_Y); + dim3 blocks(blocks_x, blocks_y); + dim3 threads(BLOCK_DIM_X, BLOCK_DIM_Y); - PD_DISPATCH_FLOATING_TYPES( - rbox1.type(), - "rbox_iou_cuda_kernel", - ([&] { - rbox_iou_cuda_kernel<<>>( - rbox1_num, - rbox2_num, - rbox1.data(), - rbox2.data(), - output.mutable_data()); - })); + PD_DISPATCH_FLOATING_TYPES( + rbox1.type(), "rbox_iou_cuda_kernel", ([&] { + rbox_iou_cuda_kernel<<>>( + rbox1_num, rbox2_num, rbox1.data(), rbox2.data(), + output.mutable_data()); + })); - return {output}; + return {output}; } - - diff --git a/ppdet/ext_op/rbox_iou_op.h b/ppdet/ext_op/csrc/rbox_iou/rbox_iou_op.h similarity index 81% rename from ppdet/ext_op/rbox_iou_op.h rename to ppdet/ext_op/csrc/rbox_iou/rbox_iou_op.h index 77fb62e394a17a2e41379a40b3379c4eacf4e80d..fce66dea00e829215ffdb3a38f8db6182a068609 100644 --- a/ppdet/ext_op/rbox_iou_op.h +++ b/ppdet/ext_op/csrc/rbox_iou/rbox_iou_op.h @@ -12,7 +12,8 @@ // See the License for the specific language governing permissions and // limitations under the License. // -// The code is based on https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated +// The code is based on +// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated #pragma once @@ -32,24 +33,20 @@ namespace { -template -struct RotatedBox { - T x_ctr, y_ctr, w, h, a; -}; +template struct RotatedBox { T x_ctr, y_ctr, w, h, a; }; -template -struct Point { +template struct Point { T x, y; - HOST_DEVICE_INLINE Point(const T& px = 0, const T& py = 0) : x(px), y(py) {} - HOST_DEVICE_INLINE Point operator+(const Point& p) const { + HOST_DEVICE_INLINE Point(const T &px = 0, const T &py = 0) : x(px), y(py) {} + HOST_DEVICE_INLINE Point operator+(const Point &p) const { return Point(x + p.x, y + p.y); } - HOST_DEVICE_INLINE Point& operator+=(const Point& p) { + HOST_DEVICE_INLINE Point &operator+=(const Point &p) { x += p.x; y += p.y; return *this; } - HOST_DEVICE_INLINE Point operator-(const Point& p) const { + HOST_DEVICE_INLINE Point operator-(const Point &p) const { return Point(x - p.x, y - p.y); } HOST_DEVICE_INLINE Point operator*(const T coeff) const { @@ -58,22 +55,21 @@ struct Point { }; template -HOST_DEVICE_INLINE T dot_2d(const Point& A, const Point& B) { +HOST_DEVICE_INLINE T dot_2d(const Point &A, const Point &B) { return A.x * B.x + A.y * B.y; } template -HOST_DEVICE_INLINE T cross_2d(const Point& A, const Point& B) { +HOST_DEVICE_INLINE T cross_2d(const Point &A, const Point &B) { return A.x * B.y - B.x * A.y; } template -HOST_DEVICE_INLINE void get_rotated_vertices( - const RotatedBox& box, - Point (&pts)[4]) { +HOST_DEVICE_INLINE void get_rotated_vertices(const RotatedBox &box, + Point (&pts)[4]) { // M_PI / 180. == 0.01745329251 - //double theta = box.a * 0.01745329251; - //MODIFIED + // double theta = box.a * 0.01745329251; + // MODIFIED double theta = box.a; T cosTheta2 = (T)cos(theta) * 0.5f; T sinTheta2 = (T)sin(theta) * 0.5f; @@ -90,10 +86,9 @@ HOST_DEVICE_INLINE void get_rotated_vertices( } template -HOST_DEVICE_INLINE int get_intersection_points( - const Point (&pts1)[4], - const Point (&pts2)[4], - Point (&intersections)[24]) { +HOST_DEVICE_INLINE int get_intersection_points(const Point (&pts1)[4], + const Point (&pts2)[4], + Point (&intersections)[24]) { // Line vector // A line from p1 to p2 is: p1 + (p2-p1)*t, t=[0,1] Point vec1[4], vec2[4]; @@ -127,8 +122,8 @@ HOST_DEVICE_INLINE int get_intersection_points( // Check for vertices of rect1 inside rect2 { - const auto& AB = vec2[0]; - const auto& DA = vec2[3]; + const auto &AB = vec2[0]; + const auto &DA = vec2[3]; auto ABdotAB = dot_2d(AB, AB); auto ADdotAD = dot_2d(DA, DA); for (int i = 0; i < 4; i++) { @@ -150,8 +145,8 @@ HOST_DEVICE_INLINE int get_intersection_points( // Reverse the check - check for vertices of rect2 inside rect1 { - const auto& AB = vec1[0]; - const auto& DA = vec1[3]; + const auto &AB = vec1[0]; + const auto &DA = vec1[3]; auto ABdotAB = dot_2d(AB, AB); auto ADdotAD = dot_2d(DA, DA); for (int i = 0; i < 4; i++) { @@ -171,11 +166,9 @@ HOST_DEVICE_INLINE int get_intersection_points( } template -HOST_DEVICE_INLINE int convex_hull_graham( - const Point (&p)[24], - const int& num_in, - Point (&q)[24], - bool shift_to_zero = false) { +HOST_DEVICE_INLINE int convex_hull_graham(const Point (&p)[24], + const int &num_in, Point (&q)[24], + bool shift_to_zero = false) { assert(num_in >= 2); // Step 1: @@ -188,7 +181,7 @@ HOST_DEVICE_INLINE int convex_hull_graham( t = i; } } - auto& start = p[t]; // starting point + auto &start = p[t]; // starting point // Step 2: // Subtract starting point from every points (for sorting in the next step) @@ -230,15 +223,15 @@ HOST_DEVICE_INLINE int convex_hull_graham( } #else // CPU version - std::sort( - q + 1, q + num_in, [](const Point& A, const Point& B) -> bool { - T temp = cross_2d(A, B); - if (fabs(temp) < 1e-6) { - return dot_2d(A, A) < dot_2d(B, B); - } else { - return temp > 0; - } - }); + std::sort(q + 1, q + num_in, + [](const Point &A, const Point &B) -> bool { + T temp = cross_2d(A, B); + if (fabs(temp) < 1e-6) { + return dot_2d(A, A) < dot_2d(B, B); + } else { + return temp > 0; + } + }); #endif // Step 4: @@ -286,7 +279,7 @@ HOST_DEVICE_INLINE int convex_hull_graham( } template -HOST_DEVICE_INLINE T polygon_area(const Point (&q)[24], const int& m) { +HOST_DEVICE_INLINE T polygon_area(const Point (&q)[24], const int &m) { if (m <= 2) { return 0; } @@ -300,9 +293,8 @@ HOST_DEVICE_INLINE T polygon_area(const Point (&q)[24], const int& m) { } template -HOST_DEVICE_INLINE T rboxes_intersection( - const RotatedBox& box1, - const RotatedBox& box2) { +HOST_DEVICE_INLINE T rboxes_intersection(const RotatedBox &box1, + const RotatedBox &box2) { // There are up to 4 x 4 + 4 + 4 = 24 intersections (including dups) returned // from rotated_rect_intersection_pts Point intersectPts[24], orderedPts[24]; @@ -327,8 +319,8 @@ HOST_DEVICE_INLINE T rboxes_intersection( } // namespace template -HOST_DEVICE_INLINE T -rbox_iou_single(T const* const box1_raw, T const* const box2_raw) { +HOST_DEVICE_INLINE T rbox_iou_single(T const *const box1_raw, + T const *const box2_raw) { // shift center to the middle point to achieve higher precision in result RotatedBox box1, box2; auto center_shift_x = (box1_raw[0] + box2_raw[0]) / 2.0; diff --git a/ppdet/ext_op/setup.py b/ppdet/ext_op/setup.py index d364db7ed37c68227a5ef7d2f8b2c8d5fcad8123..5892f4625c263b9eac19a434aca10968882bc4bc 100644 --- a/ppdet/ext_op/setup.py +++ b/ppdet/ext_op/setup.py @@ -1,14 +1,33 @@ +import os +import glob import paddle from paddle.utils.cpp_extension import CppExtension, CUDAExtension, setup -if __name__ == "__main__": + +def get_extensions(): + root_dir = os.path.dirname(os.path.abspath(__file__)) + ext_root_dir = os.path.join(root_dir, 'csrc') + sources = [] + for ext_name in os.listdir(ext_root_dir): + ext_dir = os.path.join(ext_root_dir, ext_name) + source = glob.glob(os.path.join(ext_dir, '*.cc')) + kwargs = dict() + if paddle.device.is_compiled_with_cuda(): + source += glob.glob(os.path.join(ext_dir, '*.cu')) + + if not source: + continue + + sources += source + if paddle.device.is_compiled_with_cuda(): - setup( - name='rbox_iou_ops', - ext_modules=CUDAExtension( - sources=['rbox_iou_op.cc', 'rbox_iou_op.cu'], - extra_compile_args={'cxx': ['-DPADDLE_WITH_CUDA']})) + extension = CUDAExtension( + sources, extra_compile_args={'cxx': ['-DPADDLE_WITH_CUDA']}) else: - setup( - name='rbox_iou_ops', - ext_modules=CppExtension(sources=['rbox_iou_op.cc'])) + extension = CppExtension(sources) + + return extension + + +if __name__ == "__main__": + setup(name='ext_op', ext_modules=get_extensions()) diff --git a/ppdet/ext_op/unittest/test_matched_rbox_iou.py b/ppdet/ext_op/unittest/test_matched_rbox_iou.py new file mode 100644 index 0000000000000000000000000000000000000000..af7b076da2435a4f025f608430549f2334c22e08 --- /dev/null +++ b/ppdet/ext_op/unittest/test_matched_rbox_iou.py @@ -0,0 +1,149 @@ +import numpy as np +import sys +import time +from shapely.geometry import Polygon +import paddle +import unittest + +from ext_op import matched_rbox_iou + + +def rbox2poly_single(rrect, get_best_begin_point=False): + """ + rrect:[x_ctr,y_ctr,w,h,angle] + to + poly:[x0,y0,x1,y1,x2,y2,x3,y3] + """ + x_ctr, y_ctr, width, height, angle = rrect[:5] + tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2 + # rect 2x4 + rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]]) + R = np.array([[np.cos(angle), -np.sin(angle)], + [np.sin(angle), np.cos(angle)]]) + # poly + poly = R.dot(rect) + x0, x1, x2, x3 = poly[0, :4] + x_ctr + y0, y1, y2, y3 = poly[1, :4] + y_ctr + poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float64) + return poly + + +def intersection(g, p): + """ + Intersection. + """ + + g = g[:8].reshape((4, 2)) + p = p[:8].reshape((4, 2)) + + a = g + b = p + + use_filter = True + if use_filter: + # step1: + inter_x1 = np.maximum(np.min(a[:, 0]), np.min(b[:, 0])) + inter_x2 = np.minimum(np.max(a[:, 0]), np.max(b[:, 0])) + inter_y1 = np.maximum(np.min(a[:, 1]), np.min(b[:, 1])) + inter_y2 = np.minimum(np.max(a[:, 1]), np.max(b[:, 1])) + if inter_x1 >= inter_x2 or inter_y1 >= inter_y2: + return 0. + x1 = np.minimum(np.min(a[:, 0]), np.min(b[:, 0])) + x2 = np.maximum(np.max(a[:, 0]), np.max(b[:, 0])) + y1 = np.minimum(np.min(a[:, 1]), np.min(b[:, 1])) + y2 = np.maximum(np.max(a[:, 1]), np.max(b[:, 1])) + if x1 >= x2 or y1 >= y2 or (x2 - x1) < 2 or (y2 - y1) < 2: + return 0. + + g = Polygon(g) + p = Polygon(p) + if not g.is_valid or not p.is_valid: + return 0 + + inter = Polygon(g).intersection(Polygon(p)).area + union = g.area + p.area - inter + if union == 0: + return 0 + else: + return inter / union + + +def matched_rbox_overlaps(anchors, gt_bboxes, use_cv2=False): + """ + + Args: + anchors: [M, 5] x1,y1,x2,y2,angle + gt_bboxes: [M, 5] x1,y1,x2,y2,angle + + Returns: + macthed_iou: [M] + """ + assert anchors.shape[1] == 5 + assert gt_bboxes.shape[1] == 5 + + gt_bboxes_ploy = [rbox2poly_single(e) for e in gt_bboxes] + anchors_ploy = [rbox2poly_single(e) for e in anchors] + + num = len(anchors_ploy) + iou = np.zeros((num, ), dtype=np.float64) + + start_time = time.time() + for i in range(num): + try: + iou[i] = intersection(gt_bboxes_ploy[i], anchors_ploy[i]) + except Exception as e: + print('cur gt_bboxes_ploy[i]', gt_bboxes_ploy[i], + 'anchors_ploy[j]', anchors_ploy[i], e) + return iou + + +def gen_sample(n): + rbox = np.random.rand(n, 5) + rbox[:, 0:4] = rbox[:, 0:4] * 0.45 + 0.001 + rbox[:, 4] = rbox[:, 4] - 0.5 + return rbox + + +class MatchedRBoxIoUTest(unittest.TestCase): + def setUp(self): + self.initTestCase() + self.rbox1 = gen_sample(self.n) + self.rbox2 = gen_sample(self.n) + + def initTestCase(self): + self.n = 1000 + + def assertAllClose(self, x, y, msg, atol=5e-1, rtol=1e-2): + self.assertTrue(np.allclose(x, y, atol=atol, rtol=rtol), msg=msg) + + def get_places(self): + places = [paddle.CPUPlace()] + if paddle.device.is_compiled_with_cuda(): + places.append(paddle.CUDAPlace(0)) + + return places + + def check_output(self, place): + paddle.disable_static() + pd_rbox1 = paddle.to_tensor(self.rbox1, place=place) + pd_rbox2 = paddle.to_tensor(self.rbox2, place=place) + actual_t = matched_rbox_iou(pd_rbox1, pd_rbox2).numpy() + poly_rbox1 = self.rbox1 + poly_rbox2 = self.rbox2 + poly_rbox1[:, 0:4] = self.rbox1[:, 0:4] * 1024 + poly_rbox2[:, 0:4] = self.rbox2[:, 0:4] * 1024 + expect_t = matched_rbox_overlaps(poly_rbox1, poly_rbox2, use_cv2=False) + self.assertAllClose( + actual_t, + expect_t, + msg="rbox_iou has diff at {} \nExpect {}\nBut got {}".format( + str(place), str(expect_t), str(actual_t))) + + def test_output(self): + places = self.get_places() + for place in places: + self.check_output(place) + + +if __name__ == "__main__": + unittest.main() diff --git a/ppdet/ext_op/test.py b/ppdet/ext_op/unittest/test_rbox_iou.py similarity index 89% rename from ppdet/ext_op/test.py rename to ppdet/ext_op/unittest/test_rbox_iou.py index 85872e484b8ca6d60a62d311c9fdfc4a9e08b6e2..8ef19ae841d5a73c5b90f1b971ed36d1d7f61a7a 100644 --- a/ppdet/ext_op/test.py +++ b/ppdet/ext_op/unittest/test_rbox_iou.py @@ -5,11 +5,7 @@ from shapely.geometry import Polygon import paddle import unittest -try: - from rbox_iou_ops import rbox_iou -except Exception as e: - print('import rbox_iou_ops error', e) - sys.exit(-1) +from ext_op import rbox_iou def rbox2poly_single(rrect, get_best_begin_point=False): @@ -80,7 +76,7 @@ def rbox_overlaps(anchors, gt_bboxes, use_cv2=False): gt_bboxes: [M, 5] x1,y1,x2,y2,angle Returns: - + iou: [NA, M] """ assert anchors.shape[1] == 5 assert gt_bboxes.shape[1] == 5 @@ -89,17 +85,16 @@ def rbox_overlaps(anchors, gt_bboxes, use_cv2=False): anchors_ploy = [rbox2poly_single(e) for e in anchors] num_gt, num_anchors = len(gt_bboxes_ploy), len(anchors_ploy) - iou = np.zeros((num_gt, num_anchors), dtype=np.float64) + iou = np.zeros((num_anchors, num_gt), dtype=np.float64) start_time = time.time() - for i in range(num_gt): - for j in range(num_anchors): + for i in range(num_anchors): + for j in range(num_gt): try: - iou[i, j] = intersection(gt_bboxes_ploy[i], anchors_ploy[j]) + iou[i, j] = intersection(anchors_ploy[i], gt_bboxes_ploy[j]) except Exception as e: - print('cur gt_bboxes_ploy[i]', gt_bboxes_ploy[i], - 'anchors_ploy[j]', anchors_ploy[j], e) - iou = iou.T + print('cur anchors_ploy[i]', anchors_ploy[i], + 'gt_bboxes_ploy[j]', gt_bboxes_ploy[j], e) return iou diff --git a/ppdet/metrics/json_results.py b/ppdet/metrics/json_results.py index c703de63be89e326da979d2edbe0a3e1afca3bec..93354ec1fc592b1567b5f0a3e2044a215d231a30 100755 --- a/ppdet/metrics/json_results.py +++ b/ppdet/metrics/json_results.py @@ -65,6 +65,14 @@ def get_det_poly_res(bboxes, bbox_nums, image_id, label_to_cat_id_map, bias=0): return det_res +def strip_mask(mask): + row = mask[0, 0, :] + col = mask[0, :, 0] + im_h = len(col) - np.count_nonzero(col == -1) + im_w = len(row) - np.count_nonzero(row == -1) + return mask[:, :im_h, :im_w] + + def get_seg_res(masks, bboxes, mask_nums, image_id, label_to_cat_id_map): import pycocotools.mask as mask_util seg_res = [] @@ -72,8 +80,10 @@ def get_seg_res(masks, bboxes, mask_nums, image_id, label_to_cat_id_map): for i in range(len(mask_nums)): cur_image_id = int(image_id[i][0]) det_nums = mask_nums[i] + mask_i = masks[k:k + det_nums] + mask_i = strip_mask(mask_i) for j in range(det_nums): - mask = masks[k].astype(np.uint8) + mask = mask_i[j].astype(np.uint8) score = float(bboxes[k][1]) label = int(bboxes[k][0]) k = k + 1 diff --git a/ppdet/metrics/keypoint_metrics.py b/ppdet/metrics/keypoint_metrics.py index 0f5c8c973ea3d6ad05e18a20647df268438ae9e7..cbd52d02d4af1f6dd81edd0ea63a98b7ed77e614 100644 --- a/ppdet/metrics/keypoint_metrics.py +++ b/ppdet/metrics/keypoint_metrics.py @@ -16,6 +16,7 @@ import os import json from collections import defaultdict, OrderedDict import numpy as np +import paddle from pycocotools.coco import COCO from pycocotools.cocoeval import COCOeval from ..modeling.keypoint_utils import oks_nms @@ -70,15 +71,23 @@ class KeyPointTopDownCOCOEval(object): self.results['all_preds'][self.idx:self.idx + num_images, :, 0: 3] = kpts[:, :, 0:3] self.results['all_boxes'][self.idx:self.idx + num_images, 0:2] = inputs[ - 'center'].numpy()[:, 0:2] + 'center'].numpy()[:, 0:2] if isinstance( + inputs['center'], paddle.Tensor) else inputs['center'][:, 0:2] self.results['all_boxes'][self.idx:self.idx + num_images, 2:4] = inputs[ - 'scale'].numpy()[:, 0:2] + 'scale'].numpy()[:, 0:2] if isinstance( + inputs['scale'], paddle.Tensor) else inputs['scale'][:, 0:2] self.results['all_boxes'][self.idx:self.idx + num_images, 4] = np.prod( - inputs['scale'].numpy() * 200, 1) - self.results['all_boxes'][self.idx:self.idx + num_images, - 5] = np.squeeze(inputs['score'].numpy()) - self.results['image_path'].extend(inputs['im_id'].numpy()) - + inputs['scale'].numpy() * 200, + 1) if isinstance(inputs['scale'], paddle.Tensor) else np.prod( + inputs['scale'] * 200, 1) + self.results['all_boxes'][ + self.idx:self.idx + num_images, + 5] = np.squeeze(inputs['score'].numpy()) if isinstance( + inputs['score'], paddle.Tensor) else np.squeeze(inputs['score']) + if isinstance(inputs['im_id'], paddle.Tensor): + self.results['image_path'].extend(inputs['im_id'].numpy()) + else: + self.results['image_path'].extend(inputs['im_id']) self.idx += num_images def _write_coco_keypoint_results(self, keypoints): diff --git a/ppdet/metrics/map_utils.py b/ppdet/metrics/map_utils.py index 9c96b9235f4205279e47ff84006351a012d7bf2d..12fb9ba51242bdd244eb60da8b364ab26ddbecba 100644 --- a/ppdet/metrics/map_utils.py +++ b/ppdet/metrics/map_utils.py @@ -121,9 +121,9 @@ def calc_rbox_iou(pred, gt_rbox): pred_rbox = pred_rbox.reshape(-1, 5) pred_rbox = pred_rbox.reshape(-1, 5) try: - from rbox_iou_ops import rbox_iou + from ext_op import rbox_iou except Exception as e: - print("import custom_ops error, try install rbox_iou_ops " \ + print("import custom_ops error, try install ext_op " \ "following ppdet/ext_op/README.md", e) sys.stdout.flush() sys.exit(-1) diff --git a/ppdet/metrics/mcmot_metrics.py b/ppdet/metrics/mcmot_metrics.py index 5bcfb923470a1f94a2bd951fb721221a8f339354..c9b5ef7506e92adcfe58d5dd1f2f2cad0d9d9e70 100644 --- a/ppdet/metrics/mcmot_metrics.py +++ b/ppdet/metrics/mcmot_metrics.py @@ -21,18 +21,21 @@ import copy import sys import math from collections import defaultdict -from motmetrics.math_util import quiet_divide import numpy as np import pandas as pd -import paddle -import paddle.nn.functional as F from .metrics import Metric -import motmetrics as mm -import openpyxl -metrics = mm.metrics.motchallenge_metrics -mh = mm.metrics.create() +try: + import motmetrics as mm + from motmetrics.math_util import quiet_divide + metrics = mm.metrics.motchallenge_metrics + mh = mm.metrics.create() +except: + print( + 'Warning: Unable to use MCMOT metric, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) + pass from ppdet.utils.logger import setup_logger logger = setup_logger(__name__) @@ -302,24 +305,30 @@ class MCMOTEvaluator(object): self.num_classes = num_classes self.load_annotations() + try: + import motmetrics as mm + mm.lap.default_solver = 'lap' + except Exception as e: + raise RuntimeError( + 'Unable to use MCMOT metric, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) self.reset_accumulator() self.class_accs = [] def load_annotations(self): assert self.data_type == 'mcmot' - self.gt_filename = os.path.join(self.data_root, '../', - 'sequences', + self.gt_filename = os.path.join(self.data_root, '../', 'sequences', '{}.txt'.format(self.seq_name)) - + if not os.path.exists(self.gt_filename): + logger.warning( + "gt_filename '{}' of MCMOTEvaluator is not exist, so the MOTA will be -INF." + ) + def reset_accumulator(self): - import motmetrics as mm - mm.lap.default_solver = 'lap' self.acc = mm.MOTAccumulator(auto_id=True) def eval_frame_dict(self, trk_objs, gt_objs, rtn_events=False, union=False): - import motmetrics as mm - mm.lap.default_solver = 'lap' if union: trk_tlwhs, trk_ids, trk_cls = unzip_objs_cls(trk_objs)[:3] gt_tlwhs, gt_ids, gt_cls = unzip_objs_cls(gt_objs)[:3] @@ -393,9 +402,6 @@ class MCMOTEvaluator(object): names, metrics=('mota', 'num_switches', 'idp', 'idr', 'idf1', 'precision', 'recall')): - import motmetrics as mm - mm.lap.default_solver = 'lap' - names = copy.deepcopy(names) if metrics is None: metrics = mm.metrics.motchallenge_metrics diff --git a/ppdet/metrics/metrics.py b/ppdet/metrics/metrics.py index 3925267d7f9e5656033e4c851d8b52f1031867ab..b20a569a0434ed7bac7f461399cb280f08ff888a 100644 --- a/ppdet/metrics/metrics.py +++ b/ppdet/metrics/metrics.py @@ -22,6 +22,7 @@ import json import paddle import numpy as np import typing +from pathlib import Path from .map_utils import prune_zero_padding, DetectionMAP from .coco_utils import get_infer_results, cocoapi_eval @@ -32,13 +33,8 @@ from ppdet.utils.logger import setup_logger logger = setup_logger(__name__) __all__ = [ - 'Metric', - 'COCOMetric', - 'VOCMetric', - 'WiderFaceMetric', - 'get_infer_results', - 'RBoxMetric', - 'SNIPERCOCOMetric' + 'Metric', 'COCOMetric', 'VOCMetric', 'WiderFaceMetric', 'get_infer_results', + 'RBoxMetric', 'SNIPERCOCOMetric' ] COCO_SIGMAS = np.array([ @@ -74,8 +70,6 @@ class Metric(paddle.metric.Metric): class COCOMetric(Metric): def __init__(self, anno_file, **kwargs): - assert os.path.isfile(anno_file), \ - "anno_file {} not a file".format(anno_file) self.anno_file = anno_file self.clsid2catid = kwargs.get('clsid2catid', None) if self.clsid2catid is None: @@ -86,6 +80,14 @@ class COCOMetric(Metric): self.bias = kwargs.get('bias', 0) self.save_prediction_only = kwargs.get('save_prediction_only', False) self.iou_type = kwargs.get('IouType', 'bbox') + + if not self.save_prediction_only: + assert os.path.isfile(anno_file), \ + "anno_file {} not a file".format(anno_file) + + if self.output_eval is not None: + Path(self.output_eval).mkdir(exist_ok=True) + self.reset() def reset(self): @@ -223,7 +225,9 @@ class VOCMetric(Metric): map_type='11point', is_bbox_normalized=False, evaluate_difficult=False, - classwise=False): + classwise=False, + output_eval=None, + save_prediction_only=False): assert os.path.isfile(label_list), \ "label_list {} not a file".format(label_list) self.clsid2catid, self.catid2name = get_categories('VOC', label_list) @@ -231,6 +235,8 @@ class VOCMetric(Metric): self.overlap_thresh = overlap_thresh self.map_type = map_type self.evaluate_difficult = evaluate_difficult + self.output_eval = output_eval + self.save_prediction_only = save_prediction_only self.detection_map = DetectionMAP( class_num=class_num, overlap_thresh=overlap_thresh, @@ -243,34 +249,52 @@ class VOCMetric(Metric): self.reset() def reset(self): + self.results = {'bbox': [], 'score': [], 'label': []} self.detection_map.reset() def update(self, inputs, outputs): - bbox_np = outputs['bbox'].numpy() + bbox_np = outputs['bbox'].numpy() if isinstance( + outputs['bbox'], paddle.Tensor) else outputs['bbox'] bboxes = bbox_np[:, 2:] scores = bbox_np[:, 1] labels = bbox_np[:, 0] - bbox_lengths = outputs['bbox_num'].numpy() + bbox_lengths = outputs['bbox_num'].numpy() if isinstance( + outputs['bbox_num'], paddle.Tensor) else outputs['bbox_num'] + + self.results['bbox'].append(bboxes.tolist()) + self.results['score'].append(scores.tolist()) + self.results['label'].append(labels.tolist()) if bboxes.shape == (1, 1) or bboxes is None: return + if self.save_prediction_only: + return + gt_boxes = inputs['gt_bbox'] gt_labels = inputs['gt_class'] difficults = inputs['difficult'] if not self.evaluate_difficult \ else None - scale_factor = inputs['scale_factor'].numpy( - ) if 'scale_factor' in inputs else np.ones( - (gt_boxes.shape[0], 2)).astype('float32') + if 'scale_factor' in inputs: + scale_factor = inputs['scale_factor'].numpy() if isinstance( + inputs['scale_factor'], + paddle.Tensor) else inputs['scale_factor'] + else: + scale_factor = np.ones((gt_boxes.shape[0], 2)).astype('float32') bbox_idx = 0 for i in range(len(gt_boxes)): - gt_box = gt_boxes[i].numpy() + gt_box = gt_boxes[i].numpy() if isinstance( + gt_boxes[i], paddle.Tensor) else gt_boxes[i] h, w = scale_factor[i] gt_box = gt_box / np.array([w, h, w, h]) - gt_label = gt_labels[i].numpy() - difficult = None if difficults is None \ - else difficults[i].numpy() + gt_label = gt_labels[i].numpy() if isinstance( + gt_labels[i], paddle.Tensor) else gt_labels[i] + if difficults is not None: + difficult = difficults[i].numpy() if isinstance( + difficults[i], paddle.Tensor) else difficults[i] + else: + difficult = None bbox_num = bbox_lengths[i] bbox = bboxes[bbox_idx:bbox_idx + bbox_num] score = scores[bbox_idx:bbox_idx + bbox_num] @@ -282,6 +306,15 @@ class VOCMetric(Metric): bbox_idx += bbox_num def accumulate(self): + output = "bbox.json" + if self.output_eval: + output = os.path.join(self.output_eval, output) + with open(output, 'w') as f: + json.dump(self.results, f) + logger.info('The bbox result is saved to bbox.json.') + if self.save_prediction_only: + return + logger.info("Accumulating evaluatation results...") self.detection_map.accumulate() @@ -427,11 +460,13 @@ class SNIPERCOCOMetric(COCOMetric): self.chip_results.append(outs) - def accumulate(self): - results = self.dataset.anno_cropper.aggregate_chips_detections(self.chip_results) + results = self.dataset.anno_cropper.aggregate_chips_detections( + self.chip_results) for outs in results: - infer_results = get_infer_results(outs, self.clsid2catid, bias=self.bias) - self.results['bbox'] += infer_results['bbox'] if 'bbox' in infer_results else [] + infer_results = get_infer_results( + outs, self.clsid2catid, bias=self.bias) + self.results['bbox'] += infer_results[ + 'bbox'] if 'bbox' in infer_results else [] super(SNIPERCOCOMetric, self).accumulate() diff --git a/ppdet/metrics/mot_metrics.py b/ppdet/metrics/mot_metrics.py index 85cba3630cd428478175ddc6347db4152d47a353..b5ed8a2d4a8f60d37297a94265733970212e24d0 100644 --- a/ppdet/metrics/mot_metrics.py +++ b/ppdet/metrics/mot_metrics.py @@ -22,13 +22,21 @@ import sys import math from collections import defaultdict import numpy as np -import paddle -import paddle.nn.functional as F + from ppdet.modeling.bbox_utils import bbox_iou_np_expand from .map_utils import ap_per_class from .metrics import Metric from .munkres import Munkres +try: + import motmetrics as mm + mm.lap.default_solver = 'lap' +except: + print( + 'Warning: Unable to use MOT metric, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) + pass + from ppdet.utils.logger import setup_logger logger = setup_logger(__name__) @@ -36,8 +44,13 @@ __all__ = ['MOTEvaluator', 'MOTMetric', 'JDEDetMetric', 'KITTIMOTMetric'] def read_mot_results(filename, is_gt=False, is_ignore=False): - valid_labels = {1} - ignore_labels = {2, 7, 8, 12} # only in motchallenge datasets like 'MOT16' + valid_label = [1] + ignore_labels = [2, 7, 8, 12] # only in motchallenge datasets like 'MOT16' + if is_gt: + logger.info( + "In MOT16/17 dataset the valid_label of ground truth is '{}', " + "in other dataset it should be '0' for single classs MOT.".format( + valid_label[0])) results_dict = dict() if os.path.isfile(filename): with open(filename, 'r') as f: @@ -50,12 +63,10 @@ def read_mot_results(filename, is_gt=False, is_ignore=False): continue results_dict.setdefault(fid, list()) - box_size = float(linelist[4]) * float(linelist[5]) - if is_gt: label = int(float(linelist[7])) mark = int(float(linelist[6])) - if mark == 0 or label not in valid_labels: + if mark == 0 or label not in valid_label: continue score = 1 elif is_ignore: @@ -112,24 +123,31 @@ class MOTEvaluator(object): self.data_type = data_type self.load_annotations() + try: + import motmetrics as mm + mm.lap.default_solver = 'lap' + except Exception as e: + raise RuntimeError( + 'Unable to use MOT metric, please install motmetrics, for example: `pip install motmetrics`, see https://github.com/longcw/py-motmetrics' + ) self.reset_accumulator() def load_annotations(self): assert self.data_type == 'mot' gt_filename = os.path.join(self.data_root, self.seq_name, 'gt', 'gt.txt') + if not os.path.exists(gt_filename): + logger.warning( + "gt_filename '{}' of MOTEvaluator is not exist, so the MOTA will be -INF." + ) self.gt_frame_dict = read_mot_results(gt_filename, is_gt=True) self.gt_ignore_frame_dict = read_mot_results( gt_filename, is_ignore=True) def reset_accumulator(self): - import motmetrics as mm - mm.lap.default_solver = 'lap' self.acc = mm.MOTAccumulator(auto_id=True) def eval_frame(self, frame_id, trk_tlwhs, trk_ids, rtn_events=False): - import motmetrics as mm - mm.lap.default_solver = 'lap' # results trk_tlwhs = np.copy(trk_tlwhs) trk_ids = np.copy(trk_ids) @@ -187,8 +205,6 @@ class MOTEvaluator(object): names, metrics=('mota', 'num_switches', 'idp', 'idr', 'idf1', 'precision', 'recall')): - import motmetrics as mm - mm.lap.default_solver = 'lap' names = copy.deepcopy(names) if metrics is None: metrics = mm.metrics.motchallenge_metrics @@ -225,8 +241,6 @@ class MOTMetric(Metric): self.result_root = result_root def accumulate(self): - import motmetrics as mm - import openpyxl metrics = mm.metrics.motchallenge_metrics mh = mm.metrics.create() summary = self.MOTEvaluator.get_summary(self.accs, self.seqs, metrics) @@ -551,7 +565,7 @@ class KITTIEvaluation(object): "track ids are not unique for sequence %d: frame %d" % (seq, t_data.frame)) logger.info( - "track id %d occured at least twice for this frame" + "track id %d occurred at least twice for this frame" % t_data.track_id) logger.info("Exiting...") #continue # this allows to evaluate non-unique result files diff --git a/ppdet/modeling/__init__.py b/ppdet/modeling/__init__.py index 88a9a3570477f55e8f7fbfeae4fd84271a76256d..cdcb5d1bf08d813257dc577366de2efa9da9add7 100644 --- a/ppdet/modeling/__init__.py +++ b/ppdet/modeling/__init__.py @@ -29,7 +29,6 @@ from . import reid from . import mot from . import transformers from . import assigners -from . import coders from .ops import * from .backbones import * @@ -44,4 +43,3 @@ from .reid import * from .mot import * from .transformers import * from .assigners import * -from .coders import * diff --git a/ppdet/modeling/architectures/__init__.py b/ppdet/modeling/architectures/__init__.py index 71c53067f72f96c1d24da19aa4313449e91f4b95..9360a5b7b15596302ae54fc7b375e83718820da0 100644 --- a/ppdet/modeling/architectures/__init__.py +++ b/ppdet/modeling/architectures/__init__.py @@ -5,6 +5,13 @@ # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + from . import meta_arch from . import faster_rcnn from . import mask_rcnn @@ -28,6 +35,7 @@ from . import sparse_rcnn from . import tood from . import retinanet from . import bytetrack +from . import yolox from .meta_arch import * from .faster_rcnn import * @@ -53,3 +61,4 @@ from .sparse_rcnn import * from .tood import * from .retinanet import * from .bytetrack import * +from .yolox import * diff --git a/ppdet/modeling/architectures/cascade_rcnn.py b/ppdet/modeling/architectures/cascade_rcnn.py index 4b5caa7a3ad16f535c007ffa0888b44c8958478b..fc5949af0ac4efaea3ea28bbb416859881461f30 100644 --- a/ppdet/modeling/architectures/cascade_rcnn.py +++ b/ppdet/modeling/architectures/cascade_rcnn.py @@ -111,8 +111,8 @@ class CascadeRCNN(BaseArch): bbox, bbox_num = self.bbox_post_process( preds, (refined_rois, rois_num), im_shape, scale_factor) # rescale the prediction back to origin image - bbox_pred = self.bbox_post_process.get_pred(bbox, bbox_num, - im_shape, scale_factor) + bbox, bbox_pred, bbox_num = self.bbox_post_process.get_pred( + bbox, bbox_num, im_shape, scale_factor) if not self.with_mask: return bbox_pred, bbox_num, None mask_out = self.mask_head(body_feats, bbox, bbox_num, self.inputs) diff --git a/ppdet/modeling/architectures/faster_rcnn.py b/ppdet/modeling/architectures/faster_rcnn.py index 26a2672d60f49aa989c7945b65ce3ecd9beec182..ce9a8e4b57d2dfe54fde037fed2dc0156cb71b51 100644 --- a/ppdet/modeling/architectures/faster_rcnn.py +++ b/ppdet/modeling/architectures/faster_rcnn.py @@ -87,8 +87,8 @@ class FasterRCNN(BaseArch): im_shape, scale_factor) # rescale the prediction back to origin image - bbox_pred = self.bbox_post_process.get_pred(bbox, bbox_num, - im_shape, scale_factor) + bboxes, bbox_pred, bbox_num = self.bbox_post_process.get_pred( + bbox, bbox_num, im_shape, scale_factor) return bbox_pred, bbox_num def get_loss(self, ): diff --git a/ppdet/modeling/architectures/keypoint_hrhrnet.py b/ppdet/modeling/architectures/keypoint_hrhrnet.py index 6f62b4b218dfd105e48675eedc71dd9a65d7a62a..366e9e3eed466f5e52503e94e6dea2afbf556c7e 100644 --- a/ppdet/modeling/architectures/keypoint_hrhrnet.py +++ b/ppdet/modeling/architectures/keypoint_hrhrnet.py @@ -153,7 +153,7 @@ class HrHRNetPostProcess(object): heat_thresh (float): value of topk below this threshhold will be ignored tag_thresh (float): coord's value sampled in tagmap below this threshold belong to same people for init - inputs(list[heatmap]): the output list of modle, [heatmap, heatmap_maxpool, tagmap], heatmap_maxpool used to get topk + inputs(list[heatmap]): the output list of model, [heatmap, heatmap_maxpool, tagmap], heatmap_maxpool used to get topk original_height, original_width (float): the original image size ''' diff --git a/ppdet/modeling/architectures/mask_rcnn.py b/ppdet/modeling/architectures/mask_rcnn.py index 43b8bff94aaf6f496d978fe755b55ba79f7786b2..a322f9f8e7b41d47d90b03b594fcdb47665c2c45 100644 --- a/ppdet/modeling/architectures/mask_rcnn.py +++ b/ppdet/modeling/architectures/mask_rcnn.py @@ -112,8 +112,8 @@ class MaskRCNN(BaseArch): body_feats, bbox, bbox_num, self.inputs, feat_func=feat_func) # rescale the prediction back to origin image - bbox_pred = self.bbox_post_process.get_pred(bbox, bbox_num, - im_shape, scale_factor) + bbox, bbox_pred, bbox_num = self.bbox_post_process.get_pred( + bbox, bbox_num, im_shape, scale_factor) origin_shape = self.bbox_post_process.get_origin_shape() mask_pred = self.mask_post_process(mask_out, bbox_pred, bbox_num, origin_shape) diff --git a/ppdet/modeling/architectures/meta_arch.py b/ppdet/modeling/architectures/meta_arch.py index 1f13c854072956395e8bb9bbb5b9ad9d43d2eeec..4ff84a97a61739e06f215f56a64daf0459e4a971 100644 --- a/ppdet/modeling/architectures/meta_arch.py +++ b/ppdet/modeling/architectures/meta_arch.py @@ -22,22 +22,23 @@ class BaseArch(nn.Layer): self.fuse_norm = False def load_meanstd(self, cfg_transform): - self.scale = 1. - self.mean = paddle.to_tensor([0.485, 0.456, 0.406]).reshape( - (1, 3, 1, 1)) - self.std = paddle.to_tensor([0.229, 0.224, 0.225]).reshape((1, 3, 1, 1)) + scale = 1. + mean = np.array([0.485, 0.456, 0.406], dtype=np.float32) + std = np.array([0.229, 0.224, 0.225], dtype=np.float32) for item in cfg_transform: if 'NormalizeImage' in item: - self.mean = paddle.to_tensor(item['NormalizeImage'][ - 'mean']).reshape((1, 3, 1, 1)) - self.std = paddle.to_tensor(item['NormalizeImage'][ - 'std']).reshape((1, 3, 1, 1)) + mean = np.array( + item['NormalizeImage']['mean'], dtype=np.float32) + std = np.array(item['NormalizeImage']['std'], dtype=np.float32) if item['NormalizeImage'].get('is_scale', True): - self.scale = 1. / 255. + scale = 1. / 255. break if self.data_format == 'NHWC': - self.mean = self.mean.reshape(1, 1, 1, 3) - self.std = self.std.reshape(1, 1, 1, 3) + self.scale = paddle.to_tensor(scale / std).reshape((1, 1, 1, 3)) + self.bias = paddle.to_tensor(-mean / std).reshape((1, 1, 1, 3)) + else: + self.scale = paddle.to_tensor(scale / std).reshape((1, 3, 1, 1)) + self.bias = paddle.to_tensor(-mean / std).reshape((1, 3, 1, 1)) def forward(self, inputs): if self.data_format == 'NHWC': @@ -46,7 +47,7 @@ class BaseArch(nn.Layer): if self.fuse_norm: image = inputs['image'] - self.inputs['image'] = (image * self.scale - self.mean) / self.std + self.inputs['image'] = image * self.scale + self.bias self.inputs['im_shape'] = inputs['im_shape'] self.inputs['scale_factor'] = inputs['scale_factor'] else: @@ -66,8 +67,7 @@ class BaseArch(nn.Layer): outs = [] for inp in inputs_list: if self.fuse_norm: - self.inputs['image'] = ( - inp['image'] * self.scale - self.mean) / self.std + self.inputs['image'] = inp['image'] * self.scale + self.bias self.inputs['im_shape'] = inp['im_shape'] self.inputs['scale_factor'] = inp['scale_factor'] else: @@ -75,7 +75,7 @@ class BaseArch(nn.Layer): outs.append(self.get_pred()) # multi-scale test - if len(outs)>1: + if len(outs) > 1: out = self.merge_multi_scale_predictions(outs) else: out = outs[0] @@ -92,7 +92,9 @@ class BaseArch(nn.Layer): keep_top_k = self.bbox_post_process.nms.keep_top_k nms_threshold = self.bbox_post_process.nms.nms_threshold else: - raise Exception("Multi scale test only supports CascadeRCNN, FasterRCNN and MaskRCNN for now") + raise Exception( + "Multi scale test only supports CascadeRCNN, FasterRCNN and MaskRCNN for now" + ) final_boxes = [] all_scale_outs = paddle.concat([o['bbox'] for o in outs]).numpy() @@ -101,9 +103,11 @@ class BaseArch(nn.Layer): if np.count_nonzero(idxs) == 0: continue r = nms(all_scale_outs[idxs, 1:], nms_threshold) - final_boxes.append(np.concatenate([np.full((r.shape[0], 1), c), r], 1)) + final_boxes.append( + np.concatenate([np.full((r.shape[0], 1), c), r], 1)) out = np.concatenate(final_boxes) - out = np.concatenate(sorted(out, key=lambda e: e[1])[-keep_top_k:]).reshape((-1, 6)) + out = np.concatenate(sorted( + out, key=lambda e: e[1])[-keep_top_k:]).reshape((-1, 6)) out = { 'bbox': paddle.to_tensor(out), 'bbox_num': paddle.to_tensor(np.array([out.shape[0], ])) diff --git a/ppdet/modeling/architectures/picodet.py b/ppdet/modeling/architectures/picodet.py index 760b8347b76bb6b0acca0aba8727c81bc9397fa2..0b87a4baa429dae1c03286f09243ca3211b199df 100644 --- a/ppdet/modeling/architectures/picodet.py +++ b/ppdet/modeling/architectures/picodet.py @@ -67,10 +67,9 @@ class PicoDet(BaseArch): if self.training or not self.export_post_process: return head_outs, None else: - im_shape = self.inputs['im_shape'] scale_factor = self.inputs['scale_factor'] bboxes, bbox_num = self.head.post_process( - head_outs, im_shape, scale_factor, export_nms=self.export_nms) + head_outs, scale_factor, export_nms=self.export_nms) return bboxes, bbox_num def get_loss(self, ): diff --git a/ppdet/modeling/architectures/retinanet.py b/ppdet/modeling/architectures/retinanet.py index 5e9ce2de4b045abae60434cedb30976ba3398e9d..e774430a03dfebf74c1e91138ed57f2ee52f1c9d 100644 --- a/ppdet/modeling/architectures/retinanet.py +++ b/ppdet/modeling/architectures/retinanet.py @@ -22,14 +22,12 @@ import paddle __all__ = ['RetinaNet'] + @register class RetinaNet(BaseArch): __category__ = 'architecture' - def __init__(self, - backbone, - neck, - head): + def __init__(self, backbone, neck, head): super(RetinaNet, self).__init__() self.backbone = backbone self.neck = neck @@ -38,35 +36,33 @@ class RetinaNet(BaseArch): @classmethod def from_config(cls, cfg, *args, **kwargs): backbone = create(cfg['backbone']) + kwargs = {'input_shape': backbone.out_shape} neck = create(cfg['neck'], **kwargs) - head = create(cfg['head']) + + kwargs = {'input_shape': neck.out_shape} + head = create(cfg['head'], **kwargs) + return { 'backbone': backbone, 'neck': neck, - 'head': head} + 'head': head, + } def _forward(self): body_feats = self.backbone(self.inputs) neck_feats = self.neck(body_feats) - head_outs = self.head(neck_feats) - if not self.training: - im_shape = self.inputs['im_shape'] - scale_factor = self.inputs['scale_factor'] - bboxes, bbox_num = self.head.post_process(head_outs, im_shape, scale_factor) - return bboxes, bbox_num - return head_outs + + if self.training: + return self.head(neck_feats, self.inputs) + else: + head_outs = self.head(neck_feats) + bbox, bbox_num = self.head.post_process( + head_outs, self.inputs['im_shape'], self.inputs['scale_factor']) + return {'bbox': bbox, 'bbox_num': bbox_num} def get_loss(self): - loss = dict() - head_outs = self._forward() - loss_retina = self.head.get_loss(head_outs, self.inputs) - loss.update(loss_retina) - total_loss = paddle.add_n(list(loss.values())) - loss.update(loss=total_loss) - return loss + return self._forward() def get_pred(self): - bbox_pred, bbox_num = self._forward() - output = dict(bbox=bbox_pred, bbox_num=bbox_num) - return output + return self._forward() diff --git a/ppdet/modeling/architectures/yolo.py b/ppdet/modeling/architectures/yolo.py index 00deb32ee3d0ea92e3bd5b0fcce3e53ea78db407..ce5be21cd48272939fd0e2ea36c5db439cb02081 100644 --- a/ppdet/modeling/architectures/yolo.py +++ b/ppdet/modeling/architectures/yolo.py @@ -115,8 +115,7 @@ class YOLOv3(BaseArch): self.inputs['im_shape'], self.inputs['scale_factor']) else: bbox, bbox_num = self.yolo_head.post_process( - yolo_head_outs, self.inputs['im_shape'], - self.inputs['scale_factor']) + yolo_head_outs, self.inputs['scale_factor']) output = {'bbox': bbox, 'bbox_num': bbox_num} return output diff --git a/ppdet/modeling/architectures/yolox.py b/ppdet/modeling/architectures/yolox.py new file mode 100644 index 0000000000000000000000000000000000000000..8e02e9ef7ecce137013ec2e7707dc04e3afabb28 --- /dev/null +++ b/ppdet/modeling/architectures/yolox.py @@ -0,0 +1,138 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from ppdet.core.workspace import register, create +from .meta_arch import BaseArch + +import random +import paddle +import paddle.nn.functional as F +import paddle.distributed as dist + +__all__ = ['YOLOX'] + + +@register +class YOLOX(BaseArch): + """ + YOLOX network, see https://arxiv.org/abs/2107.08430 + + Args: + backbone (nn.Layer): backbone instance + neck (nn.Layer): neck instance + head (nn.Layer): head instance + for_mot (bool): whether used for MOT or not + input_size (list[int]): initial scale, will be reset by self._preprocess() + size_stride (int): stride of the size range + size_range (list[int]): multi-scale range for training + random_interval (int): interval of iter to change self._input_size + """ + __category__ = 'architecture' + + def __init__(self, + backbone='CSPDarkNet', + neck='YOLOCSPPAN', + head='YOLOXHead', + for_mot=False, + input_size=[640, 640], + size_stride=32, + size_range=[15, 25], + random_interval=10): + super(YOLOX, self).__init__() + self.backbone = backbone + self.neck = neck + self.head = head + self.for_mot = for_mot + + self.input_size = input_size + self._input_size = paddle.to_tensor(input_size) + self.size_stride = size_stride + self.size_range = size_range + self.random_interval = random_interval + self._step = 0 + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # backbone + backbone = create(cfg['backbone']) + + # fpn + kwargs = {'input_shape': backbone.out_shape} + neck = create(cfg['neck'], **kwargs) + + # head + kwargs = {'input_shape': neck.out_shape} + head = create(cfg['head'], **kwargs) + + return { + 'backbone': backbone, + 'neck': neck, + "head": head, + } + + def _forward(self): + if self.training: + self._preprocess() + body_feats = self.backbone(self.inputs) + neck_feats = self.neck(body_feats, self.for_mot) + + if self.training: + yolox_losses = self.head(neck_feats, self.inputs) + yolox_losses.update({'size': self._input_size[0]}) + return yolox_losses + else: + head_outs = self.head(neck_feats) + bbox, bbox_num = self.head.post_process( + head_outs, self.inputs['im_shape'], self.inputs['scale_factor']) + return {'bbox': bbox, 'bbox_num': bbox_num} + + def get_loss(self): + return self._forward() + + def get_pred(self): + return self._forward() + + def _preprocess(self): + # YOLOX multi-scale training, interpolate resize before inputs of the network. + self._get_size() + scale_y = self._input_size[0] / self.input_size[0] + scale_x = self._input_size[1] / self.input_size[1] + if scale_x != 1 or scale_y != 1: + self.inputs['image'] = F.interpolate( + self.inputs['image'], + size=self._input_size, + mode='bilinear', + align_corners=False) + gt_bboxes = self.inputs['gt_bbox'] + for i in range(len(gt_bboxes)): + if len(gt_bboxes[i]) > 0: + gt_bboxes[i][:, 0::2] = gt_bboxes[i][:, 0::2] * scale_x + gt_bboxes[i][:, 1::2] = gt_bboxes[i][:, 1::2] * scale_y + self.inputs['gt_bbox'] = gt_bboxes + + def _get_size(self): + # random_interval = 10 as default, every 10 iters to change self._input_size + image_ratio = self.input_size[1] * 1.0 / self.input_size[0] + if self._step % self.random_interval == 0: + size_factor = random.randint(*self.size_range) + size = [ + self.size_stride * size_factor, + self.size_stride * int(size_factor * image_ratio) + ] + self._input_size = paddle.to_tensor(size) + self._step += 1 diff --git a/ppdet/modeling/assigners/atss_assigner.py b/ppdet/modeling/assigners/atss_assigner.py index aba857e3d88145151e2246681c2ba673675efde1..6406d7bce5b796c125cd489886e9622a3f4ede97 100644 --- a/ppdet/modeling/assigners/atss_assigner.py +++ b/ppdet/modeling/assigners/atss_assigner.py @@ -22,8 +22,7 @@ import paddle.nn as nn import paddle.nn.functional as F from ppdet.core.workspace import register -from ..ops import iou_similarity -from ..bbox_utils import iou_similarity as batch_iou_similarity +from ..bbox_utils import iou_similarity, batch_iou_similarity from ..bbox_utils import bbox_center from .utils import (check_points_inside_bboxes, compute_max_iou_anchor, compute_max_iou_gt) @@ -51,7 +50,6 @@ class ATSSAssigner(nn.Layer): def _gather_topk_pyramid(self, gt2anchor_distances, num_anchors_list, pad_gt_mask): - pad_gt_mask = pad_gt_mask.tile([1, 1, self.topk]).astype(paddle.bool) gt2anchor_distances_list = paddle.split( gt2anchor_distances, num_anchors_list, axis=-1) num_anchors_index = np.cumsum(num_anchors_list).tolist() @@ -61,15 +59,12 @@ class ATSSAssigner(nn.Layer): for distances, anchors_index in zip(gt2anchor_distances_list, num_anchors_index): num_anchors = distances.shape[-1] - topk_metrics, topk_idxs = paddle.topk( + _, topk_idxs = paddle.topk( distances, self.topk, axis=-1, largest=False) topk_idxs_list.append(topk_idxs + anchors_index) - topk_idxs = paddle.where(pad_gt_mask, topk_idxs, - paddle.zeros_like(topk_idxs)) - is_in_topk = F.one_hot(topk_idxs, num_anchors).sum(axis=-2) - is_in_topk = paddle.where(is_in_topk > 1, - paddle.zeros_like(is_in_topk), is_in_topk) - is_in_topk_list.append(is_in_topk.astype(gt2anchor_distances.dtype)) + is_in_topk = F.one_hot(topk_idxs, num_anchors).sum( + axis=-2).astype(gt2anchor_distances.dtype) + is_in_topk_list.append(is_in_topk * pad_gt_mask) is_in_topk_list = paddle.concat(is_in_topk_list, axis=-1) topk_idxs_list = paddle.concat(topk_idxs_list, axis=-1) return is_in_topk_list, topk_idxs_list @@ -124,7 +119,8 @@ class ATSSAssigner(nn.Layer): # negative batch if num_max_boxes == 0: - assigned_labels = paddle.full([batch_size, num_anchors], bg_index) + assigned_labels = paddle.full( + [batch_size, num_anchors], bg_index, dtype=gt_labels.dtype) assigned_bboxes = paddle.zeros([batch_size, num_anchors, 4]) assigned_scores = paddle.zeros( [batch_size, num_anchors, self.num_classes]) @@ -154,9 +150,8 @@ class ATSSAssigner(nn.Layer): iou_threshold = iou_threshold.reshape([batch_size, num_max_boxes, -1]) iou_threshold = iou_threshold.mean(axis=-1, keepdim=True) + \ iou_threshold.std(axis=-1, keepdim=True) - is_in_topk = paddle.where( - iou_candidates > iou_threshold.tile([1, 1, num_anchors]), - is_in_topk, paddle.zeros_like(is_in_topk)) + is_in_topk = paddle.where(iou_candidates > iou_threshold, is_in_topk, + paddle.zeros_like(is_in_topk)) # 6. check the positive sample's center in gt, [B, n, L] is_in_gts = check_points_inside_bboxes(anchor_centers, gt_bboxes) @@ -199,7 +194,11 @@ class ATSSAssigner(nn.Layer): gt_bboxes.reshape([-1, 4]), assigned_gt_index.flatten(), axis=0) assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 4]) - assigned_scores = F.one_hot(assigned_labels, self.num_classes) + assigned_scores = F.one_hot(assigned_labels, self.num_classes + 1) + ind = list(range(self.num_classes + 1)) + ind.remove(bg_index) + assigned_scores = paddle.index_select( + assigned_scores, paddle.to_tensor(ind), axis=-1) if pred_bboxes is not None: # assigned iou ious = batch_iou_similarity(gt_bboxes, pred_bboxes) * mask_positive diff --git a/ppdet/modeling/assigners/task_aligned_assigner.py b/ppdet/modeling/assigners/task_aligned_assigner.py index b1f47e786df0261d3925d1b5bc776683657385c1..5b3368e06814b4f3793e3f38c15d985cf4e8e6bc 100644 --- a/ppdet/modeling/assigners/task_aligned_assigner.py +++ b/ppdet/modeling/assigners/task_aligned_assigner.py @@ -21,7 +21,7 @@ import paddle.nn as nn import paddle.nn.functional as F from ppdet.core.workspace import register -from ..bbox_utils import iou_similarity +from ..bbox_utils import batch_iou_similarity from .utils import (gather_topk_anchors, check_points_inside_bboxes, compute_max_iou_anchor) @@ -85,14 +85,15 @@ class TaskAlignedAssigner(nn.Layer): # negative batch if num_max_boxes == 0: - assigned_labels = paddle.full([batch_size, num_anchors], bg_index) + assigned_labels = paddle.full( + [batch_size, num_anchors], bg_index, dtype=gt_labels.dtype) assigned_bboxes = paddle.zeros([batch_size, num_anchors, 4]) assigned_scores = paddle.zeros( [batch_size, num_anchors, num_classes]) return assigned_labels, assigned_bboxes, assigned_scores # compute iou between gt and pred bbox, [B, n, L] - ious = iou_similarity(gt_bboxes, pred_bboxes) + ious = batch_iou_similarity(gt_bboxes, pred_bboxes) # gather pred bboxes class score pred_scores = pred_scores.transpose([0, 2, 1]) batch_ind = paddle.arange( @@ -111,9 +112,7 @@ class TaskAlignedAssigner(nn.Layer): # select topk largest alignment metrics pred bbox as candidates # for each gt, [B, n, L] is_in_topk = gather_topk_anchors( - alignment_metrics * is_in_gts, - self.topk, - topk_mask=pad_gt_mask.tile([1, 1, self.topk]).astype(paddle.bool)) + alignment_metrics * is_in_gts, self.topk, topk_mask=pad_gt_mask) # select positive sample, [B, n, L] mask_positive = is_in_topk * is_in_gts * pad_gt_mask @@ -143,7 +142,11 @@ class TaskAlignedAssigner(nn.Layer): gt_bboxes.reshape([-1, 4]), assigned_gt_index.flatten(), axis=0) assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 4]) - assigned_scores = F.one_hot(assigned_labels, num_classes) + assigned_scores = F.one_hot(assigned_labels, num_classes + 1) + ind = list(range(num_classes + 1)) + ind.remove(bg_index) + assigned_scores = paddle.index_select( + assigned_scores, paddle.to_tensor(ind), axis=-1) # rescale alignment metrics alignment_metrics *= mask_positive max_metrics_per_instance = alignment_metrics.max(axis=-1, keepdim=True) diff --git a/ppdet/modeling/assigners/utils.py b/ppdet/modeling/assigners/utils.py index 0c5e1d94fc06dd2abdf135638f9b17c8ba2ff8cf..0bc399315797b4be04954858fac5cccbbd73ee33 100644 --- a/ppdet/modeling/assigners/utils.py +++ b/ppdet/modeling/assigners/utils.py @@ -88,7 +88,7 @@ def gather_topk_anchors(metrics, topk, largest=True, topk_mask=None, eps=1e-9): largest (bool) : largest is a flag, if set to true, algorithm will sort by descending order, otherwise sort by ascending order. Default: True - topk_mask (Tensor, bool|None): shape[B, n, topk], ignore bbox mask, + topk_mask (Tensor, float32): shape[B, n, 1], ignore bbox mask, Default: None eps (float): Default: 1e-9 Returns: @@ -98,13 +98,11 @@ def gather_topk_anchors(metrics, topk, largest=True, topk_mask=None, eps=1e-9): topk_metrics, topk_idxs = paddle.topk( metrics, topk, axis=-1, largest=largest) if topk_mask is None: - topk_mask = (topk_metrics.max(axis=-1, keepdim=True) > eps).tile( - [1, 1, topk]) - topk_idxs = paddle.where(topk_mask, topk_idxs, paddle.zeros_like(topk_idxs)) - is_in_topk = F.one_hot(topk_idxs, num_anchors).sum(axis=-2) - is_in_topk = paddle.where(is_in_topk > 1, - paddle.zeros_like(is_in_topk), is_in_topk) - return is_in_topk.astype(metrics.dtype) + topk_mask = ( + topk_metrics.max(axis=-1, keepdim=True) > eps).astype(metrics.dtype) + is_in_topk = F.one_hot(topk_idxs, num_anchors).sum( + axis=-2).astype(metrics.dtype) + return is_in_topk * topk_mask def check_points_inside_bboxes(points, @@ -115,7 +113,7 @@ def check_points_inside_bboxes(points, Args: points (Tensor, float32): shape[L, 2], "xy" format, L: num_anchors bboxes (Tensor, float32): shape[B, n, 4], "xmin, ymin, xmax, ymax" format - center_radius_tensor (Tensor, float32): shape [L, 1] Default: None. + center_radius_tensor (Tensor, float32): shape [L, 1]. Default: None. eps (float): Default: 1e-9 Returns: is_in_bboxes (Tensor, float32): shape[B, n, L], value=1. means selected @@ -123,25 +121,28 @@ def check_points_inside_bboxes(points, points = points.unsqueeze([0, 1]) x, y = points.chunk(2, axis=-1) xmin, ymin, xmax, ymax = bboxes.unsqueeze(2).chunk(4, axis=-1) - if center_radius_tensor is not None: - center_radius_tensor = center_radius_tensor.unsqueeze([0, 1]) - bboxes_cx = (xmin + xmax) / 2. - bboxes_cy = (ymin + ymax) / 2. - xmin_sampling = bboxes_cx - center_radius_tensor - ymin_sampling = bboxes_cy - center_radius_tensor - xmax_sampling = bboxes_cx + center_radius_tensor - ymax_sampling = bboxes_cy + center_radius_tensor - - xmin = paddle.maximum(xmin, xmin_sampling) - ymin = paddle.maximum(ymin, ymin_sampling) - xmax = paddle.minimum(xmax, xmax_sampling) - ymax = paddle.minimum(ymax, ymax_sampling) + # check whether `points` is in `bboxes` l = x - xmin t = y - ymin r = xmax - x b = ymax - y - bbox_ltrb = paddle.concat([l, t, r, b], axis=-1) - return (bbox_ltrb.min(axis=-1) > eps).astype(bboxes.dtype) + delta_ltrb = paddle.concat([l, t, r, b], axis=-1) + is_in_bboxes = (delta_ltrb.min(axis=-1) > eps) + if center_radius_tensor is not None: + # check whether `points` is in `center_radius` + center_radius_tensor = center_radius_tensor.unsqueeze([0, 1]) + cx = (xmin + xmax) * 0.5 + cy = (ymin + ymax) * 0.5 + l = x - (cx - center_radius_tensor) + t = y - (cy - center_radius_tensor) + r = (cx + center_radius_tensor) - x + b = (cy + center_radius_tensor) - y + delta_ltrb_c = paddle.concat([l, t, r, b], axis=-1) + is_in_center = (delta_ltrb_c.min(axis=-1) > eps) + return (paddle.logical_and(is_in_bboxes, is_in_center), + paddle.logical_or(is_in_bboxes, is_in_center)) + + return is_in_bboxes.astype(bboxes.dtype) def compute_max_iou_anchor(ious): @@ -175,7 +176,8 @@ def compute_max_iou_gt(ious): def generate_anchors_for_grid_cell(feats, fpn_strides, grid_cell_size=5.0, - grid_cell_offset=0.5): + grid_cell_offset=0.5, + dtype='float32'): r""" Like ATSS, generate anchors based on grid size. Args: @@ -205,16 +207,15 @@ def generate_anchors_for_grid_cell(feats, shift_x - cell_half_size, shift_y - cell_half_size, shift_x + cell_half_size, shift_y + cell_half_size ], - axis=-1).astype(feat.dtype) - anchor_point = paddle.stack( - [shift_x, shift_y], axis=-1).astype(feat.dtype) + axis=-1).astype(dtype) + anchor_point = paddle.stack([shift_x, shift_y], axis=-1).astype(dtype) anchors.append(anchor.reshape([-1, 4])) anchor_points.append(anchor_point.reshape([-1, 2])) num_anchors_list.append(len(anchors[-1])) stride_tensor.append( paddle.full( - [num_anchors_list[-1], 1], stride, dtype=feat.dtype)) + [num_anchors_list[-1], 1], stride, dtype=dtype)) anchors = paddle.concat(anchors) anchors.stop_gradient = True anchor_points = paddle.concat(anchor_points) diff --git a/ppdet/modeling/backbones/__init__.py b/ppdet/modeling/backbones/__init__.py index bbc696dbacd048ce0ca752580ecfc54e11a1433f..12e9354b744dc97e2de584915a8827d137a3f7c2 100644 --- a/ppdet/modeling/backbones/__init__.py +++ b/ppdet/modeling/backbones/__init__.py @@ -1,15 +1,15 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and # limitations under the License. from . import vgg @@ -30,6 +30,10 @@ from . import lcnet from . import hardnet from . import esnet from . import cspresnet +from . import csp_darknet +from . import convnext +from . import vision_transformer +from . import mobileone from .vgg import * from .resnet import * @@ -49,3 +53,8 @@ from .lcnet import * from .hardnet import * from .esnet import * from .cspresnet import * +from .csp_darknet import * +from .convnext import * +from .vision_transformer import * +from .vision_transformer import * +from .mobileone import * diff --git a/ppdet/modeling/backbones/convnext.py b/ppdet/modeling/backbones/convnext.py new file mode 100644 index 0000000000000000000000000000000000000000..476e12b2da50585dd142f3049ba024769e691e8b --- /dev/null +++ b/ppdet/modeling/backbones/convnext.py @@ -0,0 +1,245 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +''' +Modified from https://github.com/facebookresearch/ConvNeXt +Copyright (c) Meta Platforms, Inc. and affiliates. +All rights reserved. +This source code is licensed under the license found in the +LICENSE file in the root directory of this source tree. +''' + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.nn.initializer import Constant + +import numpy as np + +from ppdet.core.workspace import register, serializable +from ..shape_spec import ShapeSpec +from .transformer_utils import DropPath, trunc_normal_, zeros_ + +__all__ = ['ConvNeXt'] + + +class Block(nn.Layer): + r""" ConvNeXt Block. There are two equivalent implementations: + (1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W) + (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back + We use (2) as we find it slightly faster in Pypaddle + + Args: + dim (int): Number of input channels. + drop_path (float): Stochastic depth rate. Default: 0.0 + layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6. + """ + + def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6): + super().__init__() + self.dwconv = nn.Conv2D( + dim, dim, kernel_size=7, padding=3, groups=dim) # depthwise conv + self.norm = LayerNorm(dim, eps=1e-6) + self.pwconv1 = nn.Linear( + dim, 4 * dim) # pointwise/1x1 convs, implemented with linear layers + self.act = nn.GELU() + self.pwconv2 = nn.Linear(4 * dim, dim) + + if layer_scale_init_value > 0: + self.gamma = self.create_parameter( + shape=(dim, ), + attr=ParamAttr(initializer=Constant(layer_scale_init_value))) + else: + self.gamma = None + + self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity( + ) + + def forward(self, x): + input = x + x = self.dwconv(x) + x = x.transpose([0, 2, 3, 1]) + x = self.norm(x) + x = self.pwconv1(x) + x = self.act(x) + x = self.pwconv2(x) + if self.gamma is not None: + x = self.gamma * x + x = x.transpose([0, 3, 1, 2]) + x = input + self.drop_path(x) + return x + + +class LayerNorm(nn.Layer): + r""" LayerNorm that supports two data formats: channels_last (default) or channels_first. + The ordering of the dimensions in the inputs. channels_last corresponds to inputs with + shape (batch_size, height, width, channels) while channels_first corresponds to inputs + with shape (batch_size, channels, height, width). + """ + + def __init__(self, normalized_shape, eps=1e-6, data_format="channels_last"): + super().__init__() + + self.weight = self.create_parameter( + shape=(normalized_shape, ), + attr=ParamAttr(initializer=Constant(1.))) + self.bias = self.create_parameter( + shape=(normalized_shape, ), + attr=ParamAttr(initializer=Constant(0.))) + + self.eps = eps + self.data_format = data_format + if self.data_format not in ["channels_last", "channels_first"]: + raise NotImplementedError + self.normalized_shape = (normalized_shape, ) + + def forward(self, x): + if self.data_format == "channels_last": + return F.layer_norm(x, self.normalized_shape, self.weight, + self.bias, self.eps) + elif self.data_format == "channels_first": + u = x.mean(1, keepdim=True) + s = (x - u).pow(2).mean(1, keepdim=True) + x = (x - u) / paddle.sqrt(s + self.eps) + x = self.weight[:, None, None] * x + self.bias[:, None, None] + return x + + +@register +@serializable +class ConvNeXt(nn.Layer): + r""" ConvNeXt + A Pypaddle impl of : `A ConvNet for the 2020s` - + https://arxiv.org/pdf/2201.03545.pdf + + Args: + in_chans (int): Number of input image channels. Default: 3 + depths (tuple(int)): Number of blocks at each stage. Default: [3, 3, 9, 3] + dims (int): Feature dimension at each stage. Default: [96, 192, 384, 768] + drop_path_rate (float): Stochastic depth rate. Default: 0. + layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6. + """ + + arch_settings = { + 'tiny': { + 'depths': [3, 3, 9, 3], + 'dims': [96, 192, 384, 768] + }, + 'small': { + 'depths': [3, 3, 27, 3], + 'dims': [96, 192, 384, 768] + }, + 'base': { + 'depths': [3, 3, 27, 3], + 'dims': [128, 256, 512, 1024] + }, + 'large': { + 'depths': [3, 3, 27, 3], + 'dims': [192, 384, 768, 1536] + }, + 'xlarge': { + 'depths': [3, 3, 27, 3], + 'dims': [256, 512, 1024, 2048] + }, + } + + def __init__( + self, + arch='tiny', + in_chans=3, + drop_path_rate=0., + layer_scale_init_value=1e-6, + return_idx=[1, 2, 3], + norm_output=True, + pretrained=None, ): + super().__init__() + depths = self.arch_settings[arch]['depths'] + dims = self.arch_settings[arch]['dims'] + self.downsample_layers = nn.LayerList( + ) # stem and 3 intermediate downsampling conv layers + stem = nn.Sequential( + nn.Conv2D( + in_chans, dims[0], kernel_size=4, stride=4), + LayerNorm( + dims[0], eps=1e-6, data_format="channels_first")) + self.downsample_layers.append(stem) + for i in range(3): + downsample_layer = nn.Sequential( + LayerNorm( + dims[i], eps=1e-6, data_format="channels_first"), + nn.Conv2D( + dims[i], dims[i + 1], kernel_size=2, stride=2), ) + self.downsample_layers.append(downsample_layer) + + self.stages = nn.LayerList( + ) # 4 feature resolution stages, each consisting of multiple residual blocks + dp_rates = [x for x in np.linspace(0, drop_path_rate, sum(depths))] + cur = 0 + for i in range(4): + stage = nn.Sequential(* [ + Block( + dim=dims[i], + drop_path=dp_rates[cur + j], + layer_scale_init_value=layer_scale_init_value) + for j in range(depths[i]) + ]) + self.stages.append(stage) + cur += depths[i] + + self.return_idx = return_idx + self.dims = [dims[i] for i in return_idx] # [::-1] + + self.norm_output = norm_output + if norm_output: + self.norms = nn.LayerList([ + LayerNorm( + c, eps=1e-6, data_format="channels_first") + for c in self.dims + ]) + + self.apply(self._init_weights) + + if pretrained is not None: + if 'http' in pretrained: #URL + path = paddle.utils.download.get_weights_path_from_url( + pretrained) + else: #model in local path + path = pretrained + self.set_state_dict(paddle.load(path)) + + def _init_weights(self, m): + if isinstance(m, (nn.Conv2D, nn.Linear)): + trunc_normal_(m.weight) + zeros_(m.bias) + + def forward_features(self, x): + output = [] + for i in range(4): + x = self.downsample_layers[i](x) + x = self.stages[i](x) + output.append(x) + + outputs = [output[i] for i in self.return_idx] + if self.norm_output: + outputs = [self.norms[i](out) for i, out in enumerate(outputs)] + + return outputs + + def forward(self, x): + x = self.forward_features(x['image']) + return x + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self.dims] diff --git a/ppdet/modeling/backbones/csp_darknet.py b/ppdet/modeling/backbones/csp_darknet.py new file mode 100644 index 0000000000000000000000000000000000000000..4c225d15c8b560385078b19dd3dfafd272858bd4 --- /dev/null +++ b/ppdet/modeling/backbones/csp_darknet.py @@ -0,0 +1,404 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from ppdet.core.workspace import register, serializable +from ppdet.modeling.initializer import conv_init_ +from ..shape_spec import ShapeSpec + +__all__ = [ + 'CSPDarkNet', 'BaseConv', 'DWConv', 'BottleNeck', 'SPPLayer', 'SPPFLayer' +] + + +class BaseConv(nn.Layer): + def __init__(self, + in_channels, + out_channels, + ksize, + stride, + groups=1, + bias=False, + act="silu"): + super(BaseConv, self).__init__() + self.conv = nn.Conv2D( + in_channels, + out_channels, + kernel_size=ksize, + stride=stride, + padding=(ksize - 1) // 2, + groups=groups, + bias_attr=bias) + self.bn = nn.BatchNorm2D( + out_channels, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + + self._init_weights() + + def _init_weights(self): + conv_init_(self.conv) + + def forward(self, x): + # use 'x * F.sigmoid(x)' replace 'silu' + x = self.bn(self.conv(x)) + y = x * F.sigmoid(x) + return y + + +class DWConv(nn.Layer): + """Depthwise Conv""" + + def __init__(self, + in_channels, + out_channels, + ksize, + stride=1, + bias=False, + act="silu"): + super(DWConv, self).__init__() + self.dw_conv = BaseConv( + in_channels, + in_channels, + ksize=ksize, + stride=stride, + groups=in_channels, + bias=bias, + act=act) + self.pw_conv = BaseConv( + in_channels, + out_channels, + ksize=1, + stride=1, + groups=1, + bias=bias, + act=act) + + def forward(self, x): + return self.pw_conv(self.dw_conv(x)) + + +class Focus(nn.Layer): + """Focus width and height information into channel space, used in YOLOX.""" + + def __init__(self, + in_channels, + out_channels, + ksize=3, + stride=1, + bias=False, + act="silu"): + super(Focus, self).__init__() + self.conv = BaseConv( + in_channels * 4, + out_channels, + ksize=ksize, + stride=stride, + bias=bias, + act=act) + + def forward(self, inputs): + # inputs [bs, C, H, W] -> outputs [bs, 4C, W/2, H/2] + top_left = inputs[:, :, 0::2, 0::2] + top_right = inputs[:, :, 0::2, 1::2] + bottom_left = inputs[:, :, 1::2, 0::2] + bottom_right = inputs[:, :, 1::2, 1::2] + outputs = paddle.concat( + [top_left, bottom_left, top_right, bottom_right], 1) + return self.conv(outputs) + + +class BottleNeck(nn.Layer): + def __init__(self, + in_channels, + out_channels, + shortcut=True, + expansion=0.5, + depthwise=False, + bias=False, + act="silu"): + super(BottleNeck, self).__init__() + hidden_channels = int(out_channels * expansion) + Conv = DWConv if depthwise else BaseConv + self.conv1 = BaseConv( + in_channels, hidden_channels, ksize=1, stride=1, bias=bias, act=act) + self.conv2 = Conv( + hidden_channels, + out_channels, + ksize=3, + stride=1, + bias=bias, + act=act) + self.add_shortcut = shortcut and in_channels == out_channels + + def forward(self, x): + y = self.conv2(self.conv1(x)) + if self.add_shortcut: + y = y + x + return y + + +class SPPLayer(nn.Layer): + """Spatial Pyramid Pooling (SPP) layer used in YOLOv3-SPP and YOLOX""" + + def __init__(self, + in_channels, + out_channels, + kernel_sizes=(5, 9, 13), + bias=False, + act="silu"): + super(SPPLayer, self).__init__() + hidden_channels = in_channels // 2 + self.conv1 = BaseConv( + in_channels, hidden_channels, ksize=1, stride=1, bias=bias, act=act) + self.maxpoolings = nn.LayerList([ + nn.MaxPool2D( + kernel_size=ks, stride=1, padding=ks // 2) + for ks in kernel_sizes + ]) + conv2_channels = hidden_channels * (len(kernel_sizes) + 1) + self.conv2 = BaseConv( + conv2_channels, out_channels, ksize=1, stride=1, bias=bias, act=act) + + def forward(self, x): + x = self.conv1(x) + x = paddle.concat([x] + [mp(x) for mp in self.maxpoolings], axis=1) + x = self.conv2(x) + return x + + +class SPPFLayer(nn.Layer): + """ Spatial Pyramid Pooling - Fast (SPPF) layer used in YOLOv5 by Glenn Jocher, + equivalent to SPP(k=(5, 9, 13)) + """ + + def __init__(self, + in_channels, + out_channels, + ksize=5, + bias=False, + act='silu'): + super(SPPFLayer, self).__init__() + hidden_channels = in_channels // 2 + self.conv1 = BaseConv( + in_channels, hidden_channels, ksize=1, stride=1, bias=bias, act=act) + self.maxpooling = nn.MaxPool2D( + kernel_size=ksize, stride=1, padding=ksize // 2) + conv2_channels = hidden_channels * 4 + self.conv2 = BaseConv( + conv2_channels, out_channels, ksize=1, stride=1, bias=bias, act=act) + + def forward(self, x): + x = self.conv1(x) + y1 = self.maxpooling(x) + y2 = self.maxpooling(y1) + y3 = self.maxpooling(y2) + concats = paddle.concat([x, y1, y2, y3], axis=1) + out = self.conv2(concats) + return out + + +class CSPLayer(nn.Layer): + """CSP (Cross Stage Partial) layer with 3 convs, named C3 in YOLOv5""" + + def __init__(self, + in_channels, + out_channels, + num_blocks=1, + shortcut=True, + expansion=0.5, + depthwise=False, + bias=False, + act="silu"): + super(CSPLayer, self).__init__() + hidden_channels = int(out_channels * expansion) + self.conv1 = BaseConv( + in_channels, hidden_channels, ksize=1, stride=1, bias=bias, act=act) + self.conv2 = BaseConv( + in_channels, hidden_channels, ksize=1, stride=1, bias=bias, act=act) + self.bottlenecks = nn.Sequential(* [ + BottleNeck( + hidden_channels, + hidden_channels, + shortcut=shortcut, + expansion=1.0, + depthwise=depthwise, + bias=bias, + act=act) for _ in range(num_blocks) + ]) + self.conv3 = BaseConv( + hidden_channels * 2, + out_channels, + ksize=1, + stride=1, + bias=bias, + act=act) + + def forward(self, x): + x_1 = self.conv1(x) + x_1 = self.bottlenecks(x_1) + x_2 = self.conv2(x) + x = paddle.concat([x_1, x_2], axis=1) + x = self.conv3(x) + return x + + +@register +@serializable +class CSPDarkNet(nn.Layer): + """ + CSPDarkNet backbone. + Args: + arch (str): Architecture of CSPDarkNet, from {P5, P6, X}, default as X, + and 'X' means used in YOLOX, 'P5/P6' means used in YOLOv5. + depth_mult (float): Depth multiplier, multiply number of channels in + each layer, default as 1.0. + width_mult (float): Width multiplier, multiply number of blocks in + CSPLayer, default as 1.0. + depthwise (bool): Whether to use depth-wise conv layer. + act (str): Activation function type, default as 'silu'. + return_idx (list): Index of stages whose feature maps are returned. + """ + + __shared__ = ['depth_mult', 'width_mult', 'act', 'trt'] + + # in_channels, out_channels, num_blocks, add_shortcut, use_spp(use_sppf) + # 'X' means setting used in YOLOX, 'P5/P6' means setting used in YOLOv5. + arch_settings = { + 'X': [[64, 128, 3, True, False], [128, 256, 9, True, False], + [256, 512, 9, True, False], [512, 1024, 3, False, True]], + 'P5': [[64, 128, 3, True, False], [128, 256, 6, True, False], + [256, 512, 9, True, False], [512, 1024, 3, True, True]], + 'P6': [[64, 128, 3, True, False], [128, 256, 6, True, False], + [256, 512, 9, True, False], [512, 768, 3, True, False], + [768, 1024, 3, True, True]], + } + + def __init__(self, + arch='X', + depth_mult=1.0, + width_mult=1.0, + depthwise=False, + act='silu', + trt=False, + return_idx=[2, 3, 4]): + super(CSPDarkNet, self).__init__() + self.arch = arch + self.return_idx = return_idx + Conv = DWConv if depthwise else BaseConv + arch_setting = self.arch_settings[arch] + base_channels = int(arch_setting[0][0] * width_mult) + + # Note: differences between the latest YOLOv5 and the original YOLOX + # 1. self.stem, use SPPF(in YOLOv5) or SPP(in YOLOX) + # 2. use SPPF(in YOLOv5) or SPP(in YOLOX) + # 3. put SPPF before(YOLOv5) or SPP after(YOLOX) the last cspdark block's CSPLayer + # 4. whether SPPF(SPP)'CSPLayer add shortcut, True in YOLOv5, False in YOLOX + if arch in ['P5', 'P6']: + # in the latest YOLOv5, use Conv stem, and SPPF (fast, only single spp kernal size) + self.stem = Conv( + 3, base_channels, ksize=6, stride=2, bias=False, act=act) + spp_kernal_sizes = 5 + elif arch in ['X']: + # in the original YOLOX, use Focus stem, and SPP (three spp kernal sizes) + self.stem = Focus( + 3, base_channels, ksize=3, stride=1, bias=False, act=act) + spp_kernal_sizes = (5, 9, 13) + else: + raise AttributeError("Unsupported arch type: {}".format(arch)) + + _out_channels = [base_channels] + layers_num = 1 + self.csp_dark_blocks = [] + + for i, (in_channels, out_channels, num_blocks, shortcut, + use_spp) in enumerate(arch_setting): + in_channels = int(in_channels * width_mult) + out_channels = int(out_channels * width_mult) + _out_channels.append(out_channels) + num_blocks = max(round(num_blocks * depth_mult), 1) + stage = [] + + conv_layer = self.add_sublayer( + 'layers{}.stage{}.conv_layer'.format(layers_num, i + 1), + Conv( + in_channels, out_channels, 3, 2, bias=False, act=act)) + stage.append(conv_layer) + layers_num += 1 + + if use_spp and arch in ['X']: + # in YOLOX use SPPLayer + spp_layer = self.add_sublayer( + 'layers{}.stage{}.spp_layer'.format(layers_num, i + 1), + SPPLayer( + out_channels, + out_channels, + kernel_sizes=spp_kernal_sizes, + bias=False, + act=act)) + stage.append(spp_layer) + layers_num += 1 + + csp_layer = self.add_sublayer( + 'layers{}.stage{}.csp_layer'.format(layers_num, i + 1), + CSPLayer( + out_channels, + out_channels, + num_blocks=num_blocks, + shortcut=shortcut, + depthwise=depthwise, + bias=False, + act=act)) + stage.append(csp_layer) + layers_num += 1 + + if use_spp and arch in ['P5', 'P6']: + # in latest YOLOv5 use SPPFLayer instead of SPPLayer + sppf_layer = self.add_sublayer( + 'layers{}.stage{}.sppf_layer'.format(layers_num, i + 1), + SPPFLayer( + out_channels, + out_channels, + ksize=5, + bias=False, + act=act)) + stage.append(sppf_layer) + layers_num += 1 + + self.csp_dark_blocks.append(nn.Sequential(*stage)) + + self._out_channels = [_out_channels[i] for i in self.return_idx] + self.strides = [[2, 4, 8, 16, 32, 64][i] for i in self.return_idx] + + def forward(self, inputs): + x = inputs['image'] + outputs = [] + x = self.stem(x) + for i, layer in enumerate(self.csp_dark_blocks): + x = layer(x) + if i + 1 in self.return_idx: + outputs.append(x) + return outputs + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=c, stride=s) + for c, s in zip(self._out_channels, self.strides) + ] diff --git a/ppdet/modeling/backbones/cspresnet.py b/ppdet/modeling/backbones/cspresnet.py index 4e0916320f4d25c84dbc8fac0d0e46b1d6c6f942..5268ec835381052988b9ceaca47c89ab2755bec9 100644 --- a/ppdet/modeling/backbones/cspresnet.py +++ b/ppdet/modeling/backbones/cspresnet.py @@ -21,6 +21,7 @@ import paddle.nn as nn import paddle.nn.functional as F from paddle import ParamAttr from paddle.regularizer import L2Decay +from paddle.nn.initializer import Constant from ppdet.modeling.ops import get_act_fn from ppdet.core.workspace import register, serializable @@ -65,7 +66,7 @@ class ConvBNLayer(nn.Layer): class RepVggBlock(nn.Layer): - def __init__(self, ch_in, ch_out, act='relu'): + def __init__(self, ch_in, ch_out, act='relu', alpha=False): super(RepVggBlock, self).__init__() self.ch_in = ch_in self.ch_out = ch_out @@ -75,12 +76,22 @@ class RepVggBlock(nn.Layer): ch_in, ch_out, 1, stride=1, padding=0, act=None) self.act = get_act_fn(act) if act is None or isinstance(act, ( str, dict)) else act + if alpha: + self.alpha = self.create_parameter( + shape=[1], + attr=ParamAttr(initializer=Constant(value=1.)), + dtype="float32") + else: + self.alpha = None def forward(self, x): if hasattr(self, 'conv'): y = self.conv(x) else: - y = self.conv1(x) + self.conv2(x) + if self.alpha: + y = self.conv1(x) + self.alpha * self.conv2(x) + else: + y = self.conv1(x) + self.conv2(x) y = self.act(y) return y @@ -96,12 +107,18 @@ class RepVggBlock(nn.Layer): kernel, bias = self.get_equivalent_kernel_bias() self.conv.weight.set_value(kernel) self.conv.bias.set_value(bias) + self.__delattr__('conv1') + self.__delattr__('conv2') def get_equivalent_kernel_bias(self): kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1) kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2) - return kernel3x3 + self._pad_1x1_to_3x3_tensor( - kernel1x1), bias3x3 + bias1x1 + if self.alpha: + return kernel3x3 + self.alpha * self._pad_1x1_to_3x3_tensor( + kernel1x1), bias3x3 + self.alpha * bias1x1 + else: + return kernel3x3 + self._pad_1x1_to_3x3_tensor( + kernel1x1), bias3x3 + bias1x1 def _pad_1x1_to_3x3_tensor(self, kernel1x1): if kernel1x1 is None: @@ -124,11 +141,16 @@ class RepVggBlock(nn.Layer): class BasicBlock(nn.Layer): - def __init__(self, ch_in, ch_out, act='relu', shortcut=True): + def __init__(self, + ch_in, + ch_out, + act='relu', + shortcut=True, + use_alpha=False): super(BasicBlock, self).__init__() assert ch_in == ch_out self.conv1 = ConvBNLayer(ch_in, ch_out, 3, stride=1, padding=1, act=act) - self.conv2 = RepVggBlock(ch_out, ch_out, act=act) + self.conv2 = RepVggBlock(ch_out, ch_out, act=act, alpha=use_alpha) self.shortcut = shortcut def forward(self, x): @@ -165,7 +187,8 @@ class CSPResStage(nn.Layer): n, stride, act='relu', - attn='eca'): + attn='eca', + use_alpha=False): super(CSPResStage, self).__init__() ch_mid = (ch_in + ch_out) // 2 @@ -176,10 +199,13 @@ class CSPResStage(nn.Layer): self.conv_down = None self.conv1 = ConvBNLayer(ch_mid, ch_mid // 2, 1, act=act) self.conv2 = ConvBNLayer(ch_mid, ch_mid // 2, 1, act=act) - self.blocks = nn.Sequential(* [ + self.blocks = nn.Sequential(*[ block_fn( - ch_mid // 2, ch_mid // 2, act=act, shortcut=True) - for i in range(n) + ch_mid // 2, + ch_mid // 2, + act=act, + shortcut=True, + use_alpha=use_alpha) for i in range(n) ]) if attn: self.attn = EffectiveSELayer(ch_mid, act='hardsigmoid') @@ -209,13 +235,17 @@ class CSPResNet(nn.Layer): layers=[3, 6, 6, 3], channels=[64, 128, 256, 512, 1024], act='swish', - return_idx=[0, 1, 2, 3, 4], + return_idx=[1, 2, 3], depth_wise=False, use_large_stem=False, width_mult=1.0, depth_mult=1.0, - trt=False): + trt=False, + use_checkpoint=False, + use_alpha=False, + **args): super(CSPResNet, self).__init__() + self.use_checkpoint = use_checkpoint channels = [max(round(c * width_mult), 1) for c in channels] layers = [max(round(l * depth_mult), 1) for l in layers] act = get_act_fn( @@ -252,20 +282,31 @@ class CSPResNet(nn.Layer): act=act))) n = len(channels) - 1 - self.stages = nn.Sequential(* [(str(i), CSPResStage( - BasicBlock, channels[i], channels[i + 1], layers[i], 2, act=act)) - for i in range(n)]) + self.stages = nn.Sequential(*[(str(i), CSPResStage( + BasicBlock, + channels[i], + channels[i + 1], + layers[i], + 2, + act=act, + use_alpha=use_alpha)) for i in range(n)]) self._out_channels = channels[1:] - self._out_strides = [4, 8, 16, 32] + self._out_strides = [4 * 2**i for i in range(n)] self.return_idx = return_idx + if use_checkpoint: + paddle.seed(0) def forward(self, inputs): x = inputs['image'] x = self.stem(x) outs = [] for idx, stage in enumerate(self.stages): - x = stage(x) + if self.use_checkpoint and self.training: + x = paddle.distributed.fleet.utils.recompute( + stage, x, **{"preserve_rng_state": True}) + else: + x = stage(x) if idx in self.return_idx: outs.append(x) diff --git a/ppdet/modeling/backbones/hardnet.py b/ppdet/modeling/backbones/hardnet.py index 14a1599dfbfac36ed54fd55126e77e9046666e8a..8615fb6a67f316cace6f2e9fb0132becf52f2d71 100644 --- a/ppdet/modeling/backbones/hardnet.py +++ b/ppdet/modeling/backbones/hardnet.py @@ -146,7 +146,7 @@ class HarDBlock(nn.Layer): class HarDNet(nn.Layer): def __init__(self, depth_wise=False, return_idx=[1, 3, 8, 13], arch=85): super(HarDNet, self).__init__() - assert arch in [39, 68, 85], "HarDNet-{} not support.".format(arch) + assert arch in [68, 85], "HarDNet-{} is not supported.".format(arch) if arch == 85: first_ch = [48, 96] second_kernel = 3 @@ -161,6 +161,8 @@ class HarDNet(nn.Layer): grmul = 1.7 gr = [14, 16, 20, 40] n_layers = [8, 16, 16, 16] + else: + raise ValueError("HarDNet-{} is not supported.".format(arch)) self.return_idx = return_idx self._out_channels = [96, 214, 458, 784] diff --git a/ppdet/modeling/backbones/mobileone.py b/ppdet/modeling/backbones/mobileone.py new file mode 100644 index 0000000000000000000000000000000000000000..e548badd3ed714946e961bc29459191ec0ab7fcb --- /dev/null +++ b/ppdet/modeling/backbones/mobileone.py @@ -0,0 +1,266 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is the paddle implementation of MobileOne block, see: https://arxiv.org/pdf/2206.04040.pdf. +Some codes are based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py +Ths copyright of microsoft/Swin-Transformer is as follows: +MIT License [see LICENSE for details] +""" + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.regularizer import L2Decay +from paddle.nn.initializer import Normal, Constant + +from ppdet.modeling.ops import get_act_fn +from ppdet.modeling.layers import ConvNormLayer + + +class MobileOneBlock(nn.Layer): + def __init__( + self, + ch_in, + ch_out, + stride, + kernel_size, + conv_num=1, + norm_type='bn', + norm_decay=0., + norm_groups=32, + bias_on=False, + lr_scale=1., + freeze_norm=False, + initializer=Normal( + mean=0., std=0.01), + skip_quant=False, + act='relu', ): + super(MobileOneBlock, self).__init__() + + self.ch_in = ch_in + self.ch_out = ch_out + self.kernel_size = kernel_size + self.stride = stride + self.padding = (kernel_size - 1) // 2 + self.k = conv_num + + self.depth_conv = nn.LayerList() + self.point_conv = nn.LayerList() + for _ in range(self.k): + self.depth_conv.append( + ConvNormLayer( + ch_in, + ch_in, + kernel_size, + stride=stride, + groups=ch_in, + norm_type=norm_type, + norm_decay=norm_decay, + norm_groups=norm_groups, + bias_on=bias_on, + lr_scale=lr_scale, + freeze_norm=freeze_norm, + initializer=initializer, + skip_quant=skip_quant)) + self.point_conv.append( + ConvNormLayer( + ch_in, + ch_out, + 1, + stride=1, + groups=1, + norm_type=norm_type, + norm_decay=norm_decay, + norm_groups=norm_groups, + bias_on=bias_on, + lr_scale=lr_scale, + freeze_norm=freeze_norm, + initializer=initializer, + skip_quant=skip_quant)) + self.rbr_1x1 = ConvNormLayer( + ch_in, + ch_in, + 1, + stride=self.stride, + groups=ch_in, + norm_type=norm_type, + norm_decay=norm_decay, + norm_groups=norm_groups, + bias_on=bias_on, + lr_scale=lr_scale, + freeze_norm=freeze_norm, + initializer=initializer, + skip_quant=skip_quant) + self.rbr_identity_st1 = nn.BatchNorm2D( + num_features=ch_in, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay( + 0.0))) if ch_in == ch_out and self.stride == 1 else None + self.rbr_identity_st2 = nn.BatchNorm2D( + num_features=ch_out, + weight_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay( + 0.0))) if ch_in == ch_out and self.stride == 1 else None + self.act = get_act_fn(act) if act is None or isinstance(act, ( + str, dict)) else act + + def forward(self, x): + if hasattr(self, "conv1") and hasattr(self, "conv2"): + y = self.act(self.conv2(self.act(self.conv1(x)))) + else: + if self.rbr_identity_st1 is None: + id_out_st1 = 0 + else: + id_out_st1 = self.rbr_identity_st1(x) + + x1_1 = 0 + for i in range(self.k): + x1_1 += self.depth_conv[i](x) + + x1_2 = self.rbr_1x1(x) + x1 = self.act(x1_1 + x1_2 + id_out_st1) + + if self.rbr_identity_st2 is None: + id_out_st2 = 0 + else: + id_out_st2 = self.rbr_identity_st2(x1) + + x2_1 = 0 + for i in range(self.k): + x2_1 += self.point_conv[i](x1) + y = self.act(x2_1 + id_out_st2) + + return y + + def convert_to_deploy(self): + if not hasattr(self, 'conv1'): + self.conv1 = nn.Conv2D( + in_channels=self.ch_in, + out_channels=self.ch_in, + kernel_size=self.kernel_size, + stride=self.stride, + padding=self.padding, + groups=self.ch_in, + bias_attr=ParamAttr( + initializer=Constant(value=0.), learning_rate=1.)) + if not hasattr(self, 'conv2'): + self.conv2 = nn.Conv2D( + in_channels=self.ch_in, + out_channels=self.ch_out, + kernel_size=1, + stride=1, + padding='SAME', + groups=1, + bias_attr=ParamAttr( + initializer=Constant(value=0.), learning_rate=1.)) + + conv1_kernel, conv1_bias, conv2_kernel, conv2_bias = self.get_equivalent_kernel_bias( + ) + self.conv1.weight.set_value(conv1_kernel) + self.conv1.bias.set_value(conv1_bias) + self.conv2.weight.set_value(conv2_kernel) + self.conv2.bias.set_value(conv2_bias) + self.__delattr__('depth_conv') + self.__delattr__('point_conv') + self.__delattr__('rbr_1x1') + if hasattr(self, 'rbr_identity_st1'): + self.__delattr__('rbr_identity_st1') + if hasattr(self, 'rbr_identity_st2'): + self.__delattr__('rbr_identity_st2') + + def get_equivalent_kernel_bias(self): + st1_kernel3x3, st1_bias3x3 = self._fuse_bn_tensor(self.depth_conv) + st1_kernel1x1, st1_bias1x1 = self._fuse_bn_tensor(self.rbr_1x1) + st1_kernelid, st1_biasid = self._fuse_bn_tensor( + self.rbr_identity_st1, kernel_size=self.kernel_size) + + st2_kernel1x1, st2_bias1x1 = self._fuse_bn_tensor(self.point_conv) + st2_kernelid, st2_biasid = self._fuse_bn_tensor( + self.rbr_identity_st2, kernel_size=1) + + conv1_kernel = st1_kernel3x3 + self._pad_1x1_to_3x3_tensor( + st1_kernel1x1) + st1_kernelid + + conv1_bias = st1_bias3x3 + st1_bias1x1 + st1_biasid + + conv2_kernel = st2_kernel1x1 + st2_kernelid + conv2_bias = st2_bias1x1 + st2_biasid + + return conv1_kernel, conv1_bias, conv2_kernel, conv2_bias + + def _pad_1x1_to_3x3_tensor(self, kernel1x1): + if kernel1x1 is None: + return 0 + else: + padding_size = (self.kernel_size - 1) // 2 + return nn.functional.pad( + kernel1x1, + [padding_size, padding_size, padding_size, padding_size]) + + def _fuse_bn_tensor(self, branch, kernel_size=3): + if branch is None: + return 0, 0 + + if isinstance(branch, nn.LayerList): + fused_kernels = [] + fused_bias = [] + for block in branch: + kernel = block.conv.weight + running_mean = block.norm._mean + running_var = block.norm._variance + gamma = block.norm.weight + beta = block.norm.bias + eps = block.norm._epsilon + + std = (running_var + eps).sqrt() + t = (gamma / std).reshape((-1, 1, 1, 1)) + + fused_kernels.append(kernel * t) + fused_bias.append(beta - running_mean * gamma / std) + + return sum(fused_kernels), sum(fused_bias) + + elif isinstance(branch, ConvNormLayer): + kernel = branch.conv.weight + running_mean = branch.norm._mean + running_var = branch.norm._variance + gamma = branch.norm.weight + beta = branch.norm.bias + eps = branch.norm._epsilon + else: + assert isinstance(branch, nn.BatchNorm2D) + input_dim = self.ch_in if kernel_size == 1 else 1 + kernel_value = paddle.zeros( + shape=[self.ch_in, input_dim, kernel_size, kernel_size], + dtype='float32') + if kernel_size > 1: + for i in range(self.ch_in): + kernel_value[i, i % input_dim, (kernel_size - 1) // 2, ( + kernel_size - 1) // 2] = 1 + elif kernel_size == 1: + for i in range(self.ch_in): + kernel_value[i, i % input_dim, 0, 0] = 1 + else: + raise ValueError("Invalid kernel size recieved!") + kernel = paddle.to_tensor(kernel_value, place=branch.weight.place) + running_mean = branch._mean + running_var = branch._variance + gamma = branch.weight + beta = branch.bias + eps = branch._epsilon + + std = (running_var + eps).sqrt() + t = (gamma / std).reshape((-1, 1, 1, 1)) + + return kernel * t, beta - running_mean * gamma / std diff --git a/ppdet/modeling/backbones/shufflenet_v2.py b/ppdet/modeling/backbones/shufflenet_v2.py index 996697ad719e29f0c4e8c2845dfed4be5e5808fb..ca7ebb93fb8099aa07f348a051d9c9e2f95e3a5f 100644 --- a/ppdet/modeling/backbones/shufflenet_v2.py +++ b/ppdet/modeling/backbones/shufflenet_v2.py @@ -188,11 +188,10 @@ class ShuffleNetV2(nn.Layer): elif scale == 1.5: stage_out_channels = [-1, 24, 176, 352, 704, 1024] elif scale == 2.0: - stage_out_channels = [-1, 24, 224, 488, 976, 2048] + stage_out_channels = [-1, 24, 244, 488, 976, 2048] else: raise NotImplementedError("This scale size:[" + str(scale) + "] is not implemented!") - self._out_channels = [] self._feature_idx = 0 # 1. conv1 diff --git a/ppdet/modeling/backbones/swin_transformer.py b/ppdet/modeling/backbones/swin_transformer.py index 8509b5164195581890e52d1ed46f4c04d6e76616..aa4311ff812dffcfe889b843ad9a5ec6a5ce8e48 100644 --- a/ppdet/modeling/backbones/swin_transformer.py +++ b/ppdet/modeling/backbones/swin_transformer.py @@ -20,62 +20,13 @@ MIT License [see LICENSE for details] import paddle import paddle.nn as nn import paddle.nn.functional as F -from paddle.nn.initializer import TruncatedNormal, Constant, Assign from ppdet.modeling.shape_spec import ShapeSpec from ppdet.core.workspace import register, serializable import numpy as np -# Common initializations -ones_ = Constant(value=1.) -zeros_ = Constant(value=0.) -trunc_normal_ = TruncatedNormal(std=.02) - - -# Common Functions -def to_2tuple(x): - return tuple([x] * 2) - - -def add_parameter(layer, datas, name=None): - parameter = layer.create_parameter( - shape=(datas.shape), default_initializer=Assign(datas)) - if name: - layer.add_parameter(name, parameter) - return parameter - - -# Common Layers -def drop_path(x, drop_prob=0., training=False): - """ - Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). - the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... - See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... - """ - if drop_prob == 0. or not training: - return x - keep_prob = paddle.to_tensor(1 - drop_prob) - shape = (paddle.shape(x)[0], ) + (1, ) * (x.ndim - 1) - random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype) - random_tensor = paddle.floor(random_tensor) # binarize - output = x.divide(keep_prob) * random_tensor - return output - - -class DropPath(nn.Layer): - def __init__(self, drop_prob=None): - super(DropPath, self).__init__() - self.drop_prob = drop_prob - - def forward(self, x): - return drop_path(x, self.drop_prob, self.training) - - -class Identity(nn.Layer): - def __init__(self): - super(Identity, self).__init__() - - def forward(self, input): - return input +from .transformer_utils import DropPath, Identity +from .transformer_utils import add_parameter, to_2tuple +from .transformer_utils import ones_, zeros_, trunc_normal_ class Mlp(nn.Layer): @@ -112,7 +63,7 @@ def window_partition(x, window_size): """ B, H, W, C = x.shape x = x.reshape( - [B, H // window_size, window_size, W // window_size, window_size, C]) + [-1, H // window_size, window_size, W // window_size, window_size, C]) windows = x.transpose([0, 1, 3, 2, 4, 5]).reshape( [-1, window_size, window_size, C]) return windows @@ -128,10 +79,11 @@ def window_reverse(windows, window_size, H, W): Returns: x: (B, H, W, C) """ + _, _, _, C = windows.shape B = int(windows.shape[0] / (H * W / window_size / window_size)) x = windows.reshape( - [B, H // window_size, W // window_size, window_size, window_size, -1]) - x = x.transpose([0, 1, 3, 2, 4, 5]).reshape([B, H, W, -1]) + [-1, H // window_size, W // window_size, window_size, window_size, C]) + x = x.transpose([0, 1, 3, 2, 4, 5]).reshape([-1, H, W, C]) return x @@ -206,14 +158,14 @@ class WindowAttention(nn.Layer): """ B_, N, C = x.shape qkv = self.qkv(x).reshape( - [B_, N, 3, self.num_heads, C // self.num_heads]).transpose( + [-1, N, 3, self.num_heads, C // self.num_heads]).transpose( [2, 0, 3, 1, 4]) q, k, v = qkv[0], qkv[1], qkv[2] q = q * self.scale attn = paddle.mm(q, k.transpose([0, 1, 3, 2])) - index = self.relative_position_index.reshape([-1]) + index = self.relative_position_index.flatten() relative_position_bias = paddle.index_select( self.relative_position_bias_table, index) @@ -227,7 +179,7 @@ class WindowAttention(nn.Layer): if mask is not None: nW = mask.shape[0] - attn = attn.reshape([B_ // nW, nW, self.num_heads, N, N + attn = attn.reshape([-1, nW, self.num_heads, N, N ]) + mask.unsqueeze(1).unsqueeze(0) attn = attn.reshape([-1, self.num_heads, N, N]) attn = self.softmax(attn) @@ -237,7 +189,7 @@ class WindowAttention(nn.Layer): attn = self.attn_drop(attn) # x = (attn @ v).transpose(1, 2).reshape([B_, N, C]) - x = paddle.mm(attn, v).transpose([0, 2, 1, 3]).reshape([B_, N, C]) + x = paddle.mm(attn, v).transpose([0, 2, 1, 3]).reshape([-1, N, C]) x = self.proj(x) x = self.proj_drop(x) return x @@ -315,7 +267,7 @@ class SwinTransformerBlock(nn.Layer): shortcut = x x = self.norm1(x) - x = x.reshape([B, H, W, C]) + x = x.reshape([-1, H, W, C]) # pad feature maps to multiples of window size pad_l = pad_t = 0 @@ -337,7 +289,7 @@ class SwinTransformerBlock(nn.Layer): x_windows = window_partition( shifted_x, self.window_size) # nW*B, window_size, window_size, C x_windows = x_windows.reshape( - [-1, self.window_size * self.window_size, + [x_windows.shape[0], self.window_size * self.window_size, C]) # nW*B, window_size*window_size, C # W-MSA/SW-MSA @@ -346,7 +298,7 @@ class SwinTransformerBlock(nn.Layer): # merge windows attn_windows = attn_windows.reshape( - [-1, self.window_size, self.window_size, C]) + [x_windows.shape[0], self.window_size, self.window_size, C]) shifted_x = window_reverse(attn_windows, self.window_size, Hp, Wp) # B H' W' C @@ -362,7 +314,7 @@ class SwinTransformerBlock(nn.Layer): if pad_r > 0 or pad_b > 0: x = x[:, :H, :W, :] - x = x.reshape([B, H * W, C]) + x = x.reshape([-1, H * W, C]) # FFN x = shortcut + self.drop_path(x) @@ -393,7 +345,7 @@ class PatchMerging(nn.Layer): B, L, C = x.shape assert L == H * W, "input feature has wrong size" - x = x.reshape([B, H, W, C]) + x = x.reshape([-1, H, W, C]) # padding pad_input = (H % 2 == 1) or (W % 2 == 1) @@ -405,7 +357,7 @@ class PatchMerging(nn.Layer): x2 = x[:, 0::2, 1::2, :] # B H/2 W/2 C x3 = x[:, 1::2, 1::2, :] # B H/2 W/2 C x = paddle.concat([x0, x1, x2, x3], -1) # B H/2 W/2 4*C - x = x.reshape([B, H * W // 4, 4 * C]) # B H/2*W/2 4*C + x = x.reshape([-1, H * W // 4, 4 * C]) # B H/2*W/2 4*C x = self.norm(x) x = self.reduction(x) @@ -482,8 +434,7 @@ class BasicLayer(nn.Layer): # calculate attention mask for SW-MSA Hp = int(np.ceil(H / self.window_size)) * self.window_size Wp = int(np.ceil(W / self.window_size)) * self.window_size - img_mask = paddle.fluid.layers.zeros( - [1, Hp, Wp, 1], dtype='float32') # 1 Hp Wp 1 + img_mask = paddle.zeros([1, Hp, Wp, 1], dtype='float32') # 1 Hp Wp 1 h_slices = (slice(0, -self.window_size), slice(-self.window_size, -self.shift_size), slice(-self.shift_size, None)) @@ -688,10 +639,10 @@ class SwinTransformer(nn.Layer): if self.frozen_stages >= 0: self.patch_embed.eval() for param in self.patch_embed.parameters(): - param.requires_grad = False + param.stop_gradient = True if self.frozen_stages >= 1 and self.ape: - self.absolute_pos_embed.requires_grad = False + self.absolute_pos_embed.stop_gradient = True if self.frozen_stages >= 2: self.pos_drop.eval() @@ -699,7 +650,7 @@ class SwinTransformer(nn.Layer): m = self.layers[i] m.eval() for param in m.parameters(): - param.requires_grad = False + param.stop_gradient = True def _init_weights(self, m): if isinstance(m, nn.Linear): @@ -713,7 +664,7 @@ class SwinTransformer(nn.Layer): def forward(self, x): """Forward function.""" x = self.patch_embed(x['image']) - _, _, Wh, Ww = x.shape + B, _, Wh, Ww = x.shape if self.ape: # interpolate the position embedding to the corresponding size absolute_pos_embed = F.interpolate( diff --git a/ppdet/modeling/backbones/transformer_utils.py b/ppdet/modeling/backbones/transformer_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..bc10652d57447c339d923b84ff3e4c39b3c80824 --- /dev/null +++ b/ppdet/modeling/backbones/transformer_utils.py @@ -0,0 +1,74 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn + +from paddle.nn.initializer import TruncatedNormal, Constant, Assign + +# Common initializations +ones_ = Constant(value=1.) +zeros_ = Constant(value=0.) +trunc_normal_ = TruncatedNormal(std=.02) + + +# Common Layers +def drop_path(x, drop_prob=0., training=False): + """ + Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). + the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... + See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... + """ + if drop_prob == 0. or not training: + return x + keep_prob = paddle.to_tensor(1 - drop_prob) + shape = (paddle.shape(x)[0], ) + (1, ) * (x.ndim - 1) + random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype) + random_tensor = paddle.floor(random_tensor) # binarize + output = x.divide(keep_prob) * random_tensor + return output + + +class DropPath(nn.Layer): + def __init__(self, drop_prob=None): + super(DropPath, self).__init__() + self.drop_prob = drop_prob + + def forward(self, x): + return drop_path(x, self.drop_prob, self.training) + + +class Identity(nn.Layer): + def __init__(self): + super(Identity, self).__init__() + + def forward(self, input): + return input + + +# common funcs + + +def to_2tuple(x): + if isinstance(x, (list, tuple)): + return x + return tuple([x] * 2) + + +def add_parameter(layer, datas, name=None): + parameter = layer.create_parameter( + shape=(datas.shape), default_initializer=Assign(datas)) + if name: + layer.add_parameter(name, parameter) + return parameter diff --git a/ppdet/modeling/backbones/vision_transformer.py b/ppdet/modeling/backbones/vision_transformer.py new file mode 100644 index 0000000000000000000000000000000000000000..798ea376878f09ade54e3a5c9bbc6f825769db72 --- /dev/null +++ b/ppdet/modeling/backbones/vision_transformer.py @@ -0,0 +1,633 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +import numpy as np +from paddle.nn.initializer import Constant + +from ppdet.modeling.shape_spec import ShapeSpec +from ppdet.core.workspace import register, serializable + +from .transformer_utils import zeros_, DropPath, Identity + + +class Mlp(nn.Layer): + def __init__(self, + in_features, + hidden_features=None, + out_features=None, + act_layer=nn.GELU, + drop=0.): + super().__init__() + out_features = out_features or in_features + hidden_features = hidden_features or in_features + self.fc1 = nn.Linear(in_features, hidden_features) + self.act = act_layer() + self.fc2 = nn.Linear(hidden_features, out_features) + self.drop = nn.Dropout(drop) + + def forward(self, x): + x = self.fc1(x) + x = self.act(x) + x = self.drop(x) + x = self.fc2(x) + x = self.drop(x) + return x + + +class Attention(nn.Layer): + def __init__(self, + dim, + num_heads=8, + qkv_bias=False, + qk_scale=None, + attn_drop=0., + proj_drop=0., + window_size=None): + super().__init__() + self.num_heads = num_heads + head_dim = dim // num_heads + self.scale = qk_scale or head_dim**-0.5 + + self.qkv = nn.Linear(dim, dim * 3, bias_attr=False) + + if qkv_bias: + self.q_bias = self.create_parameter( + shape=([dim]), default_initializer=zeros_) + self.v_bias = self.create_parameter( + shape=([dim]), default_initializer=zeros_) + else: + self.q_bias = None + self.v_bias = None + if window_size: + self.window_size = window_size + self.num_relative_distance = (2 * window_size[0] - 1) * ( + 2 * window_size[1] - 1) + 3 + self.relative_position_bias_table = self.create_parameter( + shape=(self.num_relative_distance, num_heads), + default_initializer=zeros_) # 2*Wh-1 * 2*Ww-1, nH + # cls to token & token 2 cls & cls to cls + + # get pair-wise relative position index for each token inside the window + coords_h = paddle.arange(window_size[0]) + coords_w = paddle.arange(window_size[1]) + coords = paddle.stack(paddle.meshgrid( + [coords_h, coords_w])) # 2, Wh, Ww + coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww + coords_flatten_1 = paddle.unsqueeze(coords_flatten, 2) + coords_flatten_2 = paddle.unsqueeze(coords_flatten, 1) + relative_coords = coords_flatten_1.clone() - coords_flatten_2.clone( + ) + + #relative_coords = coords_flatten[:, :, None] - coords_flatten[:, None, :] # 2, Wh*Ww, Wh*Wh + relative_coords = relative_coords.transpose( + (1, 2, 0)) #.contiguous() # Wh*Ww, Wh*Ww, 2 + relative_coords[:, :, 0] += window_size[ + 0] - 1 # shift to start from 0 + relative_coords[:, :, 1] += window_size[1] - 1 + relative_coords[:, :, 0] *= 2 * window_size[1] - 1 + relative_position_index = \ + paddle.zeros(shape=(window_size[0] * window_size[1] + 1, ) * 2, dtype=relative_coords.dtype) + relative_position_index[1:, 1:] = relative_coords.sum( + -1) # Wh*Ww, Wh*Ww + relative_position_index[0, 0:] = self.num_relative_distance - 3 + relative_position_index[0:, 0] = self.num_relative_distance - 2 + relative_position_index[0, 0] = self.num_relative_distance - 1 + + self.register_buffer("relative_position_index", + relative_position_index) + # trunc_normal_(self.relative_position_bias_table, std=.0) + else: + self.window_size = None + self.relative_position_bias_table = None + self.relative_position_index = None + + self.attn_drop = nn.Dropout(attn_drop) + self.proj = nn.Linear(dim, dim) + self.proj_drop = nn.Dropout(proj_drop) + + def forward(self, x, rel_pos_bias=None): + x_shape = paddle.shape(x) + N, C = x_shape[1], x_shape[2] + + qkv_bias = None + if self.q_bias is not None: + qkv_bias = paddle.concat( + (self.q_bias, paddle.zeros_like(self.v_bias), self.v_bias)) + qkv = F.linear(x, weight=self.qkv.weight, bias=qkv_bias) + + qkv = qkv.reshape((-1, N, 3, self.num_heads, + C // self.num_heads)).transpose((2, 0, 3, 1, 4)) + q, k, v = qkv[0], qkv[1], qkv[2] + attn = (q.matmul(k.transpose((0, 1, 3, 2)))) * self.scale + + if self.relative_position_bias_table is not None: + relative_position_bias = self.relative_position_bias_table[ + self.relative_position_index.reshape([-1])].reshape([ + self.window_size[0] * self.window_size[1] + 1, + self.window_size[0] * self.window_size[1] + 1, -1 + ]) # Wh*Ww,Wh*Ww,nH + relative_position_bias = relative_position_bias.transpose( + (2, 0, 1)) #.contiguous() # nH, Wh*Ww, Wh*Ww + attn = attn + relative_position_bias.unsqueeze(0) + if rel_pos_bias is not None: + attn = attn + rel_pos_bias + + attn = nn.functional.softmax(attn, axis=-1) + attn = self.attn_drop(attn) + + x = (attn.matmul(v)).transpose((0, 2, 1, 3)).reshape((-1, N, C)) + x = self.proj(x) + x = self.proj_drop(x) + return x + + +class Block(nn.Layer): + def __init__(self, + dim, + num_heads, + mlp_ratio=4., + qkv_bias=False, + qk_scale=None, + drop=0., + attn_drop=0., + drop_path=0., + window_size=None, + init_values=None, + act_layer=nn.GELU, + norm_layer='nn.LayerNorm', + epsilon=1e-5): + super().__init__() + self.norm1 = nn.LayerNorm(dim, epsilon=1e-6) + self.attn = Attention( + dim, + num_heads=num_heads, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + attn_drop=attn_drop, + proj_drop=drop, + window_size=window_size) + # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here + self.drop_path = DropPath(drop_path) if drop_path > 0. else Identity() + self.norm2 = eval(norm_layer)(dim, epsilon=epsilon) + mlp_hidden_dim = int(dim * mlp_ratio) + self.mlp = Mlp(in_features=dim, + hidden_features=mlp_hidden_dim, + act_layer=act_layer, + drop=drop) + if init_values is not None: + self.gamma_1 = self.create_parameter( + shape=([dim]), default_initializer=Constant(value=init_values)) + self.gamma_2 = self.create_parameter( + shape=([dim]), default_initializer=Constant(value=init_values)) + else: + self.gamma_1, self.gamma_2 = None, None + + def forward(self, x, rel_pos_bias=None): + + if self.gamma_1 is None: + x = x + self.drop_path( + self.attn( + self.norm1(x), rel_pos_bias=rel_pos_bias)) + x = x + self.drop_path(self.mlp(self.norm2(x))) + else: + x = x + self.drop_path(self.gamma_1 * self.attn( + self.norm1(x), rel_pos_bias=rel_pos_bias)) + x = x + self.drop_path(self.gamma_2 * self.mlp(self.norm2(x))) + return x + + +class PatchEmbed(nn.Layer): + """ Image to Patch Embedding + """ + + def __init__(self, + img_size=[224, 224], + patch_size=16, + in_chans=3, + embed_dim=768): + super().__init__() + self.num_patches_w = img_size[0] // patch_size + self.num_patches_h = img_size[1] // patch_size + + num_patches = self.num_patches_w * self.num_patches_h + self.patch_shape = (img_size[0] // patch_size, + img_size[1] // patch_size) + self.img_size = img_size + self.patch_size = patch_size + self.num_patches = num_patches + + self.proj = nn.Conv2D( + in_chans, embed_dim, kernel_size=patch_size, stride=patch_size) + + @property + def num_patches_in_h(self): + return self.img_size[1] // self.patch_size + + @property + def num_patches_in_w(self): + return self.img_size[0] // self.patch_size + + def forward(self, x, mask=None): + B, C, H, W = x.shape + return self.proj(x) + + +class RelativePositionBias(nn.Layer): + def __init__(self, window_size, num_heads): + super().__init__() + self.window_size = window_size + self.num_relative_distance = (2 * window_size[0] - 1) * ( + 2 * window_size[1] - 1) + 3 + self.relative_position_bias_table = self.create_parameter( + shape=(self.num_relative_distance, num_heads), + default_initialize=zeros_) + # cls to token & token 2 cls & cls to cls + + # get pair-wise relative position index for each token inside the window + coords_h = paddle.arange(window_size[0]) + coords_w = paddle.arange(window_size[1]) + coords = paddle.stack(paddle.meshgrid( + [coords_h, coords_w])) # 2, Wh, Ww + coords_flatten = coords.flatten(1) # 2, Wh*Ww + + relative_coords = coords_flatten[:, :, + None] - coords_flatten[:, + None, :] # 2, Wh*Ww, Wh*Ww + relative_coords = relative_coords.transpos( + (1, 2, 0)) # Wh*Ww, Wh*Ww, 2 + relative_coords[:, :, 0] += window_size[0] - 1 # shift to start from 0 + relative_coords[:, :, 1] += window_size[1] - 1 + relative_coords[:, :, 0] *= 2 * window_size[1] - 1 + relative_position_index = \ + paddle.zeros(size=(window_size[0] * window_size[1] + 1,) * 2, dtype=relative_coords.dtype) + relative_position_index[1:, 1:] = relative_coords.sum( + -1) # Wh*Ww, Wh*Ww + relative_position_index[0, 0:] = self.num_relative_distance - 3 + relative_position_index[0:, 0] = self.num_relative_distance - 2 + relative_position_index[0, 0] = self.num_relative_distance - 1 + self.register_buffer("relative_position_index", relative_position_index) + + def forward(self): + relative_position_bias = \ + self.relative_position_bias_table[self.relative_position_index.view(-1)].view( + self.window_size[0] * self.window_size[1] + 1, + self.window_size[0] * self.window_size[1] + 1, -1) # Wh*Ww,Wh*Ww,nH + return relative_position_bias.transpose((2, 0, 1)) # nH, Wh*Ww, Wh*Ww + + +def get_sinusoid_encoding_table(n_position, d_hid, token=False): + ''' Sinusoid position encoding table ''' + + def get_position_angle_vec(position): + return [ + position / np.power(10000, 2 * (hid_j // 2) / d_hid) + for hid_j in range(d_hid) + ] + + sinusoid_table = np.array( + [get_position_angle_vec(pos_i) for pos_i in range(n_position)]) + sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i + sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1 + if token: + sinusoid_table = np.concatenate( + [sinusoid_table, np.zeros([1, d_hid])], dim=0) + + return paddle.to_tensor(sinusoid_table, dtype=paddle.float32).unsqueeze(0) + + +@register +@serializable +class VisionTransformer(nn.Layer): + """ Vision Transformer with support for patch input + """ + + def __init__(self, + img_size=[672, 1092], + patch_size=16, + in_chans=3, + embed_dim=768, + depth=12, + num_heads=12, + mlp_ratio=4, + qkv_bias=False, + qk_scale=None, + drop_rate=0., + attn_drop_rate=0., + drop_path_rate=0., + norm_layer='nn.LayerNorm', + init_values=None, + use_rel_pos_bias=False, + use_shared_rel_pos_bias=False, + epsilon=1e-5, + final_norm=False, + pretrained=None, + out_indices=[3, 5, 7, 11], + use_abs_pos_emb=False, + use_sincos_pos_emb=True, + with_fpn=True, + use_checkpoint=False, + **args): + super().__init__() + self.img_size = img_size + self.embed_dim = embed_dim + self.with_fpn = with_fpn + self.use_checkpoint = use_checkpoint + self.use_sincos_pos_emb = use_sincos_pos_emb + self.use_rel_pos_bias = use_rel_pos_bias + self.final_norm = final_norm + + if use_checkpoint: + print('please set: FLAGS_allocator_strategy=naive_best_fit') + self.patch_embed = PatchEmbed( + img_size=img_size, + patch_size=patch_size, + in_chans=in_chans, + embed_dim=embed_dim) + + self.pos_w = self.patch_embed.num_patches_in_w + self.pos_h = self.patch_embed.num_patches_in_h + + self.cls_token = self.create_parameter( + shape=(1, 1, embed_dim), + default_initializer=paddle.nn.initializer.Constant(value=0.)) + + if use_abs_pos_emb: + self.pos_embed = self.create_parameter( + shape=(1, self.pos_w * self.pos_h + 1, embed_dim), + default_initializer=paddle.nn.initializer.TruncatedNormal( + std=.02)) + elif use_sincos_pos_emb: + pos_embed = self.build_2d_sincos_position_embedding(embed_dim) + + self.pos_embed = pos_embed + self.pos_embed = self.create_parameter(shape=pos_embed.shape) + self.pos_embed.set_value(pos_embed.numpy()) + self.pos_embed.stop_gradient = True + + else: + self.pos_embed = None + + self.pos_drop = nn.Dropout(p=drop_rate) + + if use_shared_rel_pos_bias: + self.rel_pos_bias = RelativePositionBias( + window_size=self.patch_embed.patch_shape, num_heads=num_heads) + else: + self.rel_pos_bias = None + + dpr = np.linspace(0, drop_path_rate, depth) + + self.blocks = nn.LayerList([ + Block( + dim=embed_dim, + num_heads=num_heads, + mlp_ratio=mlp_ratio, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + drop=drop_rate, + attn_drop=attn_drop_rate, + drop_path=dpr[i], + norm_layer=norm_layer, + init_values=init_values, + window_size=self.patch_embed.patch_shape + if use_rel_pos_bias else None, + epsilon=epsilon) for i in range(depth) + ]) + + self.pretrained = pretrained + self.init_weight() + + assert len(out_indices) <= 4, '' + self.out_indices = out_indices + self.out_channels = [embed_dim for _ in range(len(out_indices))] + self.out_strides = [4, 8, 16, 32][-len(out_indices):] if with_fpn else [ + 8 for _ in range(len(out_indices)) + ] + + self.norm = Identity() + + if self.with_fpn: + self.init_fpn( + embed_dim=embed_dim, + patch_size=patch_size, ) + + def init_weight(self): + pretrained = self.pretrained + + if pretrained: + if 'http' in pretrained: #URL + path = paddle.utils.download.get_weights_path_from_url( + pretrained) + else: #model in local path + path = pretrained + + load_state_dict = paddle.load(path) + model_state_dict = self.state_dict() + pos_embed_name = "pos_embed" + + if pos_embed_name in load_state_dict.keys(): + load_pos_embed = paddle.to_tensor( + load_state_dict[pos_embed_name], dtype="float32") + if self.pos_embed.shape != load_pos_embed.shape: + pos_size = int(math.sqrt(load_pos_embed.shape[1] - 1)) + model_state_dict[pos_embed_name] = self.resize_pos_embed( + load_pos_embed, (pos_size, pos_size), + (self.pos_h, self.pos_w)) + + # self.set_state_dict(model_state_dict) + load_state_dict[pos_embed_name] = model_state_dict[ + pos_embed_name] + + print("Load pos_embed and resize it from {} to {} .".format( + load_pos_embed.shape, self.pos_embed.shape)) + + self.set_state_dict(load_state_dict) + print("Load load_state_dict....") + + def init_fpn(self, embed_dim=768, patch_size=16, out_with_norm=False): + if patch_size == 16: + self.fpn1 = nn.Sequential( + nn.Conv2DTranspose( + embed_dim, embed_dim, kernel_size=2, stride=2), + nn.BatchNorm2D(embed_dim), + nn.GELU(), + nn.Conv2DTranspose( + embed_dim, embed_dim, kernel_size=2, stride=2), ) + + self.fpn2 = nn.Sequential( + nn.Conv2DTranspose( + embed_dim, embed_dim, kernel_size=2, stride=2), ) + + self.fpn3 = Identity() + + self.fpn4 = nn.MaxPool2D(kernel_size=2, stride=2) + elif patch_size == 8: + self.fpn1 = nn.Sequential( + nn.Conv2DTranspose( + embed_dim, embed_dim, kernel_size=2, stride=2), ) + + self.fpn2 = Identity() + + self.fpn3 = nn.Sequential(nn.MaxPool2D(kernel_size=2, stride=2), ) + + self.fpn4 = nn.Sequential(nn.MaxPool2D(kernel_size=4, stride=4), ) + + if not out_with_norm: + self.norm = Identity() + else: + self.norm = nn.LayerNorm(embed_dim, epsilon=1e-6) + + def interpolate_pos_encoding(self, x, w, h): + npatch = x.shape[1] - 1 + N = self.pos_embed.shape[1] - 1 + w0 = w // self.patch_embed.patch_size + h0 = h // self.patch_embed.patch_size + if npatch == N and w0 == self.patch_embed.num_patches_w and h0 == self.patch_embed.num_patches_h: + return self.pos_embed + class_pos_embed = self.pos_embed[:, 0] + patch_pos_embed = self.pos_embed[:, 1:] + dim = x.shape[-1] + # we add a small number to avoid floating point error in the interpolation + # see discussion at https://github.com/facebookresearch/dino/issues/8 + w0, h0 = w0 + 0.1, h0 + 0.1 + + patch_pos_embed = nn.functional.interpolate( + patch_pos_embed.reshape([ + 1, self.patch_embed.num_patches_w, + self.patch_embed.num_patches_h, dim + ]).transpose((0, 3, 1, 2)), + scale_factor=(w0 / self.patch_embed.num_patches_w, + h0 / self.patch_embed.num_patches_h), + mode='bicubic', ) + assert int(w0) == patch_pos_embed.shape[-2] and int( + h0) == patch_pos_embed.shape[-1] + patch_pos_embed = patch_pos_embed.transpose( + (0, 2, 3, 1)).reshape([1, -1, dim]) + return paddle.concat( + (class_pos_embed.unsqueeze(0), patch_pos_embed), axis=1) + + def resize_pos_embed(self, pos_embed, old_hw, new_hw): + """ + Resize pos_embed weight. + Args: + pos_embed (Tensor): the pos_embed weight + old_hw (list[int]): the height and width of old pos_embed + new_hw (list[int]): the height and width of new pos_embed + Returns: + Tensor: the resized pos_embed weight + """ + cls_pos_embed = pos_embed[:, :1, :] + pos_embed = pos_embed[:, 1:, :] + + pos_embed = pos_embed.transpose([0, 2, 1]) + pos_embed = pos_embed.reshape([1, -1, old_hw[0], old_hw[1]]) + pos_embed = F.interpolate( + pos_embed, new_hw, mode='bicubic', align_corners=False) + pos_embed = pos_embed.flatten(2).transpose([0, 2, 1]) + pos_embed = paddle.concat([cls_pos_embed, pos_embed], axis=1) + + return pos_embed + + def build_2d_sincos_position_embedding( + self, + embed_dim=768, + temperature=10000., ): + h, w = self.patch_embed.patch_shape + grid_w = paddle.arange(w, dtype=paddle.float32) + grid_h = paddle.arange(h, dtype=paddle.float32) + grid_w, grid_h = paddle.meshgrid(grid_w, grid_h) + assert embed_dim % 4 == 0, 'Embed dimension must be divisible by 4 for 2D sin-cos position embedding' + pos_dim = embed_dim // 4 + omega = paddle.arange(pos_dim, dtype=paddle.float32) / pos_dim + omega = 1. / (temperature**omega) + + out_w = grid_w.flatten()[..., None] @omega[None] + out_h = grid_h.flatten()[..., None] @omega[None] + + pos_emb = paddle.concat( + [ + paddle.sin(out_w), paddle.cos(out_w), paddle.sin(out_h), + paddle.cos(out_h) + ], + axis=1)[None, :, :] + + pe_token = paddle.zeros([1, 1, embed_dim], dtype=paddle.float32) + pos_embed = paddle.concat([pe_token, pos_emb], axis=1) + # pos_embed.stop_gradient = True + + return pos_embed + + def forward(self, x): + x = x['image'] if isinstance(x, dict) else x + _, _, h, w = x.shape + + x = self.patch_embed(x) + + B, D, Hp, Wp = x.shape # b * c * h * w + + cls_tokens = self.cls_token.expand( + (B, self.cls_token.shape[-2], self.cls_token.shape[-1])) + x = x.flatten(2).transpose([0, 2, 1]) # b * hw * c + x = paddle.concat([cls_tokens, x], axis=1) + + if self.pos_embed is not None: + # x = x + self.interpolate_pos_encoding(x, w, h) + x = x + self.interpolate_pos_encoding(x, h, w) + + x = self.pos_drop(x) + + rel_pos_bias = self.rel_pos_bias( + ) if self.rel_pos_bias is not None else None + + feats = [] + for idx, blk in enumerate(self.blocks): + if self.use_checkpoint: + x = paddle.distributed.fleet.utils.recompute( + blk, x, rel_pos_bias, **{"preserve_rng_state": True}) + else: + x = blk(x, rel_pos_bias) + + if idx in self.out_indices: + xp = paddle.reshape( + paddle.transpose( + self.norm(x[:, 1:, :]), perm=[0, 2, 1]), + shape=[B, D, Hp, Wp]) + feats.append(xp) + + if self.with_fpn: + fpns = [self.fpn1, self.fpn2, self.fpn3, self.fpn4] + for i in range(len(feats)): + feats[i] = fpns[i](feats[i]) + + return feats + + @property + def num_layers(self): + return len(self.blocks) + + @property + def no_weight_decay(self): + return {'pos_embed', 'cls_token'} + + @property + def out_shape(self): + return [ + ShapeSpec( + channels=c, stride=s) + for c, s in zip(self.out_channels, self.out_strides) + ] diff --git a/ppdet/modeling/bbox_utils.py b/ppdet/modeling/bbox_utils.py index 11f504f804c98b3f7a0a44d8b6f4481577d5aa6d..f895340c7e8da8606bfd0f55b1e9b84d36bfd549 100644 --- a/ppdet/modeling/bbox_utils.py +++ b/ppdet/modeling/bbox_utils.py @@ -278,8 +278,8 @@ def decode_yolo(box, anchor, downsample_ratio): return [x1, y1, w1, h1] -def iou_similarity(box1, box2, eps=1e-9): - """Calculate iou of box1 and box2 +def batch_iou_similarity(box1, box2, eps=1e-9): + """Calculate iou of box1 and box2 in batch Args: box1 (Tensor): box with the shape [N, M1, 4] @@ -866,3 +866,26 @@ def bbox2delta_v2(src_boxes, stds = paddle.to_tensor(stds, place=src_boxes.place) deltas = (deltas - means) / stds return deltas + + +def iou_similarity(box1, box2, eps=1e-10): + """Calculate iou of box1 and box2 + + Args: + box1 (Tensor): box with the shape [M1, 4] + box2 (Tensor): box with the shape [M2, 4] + + Return: + iou (Tensor): iou between box1 and box2 with the shape [M1, M2] + """ + box1 = box1.unsqueeze(1) # [M1, 4] -> [M1, 1, 4] + box2 = box2.unsqueeze(0) # [M2, 4] -> [1, M2, 4] + px1y1, px2y2 = box1[:, :, 0:2], box1[:, :, 2:4] + gx1y1, gx2y2 = box2[:, :, 0:2], box2[:, :, 2:4] + x1y1 = paddle.maximum(px1y1, gx1y1) + x2y2 = paddle.minimum(px2y2, gx2y2) + overlap = (x2y2 - x1y1).clip(0).prod(-1) + area1 = (px2y2 - px1y1).clip(0).prod(-1) + area2 = (gx2y2 - gx1y1).clip(0).prod(-1) + union = area1 + area2 - overlap + eps + return overlap / union diff --git a/static/ppdet/modeling/tests/decorator_helper.py b/ppdet/modeling/cls_utils.py similarity index 41% rename from static/ppdet/modeling/tests/decorator_helper.py rename to ppdet/modeling/cls_utils.py index 894833ce15eab82ea06c2e66a8e53cb2e7e057b5..3ae8d116959a96bb2bf337dee7330c5909bc61ac 100644 --- a/static/ppdet/modeling/tests/decorator_helper.py +++ b/ppdet/modeling/cls_utils.py @@ -1,4 +1,4 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,22 +12,29 @@ # See the License for the specific language governing permissions and # limitations under the License. -import paddle.fluid as fluid -__all__ = ['prog_scope'] +def _get_class_default_kwargs(cls, *args, **kwargs): + """ + Get default arguments of a class in dict format, if args and + kwargs is specified, it will replace default arguments + """ + varnames = cls.__init__.__code__.co_varnames + argcount = cls.__init__.__code__.co_argcount + keys = varnames[:argcount] + assert keys[0] == 'self' + keys = keys[1:] + values = list(cls.__init__.__defaults__) + assert len(values) == len(keys) -def prog_scope(): - def __impl__(fn): - def __fn__(*args, **kwargs): - prog = fluid.Program() - startup_prog = fluid.Program() - scope = fluid.core.Scope() - with fluid.scope_guard(scope): - with fluid.program_guard(prog, startup_prog): - with fluid.unique_name.guard(): - fn(*args, **kwargs) + if len(args) > 0: + for i, arg in enumerate(args): + values[i] = arg - return __fn__ + default_kwargs = dict(zip(keys, values)) - return __impl__ + if len(kwargs) > 0: + for k, v in kwargs.items(): + default_kwargs[k] = v + + return default_kwargs diff --git a/ppdet/modeling/coders/__init__.py b/ppdet/modeling/coders/__init__.py deleted file mode 100644 index 7726bb36cb06430b7bccd64ab89c8ef626e47790..0000000000000000000000000000000000000000 --- a/ppdet/modeling/coders/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from .delta_bbox_coder import DeltaBBoxCoder diff --git a/ppdet/modeling/coders/delta_bbox_coder.py b/ppdet/modeling/coders/delta_bbox_coder.py deleted file mode 100644 index 0c53ea349eed4799cba164c4544051cb45d60385..0000000000000000000000000000000000000000 --- a/ppdet/modeling/coders/delta_bbox_coder.py +++ /dev/null @@ -1,40 +0,0 @@ -import paddle -import numpy as np -from ppdet.core.workspace import register -from ppdet.modeling.bbox_utils import delta2bbox_v2, bbox2delta_v2 - -__all__ = ['DeltaBBoxCoder'] - - -@register -class DeltaBBoxCoder: - """Encode bboxes in terms of delta/offset of a reference bbox. - Args: - norm_mean (list[float]): the mean to normalize delta - norm_std (list[float]): the std to normalize delta - wh_ratio_clip (float): to clip delta wh of decoded bboxes - ctr_clip (float or None): whether to clip delta xy of decoded bboxes - """ - def __init__(self, - norm_mean=[0.0, 0.0, 0.0, 0.0], - norm_std=[1., 1., 1., 1.], - wh_ratio_clip=16/1000.0, - ctr_clip=None): - self.norm_mean = norm_mean - self.norm_std = norm_std - self.wh_ratio_clip = wh_ratio_clip - self.ctr_clip = ctr_clip - - def encode(self, bboxes, tar_bboxes): - return bbox2delta_v2( - bboxes, tar_bboxes, means=self.norm_mean, stds=self.norm_std) - - def decode(self, bboxes, deltas, max_shape=None): - return delta2bbox_v2( - bboxes, - deltas, - max_shape=max_shape, - wh_ratio_clip=self.wh_ratio_clip, - ctr_clip=self.ctr_clip, - means=self.norm_mean, - stds=self.norm_std) diff --git a/ppdet/modeling/heads/bbox_head.py b/ppdet/modeling/heads/bbox_head.py index e4d7d68785d7632b0053f70faf94a5cccdb27713..debd3074c2ad0ae05a26c9ef240d9b4a573846e6 100644 --- a/ppdet/modeling/heads/bbox_head.py +++ b/ppdet/modeling/heads/bbox_head.py @@ -24,6 +24,7 @@ from ppdet.core.workspace import register, create from .roi_extractor import RoIAlign from ..shape_spec import ShapeSpec from ..bbox_utils import bbox2delta +from ..cls_utils import _get_class_default_kwargs from ppdet.modeling.layers import ConvNormLayer __all__ = ['TwoFCHead', 'XConvNormHead', 'BBoxHead'] @@ -178,7 +179,7 @@ class BBoxHead(nn.Layer): def __init__(self, head, in_channel, - roi_extractor=RoIAlign().__dict__, + roi_extractor=_get_class_default_kwargs(RoIAlign), bbox_assigner='BboxAssigner', with_pool=False, num_classes=80, @@ -256,7 +257,13 @@ class BBoxHead(nn.Layer): pred = self.get_prediction(scores, deltas) return pred, self.head - def get_loss(self, scores, deltas, targets, rois, bbox_weight): + def get_loss(self, + scores, + deltas, + targets, + rois, + bbox_weight, + loss_normalize_pos=False): """ scores (Tensor): scores from bbox head outputs deltas (Tensor): deltas from bbox head outputs @@ -279,8 +286,15 @@ class BBoxHead(nn.Layer): else: tgt_labels = tgt_labels.cast('int64') tgt_labels.stop_gradient = True - loss_bbox_cls = F.cross_entropy( - input=scores, label=tgt_labels, reduction='mean') + + if not loss_normalize_pos: + loss_bbox_cls = F.cross_entropy( + input=scores, label=tgt_labels, reduction='mean') + else: + loss_bbox_cls = F.cross_entropy( + input=scores, label=tgt_labels, + reduction='none').sum() / (tgt_labels.shape[0] + 1e-7) + loss_bbox[cls_name] = loss_bbox_cls # bbox reg @@ -321,9 +335,16 @@ class BBoxHead(nn.Layer): if self.bbox_loss is not None: reg_delta = self.bbox_transform(reg_delta) reg_target = self.bbox_transform(reg_target) - loss_bbox_reg = self.bbox_loss( - reg_delta, reg_target).sum() / tgt_labels.shape[0] - loss_bbox_reg *= self.num_classes + + if not loss_normalize_pos: + loss_bbox_reg = self.bbox_loss( + reg_delta, reg_target).sum() / tgt_labels.shape[0] + loss_bbox_reg *= self.num_classes + + else: + loss_bbox_reg = self.bbox_loss( + reg_delta, reg_target).sum() / (tgt_labels.shape[0] + 1e-7) + else: loss_bbox_reg = paddle.abs(reg_delta - reg_target).sum( ) / tgt_labels.shape[0] diff --git a/ppdet/modeling/heads/cascade_head.py b/ppdet/modeling/heads/cascade_head.py index 935642bd6d85c402afa84900c72627e97db0f9d6..0498a35da5ce4952739245ba0426a1ac306bf2e3 100644 --- a/ppdet/modeling/heads/cascade_head.py +++ b/ppdet/modeling/heads/cascade_head.py @@ -22,6 +22,7 @@ from .bbox_head import BBoxHead, TwoFCHead, XConvNormHead from .roi_extractor import RoIAlign from ..shape_spec import ShapeSpec from ..bbox_utils import delta2bbox, clip_bbox, nonempty_bbox +from ..cls_utils import _get_class_default_kwargs __all__ = ['CascadeTwoFCHead', 'CascadeXConvNormHead', 'CascadeHead'] @@ -153,13 +154,17 @@ class CascadeHead(BBoxHead): def __init__(self, head, in_channel, - roi_extractor=RoIAlign().__dict__, + roi_extractor=_get_class_default_kwargs(RoIAlign), bbox_assigner='BboxAssigner', num_classes=80, bbox_weight=[[10., 10., 5., 5.], [20.0, 20.0, 10.0, 10.0], [30.0, 30.0, 15.0, 15.0]], num_cascade_stages=3, - bbox_loss=None): + bbox_loss=None, + reg_class_agnostic=True, + stage_loss_weights=None, + loss_normalize_pos=False): + nn.Layer.__init__(self, ) self.head = head self.roi_extractor = roi_extractor @@ -171,6 +176,16 @@ class CascadeHead(BBoxHead): self.bbox_weight = bbox_weight self.num_cascade_stages = num_cascade_stages self.bbox_loss = bbox_loss + self.stage_loss_weights = [ + 1. / num_cascade_stages for _ in range(num_cascade_stages) + ] if stage_loss_weights is None else stage_loss_weights + assert len( + self.stage_loss_weights + ) == num_cascade_stages, f'stage_loss_weights({len(self.stage_loss_weights)}) do not equal to num_cascade_stages({num_cascade_stages})' + + self.reg_class_agnostic = reg_class_agnostic + num_bbox_delta = 4 if reg_class_agnostic else 4 * num_classes + self.loss_normalize_pos = loss_normalize_pos self.bbox_score_list = [] self.bbox_delta_list = [] @@ -189,7 +204,7 @@ class CascadeHead(BBoxHead): delta_name, nn.Linear( in_channel, - 4, + num_bbox_delta, weight_attr=paddle.ParamAttr(initializer=Normal( mean=0.0, std=0.001)))) self.bbox_score_list.append(bbox_score) @@ -226,6 +241,20 @@ class CascadeHead(BBoxHead): bbox_feat = self.head(rois_feat, i) scores = self.bbox_score_list[i](bbox_feat) deltas = self.bbox_delta_list[i](bbox_feat) + + # TODO (lyuwenyu) Is it correct for only one class ? + if not self.reg_class_agnostic and i < self.num_cascade_stages - 1: + deltas = deltas.reshape([deltas.shape[0], self.num_classes, 4]) + labels = scores[:, :-1].argmax(axis=-1) + + if self.training: + deltas = deltas[paddle.arange(deltas.shape[0]), labels] + else: + deltas = deltas[(deltas * F.one_hot( + labels, num_classes=self.num_classes).unsqueeze(-1) != 0 + ).nonzero(as_tuple=True)].reshape( + [deltas.shape[0], 4]) + head_out_list.append([scores, deltas, rois]) pred_bbox = self._get_pred_bbox(deltas, rois, self.bbox_weight[i]) @@ -233,11 +262,16 @@ class CascadeHead(BBoxHead): loss = {} for stage, value in enumerate(zip(head_out_list, targets_list)): (scores, deltas, rois), targets = value - loss_stage = self.get_loss(scores, deltas, targets, rois, - self.bbox_weight[stage]) + loss_stage = self.get_loss( + scores, + deltas, + targets, + rois, + self.bbox_weight[stage], + loss_normalize_pos=self.loss_normalize_pos) for k, v in loss_stage.items(): loss[k + "_stage{}".format( - stage)] = v / self.num_cascade_stages + stage)] = v * self.stage_loss_weights[stage] return loss, bbox_feat else: @@ -266,6 +300,12 @@ class CascadeHead(BBoxHead): num_prop = [] for p in proposals: num_prop.append(p.shape[0]) + + # NOTE(dev): num_prob will be tagged as LoDTensorArray because it + # depends on batch_size under @to_static. However the argument + # num_or_sections in paddle.split does not support LoDTensorArray, + # so we use [-1] to replace it and whitout lossing correctness. + num_prop = [-1] if len(num_prop) == 1 else num_prop return pred_bbox.split(num_prop) def get_prediction(self, head_out_list): diff --git a/ppdet/modeling/heads/face_head.py b/ppdet/modeling/heads/face_head.py index bb51f2eb96fbed3e9696852d011a55c1e2115937..360f909a67fd272acc15cdbcd79c1172e9b1088a 100644 --- a/ppdet/modeling/heads/face_head.py +++ b/ppdet/modeling/heads/face_head.py @@ -17,6 +17,7 @@ import paddle.nn as nn from ppdet.core.workspace import register from ..layers import AnchorGeneratorSSD +from ..cls_utils import _get_class_default_kwargs @register @@ -39,7 +40,7 @@ class FaceHead(nn.Layer): def __init__(self, num_classes=80, in_channels=[96, 96], - anchor_generator=AnchorGeneratorSSD().__dict__, + anchor_generator=_get_class_default_kwargs(AnchorGeneratorSSD), kernel_size=3, padding=1, conv_decay=0., diff --git a/ppdet/modeling/heads/gfl_head.py b/ppdet/modeling/heads/gfl_head.py index 779d739b835b3091ddabf3ab0375973f6bc3b8ab..9c87eecd81ef8bd0be8bb61db385ef844fcff2ff 100644 --- a/ppdet/modeling/heads/gfl_head.py +++ b/ppdet/modeling/heads/gfl_head.py @@ -79,7 +79,9 @@ class Integral(nn.Layer): offsets from the box center in four directions, shape (N, 4). """ x = F.softmax(x.reshape([-1, self.reg_max + 1]), axis=1) - x = F.linear(x, self.project).reshape([-1, 4]) + x = F.linear(x, self.project) + if self.training: + x = x.reshape([-1, 4]) return x @@ -386,7 +388,12 @@ class GFLHead(nn.Layer): avg_factor = sum(avg_factor) try: - avg_factor = paddle.distributed.all_reduce(avg_factor.clone()) + avg_factor_clone = avg_factor.clone() + tmp_avg_factor = paddle.distributed.all_reduce(avg_factor_clone) + if tmp_avg_factor is not None: + avg_factor = tmp_avg_factor + else: + avg_factor = avg_factor_clone avg_factor = paddle.clip( avg_factor / paddle.distributed.get_world_size(), min=1) except: diff --git a/ppdet/modeling/heads/mask_head.py b/ppdet/modeling/heads/mask_head.py index 604847a2d07224314b2eba700eefa00729b4f95f..939debbaae129293551394b5571f7da158a0cccb 100644 --- a/ppdet/modeling/heads/mask_head.py +++ b/ppdet/modeling/heads/mask_head.py @@ -20,6 +20,7 @@ from paddle.nn.initializer import KaimingNormal from ppdet.core.workspace import register, create from ppdet.modeling.layers import ConvNormLayer from .roi_extractor import RoIAlign +from ..cls_utils import _get_class_default_kwargs @register @@ -120,7 +121,7 @@ class MaskHead(nn.Layer): def __init__(self, head, - roi_extractor=RoIAlign().__dict__, + roi_extractor=_get_class_default_kwargs(RoIAlign), mask_assigner='MaskAssigner', num_classes=80, share_bbox_feat=False, @@ -221,7 +222,7 @@ class MaskHead(nn.Layer): mask_feat = self.head(rois_feat) mask_logit = self.mask_fcn_logits(mask_feat) if self.num_classes == 1: - mask_out = F.sigmoid(mask_logit) + mask_out = F.sigmoid(mask_logit)[:, 0, :, :] else: num_masks = paddle.shape(mask_logit)[0] index = paddle.arange(num_masks).cast('int32') diff --git a/ppdet/modeling/heads/pico_head.py b/ppdet/modeling/heads/pico_head.py index 98c8c8ef932f3af793bc8d69709420b7930cc6ea..a63e7c90ca76f54934f9e28858e135cdb5c04d16 100644 --- a/ppdet/modeling/heads/pico_head.py +++ b/ppdet/modeling/heads/pico_head.py @@ -23,7 +23,6 @@ import paddle.nn as nn import paddle.nn.functional as F from paddle import ParamAttr from paddle.nn.initializer import Normal, Constant -from paddle.fluid.dygraph import parallel_helper from ppdet.modeling.ops import get_static_shape from ..initializer import normal_ @@ -91,7 +90,7 @@ class PicoFeat(nn.Layer): self.reg_convs = [] if use_se: assert share_cls_reg == True, \ - 'In the case of using se, share_cls_reg is not supported' + 'In the case of using se, share_cls_reg must be set to True' self.se = nn.LayerList() for stage_idx in range(num_fpn_stride): cls_subnet_convs = [] @@ -194,7 +193,7 @@ class PicoHead(OTAVFLHead): 'conv_feat', 'dgqp_module', 'loss_class', 'loss_dfl', 'loss_bbox', 'assigner', 'nms' ] - __shared__ = ['num_classes'] + __shared__ = ['num_classes', 'eval_size'] def __init__(self, conv_feat='PicoFeat', @@ -210,7 +209,8 @@ class PicoHead(OTAVFLHead): feat_in_chan=96, nms=None, nms_pre=1000, - cell_offset=0): + cell_offset=0, + eval_size=None): super(PicoHead, self).__init__( conv_feat=conv_feat, dgqp_module=dgqp_module, @@ -239,6 +239,7 @@ class PicoHead(OTAVFLHead): self.nms = nms self.nms_pre = nms_pre self.cell_offset = cell_offset + self.eval_size = eval_size self.use_sigmoid = self.loss_vfl.use_sigmoid if self.use_sigmoid: @@ -282,12 +283,50 @@ class PicoHead(OTAVFLHead): bias_attr=ParamAttr(initializer=Constant(value=0)))) self.head_reg_list.append(head_reg) + # initialize the anchor points + if self.eval_size: + self.anchor_points, self.stride_tensor = self._generate_anchors() + def forward(self, fpn_feats, export_post_process=True): assert len(fpn_feats) == len( self.fpn_stride ), "The size of fpn_feats is not equal to size of fpn_stride" - cls_logits_list = [] - bboxes_reg_list = [] + + if self.training: + return self.forward_train(fpn_feats) + else: + return self.forward_eval( + fpn_feats, export_post_process=export_post_process) + + def forward_train(self, fpn_feats): + cls_logits_list, bboxes_reg_list = [], [] + for i, fpn_feat in enumerate(fpn_feats): + conv_cls_feat, conv_reg_feat = self.conv_feat(fpn_feat, i) + if self.conv_feat.share_cls_reg: + cls_logits = self.head_cls_list[i](conv_cls_feat) + cls_score, bbox_pred = paddle.split( + cls_logits, + [self.cls_out_channels, 4 * (self.reg_max + 1)], + axis=1) + else: + cls_score = self.head_cls_list[i](conv_cls_feat) + bbox_pred = self.head_reg_list[i](conv_reg_feat) + + if self.dgqp_module: + quality_score = self.dgqp_module(bbox_pred) + cls_score = F.sigmoid(cls_score) * quality_score + + cls_logits_list.append(cls_score) + bboxes_reg_list.append(bbox_pred) + + return (cls_logits_list, bboxes_reg_list) + + def forward_eval(self, fpn_feats, export_post_process=True): + if self.eval_size: + anchor_points, stride_tensor = self.anchor_points, self.stride_tensor + else: + anchor_points, stride_tensor = self._generate_anchors(fpn_feats) + cls_logits_list, bboxes_reg_list = [], [] for i, fpn_feat in enumerate(fpn_feats): conv_cls_feat, conv_reg_feat = self.conv_feat(fpn_feat, i) if self.conv_feat.share_cls_reg: @@ -307,50 +346,68 @@ class PicoHead(OTAVFLHead): if not export_post_process: # Now only supports batch size = 1 in deploy # TODO(ygh): support batch size > 1 - cls_score = F.sigmoid(cls_score).reshape( + cls_score_out = F.sigmoid(cls_score).reshape( [1, self.cls_out_channels, -1]).transpose([0, 2, 1]) bbox_pred = bbox_pred.reshape([1, (self.reg_max + 1) * 4, -1]).transpose([0, 2, 1]) - elif not self.training: - cls_score = F.sigmoid(cls_score.transpose([0, 2, 3, 1])) + else: + b, _, h, w = fpn_feat.shape + l = h * w + cls_score_out = F.sigmoid( + cls_score.reshape([b, self.cls_out_channels, l])) bbox_pred = bbox_pred.transpose([0, 2, 3, 1]) - stride = self.fpn_stride[i] - b, cell_h, cell_w, _ = paddle.shape(cls_score) - y, x = self.get_single_level_center_point( - [cell_h, cell_w], stride, cell_offset=self.cell_offset) - center_points = paddle.stack([x, y], axis=-1) - cls_score = cls_score.reshape([b, -1, self.cls_out_channels]) - bbox_pred = self.distribution_project(bbox_pred) * stride - bbox_pred = bbox_pred.reshape([b, cell_h * cell_w, 4]) - - # NOTE: If keep_ratio=False and image shape value that - # multiples of 32, distance2bbox not set max_shapes parameter - # to speed up model prediction. If need to set max_shapes, - # please use inputs['im_shape']. - bbox_pred = batch_distance2bbox( - center_points, bbox_pred, max_shapes=None) + bbox_pred = self.distribution_project(bbox_pred) + bbox_pred = bbox_pred.reshape([b, l, 4]) - cls_logits_list.append(cls_score) + cls_logits_list.append(cls_score_out) bboxes_reg_list.append(bbox_pred) + if export_post_process: + cls_logits_list = paddle.concat(cls_logits_list, axis=-1) + bboxes_reg_list = paddle.concat(bboxes_reg_list, axis=1) + bboxes_reg_list = batch_distance2bbox(anchor_points, + bboxes_reg_list) + bboxes_reg_list *= stride_tensor + return (cls_logits_list, bboxes_reg_list) - def post_process(self, - gfl_head_outs, - im_shape, - scale_factor, - export_nms=True): - cls_scores, bboxes_reg = gfl_head_outs - bboxes = paddle.concat(bboxes_reg, axis=1) - mlvl_scores = paddle.concat(cls_scores, axis=1) - mlvl_scores = mlvl_scores.transpose([0, 2, 1]) + def _generate_anchors(self, feats=None): + # just use in eval time + anchor_points = [] + stride_tensor = [] + for i, stride in enumerate(self.fpn_stride): + if feats is not None: + _, _, h, w = feats[i].shape + else: + h = math.ceil(self.eval_size[0] / stride) + w = math.ceil(self.eval_size[1] / stride) + shift_x = paddle.arange(end=w) + self.cell_offset + shift_y = paddle.arange(end=h) + self.cell_offset + shift_y, shift_x = paddle.meshgrid(shift_y, shift_x) + anchor_point = paddle.cast( + paddle.stack( + [shift_x, shift_y], axis=-1), dtype='float32') + anchor_points.append(anchor_point.reshape([-1, 2])) + stride_tensor.append( + paddle.full( + [h * w, 1], stride, dtype='float32')) + anchor_points = paddle.concat(anchor_points) + stride_tensor = paddle.concat(stride_tensor) + return anchor_points, stride_tensor + + def post_process(self, head_outs, scale_factor, export_nms=True): + pred_scores, pred_bboxes = head_outs if not export_nms: - return bboxes, mlvl_scores + return pred_bboxes, pred_scores else: # rescale: [h_scale, w_scale] -> [w_scale, h_scale, w_scale, h_scale] - im_scale = scale_factor.flip([1]).tile([1, 2]).unsqueeze(1) - bboxes /= im_scale - bbox_pred, bbox_num, _ = self.nms(bboxes, mlvl_scores) + scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1) + scale_factor = paddle.concat( + [scale_x, scale_y, scale_x, scale_y], + axis=-1).reshape([-1, 1, 4]) + # scale bbox to origin image size. + pred_bboxes /= scale_factor + bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores) return bbox_pred, bbox_num @@ -374,29 +431,29 @@ class PicoHeadV2(GFLHead): 'conv_feat', 'dgqp_module', 'loss_class', 'loss_dfl', 'loss_bbox', 'static_assigner', 'assigner', 'nms' ] - __shared__ = ['num_classes'] - - def __init__( - self, - conv_feat='PicoFeatV2', - dgqp_module=None, - num_classes=80, - fpn_stride=[8, 16, 32], - prior_prob=0.01, - use_align_head=True, - loss_class='VariFocalLoss', - loss_dfl='DistributionFocalLoss', - loss_bbox='GIoULoss', - static_assigner_epoch=60, - static_assigner='ATSSAssigner', - assigner='TaskAlignedAssigner', - reg_max=16, - feat_in_chan=96, - nms=None, - nms_pre=1000, - cell_offset=0, - act='hard_swish', - grid_cell_scale=5.0, ): + __shared__ = ['num_classes', 'eval_size'] + + def __init__(self, + conv_feat='PicoFeatV2', + dgqp_module=None, + num_classes=80, + fpn_stride=[8, 16, 32], + prior_prob=0.01, + use_align_head=True, + loss_class='VariFocalLoss', + loss_dfl='DistributionFocalLoss', + loss_bbox='GIoULoss', + static_assigner_epoch=60, + static_assigner='ATSSAssigner', + assigner='TaskAlignedAssigner', + reg_max=16, + feat_in_chan=96, + nms=None, + nms_pre=1000, + cell_offset=0, + act='hard_swish', + grid_cell_scale=5.0, + eval_size=None): super(PicoHeadV2, self).__init__( conv_feat=conv_feat, dgqp_module=dgqp_module, @@ -432,6 +489,7 @@ class PicoHeadV2(GFLHead): self.grid_cell_scale = grid_cell_scale self.use_align_head = use_align_head self.cls_out_channels = self.num_classes + self.eval_size = eval_size bias_init_value = -math.log((1 - self.prior_prob) / self.prior_prob) # Clear the super class initialization @@ -478,11 +536,22 @@ class PicoHeadV2(GFLHead): act=self.act, use_act_in_out=False)) + # initialize the anchor points + if self.eval_size: + self.anchor_points, self.stride_tensor = self._generate_anchors() + def forward(self, fpn_feats, export_post_process=True): assert len(fpn_feats) == len( self.fpn_stride ), "The size of fpn_feats is not equal to size of fpn_stride" + if self.training: + return self.forward_train(fpn_feats) + else: + return self.forward_eval( + fpn_feats, export_post_process=export_post_process) + + def forward_train(self, fpn_feats): cls_score_list, reg_list, box_list = [], [], [] for i, (fpn_feat, stride) in enumerate(zip(fpn_feats, self.fpn_stride)): b, _, h, w = get_static_shape(fpn_feat) @@ -498,7 +567,48 @@ class PicoHeadV2(GFLHead): else: cls_score = F.sigmoid(cls_logit) - if not export_post_process and not self.training: + cls_score_out = cls_score.transpose([0, 2, 3, 1]) + bbox_pred = reg_pred.transpose([0, 2, 3, 1]) + b, cell_h, cell_w, _ = paddle.shape(cls_score_out) + y, x = self.get_single_level_center_point( + [cell_h, cell_w], stride, cell_offset=self.cell_offset) + center_points = paddle.stack([x, y], axis=-1) + cls_score_out = cls_score_out.reshape( + [b, -1, self.cls_out_channels]) + bbox_pred = self.distribution_project(bbox_pred) * stride + bbox_pred = bbox_pred.reshape([b, cell_h * cell_w, 4]) + bbox_pred = batch_distance2bbox( + center_points, bbox_pred, max_shapes=None) + cls_score_list.append(cls_score.flatten(2).transpose([0, 2, 1])) + reg_list.append(reg_pred.flatten(2).transpose([0, 2, 1])) + box_list.append(bbox_pred / stride) + + cls_score_list = paddle.concat(cls_score_list, axis=1) + box_list = paddle.concat(box_list, axis=1) + reg_list = paddle.concat(reg_list, axis=1) + return cls_score_list, reg_list, box_list, fpn_feats + + def forward_eval(self, fpn_feats, export_post_process=True): + if self.eval_size: + anchor_points, stride_tensor = self.anchor_points, self.stride_tensor + else: + anchor_points, stride_tensor = self._generate_anchors(fpn_feats) + cls_score_list, box_list = [], [] + for i, (fpn_feat, stride) in enumerate(zip(fpn_feats, self.fpn_stride)): + b, _, h, w = fpn_feat.shape + # task decomposition + conv_cls_feat, se_feat = self.conv_feat(fpn_feat, i) + cls_logit = self.head_cls_list[i](se_feat) + reg_pred = self.head_reg_list[i](se_feat) + + # cls prediction and alignment + if self.use_align_head: + cls_prob = F.sigmoid(self.cls_align[i](conv_cls_feat)) + cls_score = (F.sigmoid(cls_logit) * cls_prob + eps).sqrt() + else: + cls_score = F.sigmoid(cls_logit) + + if not export_post_process: # Now only supports batch size = 1 in deploy cls_score_list.append( cls_score.reshape([1, self.cls_out_channels, -1]).transpose( @@ -507,34 +617,21 @@ class PicoHeadV2(GFLHead): reg_pred.reshape([1, (self.reg_max + 1) * 4, -1]).transpose( [0, 2, 1])) else: - cls_score_out = cls_score.transpose([0, 2, 3, 1]) + l = h * w + cls_score_out = cls_score.reshape([b, self.cls_out_channels, l]) bbox_pred = reg_pred.transpose([0, 2, 3, 1]) - b, cell_h, cell_w, _ = paddle.shape(cls_score_out) - y, x = self.get_single_level_center_point( - [cell_h, cell_w], stride, cell_offset=self.cell_offset) - center_points = paddle.stack([x, y], axis=-1) - cls_score_out = cls_score_out.reshape( - [b, -1, self.cls_out_channels]) - bbox_pred = self.distribution_project(bbox_pred) * stride - bbox_pred = bbox_pred.reshape([b, cell_h * cell_w, 4]) - bbox_pred = batch_distance2bbox( - center_points, bbox_pred, max_shapes=None) - if not self.training: - cls_score_list.append(cls_score_out) - box_list.append(bbox_pred) - else: - cls_score_list.append( - cls_score.flatten(2).transpose([0, 2, 1])) - reg_list.append(reg_pred.flatten(2).transpose([0, 2, 1])) - box_list.append(bbox_pred / stride) - - if not self.training: - return cls_score_list, box_list - else: - cls_score_list = paddle.concat(cls_score_list, axis=1) + bbox_pred = self.distribution_project(bbox_pred) + bbox_pred = bbox_pred.reshape([b, l, 4]) + cls_score_list.append(cls_score_out) + box_list.append(bbox_pred) + + if export_post_process: + cls_score_list = paddle.concat(cls_score_list, axis=-1) box_list = paddle.concat(box_list, axis=1) - reg_list = paddle.concat(reg_list, axis=1) - return cls_score_list, reg_list, box_list, fpn_feats + box_list = batch_distance2bbox(anchor_points, box_list) + box_list *= stride_tensor + + return cls_score_list, box_list def get_loss(self, head_outs, gt_meta): pred_scores, pred_regs, pred_bboxes, fpn_feats = head_outs @@ -628,8 +725,7 @@ class PicoHeadV2(GFLHead): loss_dfl = paddle.zeros([1]) avg_factor = flatten_assigned_scores.sum() - if paddle.fluid.core.is_compiled_with_dist( - ) and parallel_helper._is_parallel_ctx_initialized(): + if paddle.distributed.get_world_size() > 1: paddle.distributed.all_reduce(avg_factor) avg_factor = paddle.clip( avg_factor / paddle.distributed.get_world_size(), min=1) @@ -644,20 +740,41 @@ class PicoHeadV2(GFLHead): return loss_states - def post_process(self, - gfl_head_outs, - im_shape, - scale_factor, - export_nms=True): - cls_scores, bboxes_reg = gfl_head_outs - bboxes = paddle.concat(bboxes_reg, axis=1) - mlvl_scores = paddle.concat(cls_scores, axis=1) - mlvl_scores = mlvl_scores.transpose([0, 2, 1]) + def _generate_anchors(self, feats=None): + # just use in eval time + anchor_points = [] + stride_tensor = [] + for i, stride in enumerate(self.fpn_stride): + if feats is not None: + _, _, h, w = feats[i].shape + else: + h = math.ceil(self.eval_size[0] / stride) + w = math.ceil(self.eval_size[1] / stride) + shift_x = paddle.arange(end=w) + self.cell_offset + shift_y = paddle.arange(end=h) + self.cell_offset + shift_y, shift_x = paddle.meshgrid(shift_y, shift_x) + anchor_point = paddle.cast( + paddle.stack( + [shift_x, shift_y], axis=-1), dtype='float32') + anchor_points.append(anchor_point.reshape([-1, 2])) + stride_tensor.append( + paddle.full( + [h * w, 1], stride, dtype='float32')) + anchor_points = paddle.concat(anchor_points) + stride_tensor = paddle.concat(stride_tensor) + return anchor_points, stride_tensor + + def post_process(self, head_outs, scale_factor, export_nms=True): + pred_scores, pred_bboxes = head_outs if not export_nms: - return bboxes, mlvl_scores + return pred_bboxes, pred_scores else: # rescale: [h_scale, w_scale] -> [w_scale, h_scale, w_scale, h_scale] - im_scale = scale_factor.flip([1]).tile([1, 2]).unsqueeze(1) - bboxes /= im_scale - bbox_pred, bbox_num, _ = self.nms(bboxes, mlvl_scores) + scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1) + scale_factor = paddle.concat( + [scale_x, scale_y, scale_x, scale_y], + axis=-1).reshape([-1, 1, 4]) + # scale bbox to origin image size. + pred_bboxes /= scale_factor + bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores) return bbox_pred, bbox_num diff --git a/ppdet/modeling/heads/ppyoloe_head.py b/ppdet/modeling/heads/ppyoloe_head.py index 920bb2298909e5275c9bc04f3c73cce3f4c8ff36..4e9c303dc64252b26a9ff3153cf65f34b53196a4 100644 --- a/ppdet/modeling/heads/ppyoloe_head.py +++ b/ppdet/modeling/heads/ppyoloe_head.py @@ -22,7 +22,8 @@ from ..losses import GIoULoss from ..initializer import bias_init_with_prob, constant_, normal_ from ..assigners.utils import generate_anchors_for_grid_cell from ppdet.modeling.backbones.cspresnet import ConvBNLayer -from ppdet.modeling.ops import get_static_shape, paddle_distributed_is_initialized, get_act_fn +from ppdet.modeling.ops import get_static_shape, get_act_fn +from ppdet.modeling.layers import MultiClassNMS __all__ = ['PPYOLOEHead'] @@ -45,7 +46,7 @@ class ESEAttn(nn.Layer): @register class PPYOLOEHead(nn.Layer): - __shared__ = ['num_classes', 'trt', 'exclude_nms'] + __shared__ = ['num_classes', 'eval_size', 'trt', 'exclude_nms'] __inject__ = ['static_assigner', 'assigner', 'nms'] def __init__(self, @@ -61,7 +62,7 @@ class PPYOLOEHead(nn.Layer): static_assigner='ATSSAssigner', assigner='TaskAlignedAssigner', nms='MultiClassNMS', - eval_input_size=[], + eval_size=None, loss_weight={ 'class': 1.0, 'iou': 2.5, @@ -80,12 +81,14 @@ class PPYOLOEHead(nn.Layer): self.iou_loss = GIoULoss() self.loss_weight = loss_weight self.use_varifocal_loss = use_varifocal_loss - self.eval_input_size = eval_input_size + self.eval_size = eval_size self.static_assigner_epoch = static_assigner_epoch self.static_assigner = static_assigner self.assigner = assigner self.nms = nms + if isinstance(self.nms, MultiClassNMS) and trt: + self.nms.trt = trt self.exclude_nms = exclude_nms # stem self.stem_cls = nn.LayerList() @@ -108,6 +111,7 @@ class PPYOLOEHead(nn.Layer): in_c, 4 * (self.reg_max + 1), 3, padding=1)) # projection conv self.proj_conv = nn.Conv2D(self.reg_max + 1, 1, 1, bias_attr=False) + self.proj_conv.skip_quant = True self._init_weights() @classmethod @@ -127,10 +131,10 @@ class PPYOLOEHead(nn.Layer): self.proj.reshape([1, self.reg_max + 1, 1, 1])) self.proj_conv.weight.stop_gradient = True - if self.eval_input_size: + if self.eval_size: anchor_points, stride_tensor = self._generate_anchors() - self.register_buffer('anchor_points', anchor_points) - self.register_buffer('stride_tensor', stride_tensor) + self.anchor_points = anchor_points + self.stride_tensor = stride_tensor def forward_train(self, feats, targets): anchors, anchor_points, num_anchors_list, stride_tensor = \ @@ -156,7 +160,7 @@ class PPYOLOEHead(nn.Layer): num_anchors_list, stride_tensor ], targets) - def _generate_anchors(self, feats=None): + def _generate_anchors(self, feats=None, dtype='float32'): # just use in eval time anchor_points = [] stride_tensor = [] @@ -164,24 +168,22 @@ class PPYOLOEHead(nn.Layer): if feats is not None: _, _, h, w = feats[i].shape else: - h = int(self.eval_input_size[0] / stride) - w = int(self.eval_input_size[1] / stride) + h = int(self.eval_size[0] / stride) + w = int(self.eval_size[1] / stride) shift_x = paddle.arange(end=w) + self.grid_cell_offset shift_y = paddle.arange(end=h) + self.grid_cell_offset shift_y, shift_x = paddle.meshgrid(shift_y, shift_x) anchor_point = paddle.cast( paddle.stack( - [shift_x, shift_y], axis=-1), dtype='float32') + [shift_x, shift_y], axis=-1), dtype=dtype) anchor_points.append(anchor_point.reshape([-1, 2])) - stride_tensor.append( - paddle.full( - [h * w, 1], stride, dtype='float32')) + stride_tensor.append(paddle.full([h * w, 1], stride, dtype=dtype)) anchor_points = paddle.concat(anchor_points) stride_tensor = paddle.concat(stride_tensor) return anchor_points, stride_tensor def forward_eval(self, feats): - if self.eval_input_size: + if self.eval_size: anchor_points, stride_tensor = self.anchor_points, self.stride_tensor else: anchor_points, stride_tensor = self._generate_anchors(feats) @@ -290,7 +292,7 @@ class PPYOLOEHead(nn.Layer): else: loss_l1 = paddle.zeros([1]) loss_iou = paddle.zeros([1]) - loss_dfl = paddle.zeros([1]) + loss_dfl = pred_dist.sum() * 0. return loss_l1, loss_iou, loss_dfl def get_loss(self, head_outs, gt_meta): @@ -331,14 +333,15 @@ class PPYOLOEHead(nn.Layer): assigned_bboxes /= stride_tensor # cls loss if self.use_varifocal_loss: - one_hot_label = F.one_hot(assigned_labels, self.num_classes) + one_hot_label = F.one_hot(assigned_labels, + self.num_classes + 1)[..., :-1] loss_cls = self._varifocal_loss(pred_scores, assigned_scores, one_hot_label) else: loss_cls = self._focal_loss(pred_scores, assigned_scores, alpha_l) assigned_scores_sum = assigned_scores.sum() - if paddle_distributed_is_initialized(): + if paddle.distributed.get_world_size() > 1: paddle.distributed.all_reduce(assigned_scores_sum) assigned_scores_sum = paddle.clip( assigned_scores_sum / paddle.distributed.get_world_size(), @@ -361,7 +364,7 @@ class PPYOLOEHead(nn.Layer): } return out_dict - def post_process(self, head_outs, img_shape, scale_factor): + def post_process(self, head_outs, scale_factor): pred_scores, pred_dist, anchor_points, stride_tensor = head_outs pred_bboxes = batch_distance2bbox(anchor_points, pred_dist.transpose([0, 2, 1])) diff --git a/ppdet/modeling/heads/retina_head.py b/ppdet/modeling/heads/retina_head.py index e8f5cbd0ac194d5adcaa0893cf12f0ffaa0161e9..8705e86febb30d06fcbbd06187a76548450c9600 100644 --- a/ppdet/modeling/heads/retina_head.py +++ b/ppdet/modeling/heads/retina_head.py @@ -1,4 +1,4 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,17 +16,20 @@ from __future__ import absolute_import from __future__ import division from __future__ import print_function -import math, paddle +import math +import paddle import paddle.nn as nn import paddle.nn.functional as F from paddle import ParamAttr from paddle.nn.initializer import Normal, Constant -from ppdet.modeling.proposal_generator import AnchorGenerator -from ppdet.core.workspace import register +from ppdet.modeling.bbox_utils import bbox2delta, delta2bbox from ppdet.modeling.heads.fcos_head import FCOSFeat +from ppdet.core.workspace import register + __all__ = ['RetinaHead'] + @register class RetinaFeat(FCOSFeat): """We use FCOSFeat to construct conv layers in RetinaNet. @@ -34,72 +37,49 @@ class RetinaFeat(FCOSFeat): """ pass -@register -class RetinaAnchorGenerator(AnchorGenerator): - def __init__(self, - octave_base_scale=4, - scales_per_octave=3, - aspect_ratios=[0.5, 1.0, 2.0], - strides=[8.0, 16.0, 32.0, 64.0, 128.0], - variance=[1.0, 1.0, 1.0, 1.0], - offset=0.0): - anchor_sizes = [] - for s in strides: - anchor_sizes.append([ - s * octave_base_scale * 2**(i/scales_per_octave) \ - for i in range(scales_per_octave)]) - super(RetinaAnchorGenerator, self).__init__( - anchor_sizes=anchor_sizes, - aspect_ratios=aspect_ratios, - strides=strides, - variance=variance, - offset=offset) @register class RetinaHead(nn.Layer): """Used in RetinaNet proposed in paper https://arxiv.org/pdf/1708.02002.pdf """ + __shared__ = ['num_classes'] __inject__ = [ - 'conv_feat', 'anchor_generator', 'bbox_assigner', - 'bbox_coder', 'loss_class', 'loss_bbox', 'nms'] + 'conv_feat', 'anchor_generator', 'bbox_assigner', 'loss_class', + 'loss_bbox', 'nms' + ] + def __init__(self, num_classes=80, + conv_feat='RetinaFeat', + anchor_generator='RetinaAnchorGenerator', + bbox_assigner='MaxIoUAssigner', + loss_class='FocalLoss', + loss_bbox='SmoothL1Loss', + nms='MultiClassNMS', prior_prob=0.01, - decode_reg_out=False, - conv_feat=None, - anchor_generator=None, - bbox_assigner=None, - bbox_coder=None, - loss_class=None, - loss_bbox=None, nms_pre=1000, - nms=None): + weights=[1., 1., 1., 1.]): super(RetinaHead, self).__init__() self.num_classes = num_classes - self.prior_prob = prior_prob - # allow RetinaNet to use IoU based losses. - self.decode_reg_out = decode_reg_out self.conv_feat = conv_feat self.anchor_generator = anchor_generator self.bbox_assigner = bbox_assigner - self.bbox_coder = bbox_coder self.loss_class = loss_class self.loss_bbox = loss_bbox - self.nms_pre = nms_pre self.nms = nms - self.cls_out_channels = num_classes - self.init_layers() + self.nms_pre = nms_pre + self.weights = weights - def init_layers(self): - bias_init_value = -math.log((1 - self.prior_prob) / self.prior_prob) + bias_init_value = -math.log((1 - prior_prob) / prior_prob) num_anchors = self.anchor_generator.num_anchors self.retina_cls = nn.Conv2D( in_channels=self.conv_feat.feat_out, - out_channels=self.cls_out_channels * num_anchors, + out_channels=self.num_classes * num_anchors, kernel_size=3, stride=1, padding=1, - weight_attr=ParamAttr(initializer=Normal(mean=0.0, std=0.01)), + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), bias_attr=ParamAttr(initializer=Constant(value=bias_init_value))) self.retina_reg = nn.Conv2D( in_channels=self.conv_feat.feat_out, @@ -107,10 +87,11 @@ class RetinaHead(nn.Layer): kernel_size=3, stride=1, padding=1, - weight_attr=ParamAttr(initializer=Normal(mean=0.0, std=0.01)), + weight_attr=ParamAttr(initializer=Normal( + mean=0.0, std=0.01)), bias_attr=ParamAttr(initializer=Constant(value=0))) - def forward(self, neck_feats): + def forward(self, neck_feats, targets=None): cls_logits_list = [] bboxes_reg_list = [] for neck_feat in neck_feats: @@ -119,33 +100,40 @@ class RetinaHead(nn.Layer): bbox_reg = self.retina_reg(conv_reg_feat) cls_logits_list.append(cls_logits) bboxes_reg_list.append(bbox_reg) - return (cls_logits_list, bboxes_reg_list) - def get_loss(self, head_outputs, meta): + if self.training: + return self.get_loss([cls_logits_list, bboxes_reg_list], targets) + else: + return [cls_logits_list, bboxes_reg_list] + + def get_loss(self, head_outputs, targets): """Here we calculate loss for a batch of images. We assign anchors to gts in each image and gather all the assigned postive and negative samples. Then loss is calculated on the gathered samples. """ - cls_logits, bboxes_reg = head_outputs - # we use the same anchor for all images - anchors = self.anchor_generator(cls_logits) + cls_logits_list, bboxes_reg_list = head_outputs + anchors = self.anchor_generator(cls_logits_list) anchors = paddle.concat(anchors) # matches: contain gt_inds # match_labels: -1(ignore), 0(neg) or 1(pos) matches_list, match_labels_list = [], [] # assign anchors to gts, no sampling is involved - for gt_bbox in meta['gt_bbox']: + for gt_bbox in targets['gt_bbox']: matches, match_labels = self.bbox_assigner(anchors, gt_bbox) matches_list.append(matches) match_labels_list.append(match_labels) + # reshape network outputs - cls_logits = [_.transpose([0, 2, 3, 1]) for _ in cls_logits] - cls_logits = [_.reshape([0, -1, self.cls_out_channels]) \ - for _ in cls_logits] - bboxes_reg = [_.transpose([0, 2, 3, 1]) for _ in bboxes_reg] - bboxes_reg = [_.reshape([0, -1, 4]) for _ in bboxes_reg] + cls_logits = [ + _.transpose([0, 2, 3, 1]).reshape([0, -1, self.num_classes]) + for _ in cls_logits_list + ] + bboxes_reg = [ + _.transpose([0, 2, 3, 1]).reshape([0, -1, 4]) + for _ in bboxes_reg_list + ] cls_logits = paddle.concat(cls_logits, axis=1) bboxes_reg = paddle.concat(bboxes_reg, axis=1) @@ -154,7 +142,7 @@ class RetinaHead(nn.Layer): # find and gather preds and targets in each image for matches, match_labels, cls_logit, bbox_reg, gt_bbox, gt_class in \ zip(matches_list, match_labels_list, cls_logits, bboxes_reg, - meta['gt_bbox'], meta['gt_class']): + targets['gt_bbox'], targets['gt_class']): pos_mask = (match_labels == 1) neg_mask = (match_labels == 0) chosen_mask = paddle.logical_or(pos_mask, neg_mask) @@ -163,59 +151,65 @@ class RetinaHead(nn.Layer): bg_class = paddle.to_tensor( [self.num_classes], dtype=gt_class.dtype) # a trick to assign num_classes to negative targets - gt_class = paddle.concat([gt_class, bg_class]) - matches = paddle.where( - neg_mask, paddle.full_like(matches, gt_class.size-1), matches) + gt_class = paddle.concat([gt_class, bg_class], axis=-1) + matches = paddle.where(neg_mask, + paddle.full_like(matches, gt_class.size - 1), + matches) cls_pred = cls_logit[chosen_mask] - cls_tar = gt_class[matches[chosen_mask]] + cls_tar = gt_class[matches[chosen_mask]] reg_pred = bbox_reg[pos_mask].reshape([-1, 4]) reg_tar = gt_bbox[matches[pos_mask]].reshape([-1, 4]) - if self.decode_reg_out: - reg_pred = self.bbox_coder.decode( - anchors[pos_mask], reg_pred) - else: - reg_tar = self.bbox_coder.encode(anchors[pos_mask], reg_tar) + reg_tar = bbox2delta(anchors[pos_mask], reg_tar, self.weights) cls_pred_list.append(cls_pred) cls_tar_list.append(cls_tar) reg_pred_list.append(reg_pred) reg_tar_list.append(reg_tar) cls_pred = paddle.concat(cls_pred_list) - cls_tar = paddle.concat(cls_tar_list) + cls_tar = paddle.concat(cls_tar_list) reg_pred = paddle.concat(reg_pred_list) - reg_tar = paddle.concat(reg_tar_list) + reg_tar = paddle.concat(reg_tar_list) + avg_factor = max(1.0, reg_pred.shape[0]) cls_loss = self.loss_class( - cls_pred, cls_tar, reduction='sum')/avg_factor - if reg_pred.size == 0: - reg_loss = bboxes_reg[0][0].sum() * 0 + cls_pred, cls_tar, reduction='sum') / avg_factor + + if reg_pred.shape[0] == 0: + reg_loss = paddle.zeros([1]) + reg_loss.stop_gradient = False else: reg_loss = self.loss_bbox( - reg_pred, reg_tar, reduction='sum')/avg_factor - return dict(loss_cls=cls_loss, loss_reg=reg_loss) + reg_pred, reg_tar, reduction='sum') / avg_factor + + loss = cls_loss + reg_loss + out_dict = { + 'loss_cls': cls_loss, + 'loss_reg': reg_loss, + 'loss': loss, + } + return out_dict def get_bboxes_single(self, anchors, - cls_scores, - bbox_preds, + cls_scores_list, + bbox_preds_list, im_shape, scale_factor, rescale=True): - assert len(cls_scores) == len(bbox_preds) + assert len(cls_scores_list) == len(bbox_preds_list) mlvl_bboxes = [] mlvl_scores = [] - for anchor, cls_score, bbox_pred in zip(anchors, cls_scores, bbox_preds): + for anchor, cls_score, bbox_pred in zip(anchors, cls_scores_list, + bbox_preds_list): cls_score = cls_score.reshape([-1, self.num_classes]) bbox_pred = bbox_pred.reshape([-1, 4]) if self.nms_pre is not None and cls_score.shape[0] > self.nms_pre: max_score = cls_score.max(axis=1) _, topk_inds = max_score.topk(self.nms_pre) bbox_pred = bbox_pred.gather(topk_inds) - anchor = anchor.gather(topk_inds) + anchor = anchor.gather(topk_inds) cls_score = cls_score.gather(topk_inds) - bbox_pred = self.bbox_coder.decode( - anchor, bbox_pred, max_shape=im_shape) - bbox_pred = bbox_pred.squeeze() + bbox_pred = delta2bbox(bbox_pred, anchor, self.weights).squeeze() mlvl_bboxes.append(bbox_pred) mlvl_scores.append(F.sigmoid(cls_score)) mlvl_bboxes = paddle.concat(mlvl_bboxes) @@ -227,18 +221,15 @@ class RetinaHead(nn.Layer): mlvl_scores = mlvl_scores.transpose([1, 0]) return mlvl_bboxes, mlvl_scores - def decode(self, anchors, cls_scores, bbox_preds, im_shape, scale_factor): + def decode(self, anchors, cls_logits, bboxes_reg, im_shape, scale_factor): batch_bboxes = [] batch_scores = [] - for img_id in range(cls_scores[0].shape[0]): - num_lvls = len(cls_scores) - cls_score_list = [cls_scores[i][img_id] for i in range(num_lvls)] - bbox_pred_list = [bbox_preds[i][img_id] for i in range(num_lvls)] + for img_id in range(cls_logits[0].shape[0]): + num_lvls = len(cls_logits) + cls_scores_list = [cls_logits[i][img_id] for i in range(num_lvls)] + bbox_preds_list = [bboxes_reg[i][img_id] for i in range(num_lvls)] bboxes, scores = self.get_bboxes_single( - anchors, - cls_score_list, - bbox_pred_list, - im_shape[img_id], + anchors, cls_scores_list, bbox_preds_list, im_shape[img_id], scale_factor[img_id]) batch_bboxes.append(bboxes) batch_scores.append(scores) @@ -247,11 +238,12 @@ class RetinaHead(nn.Layer): return batch_bboxes, batch_scores def post_process(self, head_outputs, im_shape, scale_factor): - cls_scores, bbox_preds = head_outputs - anchors = self.anchor_generator(cls_scores) - cls_scores = [_.transpose([0, 2, 3, 1]) for _ in cls_scores] - bbox_preds = [_.transpose([0, 2, 3, 1]) for _ in bbox_preds] - bboxes, scores = self.decode( - anchors, cls_scores, bbox_preds, im_shape, scale_factor) + cls_logits_list, bboxes_reg_list = head_outputs + anchors = self.anchor_generator(cls_logits_list) + cls_logits = [_.transpose([0, 2, 3, 1]) for _ in cls_logits_list] + bboxes_reg = [_.transpose([0, 2, 3, 1]) for _ in bboxes_reg_list] + bboxes, scores = self.decode(anchors, cls_logits, bboxes_reg, im_shape, + scale_factor) + bbox_pred, bbox_num, _ = self.nms(bboxes, scores) return bbox_pred, bbox_num diff --git a/ppdet/modeling/heads/roi_extractor.py b/ppdet/modeling/heads/roi_extractor.py index 35c3924e36c60ddbc82f38f6b828197e31833b01..5d2b1528f07003193b03a02bc1320bfb2d304a6d 100644 --- a/ppdet/modeling/heads/roi_extractor.py +++ b/ppdet/modeling/heads/roi_extractor.py @@ -29,7 +29,7 @@ class RoIAlign(object): RoI Align module For more details, please refer to the document of roi_align in - in ppdet/modeing/ops.py + in https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/vision/ops.py Args: resolution (int): The output size, default 14 @@ -76,12 +76,12 @@ class RoIAlign(object): def __call__(self, feats, roi, rois_num): roi = paddle.concat(roi) if len(roi) > 1 else roi[0] if len(feats) == 1: - rois_feat = ops.roi_align( - feats[self.start_level], - roi, - self.resolution, - self.spatial_scale[0], - rois_num=rois_num, + rois_feat = paddle.vision.ops.roi_align( + x=feats[self.start_level], + boxes=roi, + boxes_num=rois_num, + output_size=self.resolution, + spatial_scale=self.spatial_scale[0], aligned=self.aligned) else: offset = 2 @@ -96,13 +96,13 @@ class RoIAlign(object): rois_num=rois_num) rois_feat_list = [] for lvl in range(self.start_level, self.end_level + 1): - roi_feat = ops.roi_align( - feats[lvl], - rois_dist[lvl], - self.resolution, - self.spatial_scale[lvl], + roi_feat = paddle.vision.ops.roi_align( + x=feats[lvl], + boxes=rois_dist[lvl], + boxes_num=rois_num_dist[lvl], + output_size=self.resolution, + spatial_scale=self.spatial_scale[lvl], sampling_ratio=self.sampling_ratio, - rois_num=rois_num_dist[lvl], aligned=self.aligned) rois_feat_list.append(roi_feat) rois_feat_shuffle = paddle.concat(rois_feat_list) diff --git a/ppdet/modeling/heads/s2anet_head.py b/ppdet/modeling/heads/s2anet_head.py index 7910379c402214090b211ed05e3b68a482f5c581..e17023d672532fb7aa786a98f95bdc3315906964 100644 --- a/ppdet/modeling/heads/s2anet_head.py +++ b/ppdet/modeling/heads/s2anet_head.py @@ -23,6 +23,7 @@ from ppdet.core.workspace import register from ppdet.modeling import ops from ppdet.modeling import bbox_utils from ppdet.modeling.proposal_generator.target_layer import RBoxAssigner +from ..cls_utils import _get_class_default_kwargs import numpy as np @@ -230,7 +231,7 @@ class S2ANetHead(nn.Layer): align_conv_type='AlignConv', align_conv_size=3, use_sigmoid_cls=True, - anchor_assign=RBoxAssigner().__dict__, + anchor_assign=_get_class_default_kwargs(RBoxAssigner), reg_loss_weight=[1.0, 1.0, 1.0, 1.0, 1.1], cls_loss_weight=[1.1, 1.05], reg_loss_type='l1'): @@ -600,9 +601,9 @@ class S2ANetHead(nn.Layer): fam_bbox = paddle.sum(fam_bbox, axis=-1) feat_bbox_weights = paddle.sum(feat_bbox_weights, axis=-1) try: - from rbox_iou_ops import rbox_iou + from ext_op import rbox_iou except Exception as e: - print("import custom_ops error, try install rbox_iou_ops " \ + print("import custom_ops error, try install ext_op " \ "following ppdet/ext_op/README.md", e) sys.stdout.flush() sys.exit(-1) @@ -715,9 +716,9 @@ class S2ANetHead(nn.Layer): odm_bbox = paddle.sum(odm_bbox, axis=-1) feat_bbox_weights = paddle.sum(feat_bbox_weights, axis=-1) try: - from rbox_iou_ops import rbox_iou + from ext_op import rbox_iou except Exception as e: - print("import custom_ops error, try install rbox_iou_ops " \ + print("import custom_ops error, try install ext_op " \ "following ppdet/ext_op/README.md", e) sys.stdout.flush() sys.exit(-1) diff --git a/ppdet/modeling/heads/simota_head.py b/ppdet/modeling/heads/simota_head.py index a1485f3905625cb579d95ae4465ca22fe777314f..77e515bbc9c2770dd371fec28d3fe9a628b4bd3a 100644 --- a/ppdet/modeling/heads/simota_head.py +++ b/ppdet/modeling/heads/simota_head.py @@ -179,8 +179,15 @@ class OTAHead(GFLHead): num_level_anchors) num_total_pos = sum(pos_num_l) try: - num_total_pos = paddle.distributed.all_reduce(num_total_pos.clone( - )) / paddle.distributed.get_world_size() + cloned_num_total_pos = num_total_pos.clone() + reduced_cloned_num_total_pos = paddle.distributed.all_reduce( + cloned_num_total_pos) + if reduced_cloned_num_total_pos is not None: + num_total_pos = reduced_cloned_num_total_pos / paddle.distributed.get_world_size( + ) + else: + num_total_pos = cloned_num_total_pos / paddle.distributed.get_world_size( + ) except: num_total_pos = max(num_total_pos, 1) @@ -255,7 +262,12 @@ class OTAHead(GFLHead): avg_factor = sum(avg_factor) try: - avg_factor = paddle.distributed.all_reduce(avg_factor.clone()) + avg_factor_clone = avg_factor.clone() + tmp_avg_factor = paddle.distributed.all_reduce(avg_factor_clone) + if tmp_avg_factor is not None: + avg_factor = tmp_avg_factor + else: + avg_factor = avg_factor_clone avg_factor = paddle.clip( avg_factor / paddle.distributed.get_world_size(), min=1) except: @@ -396,8 +408,15 @@ class OTAVFLHead(OTAHead): num_level_anchors) num_total_pos = sum(pos_num_l) try: - num_total_pos = paddle.distributed.all_reduce(num_total_pos.clone( - )) / paddle.distributed.get_world_size() + cloned_num_total_pos = num_total_pos.clone() + reduced_cloned_num_total_pos = paddle.distributed.all_reduce( + cloned_num_total_pos) + if reduced_cloned_num_total_pos is not None: + num_total_pos = reduced_cloned_num_total_pos / paddle.distributed.get_world_size( + ) + else: + num_total_pos = cloned_num_total_pos / paddle.distributed.get_world_size( + ) except: num_total_pos = max(num_total_pos, 1) @@ -475,7 +494,12 @@ class OTAVFLHead(OTAHead): avg_factor = sum(avg_factor) try: - avg_factor = paddle.distributed.all_reduce(avg_factor.clone()) + avg_factor_clone = avg_factor.clone() + tmp_avg_factor = paddle.distributed.all_reduce(avg_factor_clone) + if tmp_avg_factor is not None: + avg_factor = tmp_avg_factor + else: + avg_factor = avg_factor_clone avg_factor = paddle.clip( avg_factor / paddle.distributed.get_world_size(), min=1) except: diff --git a/ppdet/modeling/heads/ssd_head.py b/ppdet/modeling/heads/ssd_head.py index 07e7e92f9b98a26a708c32c103755c3923b356b5..a6df4824dc036d6419f73ec82dc00e8adf0bd780 100644 --- a/ppdet/modeling/heads/ssd_head.py +++ b/ppdet/modeling/heads/ssd_head.py @@ -20,6 +20,7 @@ from paddle.regularizer import L2Decay from paddle import ParamAttr from ..layers import AnchorGeneratorSSD +from ..cls_utils import _get_class_default_kwargs class SepConvLayer(nn.Layer): @@ -113,7 +114,7 @@ class SSDHead(nn.Layer): def __init__(self, num_classes=80, in_channels=(512, 1024, 512, 256, 256, 256), - anchor_generator=AnchorGeneratorSSD().__dict__, + anchor_generator=_get_class_default_kwargs(AnchorGeneratorSSD), kernel_size=3, padding=1, use_sepconv=False, diff --git a/ppdet/modeling/heads/yolo_head.py b/ppdet/modeling/heads/yolo_head.py index 7b4e9bc3353edfe42c0c35db6fb38b67de03c730..0a63060d02aab1d20901ab7c4422d58e55166c3d 100644 --- a/ppdet/modeling/heads/yolo_head.py +++ b/ppdet/modeling/heads/yolo_head.py @@ -1,3 +1,17 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import paddle import paddle.nn as nn import paddle.nn.functional as F @@ -5,6 +19,17 @@ from paddle import ParamAttr from paddle.regularizer import L2Decay from ppdet.core.workspace import register +import math +import numpy as np +from ..initializer import bias_init_with_prob, constant_ +from ..backbones.csp_darknet import BaseConv, DWConv +from ..losses import IouLoss +from ppdet.modeling.assigners.simota_assigner import SimOTAAssigner +from ppdet.modeling.bbox_utils import bbox_overlaps +from ppdet.modeling.layers import MultiClassNMS + +__all__ = ['YOLOv3Head', 'YOLOXHead'] + def _de_sigmoid(x, eps=1e-7): x = paddle.clip(x, eps, 1. / eps) @@ -122,3 +147,270 @@ class YOLOv3Head(nn.Layer): @classmethod def from_config(cls, cfg, input_shape): return {'in_channels': [i.channels for i in input_shape], } + + +@register +class YOLOXHead(nn.Layer): + __shared__ = ['num_classes', 'width_mult', 'act', 'trt', 'exclude_nms'] + __inject__ = ['assigner', 'nms'] + + def __init__(self, + num_classes=80, + width_mult=1.0, + depthwise=False, + in_channels=[256, 512, 1024], + feat_channels=256, + fpn_strides=(8, 16, 32), + l1_epoch=285, + act='silu', + assigner=SimOTAAssigner(use_vfl=False), + nms='MultiClassNMS', + loss_weight={ + 'cls': 1.0, + 'obj': 1.0, + 'iou': 5.0, + 'l1': 1.0, + }, + trt=False, + exclude_nms=False): + super(YOLOXHead, self).__init__() + self._dtype = paddle.framework.get_default_dtype() + self.num_classes = num_classes + assert len(in_channels) > 0, "in_channels length should > 0" + self.in_channels = in_channels + feat_channels = int(feat_channels * width_mult) + self.fpn_strides = fpn_strides + self.l1_epoch = l1_epoch + self.assigner = assigner + self.nms = nms + if isinstance(self.nms, MultiClassNMS) and trt: + self.nms.trt = trt + self.exclude_nms = exclude_nms + self.loss_weight = loss_weight + self.iou_loss = IouLoss(loss_weight=1.0) # default loss_weight 2.5 + + ConvBlock = DWConv if depthwise else BaseConv + + self.stem_conv = nn.LayerList() + self.conv_cls = nn.LayerList() + self.conv_reg = nn.LayerList() # reg [x,y,w,h] + obj + for in_c in self.in_channels: + self.stem_conv.append(BaseConv(in_c, feat_channels, 1, 1, act=act)) + + self.conv_cls.append( + nn.Sequential(* [ + ConvBlock( + feat_channels, feat_channels, 3, 1, act=act), ConvBlock( + feat_channels, feat_channels, 3, 1, act=act), + nn.Conv2D( + feat_channels, + self.num_classes, + 1, + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + ])) + + self.conv_reg.append( + nn.Sequential(* [ + ConvBlock( + feat_channels, feat_channels, 3, 1, act=act), + ConvBlock( + feat_channels, feat_channels, 3, 1, act=act), + nn.Conv2D( + feat_channels, + 4 + 1, # reg [x,y,w,h] + obj + 1, + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + ])) + + self._init_weights() + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + def _init_weights(self): + bias_cls = bias_init_with_prob(0.01) + bias_reg = paddle.full([5], math.log(5.), dtype=self._dtype) + bias_reg[:2] = 0. + bias_reg[-1] = bias_cls + for cls_, reg_ in zip(self.conv_cls, self.conv_reg): + constant_(cls_[-1].weight) + constant_(cls_[-1].bias, bias_cls) + constant_(reg_[-1].weight) + reg_[-1].bias.set_value(bias_reg) + + def _generate_anchor_point(self, feat_sizes, strides, offset=0.): + anchor_points, stride_tensor = [], [] + num_anchors_list = [] + for feat_size, stride in zip(feat_sizes, strides): + h, w = feat_size + x = (paddle.arange(w) + offset) * stride + y = (paddle.arange(h) + offset) * stride + y, x = paddle.meshgrid(y, x) + anchor_points.append(paddle.stack([x, y], axis=-1).reshape([-1, 2])) + stride_tensor.append( + paddle.full( + [len(anchor_points[-1]), 1], stride, dtype=self._dtype)) + num_anchors_list.append(len(anchor_points[-1])) + anchor_points = paddle.concat(anchor_points).astype(self._dtype) + anchor_points.stop_gradient = True + stride_tensor = paddle.concat(stride_tensor) + stride_tensor.stop_gradient = True + return anchor_points, stride_tensor, num_anchors_list + + def forward(self, feats, targets=None): + assert len(feats) == len(self.fpn_strides), \ + "The size of feats is not equal to size of fpn_strides" + + feat_sizes = [[f.shape[-2], f.shape[-1]] for f in feats] + cls_score_list, reg_pred_list = [], [] + obj_score_list = [] + for i, feat in enumerate(feats): + feat = self.stem_conv[i](feat) + cls_logit = self.conv_cls[i](feat) + reg_pred = self.conv_reg[i](feat) + # cls prediction + cls_score = F.sigmoid(cls_logit) + cls_score_list.append(cls_score.flatten(2).transpose([0, 2, 1])) + # reg prediction + reg_xywh, obj_logit = paddle.split(reg_pred, [4, 1], axis=1) + reg_xywh = reg_xywh.flatten(2).transpose([0, 2, 1]) + reg_pred_list.append(reg_xywh) + # obj prediction + obj_score = F.sigmoid(obj_logit) + obj_score_list.append(obj_score.flatten(2).transpose([0, 2, 1])) + + cls_score_list = paddle.concat(cls_score_list, axis=1) + reg_pred_list = paddle.concat(reg_pred_list, axis=1) + obj_score_list = paddle.concat(obj_score_list, axis=1) + + # bbox decode + anchor_points, stride_tensor, _ =\ + self._generate_anchor_point(feat_sizes, self.fpn_strides) + reg_xy, reg_wh = paddle.split(reg_pred_list, 2, axis=-1) + reg_xy += (anchor_points / stride_tensor) + reg_wh = paddle.exp(reg_wh) * 0.5 + bbox_pred_list = paddle.concat( + [reg_xy - reg_wh, reg_xy + reg_wh], axis=-1) + + if self.training: + anchor_points, stride_tensor, num_anchors_list =\ + self._generate_anchor_point(feat_sizes, self.fpn_strides, 0.5) + yolox_losses = self.get_loss([ + cls_score_list, bbox_pred_list, obj_score_list, anchor_points, + stride_tensor, num_anchors_list + ], targets) + return yolox_losses + else: + pred_scores = (cls_score_list * obj_score_list).sqrt() + return pred_scores, bbox_pred_list, stride_tensor + + def get_loss(self, head_outs, targets): + pred_cls, pred_bboxes, pred_obj,\ + anchor_points, stride_tensor, num_anchors_list = head_outs + gt_labels = targets['gt_class'] + gt_bboxes = targets['gt_bbox'] + pred_scores = (pred_cls * pred_obj).sqrt() + # label assignment + center_and_strides = paddle.concat( + [anchor_points, stride_tensor, stride_tensor], axis=-1) + pos_num_list, label_list, bbox_target_list = [], [], [] + for pred_score, pred_bbox, gt_box, gt_label in zip( + pred_scores.detach(), + pred_bboxes.detach() * stride_tensor, gt_bboxes, gt_labels): + pos_num, label, _, bbox_target = self.assigner( + pred_score, center_and_strides, pred_bbox, gt_box, gt_label) + pos_num_list.append(pos_num) + label_list.append(label) + bbox_target_list.append(bbox_target) + labels = paddle.to_tensor(np.stack(label_list, axis=0)) + bbox_targets = paddle.to_tensor(np.stack(bbox_target_list, axis=0)) + bbox_targets /= stride_tensor # rescale bbox + + # 1. obj score loss + mask_positive = (labels != self.num_classes) + loss_obj = F.binary_cross_entropy( + pred_obj, + mask_positive.astype(pred_obj.dtype).unsqueeze(-1), + reduction='sum') + + num_pos = sum(pos_num_list) + + if num_pos > 0: + num_pos = paddle.to_tensor(num_pos, dtype=self._dtype).clip(min=1) + loss_obj /= num_pos + + # 2. iou loss + bbox_mask = mask_positive.unsqueeze(-1).tile([1, 1, 4]) + pred_bboxes_pos = paddle.masked_select(pred_bboxes, + bbox_mask).reshape([-1, 4]) + assigned_bboxes_pos = paddle.masked_select( + bbox_targets, bbox_mask).reshape([-1, 4]) + bbox_iou = bbox_overlaps(pred_bboxes_pos, assigned_bboxes_pos) + bbox_iou = paddle.diag(bbox_iou) + + loss_iou = self.iou_loss( + pred_bboxes_pos.split( + 4, axis=-1), + assigned_bboxes_pos.split( + 4, axis=-1)) + loss_iou = loss_iou.sum() / num_pos + + # 3. cls loss + cls_mask = mask_positive.unsqueeze(-1).tile( + [1, 1, self.num_classes]) + pred_cls_pos = paddle.masked_select( + pred_cls, cls_mask).reshape([-1, self.num_classes]) + assigned_cls_pos = paddle.masked_select(labels, mask_positive) + assigned_cls_pos = F.one_hot(assigned_cls_pos, + self.num_classes + 1)[..., :-1] + assigned_cls_pos *= bbox_iou.unsqueeze(-1) + loss_cls = F.binary_cross_entropy( + pred_cls_pos, assigned_cls_pos, reduction='sum') + loss_cls /= num_pos + + # 4. l1 loss + if targets['epoch_id'] >= self.l1_epoch: + loss_l1 = F.l1_loss( + pred_bboxes_pos, assigned_bboxes_pos, reduction='sum') + loss_l1 /= num_pos + else: + loss_l1 = paddle.zeros([1]) + loss_l1.stop_gradient = False + else: + loss_cls = paddle.zeros([1]) + loss_iou = paddle.zeros([1]) + loss_l1 = paddle.zeros([1]) + loss_cls.stop_gradient = False + loss_iou.stop_gradient = False + loss_l1.stop_gradient = False + + loss = self.loss_weight['obj'] * loss_obj + \ + self.loss_weight['cls'] * loss_cls + \ + self.loss_weight['iou'] * loss_iou + + if targets['epoch_id'] >= self.l1_epoch: + loss += (self.loss_weight['l1'] * loss_l1) + + yolox_losses = { + 'loss': loss, + 'loss_cls': loss_cls, + 'loss_obj': loss_obj, + 'loss_iou': loss_iou, + 'loss_l1': loss_l1, + } + return yolox_losses + + def post_process(self, head_outs, img_shape, scale_factor): + pred_scores, pred_bboxes, stride_tensor = head_outs + pred_scores = pred_scores.transpose([0, 2, 1]) + pred_bboxes *= stride_tensor + # scale bbox to origin image + scale_factor = scale_factor.flip(-1).tile([1, 2]).unsqueeze(1) + pred_bboxes /= scale_factor + if self.exclude_nms: + # `exclude_nms=True` just use in benchmark + return pred_bboxes.sum(), pred_scores.sum() + else: + bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores) + return bbox_pred, bbox_num diff --git a/ppdet/modeling/initializer.py b/ppdet/modeling/initializer.py index b7a135dcca7309be09fa4819e21d77a02d332d57..b482f133dd9ac1e2568f5c971f004117c56a5368 100644 --- a/ppdet/modeling/initializer.py +++ b/ppdet/modeling/initializer.py @@ -273,7 +273,8 @@ def linear_init_(module): def conv_init_(module): bound = 1 / np.sqrt(np.prod(module.weight.shape[1:])) uniform_(module.weight, -bound, bound) - uniform_(module.bias, -bound, bound) + if module.bias is not None: + uniform_(module.bias, -bound, bound) def bias_init_with_prob(prior_prob=0.01): diff --git a/ppdet/modeling/layers.py b/ppdet/modeling/layers.py index 055cbf4f6388ce2acf6ef5d4cd58ab93dbbc9fcb..4afc7f560d1b69ed690eec05d53efd40283a33d5 100644 --- a/ppdet/modeling/layers.py +++ b/ppdet/modeling/layers.py @@ -440,7 +440,8 @@ class MultiClassNMS(object): normalized=True, nms_eta=1.0, return_index=False, - return_rois_num=True): + return_rois_num=True, + trt=False): super(MultiClassNMS, self).__init__() self.score_threshold = score_threshold self.nms_top_k = nms_top_k @@ -450,6 +451,7 @@ class MultiClassNMS(object): self.nms_eta = nms_eta self.return_index = return_index self.return_rois_num = return_rois_num + self.trt = trt def __call__(self, bboxes, score, background_label=-1): """ @@ -471,7 +473,19 @@ class MultiClassNMS(object): kwargs.update({'rois_num': bbox_num}) if background_label > -1: kwargs.update({'background_label': background_label}) - return ops.multiclass_nms(bboxes, score, **kwargs) + kwargs.pop('trt') + # TODO(wangxinxin08): paddle version should be develop or 2.3 and above to run nms on tensorrt + if self.trt and (int(paddle.version.major) == 0 or + (int(paddle.version.major) >= 2 and + int(paddle.version.minor) >= 3)): + # TODO(wangxinxin08): tricky switch to run nms on tensorrt + kwargs.update({'nms_eta': 1.1}) + bbox, bbox_num, _ = ops.multiclass_nms(bboxes, score, **kwargs) + mask = paddle.slice(bbox, [-1], [0], [1]) != -1 + bbox = paddle.masked_select(bbox, mask).reshape((-1, 6)) + return bbox, bbox_num, None + else: + return ops.multiclass_nms(bboxes, score, **kwargs) @register @@ -540,10 +554,15 @@ class YOLOBox(object): origin_shape = im_shape / scale_factor origin_shape = paddle.cast(origin_shape, 'int32') for i, head_out in enumerate(yolo_head_out): - boxes, scores = ops.yolo_box(head_out, origin_shape, anchors[i], - self.num_classes, self.conf_thresh, - self.downsample_ratio // 2**i, - self.clip_bbox, self.scale_x_y) + boxes, scores = paddle.vision.ops.yolo_box( + head_out, + origin_shape, + anchors[i], + self.num_classes, + self.conf_thresh, + self.downsample_ratio // 2**i, + self.clip_bbox, + scale_x_y=self.scale_x_y) boxes_list.append(boxes) scores_list.append(paddle.transpose(scores, perm=[0, 2, 1])) yolo_boxes = paddle.concat(boxes_list, axis=1) @@ -608,94 +627,6 @@ class SSDBox(object): return output_boxes, output_scores -@register -@serializable -class AnchorGrid(object): - """Generate anchor grid - - Args: - image_size (int or list): input image size, may be a single integer or - list of [h, w]. Default: 512 - min_level (int): min level of the feature pyramid. Default: 3 - max_level (int): max level of the feature pyramid. Default: 7 - anchor_base_scale: base anchor scale. Default: 4 - num_scales: number of anchor scales. Default: 3 - aspect_ratios: aspect ratios. default: [[1, 1], [1.4, 0.7], [0.7, 1.4]] - """ - - def __init__(self, - image_size=512, - min_level=3, - max_level=7, - anchor_base_scale=4, - num_scales=3, - aspect_ratios=[[1, 1], [1.4, 0.7], [0.7, 1.4]]): - super(AnchorGrid, self).__init__() - if isinstance(image_size, Integral): - self.image_size = [image_size, image_size] - else: - self.image_size = image_size - for dim in self.image_size: - assert dim % 2 ** max_level == 0, \ - "image size should be multiple of the max level stride" - self.min_level = min_level - self.max_level = max_level - self.anchor_base_scale = anchor_base_scale - self.num_scales = num_scales - self.aspect_ratios = aspect_ratios - - @property - def base_cell(self): - if not hasattr(self, '_base_cell'): - self._base_cell = self.make_cell() - return self._base_cell - - def make_cell(self): - scales = [2**(i / self.num_scales) for i in range(self.num_scales)] - scales = np.array(scales) - ratios = np.array(self.aspect_ratios) - ws = np.outer(scales, ratios[:, 0]).reshape(-1, 1) - hs = np.outer(scales, ratios[:, 1]).reshape(-1, 1) - anchors = np.hstack((-0.5 * ws, -0.5 * hs, 0.5 * ws, 0.5 * hs)) - return anchors - - def make_grid(self, stride): - cell = self.base_cell * stride * self.anchor_base_scale - x_steps = np.arange(stride // 2, self.image_size[1], stride) - y_steps = np.arange(stride // 2, self.image_size[0], stride) - offset_x, offset_y = np.meshgrid(x_steps, y_steps) - offset_x = offset_x.flatten() - offset_y = offset_y.flatten() - offsets = np.stack((offset_x, offset_y, offset_x, offset_y), axis=-1) - offsets = offsets[:, np.newaxis, :] - return (cell + offsets).reshape(-1, 4) - - def generate(self): - return [ - self.make_grid(2**l) - for l in range(self.min_level, self.max_level + 1) - ] - - def __call__(self): - if not hasattr(self, '_anchor_vars'): - anchor_vars = [] - helper = LayerHelper('anchor_grid') - for idx, l in enumerate(range(self.min_level, self.max_level + 1)): - stride = 2**l - anchors = self.make_grid(stride) - var = helper.create_parameter( - attr=ParamAttr(name='anchors_{}'.format(idx)), - shape=anchors.shape, - dtype='float32', - stop_gradient=True, - default_initializer=NumpyArrayInitializer(anchors)) - anchor_vars.append(var) - var.persistable = True - self._anchor_vars = anchor_vars - - return self._anchor_vars - - @register @serializable class FCOSBox(object): @@ -818,7 +749,6 @@ class TTFBox(object): # batch size is 1 scores_r = paddle.reshape(scores, [cat, -1]) topk_scores, topk_inds = paddle.topk(scores_r, k) - topk_scores, topk_inds = paddle.topk(scores_r, k) topk_ys = topk_inds // width topk_xs = topk_inds % width diff --git a/ppdet/modeling/losses/detr_loss.py b/ppdet/modeling/losses/detr_loss.py index 5a589d4a2b4dae5644dc8b8ecf6f839c68559bdb..e22c5d8b101234e8b1032a540e8c98d290631f02 100644 --- a/ppdet/modeling/losses/detr_loss.py +++ b/ppdet/modeling/losses/detr_loss.py @@ -80,7 +80,7 @@ class DETRLoss(nn.Layer): target_label = target_label.reshape([bs, num_query_objects]) if self.use_focal_loss: target_label = F.one_hot(target_label, - self.num_classes + 1)[:, :, :-1] + self.num_classes + 1)[..., :-1] return { 'loss_class': self.loss_coeff['class'] * sigmoid_focal_loss( logits, target_label, num_gts / num_query_objects) diff --git a/ppdet/modeling/losses/iou_loss.py b/ppdet/modeling/losses/iou_loss.py index 9b8da6c059c90d7bbba7b8e5688aa7556a38ac63..b5cac22e342e633b5c413805623ba4015073b3b1 100644 --- a/ppdet/modeling/losses/iou_loss.py +++ b/ppdet/modeling/losses/iou_loss.py @@ -17,13 +17,13 @@ from __future__ import division from __future__ import print_function import numpy as np - +import math import paddle from ppdet.core.workspace import register, serializable from ..bbox_utils import bbox_iou -__all__ = ['IouLoss', 'GIoULoss', 'DIouLoss'] +__all__ = ['IouLoss', 'GIoULoss', 'DIouLoss', 'SIoULoss'] @register @@ -208,3 +208,88 @@ class DIouLoss(GIoULoss): diou = paddle.mean((1 - iouk + ciou_term + diou_term) * iou_weight) return diou * self.loss_weight + + +@register +@serializable +class SIoULoss(GIoULoss): + """ + see https://arxiv.org/pdf/2205.12740.pdf + Args: + loss_weight (float): siou loss weight, default as 1 + eps (float): epsilon to avoid divide by zero, default as 1e-10 + theta (float): default as 4 + reduction (str): Options are "none", "mean" and "sum". default as none + """ + + def __init__(self, loss_weight=1., eps=1e-10, theta=4., reduction='none'): + super(SIoULoss, self).__init__(loss_weight=loss_weight, eps=eps) + self.loss_weight = loss_weight + self.eps = eps + self.theta = theta + self.reduction = reduction + + def __call__(self, pbox, gbox): + x1, y1, x2, y2 = paddle.split(pbox, num_or_sections=4, axis=-1) + x1g, y1g, x2g, y2g = paddle.split(gbox, num_or_sections=4, axis=-1) + + box1 = [x1, y1, x2, y2] + box2 = [x1g, y1g, x2g, y2g] + iou = bbox_iou(box1, box2) + + cx = (x1 + x2) / 2 + cy = (y1 + y2) / 2 + w = x2 - x1 + self.eps + h = y2 - y1 + self.eps + + cxg = (x1g + x2g) / 2 + cyg = (y1g + y2g) / 2 + wg = x2g - x1g + self.eps + hg = y2g - y1g + self.eps + + x2 = paddle.maximum(x1, x2) + y2 = paddle.maximum(y1, y2) + + # A or B + xc1 = paddle.minimum(x1, x1g) + yc1 = paddle.minimum(y1, y1g) + xc2 = paddle.maximum(x2, x2g) + yc2 = paddle.maximum(y2, y2g) + + cw_out = xc2 - xc1 + ch_out = yc2 - yc1 + + ch = paddle.maximum(cy, cyg) - paddle.minimum(cy, cyg) + cw = paddle.maximum(cx, cxg) - paddle.minimum(cx, cxg) + + # angle cost + dist_intersection = paddle.sqrt((cx - cxg)**2 + (cy - cyg)**2) + sin_angle_alpha = ch / dist_intersection + sin_angle_beta = cw / dist_intersection + thred = paddle.pow(paddle.to_tensor(2), 0.5) / 2 + thred.stop_gradient = True + sin_alpha = paddle.where(sin_angle_alpha > thred, sin_angle_beta, + sin_angle_alpha) + angle_cost = paddle.cos(paddle.asin(sin_alpha) * 2 - math.pi / 2) + + # distance cost + gamma = 2 - angle_cost + # gamma.stop_gradient = True + beta_x = ((cxg - cx) / cw_out)**2 + beta_y = ((cyg - cy) / ch_out)**2 + dist_cost = 1 - paddle.exp(-gamma * beta_x) + 1 - paddle.exp(-gamma * + beta_y) + + # shape cost + omega_w = paddle.abs(w - wg) / paddle.maximum(w, wg) + omega_h = paddle.abs(hg - h) / paddle.maximum(h, hg) + omega = (1 - paddle.exp(-omega_w))**self.theta + ( + 1 - paddle.exp(-omega_h))**self.theta + siou_loss = 1 - iou + (omega + dist_cost) / 2 + + if self.reduction == 'mean': + siou_loss = paddle.mean(siou_loss) + elif self.reduction == 'sum': + siou_loss = paddle.sum(siou_loss) + + return siou_loss * self.loss_weight diff --git a/ppdet/modeling/losses/sparsercnn_loss.py b/ppdet/modeling/losses/sparsercnn_loss.py index 2d36b21a2302d6c070728cfb4a213c09232f1853..8b7db92fada6f6e3f3dd7999fda35f6e750a1f12 100644 --- a/ppdet/modeling/losses/sparsercnn_loss.py +++ b/ppdet/modeling/losses/sparsercnn_loss.py @@ -198,7 +198,7 @@ class SparseRCNNLoss(nn.Layer): # Retrieve the matching between the outputs of the last layer and the targets indices = self.matcher(outputs_without_aux, targets) - # Compute the average number of target boxes accross all nodes, for normalization purposes + # Compute the average number of target boxes across all nodes, for normalization purposes num_boxes = sum(len(t["labels"]) for t in targets) num_boxes = paddle.to_tensor( [num_boxes], diff --git a/ppdet/modeling/losses/ssd_loss.py b/ppdet/modeling/losses/ssd_loss.py index 62aecc1f33a104531edc2a77015e27847bb92506..2ab94f2b5bbf1f31fe47d186a92ac805cdf6daf3 100644 --- a/ppdet/modeling/losses/ssd_loss.py +++ b/ppdet/modeling/losses/ssd_loss.py @@ -20,8 +20,7 @@ import paddle import paddle.nn as nn import paddle.nn.functional as F from ppdet.core.workspace import register -from ..ops import iou_similarity -from ..bbox_utils import bbox2delta +from ..bbox_utils import iou_similarity, bbox2delta __all__ = ['SSDLoss'] diff --git a/ppdet/modeling/losses/yolo_loss.py b/ppdet/modeling/losses/yolo_loss.py index 657959cd7e55cf43d6362f03e1a4c1204b814c07..1ba05f2c8eae530e44e20d21375f7cf9b9cd1fb0 100644 --- a/ppdet/modeling/losses/yolo_loss.py +++ b/ppdet/modeling/losses/yolo_loss.py @@ -21,7 +21,7 @@ import paddle.nn as nn import paddle.nn.functional as F from ppdet.core.workspace import register -from ..bbox_utils import decode_yolo, xywh2xyxy, iou_similarity +from ..bbox_utils import decode_yolo, xywh2xyxy, batch_iou_similarity __all__ = ['YOLOv3Loss'] @@ -80,7 +80,7 @@ class YOLOv3Loss(nn.Layer): gwh = gbox[:, :, 0:2] + gbox[:, :, 2:4] * 0.5 gbox = paddle.concat([gxy, gwh], axis=-1) - iou = iou_similarity(pbox, gbox) + iou = batch_iou_similarity(pbox, gbox) iou.stop_gradient = True iou_max = iou.max(2) # [N, M1] iou_mask = paddle.cast(iou_max <= self.ignore_thresh, dtype=pbox.dtype) diff --git a/ppdet/modeling/mot/matching/__init__.py b/ppdet/modeling/mot/matching/__init__.py index 54c6680f79f16247c562a9da1024dd3e1de4c57f..f6a88c5673a50452415b1f86f7b18bac12297f49 100644 --- a/ppdet/modeling/mot/matching/__init__.py +++ b/ppdet/modeling/mot/matching/__init__.py @@ -14,6 +14,8 @@ from . import jde_matching from . import deepsort_matching +from . import ocsort_matching from .jde_matching import * from .deepsort_matching import * +from .ocsort_matching import * diff --git a/ppdet/modeling/mot/matching/jde_matching.py b/ppdet/modeling/mot/matching/jde_matching.py index e9c40dba4d3f2a82f8138229ff20b6d27cc1a0e5..3b1cf02edd75cb960e433926274b761d49136033 100644 --- a/ppdet/modeling/mot/matching/jde_matching.py +++ b/ppdet/modeling/mot/matching/jde_matching.py @@ -15,7 +15,14 @@ This code is based on https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/matching.py """ -import lap +try: + import lap +except: + print( + 'Warning: Unable to use JDE/FairMOT/ByteTrack, please install lap, for example: `pip install lap`, see https://github.com/gatagat/lap' + ) + pass + import scipy import numpy as np from scipy.spatial.distance import cdist @@ -26,7 +33,7 @@ warnings.filterwarnings("ignore") __all__ = [ 'merge_matches', 'linear_assignment', - 'cython_bbox_ious', + 'bbox_ious', 'iou_distance', 'embedding_distance', 'fuse_motion', @@ -53,6 +60,12 @@ def merge_matches(m1, m2, shape): def linear_assignment(cost_matrix, thresh): + try: + import lap + except Exception as e: + raise RuntimeError( + 'Unable to use JDE/FairMOT/ByteTrack, please install lap, for example: `pip install lap`, see https://github.com/gatagat/lap' + ) if cost_matrix.size == 0: return np.empty( (0, 2), dtype=int), tuple(range(cost_matrix.shape[0])), tuple( @@ -68,22 +81,28 @@ def linear_assignment(cost_matrix, thresh): return matches, unmatched_a, unmatched_b -def cython_bbox_ious(atlbrs, btlbrs): - ious = np.zeros((len(atlbrs), len(btlbrs)), dtype=np.float) - if ious.size == 0: +def bbox_ious(atlbrs, btlbrs): + boxes = np.ascontiguousarray(atlbrs, dtype=np.float) + query_boxes = np.ascontiguousarray(btlbrs, dtype=np.float) + N = boxes.shape[0] + K = query_boxes.shape[0] + ious = np.zeros((N, K), dtype=boxes.dtype) + if N * K == 0: return ious - try: - import cython_bbox - except Exception as e: - print('cython_bbox not found, please install cython_bbox.' - 'for example: `pip install cython_bbox`.') - raise e - - ious = cython_bbox.bbox_overlaps( - np.ascontiguousarray( - atlbrs, dtype=np.float), - np.ascontiguousarray( - btlbrs, dtype=np.float)) + + for k in range(K): + box_area = ((query_boxes[k, 2] - query_boxes[k, 0] + 1) * + (query_boxes[k, 3] - query_boxes[k, 1] + 1)) + for n in range(N): + iw = (min(boxes[n, 2], query_boxes[k, 2]) - max( + boxes[n, 0], query_boxes[k, 0]) + 1) + if iw > 0: + ih = (min(boxes[n, 3], query_boxes[k, 3]) - max( + boxes[n, 1], query_boxes[k, 1]) + 1) + if ih > 0: + ua = float((boxes[n, 2] - boxes[n, 0] + 1) * (boxes[ + n, 3] - boxes[n, 1] + 1) + box_area - iw * ih) + ious[n, k] = iw * ih / ua return ious @@ -98,7 +117,7 @@ def iou_distance(atracks, btracks): else: atlbrs = [track.tlbr for track in atracks] btlbrs = [track.tlbr for track in btracks] - _ious = cython_bbox_ious(atlbrs, btlbrs) + _ious = bbox_ious(atlbrs, btlbrs) cost_matrix = 1 - _ious return cost_matrix diff --git a/ppdet/modeling/mot/matching/ocsort_matching.py b/ppdet/modeling/mot/matching/ocsort_matching.py new file mode 100644 index 0000000000000000000000000000000000000000..a32d76155b985f03b8ecbaedd88df70eaa9fd0fa --- /dev/null +++ b/ppdet/modeling/mot/matching/ocsort_matching.py @@ -0,0 +1,124 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/noahcao/OC_SORT/blob/master/trackers/ocsort_tracker/association.py +""" + +import os +import numpy as np + + +def iou_batch(bboxes1, bboxes2): + bboxes2 = np.expand_dims(bboxes2, 0) + bboxes1 = np.expand_dims(bboxes1, 1) + + xx1 = np.maximum(bboxes1[..., 0], bboxes2[..., 0]) + yy1 = np.maximum(bboxes1[..., 1], bboxes2[..., 1]) + xx2 = np.minimum(bboxes1[..., 2], bboxes2[..., 2]) + yy2 = np.minimum(bboxes1[..., 3], bboxes2[..., 3]) + w = np.maximum(0., xx2 - xx1) + h = np.maximum(0., yy2 - yy1) + area = w * h + iou_matrix = area / ((bboxes1[..., 2] - bboxes1[..., 0]) * + (bboxes1[..., 3] - bboxes1[..., 1]) + + (bboxes2[..., 2] - bboxes2[..., 0]) * + (bboxes2[..., 3] - bboxes2[..., 1]) - area) + return iou_matrix + + +def speed_direction_batch(dets, tracks): + tracks = tracks[..., np.newaxis] + CX1, CY1 = (dets[:, 0] + dets[:, 2]) / 2.0, (dets[:, 1] + dets[:, 3]) / 2.0 + CX2, CY2 = (tracks[:, 0] + tracks[:, 2]) / 2.0, ( + tracks[:, 1] + tracks[:, 3]) / 2.0 + dx = CX1 - CX2 + dy = CY1 - CY2 + norm = np.sqrt(dx**2 + dy**2) + 1e-6 + dx = dx / norm + dy = dy / norm + return dy, dx + + +def linear_assignment(cost_matrix): + try: + import lap + _, x, y = lap.lapjv(cost_matrix, extend_cost=True) + return np.array([[y[i], i] for i in x if i >= 0]) + except ImportError: + from scipy.optimize import linear_sum_assignment + x, y = linear_sum_assignment(cost_matrix) + return np.array(list(zip(x, y))) + + +def associate(detections, trackers, iou_threshold, velocities, previous_obs, + vdc_weight): + if (len(trackers) == 0): + return np.empty( + (0, 2), dtype=int), np.arange(len(detections)), np.empty( + (0, 5), dtype=int) + + Y, X = speed_direction_batch(detections, previous_obs) + inertia_Y, inertia_X = velocities[:, 0], velocities[:, 1] + inertia_Y = np.repeat(inertia_Y[:, np.newaxis], Y.shape[1], axis=1) + inertia_X = np.repeat(inertia_X[:, np.newaxis], X.shape[1], axis=1) + diff_angle_cos = inertia_X * X + inertia_Y * Y + diff_angle_cos = np.clip(diff_angle_cos, a_min=-1, a_max=1) + diff_angle = np.arccos(diff_angle_cos) + diff_angle = (np.pi / 2.0 - np.abs(diff_angle)) / np.pi + + valid_mask = np.ones(previous_obs.shape[0]) + valid_mask[np.where(previous_obs[:, 4] < 0)] = 0 + + iou_matrix = iou_batch(detections, trackers) + scores = np.repeat( + detections[:, -1][:, np.newaxis], trackers.shape[0], axis=1) + # iou_matrix = iou_matrix * scores # a trick sometiems works, we don't encourage this + valid_mask = np.repeat(valid_mask[:, np.newaxis], X.shape[1], axis=1) + + angle_diff_cost = (valid_mask * diff_angle) * vdc_weight + angle_diff_cost = angle_diff_cost.T + angle_diff_cost = angle_diff_cost * scores + + if min(iou_matrix.shape) > 0: + a = (iou_matrix > iou_threshold).astype(np.int32) + if a.sum(1).max() == 1 and a.sum(0).max() == 1: + matched_indices = np.stack(np.where(a), axis=1) + else: + matched_indices = linear_assignment(-(iou_matrix + angle_diff_cost)) + else: + matched_indices = np.empty(shape=(0, 2)) + + unmatched_detections = [] + for d, det in enumerate(detections): + if (d not in matched_indices[:, 0]): + unmatched_detections.append(d) + unmatched_trackers = [] + for t, trk in enumerate(trackers): + if (t not in matched_indices[:, 1]): + unmatched_trackers.append(t) + + # filter out matched with low IOU + matches = [] + for m in matched_indices: + if (iou_matrix[m[0], m[1]] < iou_threshold): + unmatched_detections.append(m[0]) + unmatched_trackers.append(m[1]) + else: + matches.append(m.reshape(1, 2)) + if (len(matches) == 0): + matches = np.empty((0, 2), dtype=int) + else: + matches = np.concatenate(matches, axis=0) + + return matches, np.array(unmatched_detections), np.array(unmatched_trackers) diff --git a/ppdet/modeling/mot/tracker/__init__.py b/ppdet/modeling/mot/tracker/__init__.py index b74593b4126d878cd655326e58369f5b6f76a2ae..03a5dd0a169203b86edbc6c81a44a095ebe9b3cc 100644 --- a/ppdet/modeling/mot/tracker/__init__.py +++ b/ppdet/modeling/mot/tracker/__init__.py @@ -16,8 +16,10 @@ from . import base_jde_tracker from . import base_sde_tracker from . import jde_tracker from . import deepsort_tracker +from . import ocsort_tracker from .base_jde_tracker import * from .base_sde_tracker import * from .jde_tracker import * from .deepsort_tracker import * +from .ocsort_tracker import * diff --git a/ppdet/modeling/mot/tracker/jde_tracker.py b/ppdet/modeling/mot/tracker/jde_tracker.py index 5d2dcb9a018e3c0a26af837c9abd0a965fdbc7df..9796e6ceb328d5bcbec256aedb6654e53d1bc850 100644 --- a/ppdet/modeling/mot/tracker/jde_tracker.py +++ b/ppdet/modeling/mot/tracker/jde_tracker.py @@ -44,7 +44,7 @@ class JDETracker(object): track_buffer (int): buffer for tracker min_box_area (int): min box area to filter out low quality boxes vertical_ratio (float): w/h, the vertical ratio of the bbox to filter - bad results. If set <0 means no need to filter bboxes,usually set + bad results. If set <= 0 means no need to filter bboxes,usually set 1.6 for pedestrian tracking. tracked_thresh (float): linear assignment threshold of tracked stracks and detections @@ -70,8 +70,8 @@ class JDETracker(object): num_classes=1, det_thresh=0.3, track_buffer=30, - min_box_area=200, - vertical_ratio=1.6, + min_box_area=0, + vertical_ratio=0, tracked_thresh=0.7, r_tracked_thresh=0.5, unconfirmed_thresh=0.7, @@ -122,7 +122,7 @@ class JDETracker(object): Return: output_stracks_dict (dict(list)): The list contains information - regarding the online_tracklets for the recieved image tensor. + regarding the online_tracklets for the received image tensor. """ self.frame_id += 1 if self.frame_id == 1: @@ -167,9 +167,8 @@ class JDETracker(object): detections = [ STrack( STrack.tlbr_to_tlwh(tlbrs[2:6]), tlbrs[1], cls_id, - 30, temp_feat) - for (tlbrs, temp_feat - ) in zip(pred_dets_cls, pred_embs_cls) + 30, temp_feat) for (tlbrs, temp_feat) in + zip(pred_dets_cls, pred_embs_cls) ] else: detections = [] @@ -244,15 +243,13 @@ class JDETracker(object): for tlbrs in pred_dets_cls_second ] else: - pred_embs_cls_second = pred_embs_dict[cls_id][inds_second] + pred_embs_cls_second = pred_embs_dict[cls_id][ + inds_second] detections_second = [ STrack( - STrack.tlbr_to_tlwh(tlbrs[2:6]), - tlbrs[1], - cls_id, - 30, - temp_feat) - for (tlbrs, temp_feat) in zip(pred_dets_cls_second, pred_embs_cls_second) + STrack.tlbr_to_tlwh(tlbrs[2:6]), tlbrs[1], + cls_id, 30, temp_feat) for (tlbrs, temp_feat) in + zip(pred_dets_cls_second, pred_embs_cls_second) ] else: detections_second = [] diff --git a/ppdet/modeling/mot/tracker/ocsort_tracker.py b/ppdet/modeling/mot/tracker/ocsort_tracker.py new file mode 100644 index 0000000000000000000000000000000000000000..350e62c9cba46d3cd18d4bad97cc8b4dd0a8bdd7 --- /dev/null +++ b/ppdet/modeling/mot/tracker/ocsort_tracker.py @@ -0,0 +1,369 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +This code is based on https://github.com/noahcao/OC_SORT/blob/master/trackers/ocsort_tracker/ocsort.py +""" + +import numpy as np +try: + from filterpy.kalman import KalmanFilter +except: + print( + 'Warning: Unable to use OC-SORT, please install filterpy, for example: `pip install filterpy`, see https://github.com/rlabbe/filterpy' + ) + pass + +from ..matching.ocsort_matching import associate, linear_assignment, iou_batch +from ppdet.core.workspace import register, serializable + + +def k_previous_obs(observations, cur_age, k): + if len(observations) == 0: + return [-1, -1, -1, -1, -1] + for i in range(k): + dt = k - i + if cur_age - dt in observations: + return observations[cur_age - dt] + max_age = max(observations.keys()) + return observations[max_age] + + +def convert_bbox_to_z(bbox): + """ + Takes a bounding box in the form [x1,y1,x2,y2] and returns z in the form + [x,y,s,r] where x,y is the centre of the box and s is the scale/area and r is + the aspect ratio + """ + w = bbox[2] - bbox[0] + h = bbox[3] - bbox[1] + x = bbox[0] + w / 2. + y = bbox[1] + h / 2. + s = w * h # scale is just area + r = w / float(h + 1e-6) + return np.array([x, y, s, r]).reshape((4, 1)) + + +def convert_x_to_bbox(x, score=None): + """ + Takes a bounding box in the centre form [x,y,s,r] and returns it in the form + [x1,y1,x2,y2] where x1,y1 is the top left and x2,y2 is the bottom right + """ + w = np.sqrt(x[2] * x[3]) + h = x[2] / w + if (score == None): + return np.array( + [x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., + x[1] + h / 2.]).reshape((1, 4)) + else: + score = np.array([score]) + return np.array([ + x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., x[1] + h / 2., score + ]).reshape((1, 5)) + + +def speed_direction(bbox1, bbox2): + cx1, cy1 = (bbox1[0] + bbox1[2]) / 2.0, (bbox1[1] + bbox1[3]) / 2.0 + cx2, cy2 = (bbox2[0] + bbox2[2]) / 2.0, (bbox2[1] + bbox2[3]) / 2.0 + speed = np.array([cy2 - cy1, cx2 - cx1]) + norm = np.sqrt((cy2 - cy1)**2 + (cx2 - cx1)**2) + 1e-6 + return speed / norm + + +class KalmanBoxTracker(object): + """ + This class represents the internal state of individual tracked objects observed as bbox. + + Args: + bbox (np.array): bbox in [x1,y1,x2,y2,score] format. + delta_t (int): delta_t of previous observation + """ + count = 0 + + def __init__(self, bbox, delta_t=3): + try: + from filterpy.kalman import KalmanFilter + except Exception as e: + raise RuntimeError( + 'Unable to use OC-SORT, please install filterpy, for example: `pip install filterpy`, see https://github.com/rlabbe/filterpy' + ) + self.kf = KalmanFilter(dim_x=7, dim_z=4) + self.kf.F = np.array([[1, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 1, 0], + [0, 0, 1, 0, 0, 0, 1], [0, 0, 0, 1, 0, 0, 0], + [0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 1, 0], + [0, 0, 0, 0, 0, 0, 1]]) + self.kf.H = np.array([[1, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0], + [0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0]]) + self.kf.R[2:, 2:] *= 10. + self.kf.P[4:, 4:] *= 1000. + # give high uncertainty to the unobservable initial velocities + self.kf.P *= 10. + self.kf.Q[-1, -1] *= 0.01 + self.kf.Q[4:, 4:] *= 0.01 + + self.score = bbox[4] + self.kf.x[:4] = convert_bbox_to_z(bbox) + self.time_since_update = 0 + self.id = KalmanBoxTracker.count + KalmanBoxTracker.count += 1 + self.history = [] + self.hits = 0 + self.hit_streak = 0 + self.age = 0 + """ + NOTE: [-1,-1,-1,-1,-1] is a compromising placeholder for non-observation status, the same for the return of + function k_previous_obs. It is ugly and I do not like it. But to support generate observation array in a + fast and unified way, which you would see below k_observations = np.array([k_previous_obs(...]]), let's bear it for now. + """ + self.last_observation = np.array([-1, -1, -1, -1, -1]) # placeholder + self.observations = dict() + self.history_observations = [] + self.velocity = None + self.delta_t = delta_t + + def update(self, bbox): + """ + Updates the state vector with observed bbox. + """ + if bbox is not None: + if self.last_observation.sum() >= 0: # no previous observation + previous_box = None + for i in range(self.delta_t): + dt = self.delta_t - i + if self.age - dt in self.observations: + previous_box = self.observations[self.age - dt] + break + if previous_box is None: + previous_box = self.last_observation + """ + Estimate the track speed direction with observations \Delta t steps away + """ + self.velocity = speed_direction(previous_box, bbox) + """ + Insert new observations. This is a ugly way to maintain both self.observations + and self.history_observations. Bear it for the moment. + """ + self.last_observation = bbox + self.observations[self.age] = bbox + self.history_observations.append(bbox) + + self.time_since_update = 0 + self.history = [] + self.hits += 1 + self.hit_streak += 1 + self.kf.update(convert_bbox_to_z(bbox)) + else: + self.kf.update(bbox) + + def predict(self): + """ + Advances the state vector and returns the predicted bounding box estimate. + """ + if ((self.kf.x[6] + self.kf.x[2]) <= 0): + self.kf.x[6] *= 0.0 + + self.kf.predict() + self.age += 1 + if (self.time_since_update > 0): + self.hit_streak = 0 + self.time_since_update += 1 + self.history.append(convert_x_to_bbox(self.kf.x, score=self.score)) + return self.history[-1] + + def get_state(self): + return convert_x_to_bbox(self.kf.x, score=self.score) + + +@register +@serializable +class OCSORTTracker(object): + """ + OCSORT tracker, support single class + + Args: + det_thresh (float): threshold of detection score + max_age (int): maximum number of missed misses before a track is deleted + min_hits (int): minimum hits for associate + iou_threshold (float): iou threshold for associate + delta_t (int): delta_t of previous observation + inertia (float): vdc_weight of angle_diff_cost for associate + vertical_ratio (float): w/h, the vertical ratio of the bbox to filter + bad results. If set <= 0 means no need to filter bboxes,usually set + 1.6 for pedestrian tracking. + min_box_area (int): min box area to filter out low quality boxes + use_byte (bool): Whether use ByteTracker, default False + """ + + def __init__(self, + det_thresh=0.6, + max_age=30, + min_hits=3, + iou_threshold=0.3, + delta_t=3, + inertia=0.2, + vertical_ratio=-1, + min_box_area=0, + use_byte=False): + self.det_thresh = det_thresh + self.max_age = max_age + self.min_hits = min_hits + self.iou_threshold = iou_threshold + self.delta_t = delta_t + self.inertia = inertia + self.vertical_ratio = vertical_ratio + self.min_box_area = min_box_area + self.use_byte = use_byte + + self.trackers = [] + self.frame_count = 0 + KalmanBoxTracker.count = 0 + + def update(self, pred_dets, pred_embs=None): + """ + Args: + pred_dets (np.array): Detection results of the image, the shape is + [N, 6], means 'cls_id, score, x0, y0, x1, y1'. + pred_embs (np.array): Embedding results of the image, the shape is + [N, 128] or [N, 512], default as None. + + Return: + tracking boxes (np.array): [M, 6], means 'x0, y0, x1, y1, score, id'. + """ + if pred_dets is None: + return np.empty((0, 6)) + + self.frame_count += 1 + + bboxes = pred_dets[:, 2:] + scores = pred_dets[:, 1:2] + dets = np.concatenate((bboxes, scores), axis=1) + scores = scores.squeeze(-1) + + inds_low = scores > 0.1 + inds_high = scores < self.det_thresh + inds_second = np.logical_and(inds_low, inds_high) + # self.det_thresh > score > 0.1, for second matching + dets_second = dets[inds_second] # detections for second matching + remain_inds = scores > self.det_thresh + dets = dets[remain_inds] + + # get predicted locations from existing trackers. + trks = np.zeros((len(self.trackers), 5)) + to_del = [] + ret = [] + for t, trk in enumerate(trks): + pos = self.trackers[t].predict()[0] + trk[:] = [pos[0], pos[1], pos[2], pos[3], 0] + if np.any(np.isnan(pos)): + to_del.append(t) + trks = np.ma.compress_rows(np.ma.masked_invalid(trks)) + for t in reversed(to_del): + self.trackers.pop(t) + + velocities = np.array([ + trk.velocity if trk.velocity is not None else np.array((0, 0)) + for trk in self.trackers + ]) + last_boxes = np.array([trk.last_observation for trk in self.trackers]) + k_observations = np.array([ + k_previous_obs(trk.observations, trk.age, self.delta_t) + for trk in self.trackers + ]) + """ + First round of association + """ + matched, unmatched_dets, unmatched_trks = associate( + dets, trks, self.iou_threshold, velocities, k_observations, + self.inertia) + for m in matched: + self.trackers[m[1]].update(dets[m[0], :]) + """ + Second round of associaton by OCR + """ + # BYTE association + if self.use_byte and len(dets_second) > 0 and unmatched_trks.shape[ + 0] > 0: + u_trks = trks[unmatched_trks] + iou_left = iou_batch( + dets_second, + u_trks) # iou between low score detections and unmatched tracks + iou_left = np.array(iou_left) + if iou_left.max() > self.iou_threshold: + """ + NOTE: by using a lower threshold, e.g., self.iou_threshold - 0.1, you may + get a higher performance especially on MOT17/MOT20 datasets. But we keep it + uniform here for simplicity + """ + matched_indices = linear_assignment(-iou_left) + to_remove_trk_indices = [] + for m in matched_indices: + det_ind, trk_ind = m[0], unmatched_trks[m[1]] + if iou_left[m[0], m[1]] < self.iou_threshold: + continue + self.trackers[trk_ind].update(dets_second[det_ind, :]) + to_remove_trk_indices.append(trk_ind) + unmatched_trks = np.setdiff1d(unmatched_trks, + np.array(to_remove_trk_indices)) + + if unmatched_dets.shape[0] > 0 and unmatched_trks.shape[0] > 0: + left_dets = dets[unmatched_dets] + left_trks = last_boxes[unmatched_trks] + iou_left = iou_batch(left_dets, left_trks) + iou_left = np.array(iou_left) + if iou_left.max() > self.iou_threshold: + """ + NOTE: by using a lower threshold, e.g., self.iou_threshold - 0.1, you may + get a higher performance especially on MOT17/MOT20 datasets. But we keep it + uniform here for simplicity + """ + rematched_indices = linear_assignment(-iou_left) + to_remove_det_indices = [] + to_remove_trk_indices = [] + for m in rematched_indices: + det_ind, trk_ind = unmatched_dets[m[0]], unmatched_trks[m[ + 1]] + if iou_left[m[0], m[1]] < self.iou_threshold: + continue + self.trackers[trk_ind].update(dets[det_ind, :]) + to_remove_det_indices.append(det_ind) + to_remove_trk_indices.append(trk_ind) + unmatched_dets = np.setdiff1d(unmatched_dets, + np.array(to_remove_det_indices)) + unmatched_trks = np.setdiff1d(unmatched_trks, + np.array(to_remove_trk_indices)) + + for m in unmatched_trks: + self.trackers[m].update(None) + + # create and initialise new trackers for unmatched detections + for i in unmatched_dets: + trk = KalmanBoxTracker(dets[i, :], delta_t=self.delta_t) + self.trackers.append(trk) + i = len(self.trackers) + for trk in reversed(self.trackers): + if trk.last_observation.sum() < 0: + d = trk.get_state()[0] + else: + d = trk.last_observation # tlbr + score + if (trk.time_since_update < 1) and ( + trk.hit_streak >= self.min_hits or + self.frame_count <= self.min_hits): + # +1 as MOT benchmark requires positive + ret.append(np.concatenate((d, [trk.id + 1])).reshape(1, -1)) + i -= 1 + # remove dead tracklet + if (trk.time_since_update > self.max_age): + self.trackers.pop(i) + if (len(ret) > 0): + return np.concatenate(ret) + return np.empty((0, 6)) diff --git a/ppdet/modeling/necks/hrfpn.py b/ppdet/modeling/necks/hrfpn.py index eb4768b8ecba7c11dcd834265e1289db8d6ec7b0..5c45c9974b3bd213747cc7b6f0f5f670f38c61bf 100644 --- a/ppdet/modeling/necks/hrfpn.py +++ b/ppdet/modeling/necks/hrfpn.py @@ -37,7 +37,8 @@ class HRFPN(nn.Layer): out_channel=256, share_conv=False, extra_stage=1, - spatial_scales=[1. / 4, 1. / 8, 1. / 16, 1. / 32]): + spatial_scales=[1. / 4, 1. / 8, 1. / 16, 1. / 32], + use_bias=False): super(HRFPN, self).__init__() in_channel = sum(in_channels) self.in_channel = in_channel @@ -47,12 +48,14 @@ class HRFPN(nn.Layer): spatial_scales = spatial_scales + [spatial_scales[-1] / 2.] self.spatial_scales = spatial_scales self.num_out = len(self.spatial_scales) + self.use_bias = use_bias + bias_attr = False if use_bias is False else None self.reduction = nn.Conv2D( in_channels=in_channel, out_channels=out_channel, kernel_size=1, - bias_attr=False) + bias_attr=bias_attr) if share_conv: self.fpn_conv = nn.Conv2D( @@ -60,7 +63,7 @@ class HRFPN(nn.Layer): out_channels=out_channel, kernel_size=3, padding=1, - bias_attr=False) + bias_attr=bias_attr) else: self.fpn_conv = [] for i in range(self.num_out): @@ -72,7 +75,7 @@ class HRFPN(nn.Layer): out_channels=out_channel, kernel_size=3, padding=1, - bias_attr=False)) + bias_attr=bias_attr)) self.fpn_conv.append(conv) def forward(self, body_feats): diff --git a/ppdet/modeling/necks/yolo_fpn.py b/ppdet/modeling/necks/yolo_fpn.py index bc06b14dba4fb5a2e5570162a6151927cb2df8da..79f4cead360f872233f48be739e2357d4c9e1121 100644 --- a/ppdet/modeling/necks/yolo_fpn.py +++ b/ppdet/modeling/necks/yolo_fpn.py @@ -1,15 +1,15 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and # limitations under the License. import paddle @@ -17,10 +17,12 @@ import paddle.nn as nn import paddle.nn.functional as F from ppdet.core.workspace import register, serializable from ppdet.modeling.layers import DropBlock +from ppdet.modeling.ops import get_act_fn from ..backbones.darknet import ConvBNLayer from ..shape_spec import ShapeSpec +from ..backbones.csp_darknet import BaseConv, DWConv, CSPLayer -__all__ = ['YOLOv3FPN', 'PPYOLOFPN', 'PPYOLOTinyFPN', 'PPYOLOPAN'] +__all__ = ['YOLOv3FPN', 'PPYOLOFPN', 'PPYOLOTinyFPN', 'PPYOLOPAN', 'YOLOCSPPAN'] def add_coord(x, data_format): @@ -986,3 +988,112 @@ class PPYOLOPAN(nn.Layer): @property def out_shape(self): return [ShapeSpec(channels=c) for c in self._out_channels] + + +@register +@serializable +class YOLOCSPPAN(nn.Layer): + """ + YOLO CSP-PAN, used in YOLOv5 and YOLOX. + """ + __shared__ = ['depth_mult', 'data_format', 'act', 'trt'] + + def __init__(self, + depth_mult=1.0, + in_channels=[256, 512, 1024], + depthwise=False, + data_format='NCHW', + act='silu', + trt=False): + super(YOLOCSPPAN, self).__init__() + self.in_channels = in_channels + self._out_channels = in_channels + Conv = DWConv if depthwise else BaseConv + + self.data_format = data_format + act = get_act_fn( + act, trt=trt) if act is None or isinstance(act, + (str, dict)) else act + self.upsample = nn.Upsample(scale_factor=2, mode="nearest") + + # top-down fpn + self.lateral_convs = nn.LayerList() + self.fpn_blocks = nn.LayerList() + for idx in range(len(in_channels) - 1, 0, -1): + self.lateral_convs.append( + BaseConv( + int(in_channels[idx]), + int(in_channels[idx - 1]), + 1, + 1, + act=act)) + self.fpn_blocks.append( + CSPLayer( + int(in_channels[idx - 1] * 2), + int(in_channels[idx - 1]), + round(3 * depth_mult), + shortcut=False, + depthwise=depthwise, + act=act)) + + # bottom-up pan + self.downsample_convs = nn.LayerList() + self.pan_blocks = nn.LayerList() + for idx in range(len(in_channels) - 1): + self.downsample_convs.append( + Conv( + int(in_channels[idx]), + int(in_channels[idx]), + 3, + stride=2, + act=act)) + self.pan_blocks.append( + CSPLayer( + int(in_channels[idx] * 2), + int(in_channels[idx + 1]), + round(3 * depth_mult), + shortcut=False, + depthwise=depthwise, + act=act)) + + def forward(self, feats, for_mot=False): + assert len(feats) == len(self.in_channels) + + # top-down fpn + inner_outs = [feats[-1]] + for idx in range(len(self.in_channels) - 1, 0, -1): + feat_heigh = inner_outs[0] + feat_low = feats[idx - 1] + feat_heigh = self.lateral_convs[len(self.in_channels) - 1 - idx]( + feat_heigh) + inner_outs[0] = feat_heigh + + upsample_feat = F.interpolate( + feat_heigh, + scale_factor=2., + mode="nearest", + data_format=self.data_format) + inner_out = self.fpn_blocks[len(self.in_channels) - 1 - idx]( + paddle.concat( + [upsample_feat, feat_low], axis=1)) + inner_outs.insert(0, inner_out) + + # bottom-up pan + outs = [inner_outs[0]] + for idx in range(len(self.in_channels) - 1): + feat_low = outs[-1] + feat_height = inner_outs[idx + 1] + downsample_feat = self.downsample_convs[idx](feat_low) + out = self.pan_blocks[idx](paddle.concat( + [downsample_feat, feat_height], axis=1)) + outs.append(out) + + return outs + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape], } + + @property + def out_shape(self): + return [ShapeSpec(channels=c) for c in self._out_channels] diff --git a/ppdet/modeling/ops.py b/ppdet/modeling/ops.py index 52a4f33962a49c45a3597c372842d965a0023166..567c26d7233e561118e22ca3a8e7d74f7b7cf686 100644 --- a/ppdet/modeling/ops.py +++ b/ppdet/modeling/ops.py @@ -1,15 +1,15 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and # limitations under the License. import paddle @@ -17,18 +17,23 @@ import paddle.nn.functional as F import paddle.nn as nn from paddle import ParamAttr from paddle.regularizer import L2Decay +from paddle import _C_ops -from paddle.fluid.framework import Variable, in_dygraph_mode -from paddle.fluid import core -from paddle.fluid.dygraph import parallel_helper -from paddle.fluid.layer_helper import LayerHelper -from paddle.fluid.data_feeder import check_variable_and_dtype, check_type, check_dtype +from paddle import in_dynamic_mode +from paddle.common_ops_import import Variable, LayerHelper, check_variable_and_dtype, check_type, check_dtype __all__ = [ - 'roi_pool', 'roi_align', 'prior_box', 'generate_proposals', - 'iou_similarity', 'box_coder', 'yolo_box', 'multiclass_nms', - 'distribute_fpn_proposals', 'collect_fpn_proposals', 'matrix_nms', - 'batch_norm', 'mish', 'swish', 'identity' + 'prior_box', + 'generate_proposals', + 'box_coder', + 'multiclass_nms', + 'distribute_fpn_proposals', + 'matrix_nms', + 'batch_norm', + 'mish', + 'silu', + 'swish', + 'identity', ] @@ -40,13 +45,17 @@ def mish(x): return F.mish(x) if hasattr(F, mish) else x * F.tanh(F.softplus(x)) +def silu(x): + return F.silu(x) + + def swish(x): return x * F.sigmoid(x) -TRT_ACT_SPEC = {'swish': swish} +TRT_ACT_SPEC = {'swish': swish, 'silu': swish} -ACT_SPEC = {'mish': mish} +ACT_SPEC = {'mish': mish, 'silu': silu} def get_act_fn(act=None, trt=False): @@ -106,392 +115,6 @@ def batch_norm(ch, return norm_layer -@paddle.jit.not_to_static -def roi_pool(input, - rois, - output_size, - spatial_scale=1.0, - rois_num=None, - name=None): - """ - - This operator implements the roi_pooling layer. - Region of interest pooling (also known as RoI pooling) is to perform max pooling on inputs of nonuniform sizes to obtain fixed-size feature maps (e.g. 7*7). - - The operator has three steps: - - 1. Dividing each region proposal into equal-sized sections with output_size(h, w); - 2. Finding the largest value in each section; - 3. Copying these max values to the output buffer. - - For more information, please refer to https://stackoverflow.com/questions/43430056/what-is-roi-layer-in-fast-rcnn - - Args: - input (Tensor): Input feature, 4D-Tensor with the shape of [N,C,H,W], - where N is the batch size, C is the input channel, H is Height, W is weight. - The data type is float32 or float64. - rois (Tensor): ROIs (Regions of Interest) to pool over. - 2D-Tensor or 2D-LoDTensor with the shape of [num_rois,4], the lod level is 1. - Given as [[x1, y1, x2, y2], ...], (x1, y1) is the top left coordinates, - and (x2, y2) is the bottom right coordinates. - output_size (int or tuple[int, int]): The pooled output size(h, w), data type is int32. If int, h and w are both equal to output_size. - spatial_scale (float, optional): Multiplicative spatial scale factor to translate ROI coords from their input scale to the scale used when pooling. Default: 1.0 - rois_num (Tensor): The number of RoIs in each image. Default: None - name(str, optional): For detailed information, please refer - to :ref:`api_guide_Name`. Usually name is no need to set and - None by default. - - - Returns: - Tensor: The pooled feature, 4D-Tensor with the shape of [num_rois, C, output_size[0], output_size[1]]. - - - Examples: - - .. code-block:: python - - import paddle - from ppdet.modeling import ops - paddle.enable_static() - - x = paddle.static.data( - name='data', shape=[None, 256, 32, 32], dtype='float32') - rois = paddle.static.data( - name='rois', shape=[None, 4], dtype='float32') - rois_num = paddle.static.data(name='rois_num', shape=[None], dtype='int32') - - pool_out = ops.roi_pool( - input=x, - rois=rois, - output_size=(1, 1), - spatial_scale=1.0, - rois_num=rois_num) - """ - check_type(output_size, 'output_size', (int, tuple), 'roi_pool') - if isinstance(output_size, int): - output_size = (output_size, output_size) - - pooled_height, pooled_width = output_size - if in_dygraph_mode(): - assert rois_num is not None, "rois_num should not be None in dygraph mode." - pool_out, argmaxes = core.ops.roi_pool( - input, rois, rois_num, "pooled_height", pooled_height, - "pooled_width", pooled_width, "spatial_scale", spatial_scale) - return pool_out, argmaxes - - else: - check_variable_and_dtype(input, 'input', ['float32'], 'roi_pool') - check_variable_and_dtype(rois, 'rois', ['float32'], 'roi_pool') - helper = LayerHelper('roi_pool', **locals()) - dtype = helper.input_dtype() - pool_out = helper.create_variable_for_type_inference(dtype) - argmaxes = helper.create_variable_for_type_inference(dtype='int32') - - inputs = { - "X": input, - "ROIs": rois, - } - if rois_num is not None: - inputs['RoisNum'] = rois_num - helper.append_op( - type="roi_pool", - inputs=inputs, - outputs={"Out": pool_out, - "Argmax": argmaxes}, - attrs={ - "pooled_height": pooled_height, - "pooled_width": pooled_width, - "spatial_scale": spatial_scale - }) - return pool_out, argmaxes - - -@paddle.jit.not_to_static -def roi_align(input, - rois, - output_size, - spatial_scale=1.0, - sampling_ratio=-1, - rois_num=None, - aligned=True, - name=None): - """ - - Region of interest align (also known as RoI align) is to perform - bilinear interpolation on inputs of nonuniform sizes to obtain - fixed-size feature maps (e.g. 7*7) - - Dividing each region proposal into equal-sized sections with - the pooled_width and pooled_height. Location remains the origin - result. - - In each ROI bin, the value of the four regularly sampled locations - are computed directly through bilinear interpolation. The output is - the mean of four locations. - Thus avoid the misaligned problem. - - Args: - input (Tensor): Input feature, 4D-Tensor with the shape of [N,C,H,W], - where N is the batch size, C is the input channel, H is Height, W is weight. - The data type is float32 or float64. - rois (Tensor): ROIs (Regions of Interest) to pool over.It should be - a 2-D Tensor or 2-D LoDTensor of shape (num_rois, 4), the lod level is 1. - The data type is float32 or float64. Given as [[x1, y1, x2, y2], ...], - (x1, y1) is the top left coordinates, and (x2, y2) is the bottom right coordinates. - output_size (int or tuple[int, int]): The pooled output size(h, w), data type is int32. If int, h and w are both equal to output_size. - spatial_scale (float32, optional): Multiplicative spatial scale factor to translate ROI coords - from their input scale to the scale used when pooling. Default: 1.0 - sampling_ratio(int32, optional): number of sampling points in the interpolation grid. - If <=0, then grid points are adaptive to roi_width and pooled_w, likewise for height. Default: -1 - rois_num (Tensor): The number of RoIs in each image. Default: None - name(str, optional): For detailed information, please refer - to :ref:`api_guide_Name`. Usually name is no need to set and - None by default. - - Returns: - Tensor: - - Output: The output of ROIAlignOp is a 4-D tensor with shape (num_rois, channels, pooled_h, pooled_w). The data type is float32 or float64. - - - Examples: - .. code-block:: python - - import paddle - from ppdet.modeling import ops - paddle.enable_static() - - x = paddle.static.data( - name='data', shape=[None, 256, 32, 32], dtype='float32') - rois = paddle.static.data( - name='rois', shape=[None, 4], dtype='float32') - rois_num = paddle.static.data(name='rois_num', shape=[None], dtype='int32') - align_out = ops.roi_align(input=x, - rois=rois, - ouput_size=(7, 7), - spatial_scale=0.5, - sampling_ratio=-1, - rois_num=rois_num) - """ - check_type(output_size, 'output_size', (int, tuple), 'roi_align') - if isinstance(output_size, int): - output_size = (output_size, output_size) - - pooled_height, pooled_width = output_size - - if in_dygraph_mode(): - assert rois_num is not None, "rois_num should not be None in dygraph mode." - align_out = core.ops.roi_align( - input, rois, rois_num, "pooled_height", pooled_height, - "pooled_width", pooled_width, "spatial_scale", spatial_scale, - "sampling_ratio", sampling_ratio, "aligned", aligned) - return align_out - - else: - check_variable_and_dtype(input, 'input', ['float32', 'float64'], - 'roi_align') - check_variable_and_dtype(rois, 'rois', ['float32', 'float64'], - 'roi_align') - helper = LayerHelper('roi_align', **locals()) - dtype = helper.input_dtype() - align_out = helper.create_variable_for_type_inference(dtype) - inputs = { - "X": input, - "ROIs": rois, - } - if rois_num is not None: - inputs['RoisNum'] = rois_num - helper.append_op( - type="roi_align", - inputs=inputs, - outputs={"Out": align_out}, - attrs={ - "pooled_height": pooled_height, - "pooled_width": pooled_width, - "spatial_scale": spatial_scale, - "sampling_ratio": sampling_ratio, - "aligned": aligned, - }) - return align_out - - -@paddle.jit.not_to_static -def iou_similarity(x, y, box_normalized=True, name=None): - """ - Computes intersection-over-union (IOU) between two box lists. - Box list 'X' should be a LoDTensor and 'Y' is a common Tensor, - boxes in 'Y' are shared by all instance of the batched inputs of X. - Given two boxes A and B, the calculation of IOU is as follows: - - $$ - IOU(A, B) = - \\frac{area(A\\cap B)}{area(A)+area(B)-area(A\\cap B)} - $$ - - Args: - x (Tensor): Box list X is a 2-D Tensor with shape [N, 4] holds N - boxes, each box is represented as [xmin, ymin, xmax, ymax], - the shape of X is [N, 4]. [xmin, ymin] is the left top - coordinate of the box if the input is image feature map, they - are close to the origin of the coordinate system. - [xmax, ymax] is the right bottom coordinate of the box. - The data type is float32 or float64. - y (Tensor): Box list Y holds M boxes, each box is represented as - [xmin, ymin, xmax, ymax], the shape of X is [N, 4]. - [xmin, ymin] is the left top coordinate of the box if the - input is image feature map, and [xmax, ymax] is the right - bottom coordinate of the box. The data type is float32 or float64. - box_normalized(bool): Whether treat the priorbox as a normalized box. - Set true by default. - name(str, optional): For detailed information, please refer - to :ref:`api_guide_Name`. Usually name is no need to set and - None by default. - - Returns: - Tensor: The output of iou_similarity op, a tensor with shape [N, M] - representing pairwise iou scores. The data type is same with x. - - Examples: - .. code-block:: python - - import paddle - from ppdet.modeling import ops - paddle.enable_static() - - x = paddle.static.data(name='x', shape=[None, 4], dtype='float32') - y = paddle.static.data(name='y', shape=[None, 4], dtype='float32') - iou = ops.iou_similarity(x=x, y=y) - """ - - if in_dygraph_mode(): - out = core.ops.iou_similarity(x, y, 'box_normalized', box_normalized) - return out - else: - helper = LayerHelper("iou_similarity", **locals()) - out = helper.create_variable_for_type_inference(dtype=x.dtype) - - helper.append_op( - type="iou_similarity", - inputs={"X": x, - "Y": y}, - attrs={"box_normalized": box_normalized}, - outputs={"Out": out}) - return out - - -@paddle.jit.not_to_static -def collect_fpn_proposals(multi_rois, - multi_scores, - min_level, - max_level, - post_nms_top_n, - rois_num_per_level=None, - name=None): - """ - - **This OP only supports LoDTensor as input**. Concat multi-level RoIs - (Region of Interest) and select N RoIs with respect to multi_scores. - This operation performs the following steps: - - 1. Choose num_level RoIs and scores as input: num_level = max_level - min_level - 2. Concat multi-level RoIs and scores - 3. Sort scores and select post_nms_top_n scores - 4. Gather RoIs by selected indices from scores - 5. Re-sort RoIs by corresponding batch_id - - Args: - multi_rois(list): List of RoIs to collect. Element in list is 2-D - LoDTensor with shape [N, 4] and data type is float32 or float64, - N is the number of RoIs. - multi_scores(list): List of scores of RoIs to collect. Element in list - is 2-D LoDTensor with shape [N, 1] and data type is float32 or - float64, N is the number of RoIs. - min_level(int): The lowest level of FPN layer to collect - max_level(int): The highest level of FPN layer to collect - post_nms_top_n(int): The number of selected RoIs - rois_num_per_level(list, optional): The List of RoIs' numbers. - Each element is 1-D Tensor which contains the RoIs' number of each - image on each level and the shape is [B] and data type is - int32, B is the number of images. If it is not None then return - a 1-D Tensor contains the output RoIs' number of each image and - the shape is [B]. Default: None - name(str, optional): For detailed information, please refer - to :ref:`api_guide_Name`. Usually name is no need to set and - None by default. - - Returns: - Variable: - - fpn_rois(Variable): 2-D LoDTensor with shape [N, 4] and data type is - float32 or float64. Selected RoIs. - - rois_num(Tensor): 1-D Tensor contains the RoIs's number of each - image. The shape is [B] and data type is int32. B is the number of - images. - - Examples: - .. code-block:: python - - import paddle - from ppdet.modeling import ops - paddle.enable_static() - multi_rois = [] - multi_scores = [] - for i in range(4): - multi_rois.append(paddle.static.data( - name='roi_'+str(i), shape=[None, 4], dtype='float32', lod_level=1)) - for i in range(4): - multi_scores.append(paddle.static.data( - name='score_'+str(i), shape=[None, 1], dtype='float32', lod_level=1)) - - fpn_rois = ops.collect_fpn_proposals( - multi_rois=multi_rois, - multi_scores=multi_scores, - min_level=2, - max_level=5, - post_nms_top_n=2000) - """ - check_type(multi_rois, 'multi_rois', list, 'collect_fpn_proposals') - check_type(multi_scores, 'multi_scores', list, 'collect_fpn_proposals') - num_lvl = max_level - min_level + 1 - input_rois = multi_rois[:num_lvl] - input_scores = multi_scores[:num_lvl] - - if in_dygraph_mode(): - assert rois_num_per_level is not None, "rois_num_per_level should not be None in dygraph mode." - attrs = ('post_nms_topN', post_nms_top_n) - output_rois, rois_num = core.ops.collect_fpn_proposals( - input_rois, input_scores, rois_num_per_level, *attrs) - return output_rois, rois_num - - else: - helper = LayerHelper('collect_fpn_proposals', **locals()) - dtype = helper.input_dtype('multi_rois') - check_dtype(dtype, 'multi_rois', ['float32', 'float64'], - 'collect_fpn_proposals') - output_rois = helper.create_variable_for_type_inference(dtype) - output_rois.stop_gradient = True - - inputs = { - 'MultiLevelRois': input_rois, - 'MultiLevelScores': input_scores, - } - outputs = {'FpnRois': output_rois} - if rois_num_per_level is not None: - inputs['MultiLevelRoIsNum'] = rois_num_per_level - rois_num = helper.create_variable_for_type_inference(dtype='int32') - rois_num.stop_gradient = True - outputs['RoisNum'] = rois_num - else: - rois_num = None - helper.append_op( - type='collect_fpn_proposals', - inputs=inputs, - outputs=outputs, - attrs={'post_nms_topN': post_nms_top_n}) - return output_rois, rois_num - - @paddle.jit.not_to_static def distribute_fpn_proposals(fpn_rois, min_level, @@ -570,12 +193,12 @@ def distribute_fpn_proposals(fpn_rois, """ num_lvl = max_level - min_level + 1 - if in_dygraph_mode(): + if in_dynamic_mode(): assert rois_num is not None, "rois_num should not be None in dygraph mode." attrs = ('min_level', min_level, 'max_level', max_level, 'refer_level', refer_level, 'refer_scale', refer_scale, 'pixel_offset', pixel_offset) - multi_rois, restore_ind, rois_num_per_level = core.ops.distribute_fpn_proposals( + multi_rois, restore_ind, rois_num_per_level = _C_ops.distribute_fpn_proposals( fpn_rois, rois_num, num_lvl, num_lvl, *attrs) return multi_rois, restore_ind, rois_num_per_level @@ -621,143 +244,6 @@ def distribute_fpn_proposals(fpn_rois, return multi_rois, restore_ind, rois_num_per_level -@paddle.jit.not_to_static -def yolo_box( - x, - origin_shape, - anchors, - class_num, - conf_thresh, - downsample_ratio, - clip_bbox=True, - scale_x_y=1., - name=None, ): - """ - - This operator generates YOLO detection boxes from output of YOLOv3 network. - - The output of previous network is in shape [N, C, H, W], while H and W - should be the same, H and W specify the grid size, each grid point predict - given number boxes, this given number, which following will be represented as S, - is specified by the number of anchors. In the second dimension(the channel - dimension), C should be equal to S * (5 + class_num), class_num is the object - category number of source dataset(such as 80 in coco dataset), so the - second(channel) dimension, apart from 4 box location coordinates x, y, w, h, - also includes confidence score of the box and class one-hot key of each anchor - box. - Assume the 4 location coordinates are :math:`t_x, t_y, t_w, t_h`, the box - predictions should be as follows: - $$ - b_x = \\sigma(t_x) + c_x - $$ - $$ - b_y = \\sigma(t_y) + c_y - $$ - $$ - b_w = p_w e^{t_w} - $$ - $$ - b_h = p_h e^{t_h} - $$ - in the equation above, :math:`c_x, c_y` is the left top corner of current grid - and :math:`p_w, p_h` is specified by anchors. - The logistic regression value of the 5th channel of each anchor prediction boxes - represents the confidence score of each prediction box, and the logistic - regression value of the last :attr:`class_num` channels of each anchor prediction - boxes represents the classifcation scores. Boxes with confidence scores less than - :attr:`conf_thresh` should be ignored, and box final scores is the product of - confidence scores and classification scores. - $$ - score_{pred} = score_{conf} * score_{class} - $$ - - Args: - x (Tensor): The input tensor of YoloBox operator is a 4-D tensor with shape of [N, C, H, W]. - The second dimension(C) stores box locations, confidence score and - classification one-hot keys of each anchor box. Generally, X should be the output of YOLOv3 network. - The data type is float32 or float64. - origin_shape (Tensor): The image size tensor of YoloBox operator, This is a 2-D tensor with shape of [N, 2]. - This tensor holds height and width of each input image used for resizing output box in input image - scale. The data type is int32. - anchors (list|tuple): The anchor width and height, it will be parsed pair by pair. - class_num (int): The number of classes to predict. - conf_thresh (float): The confidence scores threshold of detection boxes. Boxes with confidence scores - under threshold should be ignored. - downsample_ratio (int): The downsample ratio from network input to YoloBox operator input, - so 32, 16, 8 should be set for the first, second, and thrid YoloBox operators. - clip_bbox (bool): Whether clip output bonding box in Input(ImgSize) boundary. Default true. - scale_x_y (float): Scale the center point of decoded bounding box. Default 1.0. - name (string): The default value is None. Normally there is no need - for user to set this property. For more information, - please refer to :ref:`api_guide_Name` - - Returns: - boxes Tensor: A 3-D tensor with shape [N, M, 4], the coordinates of boxes, N is the batch num, - M is output box number, and the 3rd dimension stores [xmin, ymin, xmax, ymax] coordinates of boxes. - scores Tensor: A 3-D tensor with shape [N, M, :attr:`class_num`], the coordinates of boxes, N is the batch num, - M is output box number. - - Raises: - TypeError: Attr anchors of yolo box must be list or tuple - TypeError: Attr class_num of yolo box must be an integer - TypeError: Attr conf_thresh of yolo box must be a float number - - Examples: - - .. code-block:: python - - import paddle - from ppdet.modeling import ops - - paddle.enable_static() - x = paddle.static.data(name='x', shape=[None, 255, 13, 13], dtype='float32') - img_size = paddle.static.data(name='img_size',shape=[None, 2],dtype='int64') - anchors = [10, 13, 16, 30, 33, 23] - boxes,scores = ops.yolo_box(x=x, img_size=img_size, class_num=80, anchors=anchors, - conf_thresh=0.01, downsample_ratio=32) - """ - helper = LayerHelper('yolo_box', **locals()) - - if not isinstance(anchors, list) and not isinstance(anchors, tuple): - raise TypeError("Attr anchors of yolo_box must be list or tuple") - if not isinstance(class_num, int): - raise TypeError("Attr class_num of yolo_box must be an integer") - if not isinstance(conf_thresh, float): - raise TypeError("Attr ignore_thresh of yolo_box must be a float number") - - if in_dygraph_mode(): - attrs = ('anchors', anchors, 'class_num', class_num, 'conf_thresh', - conf_thresh, 'downsample_ratio', downsample_ratio, 'clip_bbox', - clip_bbox, 'scale_x_y', scale_x_y) - boxes, scores = core.ops.yolo_box(x, origin_shape, *attrs) - return boxes, scores - else: - boxes = helper.create_variable_for_type_inference(dtype=x.dtype) - scores = helper.create_variable_for_type_inference(dtype=x.dtype) - - attrs = { - "anchors": anchors, - "class_num": class_num, - "conf_thresh": conf_thresh, - "downsample_ratio": downsample_ratio, - "clip_bbox": clip_bbox, - "scale_x_y": scale_x_y, - } - - helper.append_op( - type='yolo_box', - inputs={ - "X": x, - "ImgSize": origin_shape, - }, - outputs={ - 'Boxes': boxes, - 'Scores': scores, - }, - attrs=attrs) - return boxes, scores - - @paddle.jit.not_to_static def prior_box(input, image, @@ -860,14 +346,14 @@ def prior_box(input, max_sizes = [max_sizes] cur_max_sizes = max_sizes - if in_dygraph_mode(): + if in_dynamic_mode(): attrs = ('min_sizes', min_sizes, 'aspect_ratios', aspect_ratios, 'variances', variance, 'flip', flip, 'clip', clip, 'step_w', steps[0], 'step_h', steps[1], 'offset', offset, 'min_max_aspect_ratios_order', min_max_aspect_ratios_order) if cur_max_sizes is not None: attrs += ('max_sizes', cur_max_sizes) - box, var = core.ops.prior_box(input, image, *attrs) + box, var = _C_ops.prior_box(input, image, *attrs) return box, var else: attrs = { @@ -1005,13 +491,13 @@ def multiclass_nms(bboxes, """ helper = LayerHelper('multiclass_nms3', **locals()) - if in_dygraph_mode(): + if in_dynamic_mode(): attrs = ('background_label', background_label, 'score_threshold', score_threshold, 'nms_top_k', nms_top_k, 'nms_threshold', nms_threshold, 'keep_top_k', keep_top_k, 'nms_eta', nms_eta, 'normalized', normalized) - output, index, nms_rois_num = core.ops.multiclass_nms3(bboxes, scores, - rois_num, *attrs) + output, index, nms_rois_num = _C_ops.multiclass_nms3(bboxes, scores, + rois_num, *attrs) if not return_index: index = None return output, nms_rois_num, index @@ -1146,13 +632,13 @@ def matrix_nms(bboxes, check_type(gaussian_sigma, 'gaussian_sigma', float, 'matrix_nms') check_type(background_label, 'background_label', int, 'matrix_nms') - if in_dygraph_mode(): + if in_dynamic_mode(): attrs = ('background_label', background_label, 'score_threshold', score_threshold, 'post_threshold', post_threshold, 'nms_top_k', nms_top_k, 'gaussian_sigma', gaussian_sigma, 'use_gaussian', use_gaussian, 'keep_top_k', keep_top_k, 'normalized', normalized) - out, index, rois_num = core.ops.matrix_nms(bboxes, scores, *attrs) + out, index, rois_num = _C_ops.matrix_nms(bboxes, scores, *attrs) if not return_index: index = None if not return_rois_num: @@ -1191,111 +677,6 @@ def matrix_nms(bboxes, return output, rois_num, index -def bipartite_match(dist_matrix, - match_type=None, - dist_threshold=None, - name=None): - """ - - This operator implements a greedy bipartite matching algorithm, which is - used to obtain the matching with the maximum distance based on the input - distance matrix. For input 2D matrix, the bipartite matching algorithm can - find the matched column for each row (matched means the largest distance), - also can find the matched row for each column. And this operator only - calculate matched indices from column to row. For each instance, - the number of matched indices is the column number of the input distance - matrix. **The OP only supports CPU**. - - There are two outputs, matched indices and distance. - A simple description, this algorithm matched the best (maximum distance) - row entity to the column entity and the matched indices are not duplicated - in each row of ColToRowMatchIndices. If the column entity is not matched - any row entity, set -1 in ColToRowMatchIndices. - - NOTE: the input DistMat can be LoDTensor (with LoD) or Tensor. - If LoDTensor with LoD, the height of ColToRowMatchIndices is batch size. - If Tensor, the height of ColToRowMatchIndices is 1. - - NOTE: This API is a very low level API. It is used by :code:`ssd_loss` - layer. Please consider to use :code:`ssd_loss` instead. - - Args: - dist_matrix(Tensor): This input is a 2-D LoDTensor with shape - [K, M]. The data type is float32 or float64. It is pair-wise - distance matrix between the entities represented by each row and - each column. For example, assumed one entity is A with shape [K], - another entity is B with shape [M]. The dist_matrix[i][j] is the - distance between A[i] and B[j]. The bigger the distance is, the - better matching the pairs are. NOTE: This tensor can contain LoD - information to represent a batch of inputs. One instance of this - batch can contain different numbers of entities. - match_type(str, optional): The type of matching method, should be - 'bipartite' or 'per_prediction'. None ('bipartite') by default. - dist_threshold(float32, optional): If `match_type` is 'per_prediction', - this threshold is to determine the extra matching bboxes based - on the maximum distance, 0.5 by default. - name(str, optional): For detailed information, please refer - to :ref:`api_guide_Name`. Usually name is no need to set and - None by default. - - Returns: - Tuple: - - matched_indices(Tensor): A 2-D Tensor with shape [N, M]. The data - type is int32. N is the batch size. If match_indices[i][j] is -1, it - means B[j] does not match any entity in i-th instance. - Otherwise, it means B[j] is matched to row - match_indices[i][j] in i-th instance. The row number of - i-th instance is saved in match_indices[i][j]. - - matched_distance(Tensor): A 2-D Tensor with shape [N, M]. The data - type is float32. N is batch size. If match_indices[i][j] is -1, - match_distance[i][j] is also -1.0. Otherwise, assumed - match_distance[i][j] = d, and the row offsets of each instance - are called LoD. Then match_distance[i][j] = - dist_matrix[d+LoD[i]][j]. - - Examples: - - .. code-block:: python - import paddle - from ppdet.modeling import ops - from ppdet.modeling.utils import iou_similarity - - paddle.enable_static() - - x = paddle.static.data(name='x', shape=[None, 4], dtype='float32') - y = paddle.static.data(name='y', shape=[None, 4], dtype='float32') - iou = iou_similarity(x=x, y=y) - matched_indices, matched_dist = ops.bipartite_match(iou) - """ - check_variable_and_dtype(dist_matrix, 'dist_matrix', - ['float32', 'float64'], 'bipartite_match') - - if in_dygraph_mode(): - match_indices, match_distance = core.ops.bipartite_match( - dist_matrix, "match_type", match_type, "dist_threshold", - dist_threshold) - return match_indices, match_distance - - helper = LayerHelper('bipartite_match', **locals()) - match_indices = helper.create_variable_for_type_inference(dtype='int32') - match_distance = helper.create_variable_for_type_inference( - dtype=dist_matrix.dtype) - helper.append_op( - type='bipartite_match', - inputs={'DistMat': dist_matrix}, - attrs={ - 'match_type': match_type, - 'dist_threshold': dist_threshold, - }, - outputs={ - 'ColToRowMatchIndices': match_indices, - 'ColToRowMatchDist': match_distance - }) - return match_indices, match_distance - - @paddle.jit.not_to_static def box_coder(prior_box, prior_box_var, @@ -1408,14 +789,14 @@ def box_coder(prior_box, check_variable_and_dtype(target_box, 'target_box', ['float32', 'float64'], 'box_coder') - if in_dygraph_mode(): + if in_dynamic_mode(): if isinstance(prior_box_var, Variable): - output_box = core.ops.box_coder( + output_box = _C_ops.box_coder( prior_box, prior_box_var, target_box, "code_type", code_type, "box_normalized", box_normalized, "axis", axis) elif isinstance(prior_box_var, list): - output_box = core.ops.box_coder( + output_box = _C_ops.box_coder( prior_box, None, target_box, "code_type", code_type, "box_normalized", box_normalized, "axis", axis, "variance", prior_box_var) @@ -1533,12 +914,12 @@ def generate_proposals(scores, rois, roi_probs = ops.generate_proposals(scores, bbox_deltas, im_shape, anchors, variances) """ - if in_dygraph_mode(): + if in_dynamic_mode(): assert return_rois_num, "return_rois_num should be True in dygraph mode." attrs = ('pre_nms_topN', pre_nms_top_n, 'post_nms_topN', post_nms_top_n, 'nms_thresh', nms_thresh, 'min_size', min_size, 'eta', eta, 'pixel_offset', pixel_offset) - rpn_rois, rpn_roi_probs, rpn_rois_num = core.ops.generate_proposals_v2( + rpn_rois, rpn_roi_probs, rpn_rois_num = _C_ops.generate_proposals_v2( scores, bbox_deltas, im_shape, anchors, variances, *attrs) if not return_rois_num: rpn_rois_num = None @@ -1639,8 +1020,3 @@ def get_static_shape(tensor): shape = paddle.shape(tensor) shape.stop_gradient = True return shape - - -def paddle_distributed_is_initialized(): - return core.is_compiled_with_dist( - ) and parallel_helper._is_parallel_ctx_initialized() diff --git a/ppdet/modeling/post_process.py b/ppdet/modeling/post_process.py index 72e409e4008ea55b4e84a09125a069215a8f34c3..27890c17ec39f3e29a3126adab173bc9e3596bc2 100644 --- a/ppdet/modeling/post_process.py +++ b/ppdet/modeling/post_process.py @@ -33,7 +33,7 @@ __all__ = [ @register -class BBoxPostProcess(nn.Layer): +class BBoxPostProcess(object): __shared__ = ['num_classes', 'export_onnx'] __inject__ = ['decode', 'nms'] @@ -45,9 +45,9 @@ class BBoxPostProcess(nn.Layer): self.nms = nms self.export_onnx = export_onnx - def forward(self, head_out, rois, im_shape, scale_factor): + def __call__(self, head_out, rois, im_shape, scale_factor): """ - Decode the bbox and do NMS if needed. + Decode the bbox and do NMS if needed. Args: head_out (tuple): bbox_pred and cls_prob of bbox_head output. @@ -85,7 +85,7 @@ class BBoxPostProcess(nn.Layer): """ Rescale, clip and filter the bbox from the output of NMS to get final prediction. - + Notes: Currently only support bs = 1. @@ -171,7 +171,7 @@ class BBoxPostProcess(nn.Layer): pred_label = paddle.where(keep_mask, pred_label, paddle.ones_like(pred_label) * -1) pred_result = paddle.concat([pred_label, pred_score, pred_bbox], axis=1) - return pred_result + return bboxes, pred_result, bbox_num def get_origin_shape(self, ): return self.origin_shape_list @@ -179,6 +179,7 @@ class BBoxPostProcess(nn.Layer): @register class MaskPostProcess(object): + __shared__ = ['export_onnx', 'assign_on_cpu'] """ refer to: https://github.com/facebookresearch/detectron2/layers/mask_ops.py @@ -186,9 +187,14 @@ class MaskPostProcess(object): Get Mask output according to the output from model """ - def __init__(self, binary_thresh=0.5): + def __init__(self, + binary_thresh=0.5, + export_onnx=False, + assign_on_cpu=False): super(MaskPostProcess, self).__init__() self.binary_thresh = binary_thresh + self.export_onnx = export_onnx + self.assign_on_cpu = assign_on_cpu def paste_mask(self, masks, boxes, im_h, im_w): """ @@ -200,10 +206,13 @@ class MaskPostProcess(object): N = masks.shape[0] img_y = paddle.arange(y0_int, y1_int) + 0.5 img_x = paddle.arange(x0_int, x1_int) + 0.5 + img_y = (img_y - y0) / (y1 - y0) * 2 - 1 img_x = (img_x - x0) / (x1 - x0) * 2 - 1 # img_x, img_y have shapes (N, w), (N, h) + if self.assign_on_cpu: + paddle.set_device('cpu') gx = img_x[:, None, :].expand( [N, paddle.shape(img_y)[1], paddle.shape(img_x)[1]]) gy = img_y[:, :, None].expand( @@ -230,15 +239,37 @@ class MaskPostProcess(object): """ num_mask = mask_out.shape[0] origin_shape = paddle.cast(origin_shape, 'int32') - # TODO: support bs > 1 and mask output dtype is bool - pred_result = paddle.zeros( - [num_mask, origin_shape[0][0], origin_shape[0][1]], dtype='int32') + device = paddle.device.get_device() - im_h, im_w = origin_shape[0][0], origin_shape[0][1] - pred_mask = self.paste_mask(mask_out[:, None, :, :], bboxes[:, 2:], - im_h, im_w) - pred_mask = pred_mask >= self.binary_thresh - pred_result = paddle.cast(pred_mask, 'int32') + if self.export_onnx: + h, w = origin_shape[0][0], origin_shape[0][1] + mask_onnx = self.paste_mask(mask_out[:, None, :, :], bboxes[:, 2:], + h, w) + mask_onnx = mask_onnx >= self.binary_thresh + pred_result = paddle.cast(mask_onnx, 'int32') + + else: + max_h = paddle.max(origin_shape[:, 0]) + max_w = paddle.max(origin_shape[:, 1]) + pred_result = paddle.zeros( + [num_mask, max_h, max_w], dtype='int32') - 1 + + id_start = 0 + for i in range(paddle.shape(bbox_num)[0]): + bboxes_i = bboxes[id_start:id_start + bbox_num[i], :] + mask_out_i = mask_out[id_start:id_start + bbox_num[i], :, :] + im_h = origin_shape[i, 0] + im_w = origin_shape[i, 1] + bbox_num_i = bbox_num[id_start] + pred_mask = self.paste_mask(mask_out_i[:, None, :, :], + bboxes_i[:, 2:], im_h, im_w) + pred_mask = paddle.cast(pred_mask >= self.binary_thresh, + 'int32') + pred_result[id_start:id_start + bbox_num[i], :im_h, : + im_w] = pred_mask + id_start += bbox_num[i] + if self.assign_on_cpu: + paddle.set_device(device) return pred_result @@ -502,7 +533,7 @@ class CenterNetPostProcess(TTFBox): x2 = xs + wh[:, 0:1] / 2 y2 = ys + wh[:, 1:2] / 2 - n, c, feat_h, feat_w = hm.shape[:] + n, c, feat_h, feat_w = paddle.shape(hm) padw = (feat_w * self.down_ratio - im_shape[0, 1]) / 2 padh = (feat_h * self.down_ratio - im_shape[0, 0]) / 2 x1 = x1 * self.down_ratio diff --git a/ppdet/modeling/proposal_generator/anchor_generator.py b/ppdet/modeling/proposal_generator/anchor_generator.py index 34f03c0ef084d1976f7f6879caf3e25b1f67d7de..94fd346002562fd772a21f525f7ad4f50f4c4680 100644 --- a/ppdet/modeling/proposal_generator/anchor_generator.py +++ b/ppdet/modeling/proposal_generator/anchor_generator.py @@ -22,6 +22,8 @@ import paddle.nn as nn from ppdet.core.workspace import register +__all__ = ['AnchorGenerator', 'RetinaAnchorGenerator'] + @register class AnchorGenerator(nn.Layer): @@ -129,3 +131,25 @@ class AnchorGenerator(nn.Layer): For FPN models, `num_anchors` on every feature map is the same. """ return len(self.cell_anchors[0]) + + +@register +class RetinaAnchorGenerator(AnchorGenerator): + def __init__(self, + octave_base_scale=4, + scales_per_octave=3, + aspect_ratios=[0.5, 1.0, 2.0], + strides=[8.0, 16.0, 32.0, 64.0, 128.0], + variance=[1.0, 1.0, 1.0, 1.0], + offset=0.0): + anchor_sizes = [] + for s in strides: + anchor_sizes.append([ + s * octave_base_scale * 2**(i/scales_per_octave) \ + for i in range(scales_per_octave)]) + super(RetinaAnchorGenerator, self).__init__( + anchor_sizes=anchor_sizes, + aspect_ratios=aspect_ratios, + strides=strides, + variance=variance, + offset=offset) diff --git a/ppdet/modeling/proposal_generator/rpn_head.py b/ppdet/modeling/proposal_generator/rpn_head.py index 608d5425065ba98fcb678252e5a568f3b21a88b3..8a431eeac208a052ed8de5dfb7278948cfbcf042 100644 --- a/ppdet/modeling/proposal_generator/rpn_head.py +++ b/ppdet/modeling/proposal_generator/rpn_head.py @@ -21,6 +21,7 @@ from ppdet.core.workspace import register from .anchor_generator import AnchorGenerator from .target_layer import RPNTargetAssign from .proposal_generator import ProposalGenerator +from ..cls_utils import _get_class_default_kwargs class RPNFeat(nn.Layer): @@ -67,14 +68,17 @@ class RPNHead(nn.Layer): derived by from_config """ __shared__ = ['export_onnx'] + __inject__ = ['loss_rpn_bbox'] def __init__(self, - anchor_generator=AnchorGenerator().__dict__, - rpn_target_assign=RPNTargetAssign().__dict__, - train_proposal=ProposalGenerator(12000, 2000).__dict__, - test_proposal=ProposalGenerator().__dict__, + anchor_generator=_get_class_default_kwargs(AnchorGenerator), + rpn_target_assign=_get_class_default_kwargs(RPNTargetAssign), + train_proposal=_get_class_default_kwargs(ProposalGenerator, + 12000, 2000), + test_proposal=_get_class_default_kwargs(ProposalGenerator), in_channel=1024, - export_onnx=False): + export_onnx=False, + loss_rpn_bbox=None): super(RPNHead, self).__init__() self.anchor_generator = anchor_generator self.rpn_target_assign = rpn_target_assign @@ -89,6 +93,7 @@ class RPNHead(nn.Layer): self.train_proposal = ProposalGenerator(**train_proposal) if isinstance(test_proposal, dict): self.test_proposal = ProposalGenerator(**test_proposal) + self.loss_rpn_bbox = loss_rpn_bbox num_anchors = self.anchor_generator.num_anchors self.rpn_feat = RPNFeat(in_channel, in_channel) @@ -296,7 +301,12 @@ class RPNHead(nn.Layer): loc_tgt = paddle.concat(loc_tgt) loc_tgt = paddle.gather(loc_tgt, pos_ind) loc_tgt.stop_gradient = True - loss_rpn_reg = paddle.abs(loc_pred - loc_tgt).sum() + + if self.loss_rpn_bbox is None: + loss_rpn_reg = paddle.abs(loc_pred - loc_tgt).sum() + else: + loss_rpn_reg = self.loss_rpn_bbox(loc_pred, loc_tgt).sum() + return { 'loss_rpn_cls': loss_rpn_cls / norm, 'loss_rpn_reg': loss_rpn_reg / norm diff --git a/ppdet/modeling/proposal_generator/target.py b/ppdet/modeling/proposal_generator/target.py index 7af30f64153acf5e9c68c51981a02c76acbe50f0..fd04f052219a00c919b945d8838436de018af873 100644 --- a/ppdet/modeling/proposal_generator/target.py +++ b/ppdet/modeling/proposal_generator/target.py @@ -74,9 +74,11 @@ def label_box(anchors, is_crowd=None, assign_on_cpu=False): if assign_on_cpu: + device = paddle.device.get_device() paddle.set_device("cpu") iou = bbox_overlaps(gt_boxes, anchors) - paddle.set_device("gpu") + paddle.set_device(device) + else: iou = bbox_overlaps(gt_boxes, anchors) n_gt = gt_boxes.shape[0] @@ -356,7 +358,7 @@ def generate_mask_target(gt_segms, rois, labels_int32, sampled_gt_inds, fg_inds_new = fg_inds.reshape([-1]).numpy() results = [] if len(gt_segms_per_im) > 0: - for j in fg_inds_new: + for j in range(fg_inds_new.shape[0]): results.append( rasterize_polygons_within_box(new_segm[j], boxes[j], resolution)) diff --git a/ppdet/modeling/proposal_generator/target_layer.py b/ppdet/modeling/proposal_generator/target_layer.py index 3b5a09601682151afcd47a0ea0db4fd0f03440a9..201c8bf86b14ee19f4398d2451dabdc886e9af98 100644 --- a/ppdet/modeling/proposal_generator/target_layer.py +++ b/ppdet/modeling/proposal_generator/target_layer.py @@ -392,9 +392,9 @@ class RBoxAssigner(object): gt_bboxes_xc_yc = paddle.to_tensor(gt_bboxes_xc_yc) try: - from rbox_iou_ops import rbox_iou + from ext_op import rbox_iou except Exception as e: - print("import custom_ops error, try install rbox_iou_ops " \ + print("import custom_ops error, try install ext_op " \ "following ppdet/ext_op/README.md", e) sys.stdout.flush() sys.exit(-1) diff --git a/ppdet/modeling/tests/test_architectures.py b/ppdet/modeling/tests/test_architectures.py index 25767e74abd9fce29c51adbc4b5109e17a50aa0b..5de79b2cedb3fffac0ce853406560821a9142363 100644 --- a/ppdet/modeling/tests/test_architectures.py +++ b/ppdet/modeling/tests/test_architectures.py @@ -62,7 +62,7 @@ class TestGFL(TestFasterRCNN): class TestPicoDet(TestFasterRCNN): def set_config(self): - self.cfg_file = 'configs/picodet/picodet_s_320_coco.yml' + self.cfg_file = 'configs/picodet/picodet_s_320_coco_lcnet.yml' if __name__ == '__main__': diff --git a/ppdet/modeling/tests/test_base.py b/ppdet/modeling/tests/test_base.py index cbb9033b393a24167ec1ebc32a4d924fa564f929..451aa78e32ce0682f55a2ab0f9d1ea03e939e481 100644 --- a/ppdet/modeling/tests/test_base.py +++ b/ppdet/modeling/tests/test_base.py @@ -18,9 +18,7 @@ import unittest import contextlib import paddle -import paddle.fluid as fluid -from paddle.fluid.framework import Program -from paddle.fluid import core +from paddle.static import Program class LayerTest(unittest.TestCase): @@ -35,19 +33,17 @@ class LayerTest(unittest.TestCase): def _get_place(self, force_to_use_cpu=False): # this option for ops that only have cpu kernel if force_to_use_cpu: - return core.CPUPlace() + return 'cpu' else: - if core.is_compiled_with_cuda(): - return core.CUDAPlace(0) - return core.CPUPlace() + return paddle.device.get_device() @contextlib.contextmanager def static_graph(self): paddle.enable_static() - scope = fluid.core.Scope() + scope = paddle.static.Scope() program = Program() - with fluid.scope_guard(scope): - with fluid.program_guard(program): + with paddle.static.scope_guard(scope): + with paddle.static.program_guard(program): paddle.seed(self.seed) paddle.framework.random._manual_program_seed(self.seed) yield @@ -57,9 +53,9 @@ class LayerTest(unittest.TestCase): fetch_list, with_lod=False, force_to_use_cpu=False): - exe = fluid.Executor(self._get_place(force_to_use_cpu)) - exe.run(fluid.default_startup_program()) - return exe.run(fluid.default_main_program(), + exe = paddle.static.Executor(self._get_place(force_to_use_cpu)) + exe.run(paddle.static.default_startup_program()) + return exe.run(paddle.static.default_main_program(), feed=feed, fetch_list=fetch_list, return_numpy=(not with_lod)) @@ -67,8 +63,8 @@ class LayerTest(unittest.TestCase): @contextlib.contextmanager def dynamic_graph(self, force_to_use_cpu=False): paddle.disable_static() - with fluid.dygraph.guard( - self._get_place(force_to_use_cpu=force_to_use_cpu)): - paddle.seed(self.seed) - paddle.framework.random._manual_program_seed(self.seed) - yield + place = self._get_place(force_to_use_cpu=force_to_use_cpu) + paddle.device.set_device(place) + paddle.seed(self.seed) + paddle.framework.random._manual_program_seed(self.seed) + yield diff --git a/ppdet/modeling/tests/test_ops.py b/ppdet/modeling/tests/test_ops.py index d4b5747487d3ee49627e4fe8aecec31cf2759ae2..4ef9cbc28c0f1dcc4268571c7206d70b306682cd 100644 --- a/ppdet/modeling/tests/test_ops.py +++ b/ppdet/modeling/tests/test_ops.py @@ -23,8 +23,6 @@ import unittest import numpy as np import paddle -import paddle.fluid as fluid -from paddle.fluid.dygraph import base import ppdet.modeling.ops as ops from ppdet.modeling.tests.test_base import LayerTest @@ -50,127 +48,6 @@ def softmax(x): return exps / np.sum(exps) -class TestCollectFpnProposals(LayerTest): - def test_collect_fpn_proposals(self): - multi_bboxes_np = [] - multi_scores_np = [] - rois_num_per_level_np = [] - for i in range(4): - bboxes_np = np.random.rand(5, 4).astype('float32') - scores_np = np.random.rand(5, 1).astype('float32') - rois_num = np.array([2, 3]).astype('int32') - multi_bboxes_np.append(bboxes_np) - multi_scores_np.append(scores_np) - rois_num_per_level_np.append(rois_num) - - with self.static_graph(): - multi_bboxes = [] - multi_scores = [] - rois_num_per_level = [] - for i in range(4): - bboxes = paddle.static.data( - name='rois' + str(i), - shape=[5, 4], - dtype='float32', - lod_level=1) - scores = paddle.static.data( - name='scores' + str(i), - shape=[5, 1], - dtype='float32', - lod_level=1) - rois_num = paddle.static.data( - name='rois_num' + str(i), shape=[None], dtype='int32') - - multi_bboxes.append(bboxes) - multi_scores.append(scores) - rois_num_per_level.append(rois_num) - - fpn_rois, rois_num = ops.collect_fpn_proposals( - multi_bboxes, - multi_scores, - 2, - 5, - 10, - rois_num_per_level=rois_num_per_level) - feed = {} - for i in range(4): - feed['rois' + str(i)] = multi_bboxes_np[i] - feed['scores' + str(i)] = multi_scores_np[i] - feed['rois_num' + str(i)] = rois_num_per_level_np[i] - fpn_rois_stat, rois_num_stat = self.get_static_graph_result( - feed=feed, fetch_list=[fpn_rois, rois_num], with_lod=True) - fpn_rois_stat = np.array(fpn_rois_stat) - rois_num_stat = np.array(rois_num_stat) - - with self.dynamic_graph(): - multi_bboxes_dy = [] - multi_scores_dy = [] - rois_num_per_level_dy = [] - for i in range(4): - bboxes_dy = base.to_variable(multi_bboxes_np[i]) - scores_dy = base.to_variable(multi_scores_np[i]) - rois_num_dy = base.to_variable(rois_num_per_level_np[i]) - multi_bboxes_dy.append(bboxes_dy) - multi_scores_dy.append(scores_dy) - rois_num_per_level_dy.append(rois_num_dy) - fpn_rois_dy, rois_num_dy = ops.collect_fpn_proposals( - multi_bboxes_dy, - multi_scores_dy, - 2, - 5, - 10, - rois_num_per_level=rois_num_per_level_dy) - fpn_rois_dy = fpn_rois_dy.numpy() - rois_num_dy = rois_num_dy.numpy() - - self.assertTrue(np.array_equal(fpn_rois_stat, fpn_rois_dy)) - self.assertTrue(np.array_equal(rois_num_stat, rois_num_dy)) - - def test_collect_fpn_proposals_error(self): - def generate_input(bbox_type, score_type, name): - multi_bboxes = [] - multi_scores = [] - for i in range(4): - bboxes = paddle.static.data( - name='rois' + name + str(i), - shape=[10, 4], - dtype=bbox_type, - lod_level=1) - scores = paddle.static.data( - name='scores' + name + str(i), - shape=[10, 1], - dtype=score_type, - lod_level=1) - multi_bboxes.append(bboxes) - multi_scores.append(scores) - return multi_bboxes, multi_scores - - with self.static_graph(): - bbox1 = paddle.static.data( - name='rois', shape=[5, 10, 4], dtype='float32', lod_level=1) - score1 = paddle.static.data( - name='scores', shape=[5, 10, 1], dtype='float32', lod_level=1) - bbox2, score2 = generate_input('int32', 'float32', '2') - self.assertRaises( - TypeError, - ops.collect_fpn_proposals, - multi_rois=bbox1, - multi_scores=score1, - min_level=2, - max_level=5, - post_nms_top_n=2000) - self.assertRaises( - TypeError, - ops.collect_fpn_proposals, - multi_rois=bbox2, - multi_scores=score2, - min_level=2, - max_level=5, - post_nms_top_n=2000) - - paddle.disable_static() - - class TestDistributeFpnProposals(LayerTest): def test_distribute_fpn_proposals(self): rois_np = np.random.rand(10, 4).astype('float32') @@ -200,8 +77,8 @@ class TestDistributeFpnProposals(LayerTest): output_stat_np.append(output_np) with self.dynamic_graph(): - rois_dy = base.to_variable(rois_np) - rois_num_dy = base.to_variable(rois_num_np) + rois_dy = paddle.to_tensor(rois_np) + rois_num_dy = paddle.to_tensor(rois_num_np) multi_rois_dy, restore_ind_dy, rois_num_per_level_dy = ops.distribute_fpn_proposals( fpn_rois=rois_dy, min_level=2, @@ -251,11 +128,11 @@ class TestROIAlign(LayerTest): rois_num = paddle.static.data( name='rois_num', shape=[None], dtype='int32') - output = ops.roi_align( - input=inputs, - rois=rois, - output_size=output_size, - rois_num=rois_num) + output = paddle.vision.ops.roi_align( + x=inputs, + boxes=rois, + boxes_num=rois_num, + output_size=output_size) output_np, = self.get_static_graph_result( feed={ 'inputs': inputs_np, @@ -266,15 +143,15 @@ class TestROIAlign(LayerTest): with_lod=False) with self.dynamic_graph(): - inputs_dy = base.to_variable(inputs_np) - rois_dy = base.to_variable(rois_np) - rois_num_dy = base.to_variable(rois_num_np) - - output_dy = ops.roi_align( - input=inputs_dy, - rois=rois_dy, - output_size=output_size, - rois_num=rois_num_dy) + inputs_dy = paddle.to_tensor(inputs_np) + rois_dy = paddle.to_tensor(rois_np) + rois_num_dy = paddle.to_tensor(rois_num_np) + + output_dy = paddle.vision.ops.roi_align( + x=inputs_dy, + boxes=rois_dy, + boxes_num=rois_num_dy, + output_size=output_size) output_dy_np = output_dy.numpy() self.assertTrue(np.array_equal(output_np, output_dy_np)) @@ -287,7 +164,7 @@ class TestROIAlign(LayerTest): name='data_error', shape=[10, 4], dtype='int32', lod_level=1) self.assertRaises( TypeError, - ops.roi_align, + paddle.vision.ops.roi_align, input=inputs, rois=rois, output_size=(7, 7)) @@ -311,11 +188,11 @@ class TestROIPool(LayerTest): rois_num = paddle.static.data( name='rois_num', shape=[None], dtype='int32') - output, _ = ops.roi_pool( - input=inputs, - rois=rois, - output_size=output_size, - rois_num=rois_num) + output = paddle.vision.ops.roi_pool( + x=inputs, + boxes=rois, + boxes_num=rois_num, + output_size=output_size) output_np, = self.get_static_graph_result( feed={ 'inputs': inputs_np, @@ -326,15 +203,15 @@ class TestROIPool(LayerTest): with_lod=False) with self.dynamic_graph(): - inputs_dy = base.to_variable(inputs_np) - rois_dy = base.to_variable(rois_np) - rois_num_dy = base.to_variable(rois_num_np) - - output_dy, _ = ops.roi_pool( - input=inputs_dy, - rois=rois_dy, - output_size=output_size, - rois_num=rois_num_dy) + inputs_dy = paddle.to_tensor(inputs_np) + rois_dy = paddle.to_tensor(rois_np) + rois_num_dy = paddle.to_tensor(rois_num_np) + + output_dy = paddle.vision.ops.roi_pool( + x=inputs_dy, + boxes=rois_dy, + boxes_num=rois_num_dy, + output_size=output_size) output_dy_np = output_dy.numpy() self.assertTrue(np.array_equal(output_np, output_dy_np)) @@ -347,7 +224,7 @@ class TestROIPool(LayerTest): name='data_error', shape=[10, 4], dtype='int32', lod_level=1) self.assertRaises( TypeError, - ops.roi_pool, + paddle.vision.ops.roi_pool, input=inputs, rois=rois, output_size=(7, 7)) @@ -355,134 +232,6 @@ class TestROIPool(LayerTest): paddle.disable_static() -class TestIoUSimilarity(LayerTest): - def test_iou_similarity(self): - b, c, h, w = 2, 12, 20, 20 - inputs_np = np.random.rand(b, c, h, w).astype('float32') - output_size = (7, 7) - x_np = make_rois(h, w, [20], output_size) - y_np = make_rois(h, w, [10], output_size) - with self.static_graph(): - x = paddle.static.data(name='x', shape=[20, 4], dtype='float32') - y = paddle.static.data(name='y', shape=[10, 4], dtype='float32') - - iou = ops.iou_similarity(x=x, y=y) - iou_np, = self.get_static_graph_result( - feed={ - 'x': x_np, - 'y': y_np, - }, fetch_list=[iou], with_lod=False) - - with self.dynamic_graph(): - x_dy = base.to_variable(x_np) - y_dy = base.to_variable(y_np) - - iou_dy = ops.iou_similarity(x=x_dy, y=y_dy) - iou_dy_np = iou_dy.numpy() - - self.assertTrue(np.array_equal(iou_np, iou_dy_np)) - - -class TestBipartiteMatch(LayerTest): - def test_bipartite_match(self): - distance = np.random.random((20, 10)).astype('float32') - with self.static_graph(): - x = paddle.static.data(name='x', shape=[20, 10], dtype='float32') - - match_indices, match_dist = ops.bipartite_match( - x, match_type='per_prediction', dist_threshold=0.5) - match_indices_np, match_dist_np = self.get_static_graph_result( - feed={'x': distance, }, - fetch_list=[match_indices, match_dist], - with_lod=False) - - with self.dynamic_graph(): - x_dy = base.to_variable(distance) - - match_indices_dy, match_dist_dy = ops.bipartite_match( - x_dy, match_type='per_prediction', dist_threshold=0.5) - match_indices_dy_np = match_indices_dy.numpy() - match_dist_dy_np = match_dist_dy.numpy() - - self.assertTrue(np.array_equal(match_indices_np, match_indices_dy_np)) - self.assertTrue(np.array_equal(match_dist_np, match_dist_dy_np)) - - -class TestYoloBox(LayerTest): - def test_yolo_box(self): - - # x shape [N C H W], C=K * (5 + class_num), class_num=10, K=2 - np_x = np.random.random([1, 30, 7, 7]).astype('float32') - np_origin_shape = np.array([[608, 608]], dtype='int32') - class_num = 10 - conf_thresh = 0.01 - downsample_ratio = 32 - scale_x_y = 1.2 - - # static - with self.static_graph(): - # x shape [N C H W], C=K * (5 + class_num), class_num=10, K=2 - x = paddle.static.data( - name='x', shape=[1, 30, 7, 7], dtype='float32') - origin_shape = paddle.static.data( - name='origin_shape', shape=[1, 2], dtype='int32') - - boxes, scores = ops.yolo_box( - x, - origin_shape, [10, 13, 30, 13], - class_num, - conf_thresh, - downsample_ratio, - scale_x_y=scale_x_y) - - boxes_np, scores_np = self.get_static_graph_result( - feed={ - 'x': np_x, - 'origin_shape': np_origin_shape, - }, - fetch_list=[boxes, scores], - with_lod=False) - - # dygraph - with self.dynamic_graph(): - x_dy = fluid.layers.assign(np_x) - origin_shape_dy = fluid.layers.assign(np_origin_shape) - - boxes_dy, scores_dy = ops.yolo_box( - x_dy, - origin_shape_dy, [10, 13, 30, 13], - 10, - 0.01, - 32, - scale_x_y=scale_x_y) - - boxes_dy_np = boxes_dy.numpy() - scores_dy_np = scores_dy.numpy() - - self.assertTrue(np.array_equal(boxes_np, boxes_dy_np)) - self.assertTrue(np.array_equal(scores_np, scores_dy_np)) - - def test_yolo_box_error(self): - with self.static_graph(): - # x shape [N C H W], C=K * (5 + class_num), class_num=10, K=2 - x = paddle.static.data( - name='x', shape=[1, 30, 7, 7], dtype='float32') - origin_shape = paddle.static.data( - name='origin_shape', shape=[1, 2], dtype='int32') - - self.assertRaises( - TypeError, - ops.yolo_box, - x, - origin_shape, [10, 13, 30, 13], - 10.123, - 0.01, - 32, - scale_x_y=1.2) - - paddle.disable_static() - - class TestPriorBox(LayerTest): def test_prior_box(self): input_np = np.random.rand(2, 10, 32, 32).astype('float32') @@ -509,8 +258,8 @@ class TestPriorBox(LayerTest): with_lod=False) with self.dynamic_graph(): - inputs_dy = base.to_variable(input_np) - image_dy = base.to_variable(image_np) + inputs_dy = paddle.to_tensor(input_np) + image_dy = paddle.to_tensor(image_np) box_dy, var_dy = ops.prior_box( input=inputs_dy, @@ -582,9 +331,9 @@ class TestMulticlassNms(LayerTest): nms_rois_num_np = np.array(nms_rois_num_np) with self.dynamic_graph(): - boxes_dy = base.to_variable(boxes_np) - scores_dy = base.to_variable(scores_np) - rois_num_dy = base.to_variable(rois_num_np) + boxes_dy = paddle.to_tensor(boxes_np) + scores_dy = paddle.to_tensor(scores_np) + rois_num_dy = paddle.to_tensor(rois_num_np) out_dy, index_dy, nms_rois_num_dy = ops.multiclass_nms( bboxes=boxes_dy, @@ -666,8 +415,8 @@ class TestMatrixNMS(LayerTest): with_lod=True) with self.dynamic_graph(): - boxes_dy = base.to_variable(boxes_np) - scores_dy = base.to_variable(scores_np) + boxes_dy = paddle.to_tensor(boxes_np) + scores_dy = paddle.to_tensor(scores_np) out_dy, index_dy, _ = ops.matrix_nms( bboxes=boxes_dy, @@ -737,9 +486,9 @@ class TestBoxCoder(LayerTest): # dygraph with self.dynamic_graph(): - prior_box_dy = base.to_variable(prior_box_np) - prior_box_var_dy = base.to_variable(prior_box_var_np) - target_box_dy = base.to_variable(target_box_np) + prior_box_dy = paddle.to_tensor(prior_box_np) + prior_box_var_dy = paddle.to_tensor(prior_box_var_np) + target_box_dy = paddle.to_tensor(target_box_np) boxes_dy = ops.box_coder( prior_box=prior_box_dy, @@ -808,11 +557,11 @@ class TestGenerateProposals(LayerTest): with_lod=True) with self.dynamic_graph(): - scores_dy = base.to_variable(scores_np) - bbox_deltas_dy = base.to_variable(bbox_deltas_np) - im_shape_dy = base.to_variable(im_shape_np) - anchors_dy = base.to_variable(anchors_np) - variances_dy = base.to_variable(variances_np) + scores_dy = paddle.to_tensor(scores_np) + bbox_deltas_dy = paddle.to_tensor(bbox_deltas_np) + im_shape_dy = paddle.to_tensor(im_shape_np) + anchors_dy = paddle.to_tensor(anchors_np) + variances_dy = paddle.to_tensor(variances_np) rois, roi_probs, rois_num = ops.generate_proposals( scores_dy, bbox_deltas_dy, diff --git a/ppdet/modeling/tests/test_yolov3_loss.py b/ppdet/modeling/tests/test_yolov3_loss.py index cec8bc940a4abb852d0b210b76ffe4386b8fc12e..433b3cf2cb95c2a1dd27da30ef9b99f3148e004f 100644 --- a/ppdet/modeling/tests/test_yolov3_loss.py +++ b/ppdet/modeling/tests/test_yolov3_loss.py @@ -17,7 +17,7 @@ from __future__ import division import unittest import paddle -from paddle import fluid +import paddle.nn.functional as F # add python path of PadleDetection to sys.path import os import sys @@ -27,19 +27,9 @@ if parent_path not in sys.path: from ppdet.modeling.losses import YOLOv3Loss from ppdet.data.transform.op_helper import jaccard_overlap +from ppdet.modeling.bbox_utils import iou_similarity import numpy as np - - -def _split_ioup(output, an_num, num_classes): - """ - Split output feature map to output, predicted iou - along channel dimension - """ - ioup = fluid.layers.slice(output, axes=[1], starts=[0], ends=[an_num]) - ioup = fluid.layers.sigmoid(ioup) - oriout = fluid.layers.slice( - output, axes=[1], starts=[an_num], ends=[an_num * (num_classes + 6)]) - return (ioup, oriout) +np.random.seed(0) def _split_output(output, an_num, num_classes): @@ -47,31 +37,31 @@ def _split_output(output, an_num, num_classes): Split output feature map to x, y, w, h, objectness, classification along channel dimension """ - x = fluid.layers.strided_slice( + x = paddle.strided_slice( output, axes=[1], starts=[0], ends=[output.shape[1]], strides=[5 + num_classes]) - y = fluid.layers.strided_slice( + y = paddle.strided_slice( output, axes=[1], starts=[1], ends=[output.shape[1]], strides=[5 + num_classes]) - w = fluid.layers.strided_slice( + w = paddle.strided_slice( output, axes=[1], starts=[2], ends=[output.shape[1]], strides=[5 + num_classes]) - h = fluid.layers.strided_slice( + h = paddle.strided_slice( output, axes=[1], starts=[3], ends=[output.shape[1]], strides=[5 + num_classes]) - obj = fluid.layers.strided_slice( + obj = paddle.strided_slice( output, axes=[1], starts=[4], @@ -81,14 +71,12 @@ def _split_output(output, an_num, num_classes): stride = output.shape[1] // an_num for m in range(an_num): clss.append( - fluid.layers.slice( + paddle.slice( output, axes=[1], starts=[stride * m + 5], ends=[stride * m + 5 + num_classes])) - cls = fluid.layers.transpose( - fluid.layers.stack( - clss, axis=1), perm=[0, 1, 3, 4, 2]) + cls = paddle.transpose(paddle.stack(clss, axis=1), perm=[0, 1, 3, 4, 2]) return (x, y, w, h, obj, cls) @@ -104,7 +92,7 @@ def _split_target(target): th = target[:, :, 3, :, :] tscale = target[:, :, 4, :, :] tobj = target[:, :, 5, :, :] - tcls = fluid.layers.transpose(target[:, :, 6:, :, :], perm=[0, 1, 3, 4, 2]) + tcls = paddle.transpose(target[:, :, 6:, :, :], perm=[0, 1, 3, 4, 2]) tcls.stop_gradient = True return (tx, ty, tw, th, tscale, tobj, tcls) @@ -115,9 +103,9 @@ def _calc_obj_loss(output, obj, tobj, gt_box, batch_size, anchors, num_classes, # objectness loss will be ignored, process as follows: # 1. get pred bbox, which is same with YOLOv3 infer mode, use yolo_box here # NOTE: img_size is set as 1.0 to get noramlized pred bbox - bbox, prob = fluid.layers.yolo_box( + bbox, prob = paddle.vision.ops.yolo_box( x=output, - img_size=fluid.layers.ones( + img_size=paddle.ones( shape=[batch_size, 2], dtype="int32"), anchors=anchors, class_num=num_classes, @@ -128,8 +116,8 @@ def _calc_obj_loss(output, obj, tobj, gt_box, batch_size, anchors, num_classes, # 2. split pred bbox and gt bbox by sample, calculate IoU between pred bbox # and gt bbox in each sample if batch_size > 1: - preds = fluid.layers.split(bbox, batch_size, dim=0) - gts = fluid.layers.split(gt_box, batch_size, dim=0) + preds = paddle.split(bbox, batch_size, axis=0) + gts = paddle.split(gt_box, batch_size, axis=0) else: preds = [bbox] gts = [gt_box] @@ -142,7 +130,7 @@ def _calc_obj_loss(output, obj, tobj, gt_box, batch_size, anchors, num_classes, y = box[:, 1] w = box[:, 2] h = box[:, 3] - return fluid.layers.stack( + return paddle.stack( [ x - w / 2., y - h / 2., @@ -150,28 +138,29 @@ def _calc_obj_loss(output, obj, tobj, gt_box, batch_size, anchors, num_classes, y + h / 2., ], axis=1) - pred = fluid.layers.squeeze(pred, axes=[0]) - gt = box_xywh2xyxy(fluid.layers.squeeze(gt, axes=[0])) - ious.append(fluid.layers.iou_similarity(pred, gt)) - iou = fluid.layers.stack(ious, axis=0) + pred = paddle.squeeze(pred, axis=[0]) + gt = box_xywh2xyxy(paddle.squeeze(gt, axis=[0])) + ious.append(iou_similarity(pred, gt)) + iou = paddle.stack(ious, axis=0) # 3. Get iou_mask by IoU between gt bbox and prediction bbox, # Get obj_mask by tobj(holds gt_score), calculate objectness loss - max_iou = fluid.layers.reduce_max(iou, dim=-1) - iou_mask = fluid.layers.cast(max_iou <= ignore_thresh, dtype="float32") - output_shape = fluid.layers.shape(output) + max_iou = paddle.max(iou, axis=-1) + iou_mask = paddle.cast(max_iou <= ignore_thresh, dtype="float32") + output_shape = paddle.shape(output) an_num = len(anchors) // 2 - iou_mask = fluid.layers.reshape(iou_mask, (-1, an_num, output_shape[2], - output_shape[3])) + iou_mask = paddle.reshape(iou_mask, (-1, an_num, output_shape[2], + output_shape[3])) iou_mask.stop_gradient = True # NOTE: tobj holds gt_score, obj_mask holds object existence mask - obj_mask = fluid.layers.cast(tobj > 0., dtype="float32") + obj_mask = paddle.cast(tobj > 0., dtype="float32") obj_mask.stop_gradient = True # For positive objectness grids, objectness loss should be calculated # For negative objectness grids, objectness loss is calculated only iou_mask == 1.0 - loss_obj = fluid.layers.sigmoid_cross_entropy_with_logits(obj, obj_mask) - loss_obj_pos = fluid.layers.reduce_sum(loss_obj * tobj, dim=[1, 2, 3]) - loss_obj_neg = fluid.layers.reduce_sum( - loss_obj * (1.0 - obj_mask) * iou_mask, dim=[1, 2, 3]) + obj_sigmoid = F.sigmoid(obj) + loss_obj = F.binary_cross_entropy(obj_sigmoid, obj_mask, reduction='none') + loss_obj_pos = paddle.sum(loss_obj * tobj, axis=[1, 2, 3]) + loss_obj_neg = paddle.sum(loss_obj * (1.0 - obj_mask) * iou_mask, + axis=[1, 2, 3]) return loss_obj_pos, loss_obj_neg @@ -194,45 +183,48 @@ def fine_grained_loss(output, scale_x_y = scale_x_y if (abs(scale_x_y - 1.0) < eps): - loss_x = fluid.layers.sigmoid_cross_entropy_with_logits( - x, tx) * tscale_tobj - loss_x = fluid.layers.reduce_sum(loss_x, dim=[1, 2, 3]) - loss_y = fluid.layers.sigmoid_cross_entropy_with_logits( - y, ty) * tscale_tobj - loss_y = fluid.layers.reduce_sum(loss_y, dim=[1, 2, 3]) + x = F.sigmoid(x) + y = F.sigmoid(y) + loss_x = F.binary_cross_entropy(x, tx, reduction='none') * tscale_tobj + loss_x = paddle.sum(loss_x, axis=[1, 2, 3]) + loss_y = F.binary_cross_entropy(y, ty, reduction='none') * tscale_tobj + loss_y = paddle.sum(loss_y, axis=[1, 2, 3]) else: - dx = scale_x_y * fluid.layers.sigmoid(x) - 0.5 * (scale_x_y - 1.0) - dy = scale_x_y * fluid.layers.sigmoid(y) - 0.5 * (scale_x_y - 1.0) - loss_x = fluid.layers.abs(dx - tx) * tscale_tobj - loss_x = fluid.layers.reduce_sum(loss_x, dim=[1, 2, 3]) - loss_y = fluid.layers.abs(dy - ty) * tscale_tobj - loss_y = fluid.layers.reduce_sum(loss_y, dim=[1, 2, 3]) + dx = scale_x_y * F.sigmoid(x) - 0.5 * (scale_x_y - 1.0) + dy = scale_x_y * F.sigmoid(y) - 0.5 * (scale_x_y - 1.0) + loss_x = paddle.abs(dx - tx) * tscale_tobj + loss_x = paddle.sum(loss_x, axis=[1, 2, 3]) + loss_y = paddle.abs(dy - ty) * tscale_tobj + loss_y = paddle.sum(loss_y, axis=[1, 2, 3]) # NOTE: we refined loss function of (w, h) as L1Loss - loss_w = fluid.layers.abs(w - tw) * tscale_tobj - loss_w = fluid.layers.reduce_sum(loss_w, dim=[1, 2, 3]) - loss_h = fluid.layers.abs(h - th) * tscale_tobj - loss_h = fluid.layers.reduce_sum(loss_h, dim=[1, 2, 3]) + loss_w = paddle.abs(w - tw) * tscale_tobj + loss_w = paddle.sum(loss_w, axis=[1, 2, 3]) + loss_h = paddle.abs(h - th) * tscale_tobj + loss_h = paddle.sum(loss_h, axis=[1, 2, 3]) loss_obj_pos, loss_obj_neg = _calc_obj_loss( output, obj, tobj, gt_box, batch_size, anchors, num_classes, downsample, ignore_thresh, scale_x_y) - loss_cls = fluid.layers.sigmoid_cross_entropy_with_logits(cls, tcls) - loss_cls = fluid.layers.elementwise_mul(loss_cls, tobj, axis=0) - loss_cls = fluid.layers.reduce_sum(loss_cls, dim=[1, 2, 3, 4]) + cls = F.sigmoid(cls) + loss_cls = F.binary_cross_entropy(cls, tcls, reduction='none') + tobj = paddle.unsqueeze(tobj, axis=-1) + + loss_cls = paddle.multiply(loss_cls, tobj) + loss_cls = paddle.sum(loss_cls, axis=[1, 2, 3, 4]) - loss_xys = fluid.layers.reduce_mean(loss_x + loss_y) - loss_whs = fluid.layers.reduce_mean(loss_w + loss_h) - loss_objs = fluid.layers.reduce_mean(loss_obj_pos + loss_obj_neg) - loss_clss = fluid.layers.reduce_mean(loss_cls) + loss_xys = paddle.mean(loss_x + loss_y) + loss_whs = paddle.mean(loss_w + loss_h) + loss_objs = paddle.mean(loss_obj_pos + loss_obj_neg) + loss_clss = paddle.mean(loss_cls) losses_all = { - "loss_xy": fluid.layers.sum(loss_xys), - "loss_wh": fluid.layers.sum(loss_whs), - "loss_loc": fluid.layers.sum(loss_xys) + fluid.layers.sum(loss_whs), - "loss_obj": fluid.layers.sum(loss_objs), - "loss_cls": fluid.layers.sum(loss_clss), + "loss_xy": paddle.sum(loss_xys), + "loss_wh": paddle.sum(loss_whs), + "loss_loc": paddle.sum(loss_xys) + paddle.sum(loss_whs), + "loss_obj": paddle.sum(loss_objs), + "loss_cls": paddle.sum(loss_clss), } return losses_all, x, y, tx, ty diff --git a/static/ppdet/utils/__init__.py b/ppdet/optimizer/__init__.py similarity index 82% rename from static/ppdet/utils/__init__.py rename to ppdet/optimizer/__init__.py index d0c32e26092f6ea25771279418582a24ea449ab2..61737923ef3dfe25eed969e1f5807e61c9094758 100644 --- a/static/ppdet/utils/__init__.py +++ b/ppdet/optimizer/__init__.py @@ -1,4 +1,4 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -11,3 +11,6 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. + +from .optimizer import * +from .ema import ModelEMA diff --git a/ppdet/optimizer/adamw.py b/ppdet/optimizer/adamw.py new file mode 100644 index 0000000000000000000000000000000000000000..821135da02a9368de407593877987101228daf35 --- /dev/null +++ b/ppdet/optimizer/adamw.py @@ -0,0 +1,244 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from paddle.optimizer import AdamW +from functools import partial +import re + + +def layerwise_lr_decay(decay_rate, name_dict, n_layers, param): + """ + Args: + decay_rate (float): + The layer-wise decay ratio. + name_dict (dict): + The keys of name_dict is dynamic name of model while the value + of name_dict is static name. + Use model.named_parameters() to get name_dict. + n_layers (int): + Total number of layers in the transformer encoder. + """ + ratio = 1.0 + static_name = name_dict[param.name] + if 'blocks.' in static_name or 'layers.' in static_name: + idx_1 = static_name.find('blocks.') + idx_2 = static_name.find('layers.') + assert any([x >= 0 for x in [idx_1, idx_2]]), '' + idx = idx_1 if idx_1 >= 0 else idx_2 + # idx = re.findall('[blocks|layers]\.(\d+)\.', static_name)[0] + + layer = int(static_name[idx:].split('.')[1]) + ratio = decay_rate**(n_layers - layer) + + elif 'cls_token' in static_name or 'patch_embed' in static_name: + ratio = decay_rate**(n_layers + 1) + + param.optimize_attr['learning_rate'] *= ratio + + +class AdamWDL(AdamW): + r""" + The AdamWDL optimizer is implemented based on the AdamW Optimization with dynamic lr setting. + Generally it's used for transformer model. + + We use "layerwise_lr_decay" as default dynamic lr setting method of AdamWDL. + “Layer-wise decay” means exponentially decaying the learning rates of individual + layers in a top-down manner. For example, suppose the 24-th layer uses a learning + rate l, and the Layer-wise decay rate is α, then the learning rate of layer m + is lα^(24-m). See more details on: https://arxiv.org/abs/1906.08237. + + .. math:: + & t = t + 1 + + & moment\_1\_out = {\beta}_1 * moment\_1 + (1 - {\beta}_1) * grad + + & moment\_2\_out = {\beta}_2 * moment\_2 + (1 - {\beta}_2) * grad * grad + + & learning\_rate = learning\_rate * \frac{\sqrt{1 - {\beta}_2^t}}{1 - {\beta}_1^t} + + & param\_out = param - learning\_rate * (\frac{moment\_1}{\sqrt{moment\_2} + \epsilon} + \lambda * param) + + Args: + learning_rate (float|LRScheduler, optional): The learning rate used to update ``Parameter``. + It can be a float value or a LRScheduler. The default value is 0.001. + beta1 (float, optional): The exponential decay rate for the 1st moment estimates. + It should be a float number or a Tensor with shape [1] and data type as float32. + The default value is 0.9. + beta2 (float, optional): The exponential decay rate for the 2nd moment estimates. + It should be a float number or a Tensor with shape [1] and data type as float32. + The default value is 0.999. + epsilon (float, optional): A small float value for numerical stability. + It should be a float number or a Tensor with shape [1] and data type as float32. + The default value is 1e-08. + parameters (list|tuple, optional): List/Tuple of ``Tensor`` to update to minimize ``loss``. \ + This parameter is required in dygraph mode. \ + The default value is None in static mode, at this time all parameters will be updated. + weight_decay (float, optional): The weight decay coefficient, it can be float or Tensor. The default value is 0.01. + apply_decay_param_fun (function|None, optional): If it is not None, + only tensors that makes apply_decay_param_fun(Tensor.name)==True + will be updated. It only works when we want to specify tensors. + Default: None. + grad_clip (GradientClipBase, optional): Gradient cliping strategy, it's an instance of + some derived class of ``GradientClipBase`` . There are three cliping strategies + ( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` , + :ref:`api_fluid_clip_GradientClipByValue` ). Default None, meaning there is no gradient clipping. + lazy_mode (bool, optional): The official Adam algorithm has two moving-average accumulators. + The accumulators are updated at every step. Every element of the two moving-average + is updated in both dense mode and sparse mode. If the size of parameter is very large, + then the update may be very slow. The lazy mode only update the element that has + gradient in current mini-batch, so it will be much more faster. But this mode has + different semantics with the original Adam algorithm and may lead to different result. + The default value is False. + multi_precision (bool, optional): Whether to use multi-precision during weight updating. Default is false. + layerwise_decay (float, optional): The layer-wise decay ratio. Defaults to 1.0. + n_layers (int, optional): The total number of encoder layers. Defaults to 12. + set_param_lr_fun (function|None, optional): If it's not None, set_param_lr_fun() will set the the parameter + learning rate before it executes Adam Operator. Defaults to :ref:`layerwise_lr_decay`. + name_dict (dict, optional): The keys of name_dict is dynamic name of model while the value + of name_dict is static name. Use model.named_parameters() to get name_dict. + name (str, optional): Normally there is no need for user to set this property. + For more information, please refer to :ref:`api_guide_Name`. + The default value is None. + + Examples: + .. code-block:: python + + import paddle + from paddlenlp.ops.optimizer import AdamWDL + def simple_lr_setting(decay_rate, name_dict, n_layers, param): + ratio = 1.0 + static_name = name_dict[param.name] + if "weight" in static_name: + ratio = decay_rate**0.5 + param.optimize_attr["learning_rate"] *= ratio + + linear = paddle.nn.Linear(10, 10) + + name_dict = dict() + for n, p in linear.named_parameters(): + name_dict[p.name] = n + + inp = paddle.rand([10,10], dtype="float32") + out = linear(inp) + loss = paddle.mean(out) + + adamwdl = AdamWDL( + learning_rate=1e-4, + parameters=linear.parameters(), + set_param_lr_fun=simple_lr_setting, + layerwise_decay=0.8, + name_dict=name_dict) + + loss.backward() + adamwdl.step() + adamwdl.clear_grad() + """ + + def __init__(self, + learning_rate=0.001, + beta1=0.9, + beta2=0.999, + epsilon=1e-8, + parameters=None, + weight_decay=0.01, + apply_decay_param_fun=None, + grad_clip=None, + lazy_mode=False, + multi_precision=False, + layerwise_decay=1.0, + n_layers=12, + set_param_lr_func=None, + name_dict=None, + name=None): + if not isinstance(layerwise_decay, float): + raise TypeError("coeff should be float or Tensor.") + self.layerwise_decay = layerwise_decay + self.n_layers = n_layers + self.set_param_lr_func = partial( + set_param_lr_func, layerwise_decay, name_dict, + n_layers) if set_param_lr_func is not None else set_param_lr_func + super(AdamWDL, self).__init__( + learning_rate=learning_rate, + parameters=parameters, + beta1=beta1, + beta2=beta2, + epsilon=epsilon, + grad_clip=grad_clip, + name=name, + apply_decay_param_fun=apply_decay_param_fun, + weight_decay=weight_decay, + lazy_mode=lazy_mode, + multi_precision=multi_precision) + + def _append_optimize_op(self, block, param_and_grad): + if self.set_param_lr_func is None: + return super(AdamWDL, self)._append_optimize_op(block, + param_and_grad) + + self._append_decoupled_weight_decay(block, param_and_grad) + prev_lr = param_and_grad[0].optimize_attr["learning_rate"] + self.set_param_lr_func(param_and_grad[0]) + # excute Adam op + res = super(AdamW, self)._append_optimize_op(block, param_and_grad) + param_and_grad[0].optimize_attr["learning_rate"] = prev_lr + return res + + +def build_adamwdl(model, + lr=1e-4, + weight_decay=0.05, + betas=(0.9, 0.999), + layer_decay=0.65, + num_layers=None, + filter_bias_and_bn=True, + skip_decay_names=None, + set_param_lr_func='layerwise_lr_decay'): + + if skip_decay_names and filter_bias_and_bn: + decay_dict = { + param.name: not (len(param.shape) == 1 or name.endswith('.bias') or + any([_n in name for _n in skip_decay_names])) + for name, param in model.named_parameters() + } + parameters = [p for p in model.parameters()] + + else: + parameters = model.parameters() + + opt_args = dict( + parameters=parameters, learning_rate=lr, weight_decay=weight_decay) + + if decay_dict is not None: + opt_args['apply_decay_param_fun'] = lambda n: decay_dict[n] + + if isinstance(set_param_lr_func, str): + set_param_lr_func = eval(set_param_lr_func) + opt_args['set_param_lr_func'] = set_param_lr_func + + opt_args['beta1'] = betas[0] + opt_args['beta2'] = betas[1] + + opt_args['layerwise_decay'] = layer_decay + name_dict = {p.name: n for n, p in model.named_parameters()} + + opt_args['name_dict'] = name_dict + opt_args['n_layers'] = num_layers + + optimizer = AdamWDL(**opt_args) + + return optimizer diff --git a/ppdet/optimizer/ema.py b/ppdet/optimizer/ema.py new file mode 100644 index 0000000000000000000000000000000000000000..bd8cb825ca0ecd33ca174acea7adb7ad37ba6185 --- /dev/null +++ b/ppdet/optimizer/ema.py @@ -0,0 +1,110 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import paddle +import weakref + + +class ModelEMA(object): + """ + Exponential Weighted Average for Deep Neutal Networks + Args: + model (nn.Layer): Detector of model. + decay (int): The decay used for updating ema parameter. + Ema's parameter are updated with the formula: + `ema_param = decay * ema_param + (1 - decay) * cur_param`. + Defaults is 0.9998. + ema_decay_type (str): type in ['threshold', 'normal', 'exponential'], + 'threshold' as default. + cycle_epoch (int): The epoch of interval to reset ema_param and + step. Defaults is -1, which means not reset. Its function is to + add a regular effect to ema, which is set according to experience + and is effective when the total training epoch is large. + """ + + def __init__(self, + model, + decay=0.9998, + ema_decay_type='threshold', + cycle_epoch=-1): + self.step = 0 + self.epoch = 0 + self.decay = decay + self.state_dict = dict() + for k, v in model.state_dict().items(): + self.state_dict[k] = paddle.zeros_like(v) + self.ema_decay_type = ema_decay_type + self.cycle_epoch = cycle_epoch + + self._model_state = { + k: weakref.ref(p) + for k, p in model.state_dict().items() + } + + def reset(self): + self.step = 0 + self.epoch = 0 + for k, v in self.state_dict.items(): + self.state_dict[k] = paddle.zeros_like(v) + + def resume(self, state_dict, step=0): + for k, v in state_dict.items(): + if k in self.state_dict: + if self.state_dict[k].dtype == v.dtype: + self.state_dict[k] = v + else: + self.state_dict[k] = v.astype(self.state_dict[k].dtype) + self.step = step + + def update(self, model=None): + if self.ema_decay_type == 'threshold': + decay = min(self.decay, (1 + self.step) / (10 + self.step)) + elif self.ema_decay_type == 'exponential': + decay = self.decay * (1 - math.exp(-(self.step + 1) / 2000)) + else: + decay = self.decay + self._decay = decay + + if model is not None: + model_dict = model.state_dict() + else: + model_dict = {k: p() for k, p in self._model_state.items()} + assert all( + [v is not None for _, v in model_dict.items()]), 'python gc.' + + for k, v in self.state_dict.items(): + v = decay * v + (1 - decay) * model_dict[k] + v.stop_gradient = True + self.state_dict[k] = v + self.step += 1 + + def apply(self): + if self.step == 0: + return self.state_dict + state_dict = dict() + for k, v in self.state_dict.items(): + if self.ema_decay_type != 'exponential': + v = v / (1 - self._decay**self.step) + v.stop_gradient = True + state_dict[k] = v + self.epoch += 1 + if self.cycle_epoch > 0 and self.epoch == self.cycle_epoch: + self.reset() + + return state_dict diff --git a/ppdet/optimizer.py b/ppdet/optimizer/optimizer.py similarity index 78% rename from ppdet/optimizer.py rename to ppdet/optimizer/optimizer.py index 1305b76fed48c6acd3967928ffc82b74ef5a881d..e8a0dd8c880699044a7af52a314b33bff27c683c 100644 --- a/ppdet/optimizer.py +++ b/ppdet/optimizer/optimizer.py @@ -16,8 +16,8 @@ from __future__ import absolute_import from __future__ import division from __future__ import print_function +import sys import math -import weakref import paddle import paddle.nn as nn @@ -25,6 +25,9 @@ import paddle.optimizer as optimizer import paddle.regularizer as regularizer from ppdet.core.workspace import register, serializable +import copy + +from .adamw import AdamWDL, build_adamwdl __all__ = ['LearningRate', 'OptimizerBuilder'] @@ -209,6 +212,33 @@ class BurninWarmup(object): return boundary, value +@serializable +class ExpWarmup(object): + """ + Warm up learning rate in exponential mode + Args: + steps (int): warm up steps. + epochs (int|None): use epochs as warm up steps, the priority + of `epochs` is higher than `steps`. Default: None. + """ + + def __init__(self, steps=5, epochs=None): + super(ExpWarmup, self).__init__() + self.steps = steps + self.epochs = epochs + + def __call__(self, base_lr, step_per_epoch): + boundary = [] + value = [] + warmup_steps = self.epochs * step_per_epoch if self.epochs is not None else self.steps + for i in range(warmup_steps + 1): + factor = (i / float(warmup_steps))**2 + value.append(base_lr * factor) + if i > 0: + boundary.append(i) + return boundary, value + + @register class LearningRate(object): """ @@ -225,7 +255,18 @@ class LearningRate(object): schedulers=[PiecewiseDecay(), LinearWarmup()]): super(LearningRate, self).__init__() self.base_lr = base_lr - self.schedulers = schedulers + self.schedulers = [] + + schedulers = copy.deepcopy(schedulers) + for sched in schedulers: + if isinstance(sched, dict): + # support dict sched instantiate + module = sys.modules[__name__] + type = sched.pop("name") + scheduler = getattr(module, type)(**sched) + self.schedulers.append(scheduler) + else: + self.schedulers.append(sched) def __call__(self, step_per_epoch): assert len(self.schedulers) >= 1 @@ -278,8 +319,13 @@ class OptimizerBuilder(): optim_args = self.optimizer.copy() optim_type = optim_args['type'] del optim_args['type'] + + if optim_type == 'AdamWDL': + return build_adamwdl(model, lr=learning_rate, **optim_args) + if optim_type != 'AdamW': optim_args['weight_decay'] = regularization + op = getattr(optimizer, optim_type) if 'param_groups' in optim_args: @@ -295,7 +341,8 @@ class OptimizerBuilder(): _params = { n: p for n, p in model.named_parameters() - if any([k in n for k in group['params']]) + if any([k in n + for k in group['params']]) and p.trainable is True } _group = group.copy() _group.update({'params': list(_params.values())}) @@ -304,7 +351,8 @@ class OptimizerBuilder(): visited.extend(list(_params.keys())) ext_params = [ - p for n, p in model.named_parameters() if n not in visited + p for n, p in model.named_parameters() + if n not in visited and p.trainable is True ] if len(ext_params) < len(model.parameters()): @@ -314,91 +362,10 @@ class OptimizerBuilder(): raise RuntimeError else: - params = model.parameters() + _params = model.parameters() + params = [param for param in _params if param.trainable is True] return op(learning_rate=learning_rate, parameters=params, grad_clip=grad_clip, **optim_args) - - -class ModelEMA(object): - """ - Exponential Weighted Average for Deep Neutal Networks - Args: - model (nn.Layer): Detector of model. - decay (int): The decay used for updating ema parameter. - Ema's parameter are updated with the formula: - `ema_param = decay * ema_param + (1 - decay) * cur_param`. - Defaults is 0.9998. - use_thres_step (bool): Whether set decay by thres_step or not - cycle_epoch (int): The epoch of interval to reset ema_param and - step. Defaults is -1, which means not reset. Its function is to - add a regular effect to ema, which is set according to experience - and is effective when the total training epoch is large. - """ - - def __init__(self, - model, - decay=0.9998, - use_thres_step=False, - cycle_epoch=-1): - self.step = 0 - self.epoch = 0 - self.decay = decay - self.state_dict = dict() - for k, v in model.state_dict().items(): - self.state_dict[k] = paddle.zeros_like(v) - self.use_thres_step = use_thres_step - self.cycle_epoch = cycle_epoch - - self._model_state = { - k: weakref.ref(p) - for k, p in model.state_dict().items() - } - - def reset(self): - self.step = 0 - self.epoch = 0 - for k, v in self.state_dict.items(): - self.state_dict[k] = paddle.zeros_like(v) - - def resume(self, state_dict, step=0): - for k, v in state_dict.items(): - if k in self.state_dict: - self.state_dict[k] = v - self.step = step - - def update(self, model=None): - if self.use_thres_step: - decay = min(self.decay, (1 + self.step) / (10 + self.step)) - else: - decay = self.decay - self._decay = decay - - if model is not None: - model_dict = model.state_dict() - else: - model_dict = {k: p() for k, p in self._model_state.items()} - assert all( - [v is not None for _, v in model_dict.items()]), 'python gc.' - - for k, v in self.state_dict.items(): - v = decay * v + (1 - decay) * model_dict[k] - v.stop_gradient = True - self.state_dict[k] = v - self.step += 1 - - def apply(self): - if self.step == 0: - return self.state_dict - state_dict = dict() - for k, v in self.state_dict.items(): - v = v / (1 - self._decay**self.step) - v.stop_gradient = True - state_dict[k] = v - self.epoch += 1 - if self.cycle_epoch > 0 and self.epoch == self.cycle_epoch: - self.reset() - - return state_dict diff --git a/ppdet/slim/__init__.py b/ppdet/slim/__init__.py index e71481d1c8fd61c646f4919ddb52c85020f11725..81ced2dd9f50534e1a3794bc2821d4313c5e91c2 100644 --- a/ppdet/slim/__init__.py +++ b/ppdet/slim/__init__.py @@ -35,16 +35,21 @@ def build_slim_model(cfg, slim_cfg, mode='train'): return cfg if slim_load_cfg['slim'] == 'Distill': - model = DistillModel(cfg, slim_cfg) + if "slim_method" in slim_load_cfg and slim_load_cfg[ + 'slim_method'] == "FGD": + model = FGDDistillModel(cfg, slim_cfg) + else: + model = DistillModel(cfg, slim_cfg) cfg['model'] = model + cfg['slim_type'] = cfg.slim elif slim_load_cfg['slim'] == 'OFA': load_config(slim_cfg) model = create(cfg.architecture) load_pretrain_weight(model, cfg.weights) slim = create(cfg.slim) - cfg['slim_type'] = cfg.slim - cfg['model'] = slim(model, model.state_dict()) cfg['slim'] = slim + cfg['model'] = slim(model, model.state_dict()) + cfg['slim_type'] = cfg.slim elif slim_load_cfg['slim'] == 'DistillPrune': if mode == 'train': model = DistillModel(cfg, slim_cfg) @@ -64,9 +69,9 @@ def build_slim_model(cfg, slim_cfg, mode='train'): load_config(slim_cfg) load_pretrain_weight(model, cfg.weights) slim = create(cfg.slim) - cfg['slim_type'] = cfg.slim - cfg['model'] = slim(model) cfg['slim'] = slim + cfg['model'] = slim(model) + cfg['slim_type'] = cfg.slim elif slim_load_cfg['slim'] == 'UnstructuredPruner': load_config(slim_cfg) slim = create(cfg.slim) @@ -81,7 +86,7 @@ def build_slim_model(cfg, slim_cfg, mode='train'): slim = create(cfg.slim) cfg['slim_type'] = cfg.slim # TODO: fix quant export model in framework. - if mode == 'test' and slim_load_cfg['slim'] == 'QAT': + if mode == 'test' and 'QAT' in slim_load_cfg['slim']: slim.quant_config['activation_preprocess_type'] = None cfg['model'] = slim(model) cfg['slim'] = slim diff --git a/ppdet/slim/distill.py b/ppdet/slim/distill.py index b808553dd0c0b6a8285b8090385ac6e1cc4b8e69..808713ffeb6cf2c7536309a5977b1b712b3b4320 100644 --- a/ppdet/slim/distill.py +++ b/ppdet/slim/distill.py @@ -19,6 +19,7 @@ from __future__ import print_function import paddle import paddle.nn as nn import paddle.nn.functional as F +from paddle import ParamAttr from ppdet.core.workspace import register, create, load_config from ppdet.modeling import ops @@ -63,6 +64,98 @@ class DistillModel(nn.Layer): return self.student_model(inputs) +class FGDDistillModel(nn.Layer): + """ + Build FGD distill model. + Args: + cfg: The student config. + slim_cfg: The teacher and distill config. + """ + + def __init__(self, cfg, slim_cfg): + super(FGDDistillModel, self).__init__() + + self.is_inherit = True + # build student model before load slim config + self.student_model = create(cfg.architecture) + self.arch = cfg.architecture + stu_pretrain = cfg['pretrain_weights'] + slim_cfg = load_config(slim_cfg) + self.teacher_cfg = slim_cfg + self.loss_cfg = slim_cfg + tea_pretrain = cfg['pretrain_weights'] + + self.teacher_model = create(self.teacher_cfg.architecture) + self.teacher_model.eval() + + for param in self.teacher_model.parameters(): + param.trainable = False + + if 'pretrain_weights' in cfg and stu_pretrain: + if self.is_inherit and 'pretrain_weights' in self.teacher_cfg and self.teacher_cfg.pretrain_weights: + load_pretrain_weight(self.student_model, + self.teacher_cfg.pretrain_weights) + logger.debug( + "Inheriting! loading teacher weights to student model!") + + load_pretrain_weight(self.student_model, stu_pretrain) + + if 'pretrain_weights' in self.teacher_cfg and self.teacher_cfg.pretrain_weights: + load_pretrain_weight(self.teacher_model, + self.teacher_cfg.pretrain_weights) + + self.fgd_loss_dic = self.build_loss( + self.loss_cfg.distill_loss, + name_list=self.loss_cfg['distill_loss_name']) + + def build_loss(self, + cfg, + name_list=[ + 'neck_f_4', 'neck_f_3', 'neck_f_2', 'neck_f_1', + 'neck_f_0' + ]): + loss_func = dict() + for idx, k in enumerate(name_list): + loss_func[k] = create(cfg) + return loss_func + + def forward(self, inputs): + if self.training: + s_body_feats = self.student_model.backbone(inputs) + s_neck_feats = self.student_model.neck(s_body_feats) + + with paddle.no_grad(): + t_body_feats = self.teacher_model.backbone(inputs) + t_neck_feats = self.teacher_model.neck(t_body_feats) + + loss_dict = {} + for idx, k in enumerate(self.fgd_loss_dic): + loss_dict[k] = self.fgd_loss_dic[k](s_neck_feats[idx], + t_neck_feats[idx], inputs) + if self.arch == "RetinaNet": + loss = self.student_model.head(s_neck_feats, inputs) + elif self.arch == "PicoDet": + loss = self.student_model.get_loss() + else: + raise ValueError(f"Unsupported model {self.arch}") + for k in loss_dict: + loss['loss'] += loss_dict[k] + loss[k] = loss_dict[k] + return loss + else: + body_feats = self.student_model.backbone(inputs) + neck_feats = self.student_model.neck(body_feats) + head_outs = self.student_model.head(neck_feats) + if self.arch == "RetinaNet": + bbox, bbox_num = self.student_model.head.post_process( + head_outs, inputs['im_shape'], inputs['scale_factor']) + return {'bbox': bbox, 'bbox_num': bbox_num} + elif self.arch == "PicoDet": + return self.student_model.head.get_pred() + else: + raise ValueError(f"Unsupported model {self.arch}") + + @register class DistillYOLOv3Loss(nn.Layer): def __init__(self, weight=1000): @@ -107,3 +200,254 @@ class DistillYOLOv3Loss(nn.Layer): loss = (distill_reg_loss + distill_cls_loss + distill_obj_loss ) * self.weight return loss + + +def parameter_init(mode="kaiming", value=0.): + if mode == "kaiming": + weight_attr = paddle.nn.initializer.KaimingUniform() + elif mode == "constant": + weight_attr = paddle.nn.initializer.Constant(value=value) + else: + weight_attr = paddle.nn.initializer.KaimingUniform() + + weight_init = ParamAttr(initializer=weight_attr) + return weight_init + + +@register +class FGDFeatureLoss(nn.Layer): + """ + The code is reference from https://github.com/yzd-v/FGD/blob/master/mmdet/distillation/losses/fgd.py + Paddle version of `Focal and Global Knowledge Distillation for Detectors` + + Args: + student_channels(int): The number of channels in the student's FPN feature map. Default to 256. + teacher_channels(int): The number of channels in the teacher's FPN feature map. Default to 256. + temp (float, optional): The temperature coefficient. Defaults to 0.5. + alpha_fgd (float, optional): The weight of fg_loss. Defaults to 0.001 + beta_fgd (float, optional): The weight of bg_loss. Defaults to 0.0005 + gamma_fgd (float, optional): The weight of mask_loss. Defaults to 0.001 + lambda_fgd (float, optional): The weight of relation_loss. Defaults to 0.000005 + """ + + def __init__(self, + student_channels=256, + teacher_channels=256, + temp=0.5, + alpha_fgd=0.001, + beta_fgd=0.0005, + gamma_fgd=0.001, + lambda_fgd=0.000005): + super(FGDFeatureLoss, self).__init__() + self.temp = temp + self.alpha_fgd = alpha_fgd + self.beta_fgd = beta_fgd + self.gamma_fgd = gamma_fgd + self.lambda_fgd = lambda_fgd + + kaiming_init = parameter_init("kaiming") + zeros_init = parameter_init("constant", 0.0) + + if student_channels != teacher_channels: + self.align = nn.Conv2d( + student_channels, + teacher_channels, + kernel_size=1, + stride=1, + padding=0, + weight_attr=kaiming_init) + student_channels = teacher_channels + else: + self.align = None + + self.conv_mask_s = nn.Conv2D( + student_channels, 1, kernel_size=1, weight_attr=kaiming_init) + self.conv_mask_t = nn.Conv2D( + teacher_channels, 1, kernel_size=1, weight_attr=kaiming_init) + + self.stu_conv_block = nn.Sequential( + nn.Conv2D( + student_channels, + student_channels // 2, + kernel_size=1, + weight_attr=zeros_init), + nn.LayerNorm([student_channels // 2, 1, 1]), + nn.ReLU(), + nn.Conv2D( + student_channels // 2, + student_channels, + kernel_size=1, + weight_attr=zeros_init)) + self.tea_conv_block = nn.Sequential( + nn.Conv2D( + teacher_channels, + teacher_channels // 2, + kernel_size=1, + weight_attr=zeros_init), + nn.LayerNorm([teacher_channels // 2, 1, 1]), + nn.ReLU(), + nn.Conv2D( + teacher_channels // 2, + teacher_channels, + kernel_size=1, + weight_attr=zeros_init)) + + def spatial_channel_attention(self, x, t=0.5): + shape = paddle.shape(x) + N, C, H, W = shape + + _f = paddle.abs(x) + spatial_map = paddle.reshape( + paddle.mean( + _f, axis=1, keepdim=True) / t, [N, -1]) + spatial_map = F.softmax(spatial_map, axis=1, dtype="float32") * H * W + spatial_att = paddle.reshape(spatial_map, [N, H, W]) + + channel_map = paddle.mean( + paddle.mean( + _f, axis=2, keepdim=False), axis=2, keepdim=False) + channel_att = F.softmax(channel_map / t, axis=1, dtype="float32") * C + return [spatial_att, channel_att] + + def spatial_pool(self, x, mode="teacher"): + batch, channel, width, height = x.shape + x_copy = x + x_copy = paddle.reshape(x_copy, [batch, channel, height * width]) + x_copy = x_copy.unsqueeze(1) + if mode.lower() == "student": + context_mask = self.conv_mask_s(x) + else: + context_mask = self.conv_mask_t(x) + + context_mask = paddle.reshape(context_mask, [batch, 1, height * width]) + context_mask = F.softmax(context_mask, axis=2) + context_mask = context_mask.unsqueeze(-1) + context = paddle.matmul(x_copy, context_mask) + context = paddle.reshape(context, [batch, channel, 1, 1]) + + return context + + def mask_loss(self, stu_channel_att, tea_channel_att, stu_spatial_att, + tea_spatial_att): + def _func(a, b): + return paddle.sum(paddle.abs(a - b)) / len(a) + + mask_loss = _func(stu_channel_att, tea_channel_att) + _func( + stu_spatial_att, tea_spatial_att) + + return mask_loss + + def feature_loss(self, stu_feature, tea_feature, Mask_fg, Mask_bg, + tea_channel_att, tea_spatial_att): + + Mask_fg = Mask_fg.unsqueeze(axis=1) + Mask_bg = Mask_bg.unsqueeze(axis=1) + + tea_channel_att = tea_channel_att.unsqueeze(axis=-1) + tea_channel_att = tea_channel_att.unsqueeze(axis=-1) + + tea_spatial_att = tea_spatial_att.unsqueeze(axis=1) + + fea_t = paddle.multiply(tea_feature, paddle.sqrt(tea_spatial_att)) + fea_t = paddle.multiply(fea_t, paddle.sqrt(tea_channel_att)) + fg_fea_t = paddle.multiply(fea_t, paddle.sqrt(Mask_fg)) + bg_fea_t = paddle.multiply(fea_t, paddle.sqrt(Mask_bg)) + + fea_s = paddle.multiply(stu_feature, paddle.sqrt(tea_spatial_att)) + fea_s = paddle.multiply(fea_s, paddle.sqrt(tea_channel_att)) + fg_fea_s = paddle.multiply(fea_s, paddle.sqrt(Mask_fg)) + bg_fea_s = paddle.multiply(fea_s, paddle.sqrt(Mask_bg)) + + fg_loss = F.mse_loss(fg_fea_s, fg_fea_t, reduction="sum") / len(Mask_fg) + bg_loss = F.mse_loss(bg_fea_s, bg_fea_t, reduction="sum") / len(Mask_bg) + + return fg_loss, bg_loss + + def relation_loss(self, stu_feature, tea_feature): + context_s = self.spatial_pool(stu_feature, "student") + context_t = self.spatial_pool(tea_feature, "teacher") + + out_s = stu_feature + self.stu_conv_block(context_s) + out_t = tea_feature + self.tea_conv_block(context_t) + + rela_loss = F.mse_loss(out_s, out_t, reduction="sum") / len(out_s) + + return rela_loss + + def mask_value(self, mask, xl, xr, yl, yr, value): + mask[xl:xr, yl:yr] = paddle.maximum(mask[xl:xr, yl:yr], value) + return mask + + def forward(self, stu_feature, tea_feature, inputs): + """Forward function. + Args: + stu_feature(Tensor): Bs*C*H*W, student's feature map + tea_feature(Tensor): Bs*C*H*W, teacher's feature map + inputs: The inputs with gt bbox and input shape info. + """ + assert stu_feature.shape[-2:] == stu_feature.shape[-2:], \ + f'The shape of Student feature {stu_feature.shape} and Teacher feature {tea_feature.shape} should be the same.' + assert "gt_bbox" in inputs.keys() and "im_shape" in inputs.keys( + ), "ERROR! FGDFeatureLoss need gt_bbox and im_shape as inputs." + gt_bboxes = inputs['gt_bbox'] + ins_shape = [ + inputs['im_shape'][i] for i in range(inputs['im_shape'].shape[0]) + ] + + if self.align is not None: + stu_feature = self.align(stu_feature) + + N, C, H, W = stu_feature.shape + + tea_spatial_att, tea_channel_att = self.spatial_channel_attention( + tea_feature, self.temp) + stu_spatial_att, stu_channel_att = self.spatial_channel_attention( + stu_feature, self.temp) + + Mask_fg = paddle.zeros(tea_spatial_att.shape) + Mask_bg = paddle.ones_like(tea_spatial_att) + one_tmp = paddle.ones([*tea_spatial_att.shape[1:]]) + zero_tmp = paddle.zeros([*tea_spatial_att.shape[1:]]) + wmin, wmax, hmin, hmax, area = [], [], [], [], [] + + for i in range(N): + tmp_box = paddle.ones_like(gt_bboxes[i]) + tmp_box[:, 0] = gt_bboxes[i][:, 0] / ins_shape[i][1] * W + tmp_box[:, 2] = gt_bboxes[i][:, 2] / ins_shape[i][1] * W + tmp_box[:, 1] = gt_bboxes[i][:, 1] / ins_shape[i][0] * H + tmp_box[:, 3] = gt_bboxes[i][:, 3] / ins_shape[i][0] * H + + zero = paddle.zeros_like(tmp_box[:, 0], dtype="int32") + ones = paddle.ones_like(tmp_box[:, 2], dtype="int32") + wmin.append( + paddle.cast(paddle.floor(tmp_box[:, 0]), "int32").maximum(zero)) + wmax.append(paddle.cast(paddle.ceil(tmp_box[:, 2]), "int32")) + hmin.append( + paddle.cast(paddle.floor(tmp_box[:, 1]), "int32").maximum(zero)) + hmax.append(paddle.cast(paddle.ceil(tmp_box[:, 3]), "int32")) + + area_recip = 1.0 / ( + hmax[i].reshape([1, -1]) + 1 - hmin[i].reshape([1, -1])) / ( + wmax[i].reshape([1, -1]) + 1 - wmin[i].reshape([1, -1])) + + for j in range(len(gt_bboxes[i])): + Mask_fg[i] = self.mask_value(Mask_fg[i], hmin[i][j], + hmax[i][j] + 1, wmin[i][j], + wmax[i][j] + 1, area_recip[0][j]) + + Mask_bg[i] = paddle.where(Mask_fg[i] > zero_tmp, zero_tmp, one_tmp) + + if paddle.sum(Mask_bg[i]): + Mask_bg[i] /= paddle.sum(Mask_bg[i]) + + fg_loss, bg_loss = self.feature_loss(stu_feature, tea_feature, Mask_fg, + Mask_bg, tea_channel_att, + tea_spatial_att) + mask_loss = self.mask_loss(stu_channel_att, tea_channel_att, + stu_spatial_att, tea_spatial_att) + rela_loss = self.relation_loss(stu_feature, tea_feature) + + loss = self.alpha_fgd * fg_loss + self.beta_fgd * bg_loss \ + + self.gamma_fgd * mask_loss + self.lambda_fgd * rela_loss + + return loss diff --git a/ppdet/slim/prune.py b/ppdet/slim/prune.py index 70d3de3692707b132ff398babf5a795a5a9e81ba..28ffb7588d1e596e5883072b3bd2b5e6ba80ed7f 100644 --- a/ppdet/slim/prune.py +++ b/ppdet/slim/prune.py @@ -83,3 +83,69 @@ class Pruner(object): pruned_flops, (ori_flops - pruned_flops) / ori_flops)) return model + + +@register +@serializable +class PrunerQAT(object): + def __init__(self, criterion, pruned_params, pruned_ratios, + print_prune_params, quant_config, print_qat_model): + super(PrunerQAT, self).__init__() + assert criterion in ['l1_norm', 'fpgm'], \ + "unsupported prune criterion: {}".format(criterion) + # Pruner hyperparameter + self.criterion = criterion + self.pruned_params = pruned_params + self.pruned_ratios = pruned_ratios + self.print_prune_params = print_prune_params + # QAT hyperparameter + self.quant_config = quant_config + self.print_qat_model = print_qat_model + + def __call__(self, model): + # FIXME: adapt to network graph when Training and inference are + # inconsistent, now only supports prune inference network graph. + model.eval() + paddleslim = try_import('paddleslim') + from paddleslim.analysis import dygraph_flops as flops + input_spec = [{ + "image": paddle.ones( + shape=[1, 3, 640, 640], dtype='float32'), + "im_shape": paddle.full( + [1, 2], 640, dtype='float32'), + "scale_factor": paddle.ones( + shape=[1, 2], dtype='float32') + }] + if self.print_prune_params: + print_prune_params(model) + + ori_flops = flops(model, input_spec) / 1000 + logger.info("FLOPs before pruning: {}GFLOPs".format(ori_flops)) + if self.criterion == 'fpgm': + pruner = paddleslim.dygraph.FPGMFilterPruner(model, input_spec) + elif self.criterion == 'l1_norm': + pruner = paddleslim.dygraph.L1NormFilterPruner(model, input_spec) + + logger.info("pruned params: {}".format(self.pruned_params)) + pruned_ratios = [float(n) for n in self.pruned_ratios] + ratios = {} + for i, param in enumerate(self.pruned_params): + ratios[param] = pruned_ratios[i] + pruner.prune_vars(ratios, [0]) + pruned_flops = flops(model, input_spec) / 1000 + logger.info("FLOPs after pruning: {}GFLOPs; pruned ratio: {}".format( + pruned_flops, (ori_flops - pruned_flops) / ori_flops)) + + self.quanter = paddleslim.dygraph.quant.QAT(config=self.quant_config) + + self.quanter.quantize(model) + + if self.print_qat_model: + logger.info("Quantized model:") + logger.info(model) + + return model + + def save_quantized_model(self, layer, path, input_spec=None, **config): + self.quanter.save_quantized_model( + model=layer, path=path, input_spec=input_spec, **config) diff --git a/ppdet/slim/quant.py b/ppdet/slim/quant.py index ab81127aea9eced229122858c0e5912b80caee85..44508198c46b77485d61e2b4e4d2804c62f96622 100644 --- a/ppdet/slim/quant.py +++ b/ppdet/slim/quant.py @@ -38,6 +38,11 @@ class QAT(object): logger.info("Model before quant:") logger.info(model) + # For PP-YOLOE, convert model to deploy firstly. + for layer in model.sublayers(): + if hasattr(layer, 'convert_to_deploy'): + layer.convert_to_deploy() + self.quanter.quantize(model) if self.print_model: diff --git a/ppdet/utils/check.py b/ppdet/utils/check.py index 45da8857e50e59fafde91b6875eff738a9a7a286..58c48806c82d8616d65f02fec80725dadd45d52b 100644 --- a/ppdet/utils/check.py +++ b/ppdet/utils/check.py @@ -20,7 +20,7 @@ import sys import paddle import six -import paddle.version as fluid_version +import paddle.version as paddle_version from .logger import setup_logger logger = setup_logger(__name__) @@ -97,8 +97,8 @@ def check_version(version='2.0'): "Please make sure the version is good with your code.".format(version) version_installed = [ - fluid_version.major, fluid_version.minor, fluid_version.patch, - fluid_version.rc + paddle_version.major, paddle_version.minor, paddle_version.patch, + paddle_version.rc ] if version_installed == ['0', '0', '0', '0']: return diff --git a/ppdet/utils/checkpoint.py b/ppdet/utils/checkpoint.py index e4325de8bb3988495fe90b3ab078805718408cbc..add087c890d4fbe82ecaec5635c19fc2c2090059 100644 --- a/ppdet/utils/checkpoint.py +++ b/ppdet/utils/checkpoint.py @@ -84,9 +84,14 @@ def load_weight(model, weight, optimizer=None, ema=None): model_weight = {} incorrect_keys = 0 - for key in model_dict.keys(): + for key, value in model_dict.items(): if key in param_state_dict.keys(): - model_weight[key] = param_state_dict[key] + if isinstance(param_state_dict[key], np.ndarray): + param_state_dict[key] = paddle.to_tensor(param_state_dict[key]) + if value.dtype == param_state_dict[key].dtype: + model_weight[key] = param_state_dict[key] + else: + model_weight[key] = param_state_dict[key].astype(value.dtype) else: logger.info('Unmatched key: {}'.format(key)) incorrect_keys += 1 @@ -209,6 +214,12 @@ def load_pretrain_weight(model, pretrain_weight): param_state_dict = paddle.load(weights_path) param_state_dict = match_state_dict(model_dict, param_state_dict) + for k, v in param_state_dict.items(): + if isinstance(v, np.ndarray): + v = paddle.to_tensor(v) + if model_dict[k].dtype != v.dtype: + param_state_dict[k] = v.astype(model_dict[k].dtype) + model.set_dict(param_state_dict) logger.info('Finish loading model weights: {}'.format(weights_path)) diff --git a/ppdet/utils/cli.py b/ppdet/utils/cli.py index b8ba59d78f1ddf606012fd0cf6d71a71d79eea05..2c5acc0e591af4bbd07a1d22e1237656ac47da65 100644 --- a/ppdet/utils/cli.py +++ b/ppdet/utils/cli.py @@ -81,6 +81,13 @@ class ArgsParser(ArgumentParser): return config +def merge_args(config, args, exclude_args=['config', 'opt', 'slim_config']): + for k, v in vars(args).items(): + if k not in exclude_args: + config[k] = v + return config + + def print_total_cfg(config): modules = get_registered_modules() color_tty = ColorTTY() diff --git a/ppdet/utils/download.py b/ppdet/utils/download.py index 54a74c92e9d768463107d5c81882ca140f81f516..71720f5e058df4335c8cde85d63eb615ff20cfca 100644 --- a/ppdet/utils/download.py +++ b/ppdet/utils/download.py @@ -97,7 +97,7 @@ DATASETS = { '49ce5a9b5ad0d6266163cd01de4b018e', ), ], ['annotations', 'images']), 'spine_coco': ([( 'https://paddledet.bj.bcebos.com/data/spine_coco.tar', - '7ed69ae73f842cd2a8cf4f58dc3c5535', ), ], ['annotations', 'images']), + '03030f42d9b6202a6e425d4becefda0d', ), ], ['annotations', 'images']), 'mot': (), 'objects365': (), 'coco_ce': ([( @@ -393,7 +393,12 @@ def _download(url, path, md5sum=None): def _download_dist(url, path, md5sum=None): env = os.environ if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env: - trainer_id = int(env['PADDLE_TRAINER_ID']) + # Mainly used to solve the problem of downloading data from + # different machines in the case of multiple machines. + # Different nodes will download data, and the same node + # will only download data once. + # Reference https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/utils/download.py#L108 + rank_id_curr_node = int(os.environ.get("PADDLE_RANK_IN_NODE", 0)) num_trainers = int(env['PADDLE_TRAINERS_NUM']) if num_trainers <= 1: return _download(url, path, md5sum) @@ -406,12 +411,9 @@ def _download_dist(url, path, md5sum=None): os.makedirs(path) if not osp.exists(fullname): - from paddle.distributed import ParallelEnv - unique_endpoints = _get_unique_endpoints(ParallelEnv() - .trainer_endpoints[:]) with open(lock_path, 'w'): # touch os.utime(lock_path, None) - if ParallelEnv().current_endpoint in unique_endpoints: + if rank_id_curr_node == 0: _download(url, path, md5sum) os.remove(lock_path) else: diff --git a/ppdet/utils/fuse_utils.py b/ppdet/utils/fuse_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..647fa995da615fcb2bcdca13f4296f73e3204628 --- /dev/null +++ b/ppdet/utils/fuse_utils.py @@ -0,0 +1,179 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import paddle +import paddle.nn as nn + +__all__ = ['fuse_conv_bn'] + + +def fuse_conv_bn(model): + is_train = False + if model.training: + model.eval() + is_train = True + fuse_list = [] + tmp_pair = [None, None] + for name, layer in model.named_sublayers(): + if isinstance(layer, nn.Conv2D): + tmp_pair[0] = name + if isinstance(layer, nn.BatchNorm2D): + tmp_pair[1] = name + + if tmp_pair[0] and tmp_pair[1] and len(tmp_pair) == 2: + fuse_list.append(tmp_pair) + tmp_pair = [None, None] + model = fuse_layers(model, fuse_list) + if is_train: + model.train() + return model + + +def find_parent_layer_and_sub_name(model, name): + """ + Given the model and the name of a layer, find the parent layer and + the sub_name of the layer. + For example, if name is 'block_1/convbn_1/conv_1', the parent layer is + 'block_1/convbn_1' and the sub_name is `conv_1`. + Args: + model(paddle.nn.Layer): the model to be quantized. + name(string): the name of a layer + + Returns: + parent_layer, subname + """ + assert isinstance(model, nn.Layer), \ + "The model must be the instance of paddle.nn.Layer." + assert len(name) > 0, "The input (name) should not be empty." + + last_idx = 0 + idx = 0 + parent_layer = model + while idx < len(name): + if name[idx] == '.': + sub_name = name[last_idx:idx] + if hasattr(parent_layer, sub_name): + parent_layer = getattr(parent_layer, sub_name) + last_idx = idx + 1 + idx += 1 + sub_name = name[last_idx:idx] + return parent_layer, sub_name + + +class Identity(nn.Layer): + '''a layer to replace bn or relu layers''' + + def __init__(self, *args, **kwargs): + super(Identity, self).__init__() + + def forward(self, input): + return input + + +def fuse_layers(model, layers_to_fuse, inplace=False): + ''' + fuse layers in layers_to_fuse + + Args: + model(nn.Layer): The model to be fused. + layers_to_fuse(list): The layers' names to be fused. For + example,"fuse_list = [["conv1", "bn1"], ["conv2", "bn2"]]". + A TypeError would be raised if "fuse" was set as + True but "fuse_list" was None. + Default: None. + inplace(bool): Whether apply fusing to the input model. + Default: False. + + Return + fused_model(paddle.nn.Layer): The fused model. + ''' + if not inplace: + model = copy.deepcopy(model) + for layers_list in layers_to_fuse: + layer_list = [] + for layer_name in layers_list: + parent_layer, sub_name = find_parent_layer_and_sub_name(model, + layer_name) + layer_list.append(getattr(parent_layer, sub_name)) + new_layers = _fuse_func(layer_list) + for i, item in enumerate(layers_list): + parent_layer, sub_name = find_parent_layer_and_sub_name(model, item) + setattr(parent_layer, sub_name, new_layers[i]) + return model + + +def _fuse_func(layer_list): + '''choose the fuser method and fuse layers''' + types = tuple(type(m) for m in layer_list) + fusion_method = types_to_fusion_method.get(types, None) + new_layers = [None] * len(layer_list) + fused_layer = fusion_method(*layer_list) + for handle_id, pre_hook_fn in layer_list[0]._forward_pre_hooks.items(): + fused_layer.register_forward_pre_hook(pre_hook_fn) + del layer_list[0]._forward_pre_hooks[handle_id] + for handle_id, hook_fn in layer_list[-1]._forward_post_hooks.items(): + fused_layer.register_forward_post_hook(hook_fn) + del layer_list[-1]._forward_post_hooks[handle_id] + new_layers[0] = fused_layer + for i in range(1, len(layer_list)): + identity = Identity() + identity.training = layer_list[0].training + new_layers[i] = identity + return new_layers + + +def _fuse_conv_bn(conv, bn): + '''fuse conv and bn for train or eval''' + assert(conv.training == bn.training),\ + "Conv and BN both must be in the same mode (train or eval)." + if conv.training: + assert bn._num_features == conv._out_channels, 'Output channel of Conv2d must match num_features of BatchNorm2d' + raise NotImplementedError + else: + return _fuse_conv_bn_eval(conv, bn) + + +def _fuse_conv_bn_eval(conv, bn): + '''fuse conv and bn for eval''' + assert (not (conv.training or bn.training)), "Fusion only for eval!" + fused_conv = copy.deepcopy(conv) + + fused_weight, fused_bias = _fuse_conv_bn_weights( + fused_conv.weight, fused_conv.bias, bn._mean, bn._variance, bn._epsilon, + bn.weight, bn.bias) + fused_conv.weight.set_value(fused_weight) + if fused_conv.bias is None: + fused_conv.bias = paddle.create_parameter( + shape=[fused_conv._out_channels], is_bias=True, dtype=bn.bias.dtype) + fused_conv.bias.set_value(fused_bias) + return fused_conv + + +def _fuse_conv_bn_weights(conv_w, conv_b, bn_rm, bn_rv, bn_eps, bn_w, bn_b): + '''fuse weights and bias of conv and bn''' + if conv_b is None: + conv_b = paddle.zeros_like(bn_rm) + if bn_w is None: + bn_w = paddle.ones_like(bn_rm) + if bn_b is None: + bn_b = paddle.zeros_like(bn_rm) + bn_var_rsqrt = paddle.rsqrt(bn_rv + bn_eps) + conv_w = conv_w * \ + (bn_w * bn_var_rsqrt).reshape([-1] + [1] * (len(conv_w.shape) - 1)) + conv_b = (conv_b - bn_rm) * bn_var_rsqrt * bn_w + bn_b + return conv_w, conv_b + + +types_to_fusion_method = {(nn.Conv2D, nn.BatchNorm2D): _fuse_conv_bn, } diff --git a/requirements.txt b/requirements.txt index 91c79fc0f396546bb86f26abbaebd4a503d2ebbe..ae1b657ac7d37c3e5ae9c2230030f1f27c55504e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,6 +1,6 @@ tqdm -typeguard ; python_version >= '3.4' -visualdl>=2.1.0 ; python_version <= '3.7' +typeguard +visualdl>=2.2.0 opencv-python PyYAML shapely @@ -8,10 +8,13 @@ scipy terminaltables Cython pycocotools -#xtcocotools==1.6 #only for crowdpose -setuptools>=42.0.0 +setuptools + +# for vehicleplate +pyclipper + +# for mot lap -sklearn motmetrics -openpyxl -cython_bbox +sklearn +filterpy diff --git a/scripts/build_wheel.sh b/scripts/build_wheel.sh index 6fa9c0b20b7bf622543a8abc42a76bdb722a20e8..c3445cd750647629c36131fb4b95981713ba02d8 100644 --- a/scripts/build_wheel.sh +++ b/scripts/build_wheel.sh @@ -91,9 +91,8 @@ function unittest() { if [ $? != 0 ]; then exit 1 fi - find "../ppdet" -name 'tests' -type d -print0 | \ - xargs -0 -I{} -n1 bash -c \ - 'python -m unittest discover -v -s {}' + find "../ppdet" -wholename '*tests/test_*' -type f -print0 | \ + xargs -0 -I{} -n1 -t bash -c 'python -u -s {}' # clean TEST_DIR cd .. diff --git a/static/LICENSE b/static/LICENSE deleted file mode 100644 index 261eeb9e9f8b2b4b0d119366dda99c6fd7d35c64..0000000000000000000000000000000000000000 --- a/static/LICENSE +++ /dev/null @@ -1,201 +0,0 @@ - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "[]" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright [yyyy] [name of copyright owner] - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. diff --git a/static/README.md b/static/README.md deleted file mode 120000 index 4015683cfa5969297febc12e7ca1264afabbc0b5..0000000000000000000000000000000000000000 --- a/static/README.md +++ /dev/null @@ -1 +0,0 @@ -README_cn.md \ No newline at end of file diff --git a/static/README_cn.md b/static/README_cn.md deleted file mode 100644 index 9a9696ea2ee2b79e67cf681ec786adaf563a4012..0000000000000000000000000000000000000000 --- a/static/README_cn.md +++ /dev/null @@ -1,261 +0,0 @@ -简体中文 | [English](README_en.md) - -文档:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io) - -# 简介 - -PaddleDetection飞桨目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。 - -PaddleDetection模块化地实现了多种主流目标检测算法,提供了丰富的数据增强策略、网络模块组件(如骨干网络)、损失函数等,并集成了模型压缩和跨平台高性能部署能力。 - -经过长时间产业实践打磨,PaddleDetection已拥有顺畅、卓越的使用体验,被工业质检、遥感图像检测、无人巡检、新零售、互联网、科研等十多个行业的开发者广泛应用。 - -
    - -
    - -### 产品动态 -- 2021.02.07: 发布release/2.0-rc版本,PaddleDetection动态图试用版本,详情参考[PaddleDetection动态图](dygraph)。 -- 2020.11.20: 发布release/0.5版本,详情请参考[版本更新文档](docs/CHANGELOG.md)。 -- 2020.11.10: 添加实例分割模型[SOLOv2](configs/solov2),在Tesla V100上达到38.6 FPS, COCO-val数据集上mask ap达到38.8,预测速度提高24%,mAP提高2.4个百分点。 -- 2020.10.30: PP-YOLO支持矩形图像输入,并新增PACT模型量化策略。 -- 2020.09.30: 发布[移动端检测demo](deploy/android_demo),可直接扫码安装体验。 -- 2020.09.21-27: 【目标检测7日打卡课】手把手教你从入门到进阶,深入了解目标检测算法的前世今生。立即加入课程QQ交流群(1136406895)一起学习吧 :) -- 2020.07.24: 发布**产业最实用**目标检测模型 [PP-YOLO](https://arxiv.org/abs/2007.12099) ,深入考虑产业应用对精度速度的双重面诉求,COCO数据集精度45.2%(最新45.9%),Tesla V100预测速度72.9 FPS,详细信息见[文档](configs/ppyolo/README_cn.md)。 -- 2020.06.11: 发布676类大规模服务器端实用目标检测模型,适用于绝大部分使用场景,可以直接用来预测,也可以用于微调其他任务。 - -### 特性 - -- **模型丰富**: 包含**目标检测**、**实例分割**、**人脸检测**等**100+个预训练模型**,涵盖多种**全球竞赛冠军**方案 -- **使用简洁**:模块化设计,解耦各个网络组件,开发者轻松搭建、试用各种检测模型及优化策略,快速得到高性能、定制化的算法。 -- **端到端打通**: 从数据增强、组网、训练、压缩、部署端到端打通,并完备支持**云端**/**边缘端**多架构、多设备部署。 -- **高性能**: 基于飞桨的高性能内核,模型训练速度及显存占用优势明显。支持FP16训练, 支持多机训练。 - -#### 套件结构概览 - - - - - - - - - - - - - - - - - - - - -
    - Architectures - - Backbones - - Components - - Data Augmentation -
    -
    • Two-Stage Detection
    • -
        -
      • Faster RCNN
      • -
      • FPN
      • -
      • Cascade-RCNN
      • -
      • Libra RCNN
      • -
      • Hybrid Task RCNN
      • -
      • PSS-Det RCNN
      • -
      -
    -
    • One-Stage Detection
    • -
        -
      • RetinaNet
      • -
      • YOLOv3
      • -
      • YOLOv4
      • -
      • PP-YOLO
      • -
      • SSD
      • -
      -
    -
    • Anchor Free
    • -
        -
      • CornerNet-Squeeze
      • -
      • FCOS
      • -
      • TTFNet
      • -
      -
    -
      -
    • Instance Segmentation
    • -
        -
      • Mask RCNN
      • -
      • SOLOv2
      • -
      -
    -
      -
    • Face-Detction
    • -
        -
      • FaceBoxes
      • -
      • BlazeFace
      • -
      • BlazeFace-NAS
      • -
      -
    -
    -
      -
    • ResNet(&vd)
    • -
    • ResNeXt(&vd)
    • -
    • SENet
    • -
    • Res2Net
    • -
    • HRNet
    • -
    • Hourglass
    • -
    • CBNet
    • -
    • GCNet
    • -
    • DarkNet
    • -
    • CSPDarkNet
    • -
    • VGG
    • -
    • MobileNetv1/v3
    • -
    • GhostNet
    • -
    • Efficientnet
    • -
    -
    -
    • Common
    • -
        -
      • Sync-BN
      • -
      • Group Norm
      • -
      • DCNv2
      • -
      • Non-local
      • -
      -
    -
    • FPN
    • -
        -
      • BiFPN
      • -
      • BFP
      • -
      • HRFPN
      • -
      • ACFPN
      • -
      -
    -
    • Loss
    • -
        -
      • Smooth-L1
      • -
      • GIoU/DIoU/CIoU
      • -
      • IoUAware
      • -
      -
    -
    • Post-processing
    • -
        -
      • SoftNMS
      • -
      • MatrixNMS
      • -
      -
    -
    • Speed
    • -
        -
      • FP16 training
      • -
      • Multi-machine training
      • -
      -
    -
    -
      -
    • Resize
    • -
    • Flipping
    • -
    • Expand
    • -
    • Crop
    • -
    • Color Distort
    • -
    • Random Erasing
    • -
    • Mixup
    • -
    • Cutmix
    • -
    • Grid Mask
    • -
    • Auto Augment
    • -
    -
    - -#### 模型性能概览 - -各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。 - -
    - -
    - -**说明:** -- `CBResNet`为`Cascade-Faster-RCNN-CBResNet200vd-FPN`模型,COCO数据集mAP高达53.3% -- `Cascade-Faster-RCNN`为`Cascade-Faster-RCNN-ResNet50vd-DCN`,PaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPS -- PaddleDetection增强版`YOLOv3-ResNet50vd-DCN`在COCO数据集mAP高于原作10.6个绝对百分点,推理速度为61.3FPS,快于原作约70% -- 图中模型均可在[模型库](#模型库)中获取 - - -## 文档教程 - -### 入门教程 - -- [安装说明](docs/tutorials/INSTALL_cn.md) -- [快速开始](docs/tutorials/QUICK_STARTED_cn.md) -- [如何准备数据](docs/tutorials/PrepareDataSet.md) -- [训练/评估/预测/部署流程](docs/tutorials/DetectionPipeline.md) -- [如何自定义数据集](docs/tutorials/Custom_DataSet.md) -- [常见问题汇总](docs/FAQ.md) - -### 进阶教程 -- 参数配置 - - [配置模块设计和介绍](docs/advanced_tutorials/config_doc/CONFIG_cn.md) - - [RCNN参数说明](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md) - - [YOLOv3参数说明](docs/advanced_tutorials/config_doc/yolov3_mobilenet_v1.md) -- 迁移学习 - - [如何加载预训练](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md) -- 模型压缩(基于[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)) - - [压缩benchmark](slim) - - [量化](slim/quantization), [剪枝](slim/prune), [蒸馏](slim/distillation), [搜索](slim/nas) -- 推理部署 - - [模型导出教程](docs/advanced_tutorials/deploy/EXPORT_MODEL.md) - - [服务器端Python部署](deploy/python) - - [服务器端C++部署](deploy/cpp) - - [移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo) - - [在线Serving部署](deploy/serving) - - [推理Benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md) -- 进阶开发 - - [新增数据预处理](docs/advanced_tutorials/READER.md) - - [新增检测算法](docs/advanced_tutorials/MODEL_TECHNICAL.md) - - -## 模型库 - -- 通用目标检测: - - [模型库和基线](docs/MODEL_ZOO_cn.md) - - [移动端模型](configs/mobile/README.md) - - [Anchor Free](configs/anchor_free/README.md) - - [PP-YOLO模型](configs/ppyolo/README_cn.md) - - [676类目标检测](docs/featured_model/LARGE_SCALE_DET_MODEL.md) - - [两阶段实用模型PSS-Det](configs/rcnn_enhance/README.md) -- 通用实例分割: - - [SOLOv2](configs/solov2/README.md) -- 垂类领域 - - [人脸检测](docs/featured_model/FACE_DETECTION.md) - - [行人检测](docs/featured_model/CONTRIB_cn.md) - - [车辆检测](docs/featured_model/CONTRIB_cn.md) -- 比赛方案 - - [Objects365 2019 Challenge夺冠模型](docs/featured_model/champion_model/CACascadeRCNN.md) - - [Open Images 2019-Object Detction比赛最佳单模型](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) - -## 应用案例 - -- [人像圣诞特效自动生成工具](application/christmas) - -## 第三方教程推荐 - -- [PaddleDetection在Windows下的部署(一)](https://zhuanlan.zhihu.com/p/268657833) -- [PaddleDetection在Windows下的部署(二)](https://zhuanlan.zhihu.com/p/280206376) -- [Jetson Nano上部署PaddleDetection经验分享](https://zhuanlan.zhihu.com/p/319371293) -- [安全帽检测YOLOv3模型在树莓派上的部署](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/yolov3_for_raspi.md) -- [使用SSD-MobileNetv1完成一个项目--准备数据集到完成树莓派部署](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/ssd_mobilenet_v1_for_raspi.md) - -## 版本更新 -v2.0-rc版本已经在`02/2021`发布,新增动态图版本,支持RCNN, YOLOv3, PP-YOLO, SSD/SSDLite, FCOS, TTFNet, SOLOv2等系列模型,支持模型剪裁和量化,支持预测部署及TensorRT推理加速,详细内容请参考[版本更新文档](docs/CHANGELOG.md)。 - -## 许可证书 -本项目的发布受[Apache 2.0 license](LICENSE)许可认证。 - - -## 贡献代码 - -我们非常欢迎你可以为PaddleDetection提供代码,也十分感谢你的反馈。 diff --git a/static/README_en.md b/static/README_en.md deleted file mode 100644 index 29e689906f5f587501fe1a84f671a50e105952dc..0000000000000000000000000000000000000000 --- a/static/README_en.md +++ /dev/null @@ -1,273 +0,0 @@ -English | [简体中文](README_cn.md) - -Documentation:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io) - -# Introduction - -PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle, which aims to help developers in the whole development of constructing, training, optimizing and deploying detection models in a faster and better way. - -PaddleDetection implements varied mainstream object detection algorithms in modular design, and provides wealthy data augmentation methods, network components(such as backbones), loss functions, etc., and integrates abilities of model compression and cross-platform high-performance deployment. - -After a long time of industry practice polishing, PaddleDetection has had smooth and excellent user experience, it has been widely used by developers in more than ten industries such as industrial quality inspection, remote sensing image object detection, automatic inspection, new retail, Internet, and scientific research. - -
    - -
    - -### Product dynamic - -- 2020.11.20: Release `release/0.5` version, Please refer to [change log](docs/CHANGELOG.md) for details. -- 2020.11.10: Added [SOLOv2](configs/solov2) as an instance segmentation model, which reached 38.6 FPS on a single Tesla V100, 38.8 mask AP on Coco-Val dataset, and inference speed increased by 24% and mAP by 2.4 percentage points. -- 2020.10.30: PP-YOLO support rectangular image input, and add a new PACT quantization strategy for slim。 -- 2020.09.30: Released the [mobile-side detection demo](deploy/android_demo), and you can directly scan the code for installation experience. -- 2020.09.21-27: [Object detection 7 days of punching class] Hand in hand to teach you from the beginning to the advanced level, in-depth understanding of the object detection algorithm life. Join the course QQ group (1136406895) to study together :) -- 2020.07.24: [PP-YOLO](https://arxiv.org/abs/2007.12099), which is **the most practical** object detection model, was released, it deeply considers the double demands of industrial applications for accuracy and speed, and reached accuracy as 45.2% (the latest 45.9%) on COCO dataset, inference speed as 72.9 FPS on a single Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details. -- 2020.06.11: Publish 676 classes of large-scale server-side practical object detection models that are applicable to most application scenarios and can be used directly for prediction or for fine-tuning other tasks. - -### Features - -- **Rich Models** -PaddleDetection provides rich of models, including **100+ pre-trained models** such as **object detection**, **instance segmentation**, **face detection** etc. It covers a variety of **global competition champion** schemes. - -- **Use Concisely** -Modular design, decouple each network component, developers easily build and try various detection models and optimization strategies, quickly get high-performance, customized algorithm. - -- **Getting Through End to End** -From data augmentation, constructing models, training, compression, depolyment, get through end to end, and complete support for multi-architecture, multi-device deployment for **cloud and edge device**. - -- **High Performance:** -Based on the high performance core of PaddlePaddle, advantages of training speed and memory occupation are obvious. Support FP16 training, support multi-machine training. - -#### Overview of Kit Structures - - - - - - - - - - - - - - - - - - - - -
    - Architectures - - Backbones - - Components - - Data Augmentation -
    -
    • Two-Stage Detection
    • -
        -
      • Faster RCNN
      • -
      • FPN
      • -
      • Cascade-RCNN
      • -
      • Libra RCNN
      • -
      • Hybrid Task RCNN
      • -
      • PSS-Det RCNN
      • -
      -
    -
    • One-Stage Detection
    • -
        -
      • RetinaNet
      • -
      • YOLOv3
      • -
      • YOLOv4
      • -
      • PP-YOLO
      • -
      • SSD
      • -
      -
    -
    • Anchor Free
    • -
        -
      • CornerNet-Squeeze
      • -
      • FCOS
      • -
      • TTFNet
      • -
      -
    -
      -
    • Instance Segmentation
    • -
        -
      • Mask RCNN
      • -
      • SOLOv2
      • -
      -
    -
      -
    • Face-Detction
    • -
        -
      • FaceBoxes
      • -
      • BlazeFace
      • -
      • BlazeFace-NAS
      • -
      -
    -
    -
      -
    • ResNet(&vd)
    • -
    • ResNeXt(&vd)
    • -
    • SENet
    • -
    • Res2Net
    • -
    • HRNet
    • -
    • Hourglass
    • -
    • CBNet
    • -
    • GCNet
    • -
    • DarkNet
    • -
    • CSPDarkNet
    • -
    • VGG
    • -
    • MobileNetv1/v3
    • -
    • GhostNet
    • -
    • Efficientnet
    • -
    -
    -
    • Common
    • -
        -
      • Sync-BN
      • -
      • Group Norm
      • -
      • DCNv2
      • -
      • Non-local
      • -
      -
    -
    • FPN
    • -
        -
      • BiFPN
      • -
      • BFP
      • -
      • HRFPN
      • -
      • ACFPN
      • -
      -
    -
    • Loss
    • -
        -
      • Smooth-L1
      • -
      • GIoU/DIoU/CIoU
      • -
      • IoUAware
      • -
      -
    -
    • Post-processing
    • -
        -
      • SoftNMS
      • -
      • MatrixNMS
      • -
      -
    -
    • Speed
    • -
        -
      • FP16 training
      • -
      • Multi-machine training
      • -
      -
    -
    -
      -
    • Resize
    • -
    • Flipping
    • -
    • Expand
    • -
    • Crop
    • -
    • Color Distort
    • -
    • Random Erasing
    • -
    • Mixup
    • -
    • Cutmix
    • -
    • Grid Mask
    • -
    • Auto Augment
    • -
    -
    - -#### Overview of Model Performance -The relationship between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones. - -
    - -
    - -**NOTE:** - -- `CBResNet stands` for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3% - -- `Cascade-Faster-RCNN` stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8% in PaddleDetection models - -- The enhanced PaddleDetection model `YOLOv3-ResNet50vd-DCN` is 10.6 absolute percentage points higher than paper on COCO mAP, and inference speed is 61.3 fps, nearly 70% faster than the darknet framework. -All these models can be get in [Model Zoo](#ModelZoo) - - -## Tutorials - -### Get Started - -- [Installation guide](docs/tutorials/INSTALL_cn.md) -- [Quick start on small dataset](docs/tutorials/QUICK_STARTED_cn.md) -- [Prepare dataset](docs/tutorials/PrepareDataSet.md) -- [Train/Evaluation/Inference/Deploy](docs/tutorials/DetectionPipeline.md) -- [How to train a custom dataset](docs/tutorials/Custom_DataSet.md) -- [FAQ](docs/FAQ.md) - -### Advanced Tutorials - -- Parameter configuration - - [Introduction to the configuration workflow](docs/advanced_tutorials/config_doc/CONFIG_cn.md) - - [Parameter configuration for RCNN model](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md) - - [Parameter configuration for YOLOv3 model](docs/advanced_tutorials/config_doc/yolov3_mobilenet_v1.md) - -- Tansfer learning - - [How to load pretrained model](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md) - -- Model Compression(Based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)) - - [Model compression benchmark](slim) - - [Quantization](slim/quantization) - - [Model pruning](slim/prune) - - [Model distillation](slim/distillation) - - [Neural Architecture Search](slim/nas) - -- Inference and deployment - - [Export model for inference](docs/advanced_tutorials/deploy/EXPORT_MODEL.md) - - [Python inference](deploy/python) - - [C++ inference](deploy/cpp) - - [Mobile](https://github.com/PaddlePaddle/Paddle-Lite-Demo) - - [Serving](deploy/serving) - - [Inference benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md) - -- Advanced development - - [New data augmentations](docs/advanced_tutorials/READER.md) - - [New detection algorithms](docs/advanced_tutorials/MODEL_TECHNICAL.md) - - -## Model Zoo - -- Universal object detection - - [Model library and baselines](docs/MODEL_ZOO_cn.md) - - [Mobile models](configs/mobile/README.md) - - [Anchor free models](configs/anchor_free/README.md) - - [PP-YOLO](configs/ppyolo/README_cn.md) - - [676 classes of object detection](docs/featured_model/LARGE_SCALE_DET_MODEL.md) - - [Two-stage practical PSS-Det](configs/rcnn_enhance/README.md) -- Universal instance segmentation - - [SOLOv2](configs/solov2/README.md) -- Vertical field - - [Face detection](docs/featured_model/FACE_DETECTION.md) - - [Pedestrian detection](docs/featured_model/CONTRIB_cn.md) - - [Vehicle detection](docs/featured_model/CONTRIB_cn.md) -- Competition Plan - - [Objects365 2019 Challenge champion model](docs/featured_model/champion_model/CACascadeRCNN.md) - - [Best single model of Open Images 2019-Object Detction](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) - -## Applications - -- [Christmas portrait automatic generation tool](application/christmas) - -## Updates - -v2.0-rc was released at `02/2021`, add dygraph version, which supports RCNN, YOLOv3, PP-YOLO, SSD/SSDLite, FCOS, TTFNet, SOLOv2, etc. supports model pruning and quantization, supports deploying and accelerating by TensorRT, etc. Please refer to [change log](docs/CHANGELOG.md) for details. - - -## License - -PaddleDetection is released under the [Apache 2.0 license](LICENSE). - - -## Contributing - -Contributions are highly welcomed and we would really appreciate your feedback!! diff --git a/static/application/christmas/README.md b/static/application/christmas/README.md deleted file mode 100644 index 1500544d9d521955f56fd065af6bc395d1240278..0000000000000000000000000000000000000000 --- a/static/application/christmas/README.md +++ /dev/null @@ -1,65 +0,0 @@ -# 人像圣诞特效自动生成工具 -通过SOLOv2实例分割模型分割人像,并通过BlazeFace关键点模型检测人脸关键点,然后根据两个模型输出结果更换圣诞风格背景并为人脸加上圣诞老人胡子、圣诞眼镜及圣诞帽等特效。本项目通过PaddleHub可直接发布Server服务,供本地调试与前端直接调用接口。您可通过以下二维码中微信小程序直接体验: - -
    - -
    - -## 环境搭建 - -### 环境依赖 - -- paddlepaddle >= 2.0.0rc0 - -- paddlehub >= 2.0.0b1 - -### 模型准备 -- 首先要获取模型,可在[模型配置文件](../../configs)里配置`solov2`与`blazeface_keypoint`,训练模型,并[导出模型](../../docs/advanced_tutorials/deploy/EXPORT_MODEL.md)。也可直接下载我们准备好模型: -[blazeface_keypoint模型](https://paddlemodels.bj.bcebos.com/object_detection/application/blazeface_keypoint.tar)和 -[solov2模型](https://paddlemodels.bj.bcebos.com/object_detection/application/solov2_r101_vd_fpn_3x.tar)。 -**注意:** 下载的模型需要解压后使用。 - -- 然后将两个模型文件夹中的文件(`infer_cfg.yml`、`__model__`和`__params__`)分别拷贝至`blazeface/blazeface_keypoint/` 和 `solov2/solov2_r101_vd_fpn_3x/`文件夹内。 - -### hub安装blazeface和solov2模型 - -```shell -hub install solov2 -hub install blazeface -``` - -### hub安装solov2_blazeface圣诞特效自动生成串联模型 - -```shell -$ hub install solov2_blazeface -``` -## 开始测试 - -### 本地测试 - -```shell -python test_main.py -``` -运行成功后,预测结果会保存到`chrismas_final.png`。 - -### serving测试 - -- step1: 启动服务 - -```shell -export CUDA_VISIBLE_DEVICES=0 -hub serving start -m solov2_blazeface -p 8880 -``` - -- step2: 在服务端发送预测请求 - -```shell -python test_server.py -``` -运行成功后,预测结果会保存到`chrismas_final.png`。 - -## 效果展示 - -
    - -
    diff --git a/static/application/christmas/blazeface/data_feed.py b/static/application/christmas/blazeface/data_feed.py deleted file mode 100644 index c7eb4c5473e80290e6e50ee0350e86e350ab9fe9..0000000000000000000000000000000000000000 --- a/static/application/christmas/blazeface/data_feed.py +++ /dev/null @@ -1,371 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import base64 - -import cv2 -import numpy as np -from PIL import Image, ImageDraw -import paddle.fluid as fluid - - -def create_inputs(im, im_info): - """generate input for different model type - Args: - im (np.ndarray): image (np.ndarray) - im_info (dict): info of image - Returns: - inputs (dict): input of model - """ - inputs = {} - inputs['image'] = im - origin_shape = list(im_info['origin_shape']) - resize_shape = list(im_info['resize_shape']) - pad_shape = list(im_info['pad_shape']) if im_info[ - 'pad_shape'] is not None else list(im_info['resize_shape']) - scale_x, scale_y = im_info['scale'] - scale = scale_x - im_info = np.array([resize_shape + [scale]]).astype('float32') - inputs['im_info'] = im_info - return inputs - - -def visualize_box_mask(im, - results, - labels=None, - mask_resolution=14, - threshold=0.5): - """ - Args: - im (str/np.ndarray): path of image/np.ndarray read by cv2 - results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, - matix element:[class, score, x_min, y_min, x_max, y_max] - MaskRCNN's results include 'masks': np.ndarray: - shape:[N, class_num, mask_resolution, mask_resolution] - labels (list): labels:['class1', ..., 'classn'] - mask_resolution (int): shape of a mask is:[mask_resolution, mask_resolution] - threshold (float): Threshold of score. - Returns: - im (PIL.Image.Image): visualized image - """ - if not labels: - labels = ['background', 'person'] - if isinstance(im, str): - im = Image.open(im).convert('RGB') - else: - im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) - im = Image.fromarray(im) - if 'masks' in results and 'boxes' in results: - im = draw_mask( - im, - results['boxes'], - results['masks'], - labels, - resolution=mask_resolution) - if 'boxes' in results: - im = draw_box(im, results['boxes'], labels) - if 'segm' in results: - im = draw_segm( - im, - results['segm'], - results['label'], - results['score'], - labels, - threshold=threshold) - if 'landmark' in results: - im = draw_lmk(im, results['landmark']) - return im - - -def get_color_map_list(num_classes): - """ - Args: - num_classes (int): number of class - Returns: - color_map (list): RGB color list - """ - color_map = num_classes * [0, 0, 0] - for i in range(0, num_classes): - j = 0 - lab = i - while lab: - color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j)) - color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)) - color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)) - j += 1 - lab >>= 3 - color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)] - return color_map - - -def expand_boxes(boxes, scale=0.0): - """ - Args: - boxes (np.ndarray): shape:[N,4], N:number of box, - matix element:[x_min, y_min, x_max, y_max] - scale (float): scale of boxes - Returns: - boxes_exp (np.ndarray): expanded boxes - """ - w_half = (boxes[:, 2] - boxes[:, 0]) * .5 - h_half = (boxes[:, 3] - boxes[:, 1]) * .5 - x_c = (boxes[:, 2] + boxes[:, 0]) * .5 - y_c = (boxes[:, 3] + boxes[:, 1]) * .5 - w_half *= scale - h_half *= scale - boxes_exp = np.zeros(boxes.shape) - boxes_exp[:, 0] = x_c - w_half - boxes_exp[:, 2] = x_c + w_half - boxes_exp[:, 1] = y_c - h_half - boxes_exp[:, 3] = y_c + h_half - return boxes_exp - - -def draw_mask(im, np_boxes, np_masks, labels, resolution=14, threshold=0.5): - """ - Args: - im (PIL.Image.Image): PIL image - np_boxes (np.ndarray): shape:[N,6], N: number of box, - matix element:[class, score, x_min, y_min, x_max, y_max] - np_masks (np.ndarray): shape:[N, class_num, resolution, resolution] - labels (list): labels:['class1', ..., 'classn'] - resolution (int): shape of a mask is:[resolution, resolution] - threshold (float): threshold of mask - Returns: - im (PIL.Image.Image): visualized image - """ - color_list = get_color_map_list(len(labels)) - scale = (resolution + 2.0) / resolution - im_w, im_h = im.size - w_ratio = 0.4 - alpha = 0.7 - im = np.array(im).astype('float32') - rects = np_boxes[:, 2:] - expand_rects = expand_boxes(rects, scale) - expand_rects = expand_rects.astype(np.int32) - clsid_scores = np_boxes[:, 0:2] - padded_mask = np.zeros((resolution + 2, resolution + 2), dtype=np.float32) - clsid2color = {} - for idx in range(len(np_boxes)): - clsid, score = clsid_scores[idx].tolist() - clsid = int(clsid) - xmin, ymin, xmax, ymax = expand_rects[idx].tolist() - w = xmax - xmin + 1 - h = ymax - ymin + 1 - w = np.maximum(w, 1) - h = np.maximum(h, 1) - padded_mask[1:-1, 1:-1] = np_masks[idx, int(clsid), :, :] - resized_mask = cv2.resize(padded_mask, (w, h)) - resized_mask = np.array(resized_mask > threshold, dtype=np.uint8) - x0 = min(max(xmin, 0), im_w) - x1 = min(max(xmax + 1, 0), im_w) - y0 = min(max(ymin, 0), im_h) - y1 = min(max(ymax + 1, 0), im_h) - im_mask = np.zeros((im_h, im_w), dtype=np.uint8) - im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), ( - x0 - xmin):(x1 - xmin)] - if clsid not in clsid2color: - clsid2color[clsid] = color_list[clsid] - color_mask = clsid2color[clsid] - for c in range(3): - color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255 - idx = np.nonzero(im_mask) - color_mask = np.array(color_mask) - im[idx[0], idx[1], :] *= 1.0 - alpha - im[idx[0], idx[1], :] += alpha * color_mask - return Image.fromarray(im.astype('uint8')) - - -def draw_box(im, np_boxes, labels): - """ - Args: - im (PIL.Image.Image): PIL image - np_boxes (np.ndarray): shape:[N,6], N: number of box, - matix element:[class, score, x_min, y_min, x_max, y_max] - labels (list): labels:['class1', ..., 'classn'] - Returns: - im (PIL.Image.Image): visualized image - """ - draw_thickness = min(im.size) // 320 - draw = ImageDraw.Draw(im) - clsid2color = {} - color_list = get_color_map_list(len(labels)) - - for dt in np_boxes: - clsid, bbox, score = int(dt[0]), dt[2:], dt[1] - xmin, ymin, xmax, ymax = bbox - w = xmax - xmin - h = ymax - ymin - if clsid not in clsid2color: - clsid2color[clsid] = color_list[clsid] - color = tuple(clsid2color[clsid]) - - # draw bbox - draw.line( - [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), - (xmin, ymin)], - width=draw_thickness, - fill=color) - - # draw label - text = "{} {:.4f}".format(labels[clsid], score) - tw, th = draw.textsize(text) - draw.rectangle( - [(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color) - draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255)) - return im - - -def draw_segm(im, - np_segms, - np_label, - np_score, - labels, - threshold=0.5, - alpha=0.7): - """ - Draw segmentation on image - """ - mask_color_id = 0 - w_ratio = .4 - color_list = get_color_map_list(len(labels)) - im = np.array(im).astype('float32') - clsid2color = {} - np_segms = np_segms.astype(np.uint8) - index = np.where(np_label == 0)[0] - index = np.where(np_score[index] > threshold)[0] - person_segms = np_segms[index] - person_mask = np.sum(person_segms, axis=0) - person_mask[person_mask > 1] = 1 - person_mask = np.expand_dims(person_mask, axis=2) - person_mask = np.repeat(person_mask, 3, axis=2) - im = im * person_mask - - return Image.fromarray(im.astype('uint8')) - - -def load_predictor(model_dir, - run_mode='fluid', - batch_size=1, - use_gpu=False, - min_subgraph_size=3): - """set AnalysisConfig, generate AnalysisPredictor - Args: - model_dir (str): root path of __model__ and __params__ - use_gpu (bool): whether use gpu - Returns: - predictor (PaddlePredictor): AnalysisPredictor - Raises: - ValueError: predict by TensorRT need use_gpu == True. - """ - if not use_gpu and not run_mode == 'fluid': - raise ValueError( - "Predict by TensorRT mode: {}, expect use_gpu==True, but use_gpu == {}" - .format(run_mode, use_gpu)) - if run_mode == 'trt_int8': - raise ValueError("TensorRT int8 mode is not supported now, " - "please use trt_fp32 or trt_fp16 instead.") - precision_map = { - 'trt_int8': fluid.core.AnalysisConfig.Precision.Int8, - 'trt_fp32': fluid.core.AnalysisConfig.Precision.Float32, - 'trt_fp16': fluid.core.AnalysisConfig.Precision.Half - } - config = fluid.core.AnalysisConfig( - os.path.join(model_dir, '__model__'), - os.path.join(model_dir, '__params__')) - if use_gpu: - # initial GPU memory(M), device ID - config.enable_use_gpu(100, 0) - # optimize graph and fuse op - config.switch_ir_optim(True) - else: - config.disable_gpu() - - if run_mode in precision_map.keys(): - config.enable_tensorrt_engine( - workspace_size=1 << 10, - max_batch_size=batch_size, - min_subgraph_size=min_subgraph_size, - precision_mode=precision_map[run_mode], - use_static=False, - use_calib_mode=False) - - # disable print log when predict - config.disable_glog_info() - # enable shared memory - config.enable_memory_optim() - # disable feed, fetch OP, needed by zero_copy_run - config.switch_use_feed_fetch_ops(False) - predictor = fluid.core.create_paddle_predictor(config) - return predictor - - -def cv2_to_base64(image): - data = cv2.imencode('.jpg', image)[1] - return base64.b64encode(data.tostring()).decode('utf8') - - -def base64_to_cv2(b64str): - data = base64.b64decode(b64str.encode('utf8')) - data = np.fromstring(data, np.uint8) - data = cv2.imdecode(data, cv2.IMREAD_COLOR) - return data - - -def lmk2out(bboxes, np_lmk, im_info, threshold=0.5, is_bbox_normalized=True): - image_w, image_h = im_info['origin_shape'] - scale = im_info['scale'] - face_index, landmark, prior_box = np_lmk[:] - xywh_res = [] - if bboxes.shape == (1, 1) or bboxes is None: - return np.array([]) - prior = np.reshape(prior_box, (-1, 4)) - predict_lmk = np.reshape(landmark, (-1, 10)) - k = 0 - for i in range(bboxes.shape[0]): - score = bboxes[i][1] - if score < threshold: - continue - theindex = face_index[i][0] - me_prior = prior[theindex, :] - lmk_pred = predict_lmk[theindex, :] - prior_h = me_prior[2] - me_prior[0] - prior_w = me_prior[3] - me_prior[1] - prior_h_center = (me_prior[2] + me_prior[0]) / 2 - prior_w_center = (me_prior[3] + me_prior[1]) / 2 - lmk_decode = np.zeros((10)) - for j in [0, 2, 4, 6, 8]: - lmk_decode[j] = lmk_pred[j] * 0.1 * prior_w + prior_h_center - for j in [1, 3, 5, 7, 9]: - lmk_decode[j] = lmk_pred[j] * 0.1 * prior_h + prior_w_center - - if is_bbox_normalized: - lmk_decode = lmk_decode * np.array([ - image_h, image_w, image_h, image_w, image_h, image_w, image_h, - image_w, image_h, image_w - ]) - xywh_res.append(lmk_decode) - return np.asarray(xywh_res) - - -def draw_lmk(image, lmk_results): - draw = ImageDraw.Draw(image) - for lmk_decode in lmk_results: - for j in range(5): - x1 = int(round(lmk_decode[2 * j])) - y1 = int(round(lmk_decode[2 * j + 1])) - draw.ellipse( - (x1 - 2, y1 - 2, x1 + 3, y1 + 3), fill='green', outline='green') - return image diff --git a/static/application/christmas/blazeface/module.py b/static/application/christmas/blazeface/module.py deleted file mode 100644 index 40656ff83fab02e54dd2c620575d7384fdbdba5c..0000000000000000000000000000000000000000 --- a/static/application/christmas/blazeface/module.py +++ /dev/null @@ -1,204 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import time -from functools import reduce -import cv2 -import numpy as np -from paddlehub.module.module import moduleinfo - -import blazeface.data_feed as D - - -@moduleinfo( - name="blazeface", - type="CV/image_editing", - author="paddlepaddle", - author_email="", - summary="blazeface is a face key point detection model.", - version="1.0.0") -class Detector(object): - """ - Args: - config (object): config of model, defined by `Config(model_dir)` - model_dir (str): root path of __model__, __params__ and infer_cfg.yml - use_gpu (bool): whether use gpu - run_mode (str): mode of running(fluid/trt_fp32/trt_fp16) - threshold (float): threshold to reserve the result for output. - """ - - def __init__(self, - min_subgraph_size=60, - use_gpu=False, - run_mode='fluid', - threshold=0.5): - - model_dir = os.path.join(self.directory, 'blazeface_keypoint') - self.predictor = D.load_predictor( - model_dir, - run_mode=run_mode, - min_subgraph_size=min_subgraph_size, - use_gpu=use_gpu) - - def face_img_process(self, - image, - mean=[104., 117., 123.], - std=[127.502231, 127.502231, 127.502231]): - image = np.array(image) - # HWC to CHW - if len(image.shape) == 3: - image = np.swapaxes(image, 1, 2) - image = np.swapaxes(image, 1, 0) - # RBG to BGR - image = image[[2, 1, 0], :, :] - image = image.astype('float32') - image -= np.array(mean)[:, np.newaxis, np.newaxis].astype('float32') - image /= np.array(std)[:, np.newaxis, np.newaxis].astype('float32') - image = [image] - image = np.array(image) - - return image - - def transform(self, image, shrink): - im_info = { - 'scale': [1., 1.], - 'origin_shape': None, - 'resize_shape': None, - 'pad_shape': None, - } - if isinstance(image, str): - with open(image, 'rb') as f: - im_read = f.read() - image = np.frombuffer(im_read, dtype='uint8') - image = cv2.imdecode(image, 1) # BGR mode, but need RGB mode - image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) - im_info['origin_shape'] = image.shape[:2] - else: - im_info['origin_shape'] = image.shape[:2] - - image_shape = [3, image.shape[0], image.shape[1]] - h, w = shrink, shrink - image = cv2.resize(image, (w, h)) - im_info['resize_shape'] = image.shape[:2] - - image = self.face_img_process(image) - - inputs = D.create_inputs(image, im_info) - return inputs, im_info - - def postprocess(self, boxes_list, lmks_list, im_info, threshold=0.5): - assert len(boxes_list) == len(lmks_list) - best_np_boxes, best_np_lmk = boxes_list[0], lmks_list[0] - for i in range(1, len(boxes_list)): - #judgment detection score - if boxes_list[i][0][1] > 0.9: - break - face_width = boxes_list[i][0][4] - boxes_list[i][0][2] - if boxes_list[i][0][1] - best_np_boxes[0][ - 1] > 0.01 and face_width > 0.2: - best_np_boxes, best_np_lmk = boxes_list[i], lmks_list[i] - # postprocess output of predictor - results = {} - results['landmark'] = D.lmk2out(best_np_boxes, best_np_lmk, im_info, - threshold) - - w, h = im_info['origin_shape'] - best_np_boxes[:, 2] *= h - best_np_boxes[:, 3] *= w - best_np_boxes[:, 4] *= h - best_np_boxes[:, 5] *= w - expect_boxes = (best_np_boxes[:, 1] > threshold) & ( - best_np_boxes[:, 0] > -1) - best_np_boxes = best_np_boxes[expect_boxes, :] - for box in best_np_boxes: - print('class_id:{:d}, confidence:{:.4f},' - 'left_top:[{:.2f},{:.2f}],' - ' right_bottom:[{:.2f},{:.2f}]'.format( - int(box[0]), box[1], box[2], box[3], box[4], box[5])) - results['boxes'] = best_np_boxes - return results - - def predict(self, - image, - threshold=0.5, - repeats=1, - visualization=False, - with_lmk=True, - save_dir='blaze_result'): - ''' - Args: - image (str/np.ndarray): path of image/ np.ndarray read by cv2 - threshold (float): threshold of predicted box' score - Returns: - results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, - matix element:[class, score, x_min, y_min, x_max, y_max] - ''' - shrink = [960, 640, 480, 320, 180] - boxes_list = [] - lmks_list = [] - for sh in shrink: - inputs, im_info = self.transform(image, shrink=sh) - np_boxes, np_lmk = None, None - - input_names = self.predictor.get_input_names() - for i in range(len(input_names)): - input_tensor = self.predictor.get_input_tensor(input_names[i]) - input_tensor.copy_from_cpu(inputs[input_names[i]]) - - t1 = time.time() - for i in range(repeats): - self.predictor.zero_copy_run() - output_names = self.predictor.get_output_names() - boxes_tensor = self.predictor.get_output_tensor(output_names[0]) - np_boxes = boxes_tensor.copy_to_cpu() - if with_lmk == True: - face_index = self.predictor.get_output_tensor(output_names[ - 1]) - landmark = self.predictor.get_output_tensor(output_names[2]) - prior_boxes = self.predictor.get_output_tensor(output_names[ - 3]) - np_face_index = face_index.copy_to_cpu() - np_prior_boxes = prior_boxes.copy_to_cpu() - np_landmark = landmark.copy_to_cpu() - np_lmk = [np_face_index, np_landmark, np_prior_boxes] - - t2 = time.time() - ms = (t2 - t1) * 1000.0 / repeats - print("Inference: {} ms per batch image".format(ms)) - - # do not perform postprocess in benchmark mode - results = [] - if reduce(lambda x, y: x * y, np_boxes.shape) < 6: - print('[WARNNING] No object detected.') - results = {'boxes': np.array([])} - else: - boxes_list.append(np_boxes) - lmks_list.append(np_lmk) - - results = self.postprocess( - boxes_list, lmks_list, im_info, threshold=threshold) - - if visualization: - if not os.path.exists(save_dir): - os.makedirs(save_dir) - output = D.visualize_box_mask( - im=image, results=results, labels=["background", "face"]) - name = str(time.time()) + '.png' - save_path = os.path.join(save_dir, name) - output.save(save_path) - img = cv2.cvtColor(np.array(output), cv2.COLOR_RGB2BGR) - results['image'] = img - - return results diff --git a/static/application/christmas/demo_images/result.png b/static/application/christmas/demo_images/result.png deleted file mode 100644 index 5b857e4cc1377d3ef3bc0400a9a6bff2f286abb9..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/demo_images/result.png and /dev/null differ diff --git a/static/application/christmas/demo_images/test.jpg b/static/application/christmas/demo_images/test.jpg deleted file mode 100644 index 10030612246c085d6250f34bf1c78207de8e049e..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/demo_images/test.jpg and /dev/null differ diff --git a/static/application/christmas/demo_images/wechat_app.jpeg b/static/application/christmas/demo_images/wechat_app.jpeg deleted file mode 100644 index 3edcb7c6d91e22107538a049a53ab64fa5eb1762..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/demo_images/wechat_app.jpeg and /dev/null differ diff --git a/static/application/christmas/element_source/background/1.json b/static/application/christmas/element_source/background/1.json deleted file mode 100644 index 5e9b4d9a36faf0161cfc80b00220f506367da94e..0000000000000000000000000000000000000000 --- a/static/application/christmas/element_source/background/1.json +++ /dev/null @@ -1 +0,0 @@ -{"path":"/Users/yuzhiliang/Downloads/docsmall-2/12.png","outputs":{"object":[{"name":"local","bndbox":{"xmin":282,"ymin":366,"xmax":3451,"ymax":4603}}]},"time_labeled":1608631688933,"labeled":true,"size":{"width":3714,"height":5725,"depth":3}} \ No newline at end of file diff --git a/static/application/christmas/element_source/background/1.png b/static/application/christmas/element_source/background/1.png deleted file mode 100755 index e4bf623b645a1704be2c948a8ba9a1dbfe8662ca..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/background/1.png and /dev/null differ diff --git a/static/application/christmas/element_source/background/2.json b/static/application/christmas/element_source/background/2.json deleted file mode 100644 index 4bb6ebbc16a7ab89a63634e3fff09cc18e0a3b44..0000000000000000000000000000000000000000 --- a/static/application/christmas/element_source/background/2.json +++ /dev/null @@ -1 +0,0 @@ -{"path":"/Users/yuzhiliang/Downloads/docsmall-2/2.png","outputs":{"object":[{"name":"local","bndbox":{"xmin":336,"ymin":512,"xmax":3416,"ymax":4672}}]},"time_labeled":1608631696021,"labeled":true,"size":{"width":3714,"height":5275,"depth":3}} \ No newline at end of file diff --git a/static/application/christmas/element_source/background/2.png b/static/application/christmas/element_source/background/2.png deleted file mode 100755 index f3c4299c97bb9923fa4995336616901f4e0dfc2c..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/background/2.png and /dev/null differ diff --git a/static/application/christmas/element_source/background/3.json b/static/application/christmas/element_source/background/3.json deleted file mode 100644 index df28a39f151c6e664bf9493b074a086bfbb4de48..0000000000000000000000000000000000000000 --- a/static/application/christmas/element_source/background/3.json +++ /dev/null @@ -1 +0,0 @@ -{"path":"/Users/yuzhiliang/Downloads/docsmall-2/3.png","outputs":{"object":[{"name":"local","bndbox":{"xmin":376,"ymin":352,"xmax":3448,"ymax":4544}}]},"time_labeled":1608631701740,"labeled":true,"size":{"width":3714,"height":5275,"depth":3}} \ No newline at end of file diff --git a/static/application/christmas/element_source/background/3.png b/static/application/christmas/element_source/background/3.png deleted file mode 100755 index e8ccd1f29a1ffa539b173d990b6a2b726fc9bf01..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/background/3.png and /dev/null differ diff --git a/static/application/christmas/element_source/beard/1.png b/static/application/christmas/element_source/beard/1.png deleted file mode 100644 index c3645f897c2e59266b7dea869bf8eadaa1c7a924..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/beard/1.png and /dev/null differ diff --git a/static/application/christmas/element_source/beard/2.png b/static/application/christmas/element_source/beard/2.png deleted file mode 100644 index 24000ad0b4eeceaa851d0d7f2a5a43c1ad5f60ca..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/beard/2.png and /dev/null differ diff --git a/static/application/christmas/element_source/glasses/1.png b/static/application/christmas/element_source/glasses/1.png deleted file mode 100644 index 385a475d4967796f6220cc9b987a070f87dbd1d4..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/glasses/1.png and /dev/null differ diff --git a/static/application/christmas/element_source/glasses/2.png b/static/application/christmas/element_source/glasses/2.png deleted file mode 100644 index b179531f41c2916e9aab340f8882e18a40562b62..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/glasses/2.png and /dev/null differ diff --git a/static/application/christmas/element_source/hat/1.png b/static/application/christmas/element_source/hat/1.png deleted file mode 100644 index 97cb6f314cd1eaed21458f6b0d21c49a4114c802..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/hat/1.png and /dev/null differ diff --git a/static/application/christmas/element_source/hat/2.png b/static/application/christmas/element_source/hat/2.png deleted file mode 100644 index 045cfee71c2b601a18aedc93e0f463c7e65cb8c9..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/hat/2.png and /dev/null differ diff --git a/static/application/christmas/element_source/hat/3.png b/static/application/christmas/element_source/hat/3.png deleted file mode 100644 index a86aaf9adc532817e3b95e0c6d73b1cfbbeeb61b..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/hat/3.png and /dev/null differ diff --git a/static/application/christmas/element_source/hat/4.png b/static/application/christmas/element_source/hat/4.png deleted file mode 100644 index aac53f9fdf29335730863b910d3baa063f548afe..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/hat/4.png and /dev/null differ diff --git a/static/application/christmas/element_source/hat/5.png b/static/application/christmas/element_source/hat/5.png deleted file mode 100644 index 3292a3ed5415aa46040d5bfcb0402c5cd0c49ffe..0000000000000000000000000000000000000000 Binary files a/static/application/christmas/element_source/hat/5.png and /dev/null differ diff --git a/static/application/christmas/solov2/data_feed.py b/static/application/christmas/solov2/data_feed.py deleted file mode 100644 index 5fb37a179551edadfb3f63232b6d4d0384d00496..0000000000000000000000000000000000000000 --- a/static/application/christmas/solov2/data_feed.py +++ /dev/null @@ -1,337 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import base64 - -import cv2 -import numpy as np -from PIL import Image, ImageDraw -import paddle.fluid as fluid - - -def create_inputs(im, im_info): - """generate input for different model type - Args: - im (np.ndarray): image (np.ndarray) - im_info (dict): info of image - Returns: - inputs (dict): input of model - """ - inputs = {} - inputs['image'] = im - origin_shape = list(im_info['origin_shape']) - resize_shape = list(im_info['resize_shape']) - pad_shape = list(im_info['pad_shape']) if im_info[ - 'pad_shape'] is not None else list(im_info['resize_shape']) - scale_x, scale_y = im_info['scale'] - scale = scale_x - im_info = np.array([resize_shape + [scale]]).astype('float32') - inputs['im_info'] = im_info - return inputs - - -def visualize_box_mask(im, - results, - labels=None, - mask_resolution=14, - threshold=0.5): - """ - Args: - im (str/np.ndarray): path of image/np.ndarray read by cv2 - results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, - matix element:[class, score, x_min, y_min, x_max, y_max] - MaskRCNN's results include 'masks': np.ndarray: - shape:[N, class_num, mask_resolution, mask_resolution] - labels (list): labels:['class1', ..., 'classn'] - mask_resolution (int): shape of a mask is:[mask_resolution, mask_resolution] - threshold (float): Threshold of score. - Returns: - im (PIL.Image.Image): visualized image - """ - if not labels: - labels = [ - 'background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', - 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire', 'hydrant', - 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', - 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', - 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', - 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', - 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', - 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', - 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', - 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', - 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', - 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', - 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', - 'scissors', 'teddy bear', 'hair drier', 'toothbrush' - ] - if isinstance(im, str): - im = Image.open(im).convert('RGB') - else: - im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) - im = Image.fromarray(im) - if 'masks' in results and 'boxes' in results: - im = draw_mask( - im, - results['boxes'], - results['masks'], - labels, - resolution=mask_resolution) - if 'boxes' in results: - im = draw_box(im, results['boxes'], labels) - if 'segm' in results: - im = draw_segm( - im, - results['segm'], - results['label'], - results['score'], - labels, - threshold=threshold) - return im - - -def get_color_map_list(num_classes): - """ - Args: - num_classes (int): number of class - Returns: - color_map (list): RGB color list - """ - color_map = num_classes * [0, 0, 0] - for i in range(0, num_classes): - j = 0 - lab = i - while lab: - color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j)) - color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)) - color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)) - j += 1 - lab >>= 3 - color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)] - return color_map - - -def expand_boxes(boxes, scale=0.0): - """ - Args: - boxes (np.ndarray): shape:[N,4], N:number of box, - matix element:[x_min, y_min, x_max, y_max] - scale (float): scale of boxes - Returns: - boxes_exp (np.ndarray): expanded boxes - """ - w_half = (boxes[:, 2] - boxes[:, 0]) * .5 - h_half = (boxes[:, 3] - boxes[:, 1]) * .5 - x_c = (boxes[:, 2] + boxes[:, 0]) * .5 - y_c = (boxes[:, 3] + boxes[:, 1]) * .5 - w_half *= scale - h_half *= scale - boxes_exp = np.zeros(boxes.shape) - boxes_exp[:, 0] = x_c - w_half - boxes_exp[:, 2] = x_c + w_half - boxes_exp[:, 1] = y_c - h_half - boxes_exp[:, 3] = y_c + h_half - return boxes_exp - - -def draw_mask(im, np_boxes, np_masks, labels, resolution=14, threshold=0.5): - """ - Args: - im (PIL.Image.Image): PIL image - np_boxes (np.ndarray): shape:[N,6], N: number of box, - matix element:[class, score, x_min, y_min, x_max, y_max] - np_masks (np.ndarray): shape:[N, class_num, resolution, resolution] - labels (list): labels:['class1', ..., 'classn'] - resolution (int): shape of a mask is:[resolution, resolution] - threshold (float): threshold of mask - Returns: - im (PIL.Image.Image): visualized image - """ - color_list = get_color_map_list(len(labels)) - scale = (resolution + 2.0) / resolution - im_w, im_h = im.size - w_ratio = 0.4 - alpha = 0.7 - im = np.array(im).astype('float32') - rects = np_boxes[:, 2:] - expand_rects = expand_boxes(rects, scale) - expand_rects = expand_rects.astype(np.int32) - clsid_scores = np_boxes[:, 0:2] - padded_mask = np.zeros((resolution + 2, resolution + 2), dtype=np.float32) - clsid2color = {} - for idx in range(len(np_boxes)): - clsid, score = clsid_scores[idx].tolist() - clsid = int(clsid) - xmin, ymin, xmax, ymax = expand_rects[idx].tolist() - w = xmax - xmin + 1 - h = ymax - ymin + 1 - w = np.maximum(w, 1) - h = np.maximum(h, 1) - padded_mask[1:-1, 1:-1] = np_masks[idx, int(clsid), :, :] - resized_mask = cv2.resize(padded_mask, (w, h)) - resized_mask = np.array(resized_mask > threshold, dtype=np.uint8) - x0 = min(max(xmin, 0), im_w) - x1 = min(max(xmax + 1, 0), im_w) - y0 = min(max(ymin, 0), im_h) - y1 = min(max(ymax + 1, 0), im_h) - im_mask = np.zeros((im_h, im_w), dtype=np.uint8) - im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), ( - x0 - xmin):(x1 - xmin)] - if clsid not in clsid2color: - clsid2color[clsid] = color_list[clsid] - color_mask = clsid2color[clsid] - for c in range(3): - color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255 - idx = np.nonzero(im_mask) - color_mask = np.array(color_mask) - im[idx[0], idx[1], :] *= 1.0 - alpha - im[idx[0], idx[1], :] += alpha * color_mask - return Image.fromarray(im.astype('uint8')) - - -def draw_box(im, np_boxes, labels): - """ - Args: - im (PIL.Image.Image): PIL image - np_boxes (np.ndarray): shape:[N,6], N: number of box, - matix element:[class, score, x_min, y_min, x_max, y_max] - labels (list): labels:['class1', ..., 'classn'] - Returns: - im (PIL.Image.Image): visualized image - """ - draw_thickness = min(im.size) // 320 - draw = ImageDraw.Draw(im) - clsid2color = {} - color_list = get_color_map_list(len(labels)) - - for dt in np_boxes: - clsid, bbox, score = int(dt[0]), dt[2:], dt[1] - xmin, ymin, xmax, ymax = bbox - w = xmax - xmin - h = ymax - ymin - if clsid not in clsid2color: - clsid2color[clsid] = color_list[clsid] - color = tuple(clsid2color[clsid]) - - # draw bbox - draw.line( - [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), - (xmin, ymin)], - width=draw_thickness, - fill=color) - - # draw label - text = "{} {:.4f}".format(labels[clsid], score) - tw, th = draw.textsize(text) - draw.rectangle( - [(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color) - draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255)) - return im - - -def draw_segm(im, - np_segms, - np_label, - np_score, - labels, - threshold=0.5, - alpha=0.7): - """ - Draw segmentation on image - """ - mask_color_id = 0 - w_ratio = .4 - color_list = get_color_map_list(len(labels)) - im = np.array(im).astype('float32') - clsid2color = {} - np_segms = np_segms.astype(np.uint8) - index = np.where(np_label == 0)[0] - index = np.where(np_score[index] > threshold)[0] - person_segms = np_segms[index] - person_mask = np.sum(person_segms, axis=0) - person_mask[person_mask > 1] = 1 - person_mask = np.expand_dims(person_mask, axis=2) - person_mask = np.repeat(person_mask, 3, axis=2) - im = im * person_mask - - return Image.fromarray(im.astype('uint8')) - - -def load_predictor(model_dir, - run_mode='fluid', - batch_size=1, - use_gpu=False, - min_subgraph_size=3): - """set AnalysisConfig, generate AnalysisPredictor - Args: - model_dir (str): root path of __model__ and __params__ - use_gpu (bool): whether use gpu - Returns: - predictor (PaddlePredictor): AnalysisPredictor - Raises: - ValueError: predict by TensorRT need use_gpu == True. - """ - if not use_gpu and not run_mode == 'fluid': - raise ValueError( - "Predict by TensorRT mode: {}, expect use_gpu==True, but use_gpu == {}" - .format(run_mode, use_gpu)) - if run_mode == 'trt_int8': - raise ValueError("TensorRT int8 mode is not supported now, " - "please use trt_fp32 or trt_fp16 instead.") - precision_map = { - 'trt_int8': fluid.core.AnalysisConfig.Precision.Int8, - 'trt_fp32': fluid.core.AnalysisConfig.Precision.Float32, - 'trt_fp16': fluid.core.AnalysisConfig.Precision.Half - } - config = fluid.core.AnalysisConfig( - os.path.join(model_dir, '__model__'), - os.path.join(model_dir, '__params__')) - if use_gpu: - # initial GPU memory(M), device ID - config.enable_use_gpu(100, 0) - # optimize graph and fuse op - config.switch_ir_optim(True) - else: - config.disable_gpu() - - if run_mode in precision_map.keys(): - config.enable_tensorrt_engine( - workspace_size=1 << 10, - max_batch_size=batch_size, - min_subgraph_size=min_subgraph_size, - precision_mode=precision_map[run_mode], - use_static=False, - use_calib_mode=False) - - # disable print log when predict - config.disable_glog_info() - # enable shared memory - config.enable_memory_optim() - # disable feed, fetch OP, needed by zero_copy_run - config.switch_use_feed_fetch_ops(False) - predictor = fluid.core.create_paddle_predictor(config) - return predictor - - -def cv2_to_base64(image): - data = cv2.imencode('.jpg', image)[1] - return base64.b64encode(data.tostring()).decode('utf8') - - -def base64_to_cv2(b64str): - data = base64.b64decode(b64str.encode('utf8')) - data = np.fromstring(data, np.uint8) - data = cv2.imdecode(data, cv2.IMREAD_COLOR) - return data diff --git a/static/application/christmas/solov2/module.py b/static/application/christmas/solov2/module.py deleted file mode 100644 index 333ec6384aa0459b10266f8b7bf2447e3c35ddea..0000000000000000000000000000000000000000 --- a/static/application/christmas/solov2/module.py +++ /dev/null @@ -1,173 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import time -from functools import reduce -import cv2 -import numpy as np -from paddlehub.module.module import moduleinfo -import solov2.processor as P -import solov2.data_feed as D - - -class Detector(object): - """ - Args: - model_dir (str): root path of __model__, __params__ and infer_cfg.yml - use_gpu (bool): whether use gpu - run_mode (str): mode of running(fluid/trt_fp32/trt_fp16) - threshold (float): threshold to reserve the result for output. - """ - - def __init__(self, - min_subgraph_size=60, - use_gpu=False, - run_mode='fluid', - threshold=0.5): - - model_dir = os.path.join(self.directory, 'solov2_r101_vd_fpn_3x') - self.predictor = D.load_predictor( - model_dir, - run_mode=run_mode, - min_subgraph_size=min_subgraph_size, - use_gpu=use_gpu) - self.compose = [ - P.Resize(max_size=1333), P.Normalize( - mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), - P.Permute(), P.PadStride(stride=32) - ] - - def transform(self, im): - im, im_info = P.preprocess(im, self.compose) - inputs = D.create_inputs(im, im_info) - return inputs, im_info - - def postprocess(self, np_boxes, np_masks, im_info, threshold=0.5): - # postprocess output of predictor - results = {} - expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1) - np_boxes = np_boxes[expect_boxes, :] - for box in np_boxes: - print('class_id:{:d}, confidence:{:.4f},' - 'left_top:[{:.2f},{:.2f}],' - ' right_bottom:[{:.2f},{:.2f}]'.format( - int(box[0]), box[1], box[2], box[3], box[4], box[5])) - results['boxes'] = np_boxes - if np_masks is not None: - np_masks = np_masks[expect_boxes, :, :, :] - results['masks'] = np_masks - return results - - def predict(self, image, threshold=0.5, warmup=0, repeats=1): - ''' - Args: - image (str/np.ndarray): path of image/ np.ndarray read by cv2 - threshold (float): threshold of predicted box' score - Returns: - results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box, - matix element:[class, score, x_min, y_min, x_max, y_max] - MaskRCNN's results include 'masks': np.ndarray: - shape:[N, class_num, mask_resolution, mask_resolution] - ''' - inputs, im_info = self.transform(image) - np_boxes, np_masks = None, None - - input_names = self.predictor.get_input_names() - for i in range(len(input_names)): - input_tensor = self.predictor.get_input_tensor(input_names[i]) - input_tensor.copy_from_cpu(inputs[input_names[i]]) - - for i in range(warmup): - self.predictor.zero_copy_run() - output_names = self.predictor.get_output_names() - boxes_tensor = self.predictor.get_output_tensor(output_names[0]) - np_boxes = boxes_tensor.copy_to_cpu() - - for i in range(repeats): - self.predictor.zero_copy_run() - output_names = self.predictor.get_output_names() - boxes_tensor = self.predictor.get_output_tensor(output_names[0]) - np_boxes = boxes_tensor.copy_to_cpu() - - # do not perform postprocess in benchmark mode - results = [] - - if reduce(lambda x, y: x * y, np_boxes.shape) < 6: - print('[WARNNING] No object detected.') - results = {'boxes': np.array([])} - else: - results = self.postprocess( - np_boxes, np_masks, im_info, threshold=threshold) - - return results - - -@moduleinfo( - name="solov2", - type="CV/image_editing", - author="paddlepaddle", - author_email="", - summary="solov2 is a detection model, this module is trained with COCO dataset.", - version="1.0.0") -class DetectorSOLOv2(Detector): - def __init__(self, use_gpu=False, run_mode='fluid', threshold=0.5): - super(DetectorSOLOv2, self).__init__( - use_gpu=use_gpu, run_mode=run_mode, threshold=threshold) - - def predict(self, - image, - threshold=0.5, - warmup=0, - repeats=1, - visualization=False, - save_dir='solov2_result'): - inputs, im_info = self.transform(image) - np_label, np_score, np_segms = None, None, None - - input_names = self.predictor.get_input_names() - for i in range(len(input_names)): - input_tensor = self.predictor.get_input_tensor(input_names[i]) - input_tensor.copy_from_cpu(inputs[input_names[i]]) - for i in range(warmup): - self.predictor.zero_copy_run() - output_names = self.predictor.get_output_names() - np_label = self.predictor.get_output_tensor(output_names[ - 0]).copy_to_cpu() - np_score = self.predictor.get_output_tensor(output_names[ - 1]).copy_to_cpu() - np_segms = self.predictor.get_output_tensor(output_names[ - 2]).copy_to_cpu() - - for i in range(repeats): - self.predictor.zero_copy_run() - output_names = self.predictor.get_output_names() - np_label = self.predictor.get_output_tensor(output_names[ - 0]).copy_to_cpu() - np_score = self.predictor.get_output_tensor(output_names[ - 1]).copy_to_cpu() - np_segms = self.predictor.get_output_tensor(output_names[ - 2]).copy_to_cpu() - output = dict(segm=np_segms, label=np_label, score=np_score) - - if visualization: - if not os.path.exists(save_dir): - os.makedirs(save_dir) - image = D.visualize_box_mask(im=image, results=output) - name = str(time.time()) + '.png' - save_path = os.path.join(save_dir, name) - image.save(save_path) - img = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR) - output['image'] = img - return output diff --git a/static/application/christmas/solov2/processor.py b/static/application/christmas/solov2/processor.py deleted file mode 100644 index b2f02c09ac5f027f631b6696ed60876d650b829b..0000000000000000000000000000000000000000 --- a/static/application/christmas/solov2/processor.py +++ /dev/null @@ -1,248 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from PIL import Image -import cv2 -import numpy as np - - -def decode_image(im_file, im_info): - """read rgb image - Args: - im_file (str/np.ndarray): path of image/ np.ndarray read by cv2 - im_info (dict): info of image - Returns: - im (np.ndarray): processed image (np.ndarray) - im_info (dict): info of processed image - """ - if isinstance(im_file, str): - with open(im_file, 'rb') as f: - im_read = f.read() - data = np.frombuffer(im_read, dtype='uint8') - im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode - im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) - - im_info['origin_shape'] = im.shape[:2] - im_info['resize_shape'] = im.shape[:2] - else: - im = im_file - im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) - im_info['origin_shape'] = im.shape[:2] - im_info['resize_shape'] = im.shape[:2] - return im, im_info - - -class Resize(object): - """resize image by target_size and max_size - Args: - arch (str): model type - target_size (int): the target size of image - max_size (int): the max size of image - use_cv2 (bool): whether us cv2 - image_shape (list): input shape of model - interp (int): method of resize - """ - - def __init__(self, - target_size=800, - max_size=1333, - use_cv2=True, - image_shape=None, - interp=cv2.INTER_LINEAR, - resize_box=False): - self.target_size = target_size - self.max_size = max_size - self.image_shape = image_shape - self.use_cv2 = use_cv2 - self.interp = interp - - def __call__(self, im, im_info): - """ - Args: - im (np.ndarray): image (np.ndarray) - im_info (dict): info of image - Returns: - im (np.ndarray): processed image (np.ndarray) - im_info (dict): info of processed image - """ - im_channel = im.shape[2] - im_scale_x, im_scale_y = self.generate_scale(im) - im_info['resize_shape'] = [ - im_scale_x * float(im.shape[0]), im_scale_y * float(im.shape[1]) - ] - if self.use_cv2: - im = cv2.resize( - im, - None, - None, - fx=im_scale_x, - fy=im_scale_y, - interpolation=self.interp) - else: - resize_w = int(im_scale_x * float(im.shape[1])) - resize_h = int(im_scale_y * float(im.shape[0])) - if self.max_size != 0: - raise TypeError( - 'If you set max_size to cap the maximum size of image,' - 'please set use_cv2 to True to resize the image.') - im = im.astype('uint8') - im = Image.fromarray(im) - im = im.resize((int(resize_w), int(resize_h)), self.interp) - im = np.array(im) - - # padding im when image_shape fixed by infer_cfg.yml - if self.max_size != 0 and self.image_shape is not None: - padding_im = np.zeros( - (self.max_size, self.max_size, im_channel), dtype=np.float32) - im_h, im_w = im.shape[:2] - padding_im[:im_h, :im_w, :] = im - im = padding_im - - im_info['scale'] = [im_scale_x, im_scale_y] - return im, im_info - - def generate_scale(self, im): - """ - Args: - im (np.ndarray): image (np.ndarray) - Returns: - im_scale_x: the resize ratio of X - im_scale_y: the resize ratio of Y - """ - origin_shape = im.shape[:2] - im_c = im.shape[2] - if self.max_size != 0: - im_size_min = np.min(origin_shape[0:2]) - im_size_max = np.max(origin_shape[0:2]) - im_scale = float(self.target_size) / float(im_size_min) - if np.round(im_scale * im_size_max) > self.max_size: - im_scale = float(self.max_size) / float(im_size_max) - im_scale_x = im_scale - im_scale_y = im_scale - else: - im_scale_x = float(self.target_size) / float(origin_shape[1]) - im_scale_y = float(self.target_size) / float(origin_shape[0]) - return im_scale_x, im_scale_y - - -class Normalize(object): - """normalize image - Args: - mean (list): im - mean - std (list): im / std - is_scale (bool): whether need im / 255 - is_channel_first (bool): if True: image shape is CHW, else: HWC - """ - - def __init__(self, mean, std, is_scale=True, is_channel_first=False): - self.mean = mean - self.std = std - self.is_scale = is_scale - self.is_channel_first = is_channel_first - - def __call__(self, im, im_info): - """ - Args: - im (np.ndarray): image (np.ndarray) - im_info (dict): info of image - Returns: - im (np.ndarray): processed image (np.ndarray) - im_info (dict): info of processed image - """ - im = im.astype(np.float32, copy=False) - if self.is_channel_first: - mean = np.array(self.mean)[:, np.newaxis, np.newaxis] - std = np.array(self.std)[:, np.newaxis, np.newaxis] - else: - mean = np.array(self.mean)[np.newaxis, np.newaxis, :] - std = np.array(self.std)[np.newaxis, np.newaxis, :] - if self.is_scale: - im = im / 255.0 - im -= mean - im /= std - return im, im_info - - -class Permute(object): - """permute image - Args: - to_bgr (bool): whether convert RGB to BGR - channel_first (bool): whether convert HWC to CHW - """ - - def __init__(self, to_bgr=False, channel_first=True): - self.to_bgr = to_bgr - self.channel_first = channel_first - - def __call__(self, im, im_info): - """ - Args: - im (np.ndarray): image (np.ndarray) - im_info (dict): info of image - Returns: - im (np.ndarray): processed image (np.ndarray) - im_info (dict): info of processed image - """ - if self.channel_first: - im = im.transpose((2, 0, 1)).copy() - if self.to_bgr: - im = im[[2, 1, 0], :, :] - return im, im_info - - -class PadStride(object): - """ padding image for model with FPN - Args: - stride (bool): model with FPN need image shape % stride == 0 - """ - - def __init__(self, stride=0): - self.coarsest_stride = stride - - def __call__(self, im, im_info): - """ - Args: - im (np.ndarray): image (np.ndarray) - im_info (dict): info of image - Returns: - im (np.ndarray): processed image (np.ndarray) - im_info (dict): info of processed image - """ - coarsest_stride = self.coarsest_stride - if coarsest_stride == 0: - return im - im_c, im_h, im_w = im.shape - pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride) - pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride) - padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32) - padding_im[:, :im_h, :im_w] = im - im_info['pad_shape'] = padding_im.shape[1:] - return padding_im, im_info - - -def preprocess(im, preprocess_ops): - # process image by preprocess_ops - im_info = { - 'scale': [1., 1.], - 'origin_shape': None, - 'resize_shape': None, - 'pad_shape': None, - } - im, im_info = decode_image(im, im_info) - count = 0 - for operator in preprocess_ops: - count += 1 - im, im_info = operator(im, im_info) - im = np.array((im, )).astype('float32') - return im, im_info diff --git a/static/application/christmas/solov2_blazeface/face_makeup_main.py b/static/application/christmas/solov2_blazeface/face_makeup_main.py deleted file mode 100644 index c86a179428760a2cf58741341b8bb66f68bcfba9..0000000000000000000000000000000000000000 --- a/static/application/christmas/solov2_blazeface/face_makeup_main.py +++ /dev/null @@ -1,282 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import os -import cv2 -import math -import numpy as np - -HAT_SCALES = { - '1.png': [3.0, 0.9, .0], - '2.png': [3.0, 1.3, .5], - '3.png': [2.2, 1.5, .8], - '4.png': [2.2, 1.8, .0], - '5.png': [1.8, 1.2, .0], -} - -GLASSES_SCALES = { - '1.png': [0.65, 2.5], - '2.png': [0.65, 2.5], -} - -BEARD_SCALES = {'1.png': [700, 0.3], '2.png': [220, 0.2]} - - -def rotate(image, angle): - """ - angle is degree, not radian - """ - (h, w) = image.shape[:2] - (cx, cy) = (w / 2, h / 2) - M = cv2.getRotationMatrix2D((cx, cy), -angle, 1.0) - cos = np.abs(M[0, 0]) - sin = np.abs(M[0, 1]) - nW = int((h * sin) + (w * cos)) - nH = int((h * cos) + (w * sin)) - M[0, 2] += (nW / 2) - cx - M[1, 2] += (nH / 2) - cy - return cv2.warpAffine(image, M, (nW, nH)) - - -def n_rotate_coord(angle, x, y): - """ - angle is radian, not degree - """ - rotatex = math.cos(angle) * x - math.sin(angle) * y - rotatey = math.cos(angle) * y + math.sin(angle) * x - return rotatex, rotatey - - -def r_rotate_coord(angle, x, y): - """ - angle is radian, not degree - """ - rotatex = math.cos(angle) * x + math.sin(angle) * y - rotatey = math.cos(angle) * y - math.sin(angle) * x - return rotatex, rotatey - - -def add_beard(person, kypoint, element_path): - beard_file_name = os.path.split(element_path)[1] - # element_len: top width of beard - # loc_offset_scale: scale relative to nose - element_len, loc_offset_scale = BEARD_SCALES[beard_file_name][:] - - x1, y1, x2, y2, x3, y3, x4, y4, x5, y5 = kypoint[:] - mouth_len = np.sqrt(np.square(np.abs(y4 - y5)) + np.square(x4 - x5)) - - element = cv2.imread(element_path) - h, w, _ = element.shape - resize_scale = mouth_len / float(element_len) - h, w = round(h * resize_scale + 0.5), round(w * resize_scale + 0.5) - resized_element = cv2.resize(element, (w, h)) - resized_ele_h, resized_ele_w, _ = resized_element.shape - - # First find the keypoint of mouth in front face - m_center_x = (x4 + x5) / 2. - m_center_y = (y4 + y5) / 2. - # cal degree only according mouth coordinates - degree = np.arccos((x4 - x5) / mouth_len) - - # coordinate of RoI in front face - half_w = int(resized_ele_w // 2) - scale = loc_offset_scale - roi_top_left_y = int(y3 + (((y5 + y4) // 2) - y3) * scale) - roi_top_left_x = int(x3 - half_w) - roi_top_right_y = roi_top_left_y - roi_top_right_x = int(x3 + half_w) - roi_bottom_left_y = roi_top_left_y + resized_ele_h - roi_bottom_left_x = roi_top_left_x - roi_bottom_right_y = roi_bottom_left_y - roi_bottom_right_x = roi_top_right_x - - r_x11, r_y11 = roi_top_left_x - x3, roi_top_left_y - y3 - r_x12, r_y12 = roi_top_right_x - x3, roi_top_right_y - y3 - r_x21, r_y21 = roi_bottom_left_x - x3, roi_bottom_left_y - y3 - r_x22, r_y22 = roi_bottom_right_x - x3, roi_bottom_right_y - y3 - - # coordinate of RoI in raw face - if m_center_x > x3: - x11, y11 = r_rotate_coord(degree, r_x11, r_y11) - x12, y12 = r_rotate_coord(degree, r_x12, r_y12) - x21, y21 = r_rotate_coord(degree, r_x21, r_y21) - x22, y22 = r_rotate_coord(degree, r_x22, r_y22) - else: - x11, y11 = n_rotate_coord(degree, r_x11, r_y11) - x12, y12 = n_rotate_coord(degree, r_x12, r_y12) - x21, y21 = n_rotate_coord(degree, r_x21, r_y21) - x22, y22 = n_rotate_coord(degree, r_x22, r_y22) - - x11, y11 = x11 + x3, y11 + y3 - x12, y12 = x12 + x3, y12 + y3 - x21, y21 = x21 + x3, y21 + y3 - x22, y22 = x22 + x3, y22 + y3 - - min_x = int(min(x11, x12, x21, x22)) - max_x = int(max(x11, x12, x21, x22)) - min_y = int(min(y11, y12, y21, y22)) - max_y = int(max(y11, y12, y21, y22)) - - angle = np.degrees(degree) - - if y4 < y5: - angle = -angle - - rotated_element = rotate(resized_element, angle) - - rotated_ele_h, rotated_ele_w, _ = rotated_element.shape - - max_x = min_x + int(rotated_ele_w) - max_y = min_y + int(rotated_ele_h) - - e2gray = cv2.cvtColor(rotated_element, cv2.COLOR_BGR2GRAY) - ret, mask = cv2.threshold(e2gray, 238, 255, cv2.THRESH_BINARY_INV) - mask_inv = cv2.bitwise_not(mask) - - roi = person[min_y:max_y, min_x:max_x] - person_bg = cv2.bitwise_and(roi, roi, mask=mask) - element_fg = cv2.bitwise_and( - rotated_element, rotated_element, mask=mask_inv) - - dst = cv2.add(person_bg, element_fg) - person[min_y:max_y, min_x:max_x] = dst - return person - - -def add_hat(person, kypoint, element_path): - x1, y1, x2, y2, x3, y3, x4, y4, x5, y5 = kypoint[:] - eye_len = np.sqrt(np.square(np.abs(y1 - y2)) + np.square(np.abs(x1 - x2))) - # cal degree only according eye coordinates - degree = np.arccos((x2 - x1) / eye_len) - - angle = np.degrees(degree) - if y2 < y1: - angle = -angle - - element = cv2.imread(element_path) - hat_file_name = os.path.split(element_path)[1] - # head_scale: size scale of hat - # high_scale: height scale above the eyes - # offect_scale: width offect of hat in face - head_scale, high_scale, offect_scale = HAT_SCALES[hat_file_name][:] - h, w, _ = element.shape - - element_len = w - resize_scale = eye_len * head_scale / float(w) - h, w = round(h * resize_scale + 0.5), round(w * resize_scale + 0.5) - resized_element = cv2.resize(element, (w, h)) - resized_ele_h, resized_ele_w, _ = resized_element.shape - - m_center_x = (x1 + x2) / 2. - m_center_y = (y1 + y2) / 2. - - head_len = int(eye_len * high_scale) - - if angle > 0: - head_center_x = int(m_center_x + head_len * math.sin(degree)) - head_center_y = int(m_center_y - head_len * math.cos(degree)) - else: - head_center_x = int(m_center_x + head_len * math.sin(degree)) - head_center_y = int(m_center_y - head_len * math.cos(degree)) - - rotated_element = rotate(resized_element, angle) - - rotated_ele_h, rotated_ele_w, _ = rotated_element.shape - max_x = int(head_center_x + (resized_ele_w // 2) * math.cos(degree)) + int( - angle * head_scale) + int(eye_len * offect_scale) - min_y = int(head_center_y - (resized_ele_w // 2) * math.cos(degree)) - - pad_ele_x0 = 0 if (max_x - int(rotated_ele_w)) > 0 else -( - max_x - int(rotated_ele_w)) - pad_ele_y0 = 0 if min_y > 0 else -(min_y) - - min_x = int(max(max_x - int(rotated_ele_w), 0)) - min_y = int(max(min_y, 0)) - max_y = min_y + int(rotated_ele_h) - - pad_y1 = max(max_y - int(person.shape[0]), 0) - pad_x1 = max(max_x - int(person.shape[1]), 0) - pad_w = pad_ele_x0 + pad_x1 - pad_h = pad_ele_y0 + pad_y1 - max_x += pad_w - - pad_person = np.zeros( - (person.shape[0] + pad_h, person.shape[1] + pad_w, 3)).astype(np.uint8) - - pad_person[pad_ele_y0:pad_ele_y0 + person.shape[0], pad_ele_x0:pad_ele_x0 + - person.shape[1], :] = person - - e2gray = cv2.cvtColor(rotated_element, cv2.COLOR_BGR2GRAY) - ret, mask = cv2.threshold(e2gray, 1, 255, cv2.THRESH_BINARY_INV) - mask_inv = cv2.bitwise_not(mask) - - roi = pad_person[min_y:max_y, min_x:max_x] - - person_bg = cv2.bitwise_and(roi, roi, mask=mask) - element_fg = cv2.bitwise_and( - rotated_element, rotated_element, mask=mask_inv) - - dst = cv2.add(person_bg, element_fg) - pad_person[min_y:max_y, min_x:max_x] = dst - - return pad_person, pad_ele_x0, pad_x1, pad_ele_y0, pad_y1, min_x, min_y, max_x, max_y - - -def add_glasses(person, kypoint, element_path): - x1, y1, x2, y2, x3, y3, x4, y4, x5, y5 = kypoint[:] - eye_len = np.sqrt(np.square(np.abs(y1 - y2)) + np.square(np.abs(x1 - x2))) - # cal degree only according eye coordinates - degree = np.arccos((x2 - x1) / eye_len) - angle = np.degrees(degree) - if y2 < y1: - angle = -angle - - element = cv2.imread(element_path) - glasses_file_name = os.path.split(element_path)[1] - # height_scale: height scale above the eyes - # glasses_scale: size ratio of glasses - height_scale, glasses_scale = GLASSES_SCALES[glasses_file_name][:] - h, w, _ = element.shape - - element_len = w - resize_scale = eye_len * glasses_scale / float(element_len) - h, w = round(h * resize_scale + 0.5), round(w * resize_scale + 0.5) - resized_element = cv2.resize(element, (w, h)) - resized_ele_h, resized_ele_w, _ = resized_element.shape - - rotated_element = rotate(resized_element, angle) - - rotated_ele_h, rotated_ele_w, _ = rotated_element.shape - - eye_center_x = (x1 + x2) / 2. - eye_center_y = (y1 + y2) / 2. - - min_x = int(eye_center_x) - int(rotated_ele_w * 0.5) + int( - angle * glasses_scale * person.shape[1] / 2000) - min_y = int(eye_center_y) - int(rotated_ele_h * height_scale) - max_x = min_x + rotated_ele_w - max_y = min_y + rotated_ele_h - - e2gray = cv2.cvtColor(rotated_element, cv2.COLOR_BGR2GRAY) - ret, mask = cv2.threshold(e2gray, 1, 255, cv2.THRESH_BINARY_INV) - mask_inv = cv2.bitwise_not(mask) - - roi = person[min_y:max_y, min_x:max_x] - - person_bg = cv2.bitwise_and(roi, roi, mask=mask) - element_fg = cv2.bitwise_and( - rotated_element, rotated_element, mask=mask_inv) - - dst = cv2.add(person_bg, element_fg) - person[min_y:max_y, min_x:max_x] = dst - return person diff --git a/static/application/christmas/solov2_blazeface/module.py b/static/application/christmas/solov2_blazeface/module.py deleted file mode 100644 index 1c63427e514b21ce6505e767b1e02fff71252194..0000000000000000000000000000000000000000 --- a/static/application/christmas/solov2_blazeface/module.py +++ /dev/null @@ -1,157 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import base64 -import json - -import cv2 -import numpy as np -import paddle.nn as nn -import paddlehub as hub -from paddlehub.module.module import moduleinfo, serving -import solov2_blazeface.processor as P - - -def cv2_to_base64(image): - data = cv2.imencode('.jpg', image)[1] - return base64.b64encode(data.tostring()).decode('utf8') - - -def base64_to_cv2(b64str): - data = base64.b64decode(b64str.encode('utf8')) - data = np.fromstring(data, np.uint8) - data = cv2.imdecode(data, cv2.IMREAD_COLOR) - return data - - -@moduleinfo( - name="solov2_blazeface", - type="CV/image_editing", - author="paddlepaddle", - author_email="", - summary="solov2_blaceface is a segmentation and face detection model based on solov2 and blaceface.", - version="1.0.0") -class SoloV2BlazeFaceModel(nn.Layer): - """ - SoloV2BlazeFaceModel - """ - - def __init__(self, use_gpu=True): - super(SoloV2BlazeFaceModel, self).__init__() - self.solov2 = hub.Module(name='solov2', use_gpu=use_gpu) - self.blaceface = hub.Module(name='blazeface', use_gpu=use_gpu) - - def predict(self, - image, - background, - beard_file=None, - glasses_file=None, - hat_file=None, - visualization=False, - threshold=0.5): - # instance segmention - solov2_output = self.solov2.predict( - image=image, threshold=threshold, visualization=visualization) - # Set background pixel to 0 - im_segm, x0, x1, y0, y1, _, _, _, _, flag_seg = P.visualize_box_mask( - image, solov2_output, threshold=threshold) - - if flag_seg == 0: - return im_segm - - h, w = y1 - y0, x1 - x0 - back_json = background[:-3] + 'json' - stand_box = json.load(open(back_json)) - stand_box = stand_box['outputs']['object'][0]['bndbox'] - stand_xmin, stand_xmax, stand_ymin, stand_ymax = stand_box[ - 'xmin'], stand_box['xmax'], stand_box['ymin'], stand_box['ymax'] - im_path = np.asarray(im_segm) - - # face detection - blaceface_output = self.blaceface.predict( - image=im_path, threshold=threshold, visualization=visualization) - im_face_kp, p_left, p_right, p_up, p_bottom, h_xmin, h_ymin, h_xmax, h_ymax, flag_face = P.visualize_box_mask( - im_path, - blaceface_output, - threshold=threshold, - beard_file=beard_file, - glasses_file=glasses_file, - hat_file=hat_file) - if flag_face == 1: - if x0 > h_xmin: - shift_x_ = x0 - h_xmin - else: - shift_x_ = 0 - if y0 > h_ymin: - shift_y_ = y0 - h_ymin - else: - shift_y_ = 0 - h += p_up + p_bottom + shift_y_ - w += p_left + p_right + shift_x_ - x0 = min(x0, h_xmin) - y0 = min(y0, h_ymin) - x1 = max(x1, h_xmax) + shift_x_ + p_left + p_right - y1 = max(y1, h_ymax) + shift_y_ + p_up + p_bottom - # Fill the background image - cropped = im_face_kp.crop((x0, y0, x1, y1)) - resize_scale = min((stand_xmax - stand_xmin) / (x1 - x0), - (stand_ymax - stand_ymin) / (y1 - y0)) - h, w = int(h * resize_scale), int(w * resize_scale) - cropped = cropped.resize((w, h), cv2.INTER_LINEAR) - cropped = cv2.cvtColor(np.asarray(cropped), cv2.COLOR_RGB2BGR) - shift_x = int((stand_xmax - stand_xmin - cropped.shape[1]) / 2) - shift_y = int((stand_ymax - stand_ymin - cropped.shape[0]) / 2) - out_image = cv2.imread(background) - e2gray = cv2.cvtColor(cropped, cv2.COLOR_BGR2GRAY) - ret, mask = cv2.threshold(e2gray, 1, 255, cv2.THRESH_BINARY_INV) - mask_inv = cv2.bitwise_not(mask) - roi = out_image[stand_ymin + shift_y:stand_ymin + cropped.shape[ - 0] + shift_y, stand_xmin + shift_x:stand_xmin + cropped.shape[1] + - shift_x] - person_bg = cv2.bitwise_and(roi, roi, mask=mask) - element_fg = cv2.bitwise_and(cropped, cropped, mask=mask_inv) - dst = cv2.add(person_bg, element_fg) - out_image[stand_ymin + shift_y:stand_ymin + cropped.shape[ - 0] + shift_y, stand_xmin + shift_x:stand_xmin + cropped.shape[1] + - shift_x] = dst - - return out_image - - @serving - def serving_method(self, images, background, beard, glasses, hat, **kwargs): - """ - Run as a service. - """ - final = {} - background_path = os.path.join( - self.directory, - 'element_source/background/{}.png'.format(background)) - beard_path = os.path.join(self.directory, - 'element_source/beard/{}.png'.format(beard)) - glasses_path = os.path.join( - self.directory, 'element_source/glasses/{}.png'.format(glasses)) - hat_path = os.path.join(self.directory, - 'element_source/hat/{}.png'.format(hat)) - images_decode = base64_to_cv2(images[0]) - output = self.predict( - image=images_decode, - background=background_path, - hat_file=hat_path, - beard_file=beard_path, - glasses_file=glasses_path, - **kwargs) - final['image'] = cv2_to_base64(output) - - return final diff --git a/static/application/christmas/solov2_blazeface/processor.py b/static/application/christmas/solov2_blazeface/processor.py deleted file mode 100644 index 702985db392dddfe4b0d0b87e5061b8b447433a5..0000000000000000000000000000000000000000 --- a/static/application/christmas/solov2_blazeface/processor.py +++ /dev/null @@ -1,163 +0,0 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import division - -import cv2 -import numpy as np -from PIL import Image -import solov2_blazeface.face_makeup_main as face_makeup_main - - -def visualize_box_mask(im, - results, - threshold=0.5, - beard_file=None, - glasses_file=None, - hat_file=None): - if isinstance(im, str): - im = Image.open(im).convert('RGB') - else: - im = Image.fromarray(im) - - if 'segm' in results: - im, x0, x1, y0, y1, flag_seg = draw_segm( - im, - results['segm'], - results['label'], - results['score'], - threshold=threshold) - return im, x0, x1, y0, y1, 0, 0, 0, 0, flag_seg - if 'landmark' in results: - im, left, right, up, bottom, h_xmin, h_ymin, h_xmax, h_ymax, flag_face = trans_lmk( - im, results['landmark'], beard_file, glasses_file, hat_file) - return im, left, right, up, bottom, h_xmin, h_ymin, h_xmax, h_ymax, flag_face - else: - return im, 0, 0, 0, 0, 0, 0, 0, 0, 0 - - -def draw_segm(im, np_segms, np_label, np_score, threshold=0.5, alpha=0.7): - """ - Draw segmentation on image - """ - im = np.array(im).astype('float32') - np_segms = np_segms.astype(np.uint8) - index_label = np.where(np_label == 0)[0] - index = np.where(np_score[index_label] > threshold)[0] - index = index_label[index] - if index.size == 0: - im = Image.fromarray(im.astype('uint8')) - return im, 0, 0, 0, 0, 0 - person_segms = np_segms[index] - person_mask_single_channel = np.sum(person_segms, axis=0) - person_mask_single_channel[person_mask_single_channel > 1] = 1 - person_mask = np.expand_dims(person_mask_single_channel, axis=2) - person_mask = np.repeat(person_mask, 3, axis=2) - im = im * person_mask - - sum_x = np.sum(person_mask_single_channel, axis=0) - x = np.where(sum_x > 0.5)[0] - sum_y = np.sum(person_mask_single_channel, axis=1) - y = np.where(sum_y > 0.5)[0] - x0, x1, y0, y1 = x[0], x[-1], y[0], y[-1] - - return Image.fromarray(im.astype('uint8')), x0, x1, y0, y1, 1 - - -def lmk2out(bboxes, np_lmk, im_info, threshold=0.5, is_bbox_normalized=True): - image_w, image_h = im_info['origin_shape'] - scale = im_info['scale'] - face_index, landmark, prior_box = np_lmk[:] - xywh_res = [] - if bboxes.shape == (1, 1) or bboxes is None: - return np.array([]) - prior = np.reshape(prior_box, (-1, 4)) - predict_lmk = np.reshape(landmark, (-1, 10)) - k = 0 - for i in range(bboxes.shape[0]): - score = bboxes[i][1] - if score < threshold: - continue - theindex = face_index[i][0] - me_prior = prior[theindex, :] - lmk_pred = predict_lmk[theindex, :] - prior_h = me_prior[2] - me_prior[0] - prior_w = me_prior[3] - me_prior[1] - prior_h_center = (me_prior[2] + me_prior[0]) / 2 - prior_w_center = (me_prior[3] + me_prior[1]) / 2 - lmk_decode = np.zeros((10)) - for j in [0, 2, 4, 6, 8]: - lmk_decode[j] = lmk_pred[j] * 0.1 * prior_w + prior_h_center - for j in [1, 3, 5, 7, 9]: - lmk_decode[j] = lmk_pred[j] * 0.1 * prior_h + prior_w_center - if is_bbox_normalized: - lmk_decode = lmk_decode * np.array([ - image_h, image_w, image_h, image_w, image_h, image_w, image_h, - image_w, image_h, image_w - ]) - xywh_res.append(lmk_decode) - return np.asarray(xywh_res) - - -def post_processing(image, lmk_decode, hat_path, beard_path, glasses_path): - image = cv2.cvtColor(np.asarray(image), cv2.COLOR_RGB2BGR) - p_left, p_right, p_up, p_bottom, h_xmax, h_ymax = [0] * 6 - h_xmin, h_ymin = 10000, 10000 - # Add beard on the face - if beard_path is not None: - image = face_makeup_main.add_beard(image, lmk_decode, beard_path) - # Add glasses on the face - if glasses_path is not None: - image = face_makeup_main.add_glasses(image, lmk_decode, glasses_path) - # Add hat on the face - if hat_path is not None: - image, p_left, p_right, p_up, p_bottom, h_xmin, h_ymin, h_xmax, h_ymax = face_makeup_main.add_hat( - image, lmk_decode, hat_path) - image = Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) - print('----------- Post Processing Success -----------') - return image, p_left, p_right, p_up, p_bottom, h_xmin, h_ymin, h_xmax, h_ymax - - -def trans_lmk(image, lmk_results, beard_file, glasses_file, hat_file): - p_left, p_right, p_up, p_bottom, h_xmax, h_ymax = [0] * 6 - h_xmin, h_ymin = 10000, 10000 - if lmk_results.shape[0] == 0: - return image, p_left, p_right, p_up, p_bottom, h_xmin, h_ymin, h_xmax, h_ymax, 0 - for lmk_decode in lmk_results: - - x1, y1, x2, y2 = lmk_decode[0], lmk_decode[1], lmk_decode[ - 2], lmk_decode[3] - x4, y4, x5, y5 = lmk_decode[6], lmk_decode[7], lmk_decode[ - 8], lmk_decode[9] - # Refine the order of keypoint - if x1 > x2: - lmk_decode[0], lmk_decode[1], lmk_decode[2], lmk_decode[ - 3] = lmk_decode[2], lmk_decode[3], lmk_decode[0], lmk_decode[1] - if x4 < x5: - lmk_decode[6], lmk_decode[7], lmk_decode[8], lmk_decode[ - 9] = lmk_decode[8], lmk_decode[9], lmk_decode[6], lmk_decode[7] - # Add decoration to the face - image, p_left_temp, p_right_temp, p_up_temp, p_bottom_temp, h_xmin_temp, h_ymin_temp, h_xmax_temp, h_ymax_temp = post_processing( - image, lmk_decode, hat_file, beard_file, glasses_file) - - p_left = max(p_left, p_left_temp) - p_right = max(p_right, p_right_temp) - p_up = max(p_up, p_up_temp) - p_bottom = max(p_bottom, p_bottom_temp) - h_xmin = min(h_xmin, h_xmin_temp) - h_ymin = min(h_ymin, h_ymin_temp) - h_xmax = max(h_xmax, h_xmax_temp) - h_ymax = max(h_ymax, h_ymax_temp) - - return image, p_left, p_right, p_up, p_bottom, h_xmin, h_ymin, h_xmax, h_ymax, 1 diff --git a/static/application/christmas/test_main.py b/static/application/christmas/test_main.py deleted file mode 100644 index ed8f2c375ad5df461063cfc572c88a1379cb681c..0000000000000000000000000000000000000000 --- a/static/application/christmas/test_main.py +++ /dev/null @@ -1,32 +0,0 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import paddlehub as hub -import cv2 - -img_file = 'demo_images/test.jpg' -background = 'element_source/background/1.png' -beard_file = 'element_source/beard/1.png' -glasses_file = 'element_source/glasses/4.png' -hat_file = 'element_source/hat/1.png' - -model = hub.Module(name='solov2_blazeface', use_gpu=True) -output = model.predict( - image=img_file, - background=background, - hat_file=hat_file, - beard_file=beard_file, - glasses_file=glasses_file, - visualization=True) -cv2.imwrite("chrismas_final.png", output) diff --git a/static/application/christmas/test_server.py b/static/application/christmas/test_server.py deleted file mode 100644 index 34963b871242199d1a48fff3dc2230ffbbf01db0..0000000000000000000000000000000000000000 --- a/static/application/christmas/test_server.py +++ /dev/null @@ -1,53 +0,0 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import requests -import json -import cv2 -import base64 -import time -import numpy as np - - -def cv2_to_base64(image): - data = cv2.imencode('.jpg', image)[1] - return base64.b64encode(data.tostring()).decode('utf8') - - -def base64_to_cv2(b64str): - data = base64.b64decode(b64str.encode('utf8')) - data = np.fromstring(data, np.uint8) - data = cv2.imdecode(data, cv2.IMREAD_COLOR) - return data - - -# Send HTTP request -org_im = cv2.cvtColor(cv2.imread('demo_images/test.jpg'), cv2.COLOR_BGR2RGB) -h, w, c = org_im.shape -hat_ids = 1 -data = { - 'images': [cv2_to_base64(org_im)], - 'background': 3, - "beard": 2, - "glasses": 3, - "hat": 3 -} -headers = {"Content-type": "application/json"} -url = "http://127.0.0.1:8880/predict/solov2_blazeface" -start = time.time() -r = requests.post(url=url, headers=headers, data=json.dumps(data)) -end = time.time() -print('cost:', end - start) -result = base64_to_cv2(r.json()["results"]['image']) -cv2.imwrite("chrismas_final.png", result) diff --git a/static/configs/acfpn/README.md b/static/configs/acfpn/README.md deleted file mode 100644 index 802d61478150af1b22e1e94fa53b30f2ed905cbc..0000000000000000000000000000000000000000 --- a/static/configs/acfpn/README.md +++ /dev/null @@ -1,16 +0,0 @@ -# Attention-guided Context Feature Pyramid Network for Object Detection - -## Introduction - -- Attention-guided Context Feature Pyramid Network for Object Detection: [https://arxiv.org/abs/2005.11475](https://arxiv.org/abs/2005.11475) - -``` -Cao J, Chen Q, Guo J, et al. Attention-guided Context Feature Pyramid Network for Object Detection[J]. arXiv preprint arXiv:2005.11475, 2020. -``` - - -## Model Zoo - -| Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | -| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| ResNet50-vd-ACFPN | Faster | 2 | 1x | 23.432 | 39.6 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_acfpn_1x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/configs/acfpn/faster_rcnn_r50_vd_acfpn_1x.yml) | diff --git a/static/configs/acfpn/faster_rcnn_r50_vd_acfpn_1x.yml b/static/configs/acfpn/faster_rcnn_r50_vd_acfpn_1x.yml deleted file mode 100644 index 2696be3f8b4a621ba185271d2c62f3460234991e..0000000000000000000000000000000000000000 --- a/static/configs/acfpn/faster_rcnn_r50_vd_acfpn_1x.yml +++ /dev/null @@ -1,107 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/faster_rcnn_r50_vd_acfpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: ACFPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -ACFPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - norm_groups: 32 - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/anchor_free/README.md b/static/configs/anchor_free/README.md deleted file mode 100644 index 15b1717be3f156ca92b490fe038dd979a95a502e..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/README.md +++ /dev/null @@ -1,80 +0,0 @@ -# Anchor Free系列模型 - -## 内容 -- [简介](#简介) -- [模型库与基线](#模型库与基线) -- [算法细节](#算法细节) -- [如何贡献代码](#如何贡献代码) - -## 简介 -目前主流的检测算法大体分为两类: single-stage和two-stage,其中single-stage的经典算法包括SSD, YOLO等,two-stage方法有RCNN系列模型,两大类算法在[PaddleDetection Model Zoo](../../docs/MODEL_ZOO.md)中均有给出,它们的共同特点是先定义一系列密集的,大小不等的anchor区域,再基于这些先验区域进行分类和回归,这种方式极大的受限于anchor自身的设计。随着CornerNet的提出,涌现了多种anchor free方法,PaddleDetection也集成了一系列anchor free算法。 - -## 模型库与基线 -下表中展示了PaddleDetection当前支持的网络结构,具体细节请参考[算法细节](#算法细节)。 - -| | ResNet50 | ResNet50-vd | Hourglass104 | DarkNet53 -|:------------------------:|:--------:|:-------------:|:-------------:|:-------------:| -| [CornerNet-Squeeze](#CornerNet-Squeeze) | x | ✓ | ✓ |x | -| [FCOS](#FCOS) | ✓ | x | x | x | -| [TTFNet](#TTFNet) | x | x | x | ✓ | - - - -### 模型库 - -#### COCO数据集上的mAP - -| 网络结构 | 骨干网络 | 图片个数/GPU | 预训练模型 | mAP | FPS | 模型下载 | 配置文件 | -|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:----------:| -| CornerNet-Squeeze | Hourglass104 | 14 | 无 | 34.5 | 35.5 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_hg104.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/cornernet_squeeze_hg104.yml) | -| CornerNet-Squeeze | ResNet50-vd | 14 | [faster\_rcnn\_r50\_vd\_fpn\_2x](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_2x.tar) | 32.7 | 47.01 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_r50_vd_fpn.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/cornernet_squeeze_r50_vd_fpn.yml) | -| CornerNet-Squeeze-dcn | ResNet50-vd | 14 | [faster\_rcnn\_dcn\_r50\_vd\_fpn\_2x](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar) | 34.9 | 40.43 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_dcn_r50_vd_fpn.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/cornernet_squeeze_dcn_r50_vd_fpn.yml) | -| CornerNet-Squeeze-dcn-mixup-cosine* | ResNet50-vd | 14 | [faster\_rcnn\_dcn\_r50\_vd\_fpn\_2x](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar) | 38.2 | 39.70 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine.yml) | -| FCOS | ResNet50 | 2 | [ResNet50\_cos\_pretrained](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar) | 39.8 | 18.85 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/fcos_r50_fpn_1x.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/fcos_r50_fpn_1x.yml) | -| FCOS+multiscale_train | ResNet50 | 2 | [ResNet50\_cos\_pretrained](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar) | 42.0 | 19.05 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/fcos_r50_fpn_multiscale_2x.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/fcos_r50_fpn_multiscale_2x.yml) | -| FCOS+DCN | ResNet50 | 2 | [ResNet50\_cos\_pretrained](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar) | 44.4 | 13.66 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/fcos_dcn_r50_fpn_1x.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/fcos_dcn_r50_fpn_1x.yml) | -| TTFNet | DarkNet53 | 12 | [DarkNet53_pretrained](https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar) | 32.9 | 85.92 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ttfnet_darknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/anchor_free/ttfnet_darknet.yml) | - -**注意:** - -- 模型FPS在Tesla V100单卡环境中通过tools/eval.py进行测试 -- CornerNet-Squeeze要求使用PaddlePaddle1.8及以上版本或适当的develop版本 -- CornerNet-Squeeze中使用ResNet结构的骨干网络时,加入了FPN结构,骨干网络的输出feature map采用FPN中的P3层输出。 -- \*CornerNet-Squeeze-dcn-mixup-cosine是基于原版CornerNet-Squeeze优化效果最好的模型,在ResNet的骨干网络基础上增加mixup预处理和使用cosine_decay -- FCOS使用GIoU loss、用location分支预测centerness、左上右下角点偏移量归一化和ground truth中心匹配策略 -- Cornernet-Squeeze模型依赖corner_pooling op,该op在```ppdet/ext_op```中编译得到,具体编译方式请参考[自定义OP的编译过程](../../ppdet/ext_op/README.md) - -## 算法细节 - -### CornerNet-Squeeze - -**简介:** [CornerNet-Squeeze](https://arxiv.org/abs/1904.08900) 在[Cornernet](https://arxiv.org/abs/1808.01244)基础上进行改进,预测目标框的左上角和右下角的位置,同时参考SqueezeNet和MobileNet的特点,优化了CornerNet骨干网络Hourglass-104,大幅提升了模型预测速度,相较于原版[YOLO-v3](https://arxiv.org/abs/1804.02767),在训练精度和推理速度上都具备一定优势。 - -**特点:** - -- 使用corner_pooling获取候选框左上角和右下角的位置 -- 替换Hourglass-104中的residual block为SqueezeNet中的fire-module -- 替换第二层3x3卷积为3x3深度可分离卷积 - - -### FCOS - -**简介:** [FCOS](https://arxiv.org/abs/1904.01355)是一种密集预测的anchor-free检测算法,使用RetinaNet的骨架,直接在feature map上回归目标物体的长宽,并预测物体的类别以及centerness(feature map上像素点离物体中心的偏移程度),centerness最终会作为权重来调整物体得分。 - -**特点:** - -- 利用FPN结构在不同层预测不同scale的物体框,避免了同一feature map像素点处有多个物体框重叠的情况 -- 通过center-ness单层分支预测当前点是否是目标中心,消除低质量误检 - - -## TTFNet - -**简介:** [TTFNet](https://arxiv.org/abs/1909.00700)是一种用于实时目标检测且对训练时间友好的网络,对CenterNet收敛速度慢的问题进行改进,提出了利用高斯核生成训练样本的新方法,有效的消除了anchor-free head中存在的模糊性。同时简单轻量化的网络结构也易于进行任务扩展。 - -**特点:** - -- 结构简单,仅需要两个head检测目标位置和大小,并且去除了耗时的后处理操作 -- 训练时间短,基于DarkNet53的骨干网路,V100 8卡仅需要训练2个小时即可达到较好的模型效果 - -## 如何贡献代码 -我们非常欢迎您可以为PaddleDetection中的Anchor Free检测模型提供代码,您可以提交PR供我们review;也十分感谢您的反馈,可以提交相应issue,我们会及时解答。 diff --git a/static/configs/anchor_free/cornernet_squeeze_dcn_r50_vd_fpn.yml b/static/configs/anchor_free/cornernet_squeeze_dcn_r50_vd_fpn.yml deleted file mode 100644 index 75ea874020f77de5c24299a5d141dfba700b2e2c..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/cornernet_squeeze_dcn_r50_vd_fpn.yml +++ /dev/null @@ -1,163 +0,0 @@ -architecture: CornerNetSqueeze -use_gpu: true -max_iters: 500000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar -weights: output/cornernet_squeeze_dcn_r50_vd_fpn/model_final -num_classes: 80 -stack: 1 - -CornerNetSqueeze: - backbone: ResNet - fpn: FPN - corner_head: CornerHead - -ResNet: - norm_type: bn - depth: 50 - feature_maps: [3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - min_level: 3 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125] - -CornerHead: - test_batch_size: 1 - ae_threshold: 0.5 - num_dets: 100 - top_k: 20 - -PostProcess: - use_soft_nms: true - detections_per_im: 100 - nms_thresh: 0.001 - sigma: 0.5 - -LearningRate: - base_lr: 0.0005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 511, 511] - fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks'] - output_size: 64 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: False - - !CornerCrop - input_size: 511 - - !Resize - target_dim: 511 - - !RandomFlipImage - prob: 0.5 - - !CornerRandColor - saturation: 0.4 - contrast: 0.4 - brightness: 0.4 - - !Lighting - eigval: [0.2141788, 0.01817699, 0.00341571] - eigvec: [[-0.58752847, -0.69563484, 0.41340352], - [-0.5832747, 0.00994535, -0.81221408], - [-0.56089297, 0.71832671, 0.41158938]] - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: False - is_channel_first: False - - !Permute - to_bgr: False - - !CornerTarget - output_size: [64, 64] - num_classes: 80 - batch_size: 14 - shuffle: true - drop_last: true - worker_num: 2 - use_process: true - drop_empty: false - -EvalReader: - inputs_def: - fields: ['image', 'im_id', 'ratios', 'borders'] - output_size: 64 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: false - - !CornerCrop - is_train: false - - !CornerRatio - input_size: 511 - output_size: 64 - - !Permute - to_bgr: False - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: True - is_channel_first: True - use_process: true - batch_size: 1 - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_id', 'ratios', 'borders'] - output_size: 64 - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: false - - !CornerCrop - is_train: false - - !CornerRatio - input_size: 511 - output_size: 64 - - !Permute - to_bgr: False - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: True - is_channel_first: True - batch_size: 1 diff --git a/static/configs/anchor_free/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine.yml b/static/configs/anchor_free/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine.yml deleted file mode 100644 index 823d702a44f1a3dc082a79aeb107a5268304720e..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine.yml +++ /dev/null @@ -1,167 +0,0 @@ -architecture: CornerNetSqueeze -use_gpu: true -max_iters: 500000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar -weights: output/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine/model_final -num_classes: 80 -stack: 1 - -CornerNetSqueeze: - backbone: ResNet - fpn: FPN - corner_head: CornerHead - -ResNet: - norm_type: bn - depth: 50 - feature_maps: [3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - min_level: 3 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125] - -CornerHead: - test_batch_size: 1 - ae_threshold: 0.5 - num_dets: 100 - top_k: 20 - -PostProcess: - use_soft_nms: true - detections_per_im: 100 - nms_thresh: 0.001 - sigma: 0.5 - -LearningRate: - base_lr: 0.005 - schedulers: - - !CosineDecay - max_iters: 500000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 511, 511] - fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks'] - output_size: 64 - max_tag_len: 256 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: False - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !CornerCrop - input_size: 511 - - !Resize - target_dim: 511 - - !RandomFlipImage - prob: 0.5 - - !CornerRandColor - saturation: 0.4 - contrast: 0.4 - brightness: 0.4 - - !Lighting - eigval: [0.2141788, 0.01817699, 0.00341571] - eigvec: [[-0.58752847, -0.69563484, 0.41340352], - [-0.5832747, 0.00994535, -0.81221408], - [-0.56089297, 0.71832671, 0.41158938]] - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: False - is_channel_first: False - - !Permute - to_bgr: False - - !CornerTarget - output_size: [64, 64] - num_classes: 80 - max_tag_len: 256 - batch_size: 14 - shuffle: true - drop_last: true - worker_num: 2 - use_process: true - drop_empty: false - mixup_epoch: 200 - -EvalReader: - inputs_def: - fields: ['image', 'im_id', 'ratios', 'borders'] - output_size: 64 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: false - - !CornerCrop - is_train: false - - !CornerRatio - input_size: 511 - output_size: 64 - - !Permute - to_bgr: False - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: True - is_channel_first: True - use_process: true - batch_size: 1 - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_id', 'ratios', 'borders'] - output_size: 64 - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: false - - !CornerCrop - is_train: false - - !CornerRatio - input_size: 511 - output_size: 64 - - !Permute - to_bgr: False - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: True - is_channel_first: True - batch_size: 1 diff --git a/static/configs/anchor_free/cornernet_squeeze_hg104.yml b/static/configs/anchor_free/cornernet_squeeze_hg104.yml deleted file mode 100644 index 8e08dccadd71c920635398179320db5b7961403f..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/cornernet_squeeze_hg104.yml +++ /dev/null @@ -1,145 +0,0 @@ -architecture: CornerNetSqueeze -use_gpu: true -max_iters: 500000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: NULL -weights: output/cornernet_squeeze_hg104/model_final -num_classes: 80 -stack: 2 - -CornerNetSqueeze: - backbone: Hourglass - corner_head: CornerHead - -Hourglass: - dims: [256, 256, 384, 384, 512] - modules: [2, 2, 2, 2, 4] - -CornerHead: - test_batch_size: 1 - ae_threshold: 0.5 - num_dets: 100 - top_k: 20 - -PostProcess: - use_soft_nms: true - detections_per_im: 100 - nms_thresh: 0.001 - sigma: 0.5 - -LearningRate: - base_lr: 0.00025 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 450000 - -OptimizerBuilder: - optimizer: - type: Adam - regularizer: NULL - -TrainReader: - inputs_def: - image_shape: [3, 511, 511] - fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks'] - output_size: 64 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: False - - !CornerCrop - input_size: 511 - - !Resize - target_dim: 511 - - !RandomFlipImage - prob: 0.5 - - !CornerRandColor - saturation: 0.4 - contrast: 0.4 - brightness: 0.4 - - !Lighting - eigval: [0.2141788, 0.01817699, 0.00341571] - eigvec: [[-0.58752847, -0.69563484, 0.41340352], - [-0.5832747, 0.00994535, -0.81221408], - [-0.56089297, 0.71832671, 0.41158938]] - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: False - is_channel_first: False - - !Permute - to_bgr: False - - !CornerTarget - output_size: [64, 64] - num_classes: 80 - batch_size: 14 - shuffle: true - drop_last: true - worker_num: 2 - use_process: true - drop_empty: false - -EvalReader: - inputs_def: - fields: ['image', 'im_id', 'ratios', 'borders'] - output_size: 64 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: false - - !CornerCrop - is_train: false - - !CornerRatio - input_size: 511 - output_size: 64 - - !Permute - to_bgr: False - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: True - is_channel_first: True - batch_size: 1 - drop_empty: false - worker_num: 2 - use_process: true - -TestReader: - inputs_def: - fields: ['image', 'im_id', 'ratios', 'borders'] - output_size: 64 - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: false - - !CornerCrop - is_train: false - - !CornerRatio - input_size: 511 - output_size: 64 - - !Permute - to_bgr: False - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: True - is_channel_first: True - batch_size: 1 diff --git a/static/configs/anchor_free/cornernet_squeeze_r50_vd_fpn.yml b/static/configs/anchor_free/cornernet_squeeze_r50_vd_fpn.yml deleted file mode 100644 index 42a84271e85cd4e15a5e568e4553364dcb7e7ece..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/cornernet_squeeze_r50_vd_fpn.yml +++ /dev/null @@ -1,155 +0,0 @@ -architecture: CornerNetSqueeze -use_gpu: true -max_iters: 500000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar -weights: output/cornernet_squeeze_r50_vd_fpn/model_final -num_classes: 80 -stack: 1 - -CornerNetSqueeze: - backbone: ResNet - fpn: FPN - corner_head: CornerHead - -ResNet: - norm_type: affine_channel - depth: 50 - feature_maps: [3, 4, 5] - freeze_at: 2 - variant: d - -FPN: - min_level: 3 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125] - -CornerHead: - test_batch_size: 1 - ae_threshold: 0.5 - num_dets: 100 - top_k: 20 - -PostProcess: - use_soft_nms: true - detections_per_im: 100 - nms_thresh: 0.001 - sigma: 0.5 - -LearningRate: - base_lr: 0.0005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 450000 - -OptimizerBuilder: - optimizer: - type: Adam - regularizer: NULL - -TrainReader: - inputs_def: - image_shape: [3, 511, 511] - fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks'] - output_size: 64 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: False - - !CornerCrop - input_size: 511 - - !Resize - target_dim: 511 - - !RandomFlipImage - prob: 0.5 - - !CornerRandColor - saturation: 0.4 - contrast: 0.4 - brightness: 0.4 - - !Lighting - eigval: [0.2141788, 0.01817699, 0.00341571] - eigvec: [[-0.58752847, -0.69563484, 0.41340352], - [-0.5832747, 0.00994535, -0.81221408], - [-0.56089297, 0.71832671, 0.41158938]] - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: False - is_channel_first: False - - !Permute - to_bgr: False - - !CornerTarget - output_size: [64, 64] - num_classes: 80 - batch_size: 14 - shuffle: true - drop_last: true - worker_num: 2 - use_process: true - drop_empty: false - -EvalReader: - inputs_def: - fields: ['image', 'im_id', 'ratios', 'borders'] - output_size: 64 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: false - - !CornerCrop - is_train: false - - !CornerRatio - input_size: 511 - output_size: 64 - - !Permute - to_bgr: False - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: True - is_channel_first: True - use_process: true - batch_size: 1 - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_id', 'ratios', 'borders'] - output_size: 64 - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: false - - !CornerCrop - is_train: false - - !CornerRatio - input_size: 511 - output_size: 64 - - !Permute - to_bgr: False - - !NormalizeImage - mean: [0.40789654, 0.44719302, 0.47026115] - std: [0.28863828, 0.27408164, 0.2780983] - is_scale: True - is_channel_first: True - batch_size: 1 diff --git a/static/configs/anchor_free/fcos_dcn_r50_fpn_1x.yml b/static/configs/anchor_free/fcos_dcn_r50_fpn_1x.yml deleted file mode 100644 index af6f265a44b8482c1e52b1ab5e6d42f823f08cce..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/fcos_dcn_r50_fpn_1x.yml +++ /dev/null @@ -1,181 +0,0 @@ -architecture: FCOS -max_iters: 90000 -use_gpu: true -snapshot_iter: 5000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/fcos_dcn_r50_fpn_1x/model_final -num_classes: 80 - -FCOS: - backbone: ResNet - fpn: FPN - fcos_head: FCOSHead - -ResNet: - norm_type: affine_channel - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - freeze_at: 2 - dcn_v2_stages: [3, 4, 5] - -FPN: - min_level: 3 - max_level: 7 - num_chan: 256 - use_c5: false - spatial_scale: [0.03125, 0.0625, 0.125] - has_extra_convs: true - -FCOSHead: - num_classes: 80 - fpn_stride: [8, 16, 32, 64, 128] - num_convs: 4 - norm_type: "gn" - fcos_loss: FCOSLoss - norm_reg_targets: True - centerness_on_reg: True - use_dcn_in_tower: True - nms: MultiClassNMS - -MultiClassNMS: - score_threshold: 0.025 - nms_top_k: 1000 - keep_top_k: 100 - nms_threshold: 0.6 - background_label: -1 - -FCOSLoss: - loss_alpha: 0.25 - loss_gamma: 2.0 - iou_loss_type: "giou" - reg_weights: 1.0 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'fcos_target'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: false - - !Gt2FCOSTarget - object_sizes_boundary: [64, 128, 256, 512] - center_sampling_radius: 1.5 - downsample_ratios: [8, 16, 32, 64, 128] - norm_reg_targets: True - batch_size: 2 - shuffle: true - worker_num: 4 - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_id', 'im_shape', 'im_info'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: true - batch_size: 1 - shuffle: false - worker_num: 1 - use_process: false - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_id', 'im_shape', 'im_info'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: true - batch_size: 1 - shuffle: false diff --git a/static/configs/anchor_free/fcos_dcn_r50_fpn_1x_cutmix.yml b/static/configs/anchor_free/fcos_dcn_r50_fpn_1x_cutmix.yml deleted file mode 100644 index 8667a49cb48caf8810c4c191ceef5a42c9767602..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/fcos_dcn_r50_fpn_1x_cutmix.yml +++ /dev/null @@ -1,186 +0,0 @@ -architecture: FCOS -max_iters: 90000 -use_gpu: true -snapshot_iter: 5000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/fcos_dcn_r50_fpn_1x_cutmix/model_final -num_classes: 80 - -FCOS: - backbone: ResNet - fpn: FPN - fcos_head: FCOSHead - -ResNet: - norm_type: affine_channel - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - freeze_at: 2 - dcn_v2_stages: [3, 4, 5] - -FPN: - min_level: 3 - max_level: 7 - num_chan: 256 - use_c5: false - spatial_scale: [0.03125, 0.0625, 0.125] - has_extra_convs: true - -FCOSHead: - num_classes: 80 - fpn_stride: [8, 16, 32, 64, 128] - num_convs: 4 - norm_type: "gn" - fcos_loss: FCOSLoss - norm_reg_targets: True - centerness_on_reg: True - use_dcn_in_tower: True - nms: MultiClassNMS - -MultiClassNMS: - score_threshold: 0.025 - nms_top_k: 1000 - keep_top_k: 100 - nms_threshold: 0.6 - background_label: -1 - -FCOSLoss: - loss_alpha: 0.25 - loss_gamma: 2.0 - iou_loss_type: "giou" - reg_weights: 1.0 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'fcos_target'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_cutmix: True - - !CutmixImage - alpha: 1.5 - beta: 1.5 - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: false - - !Gt2FCOSTarget - object_sizes_boundary: [64, 128, 256, 512] - center_sampling_radius: 1.5 - downsample_ratios: [8, 16, 32, 64, 128] - norm_reg_targets: True - batch_size: 2 - shuffle: true - worker_num: 4 - use_process: false - cutmix_epoch: 10 - -EvalReader: - inputs_def: - fields: ['image', 'im_id', 'im_shape', 'im_info'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: true - batch_size: 1 - shuffle: false - worker_num: 1 - use_process: false - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_id', 'im_shape', 'im_info'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: true - batch_size: 1 - shuffle: false diff --git a/static/configs/anchor_free/fcos_r50_fpn_1x.yml b/static/configs/anchor_free/fcos_r50_fpn_1x.yml deleted file mode 100644 index 618914e9448a54acf230f1114dc2caf36e7151c9..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/fcos_r50_fpn_1x.yml +++ /dev/null @@ -1,180 +0,0 @@ -architecture: FCOS -max_iters: 90000 -use_gpu: true -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/fcos_r50_fpn_1x/model_final -num_classes: 80 - -FCOS: - backbone: ResNet - fpn: FPN - fcos_head: FCOSHead - -ResNet: - norm_type: affine_channel - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - freeze_at: 2 - -FPN: - min_level: 3 - max_level: 7 - num_chan: 256 - use_c5: false - spatial_scale: [0.03125, 0.0625, 0.125] - has_extra_convs: true - -FCOSHead: - num_classes: 80 - fpn_stride: [8, 16, 32, 64, 128] - num_convs: 4 - norm_type: "gn" - fcos_loss: FCOSLoss - norm_reg_targets: True - centerness_on_reg: True - use_dcn_in_tower: False - nms: MultiClassNMS - -MultiClassNMS: - score_threshold: 0.025 - nms_top_k: 1000 - keep_top_k: 100 - nms_threshold: 0.6 - background_label: -1 - -FCOSLoss: - loss_alpha: 0.25 - loss_gamma: 2.0 - iou_loss_type: "giou" - reg_weights: 1.0 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'fcos_target'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: false - - !Gt2FCOSTarget - object_sizes_boundary: [64, 128, 256, 512] - center_sampling_radius: 1.5 - downsample_ratios: [8, 16, 32, 64, 128] - norm_reg_targets: True - batch_size: 2 - shuffle: true - worker_num: 4 - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_id', 'im_shape', 'im_info'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: true - batch_size: 1 - shuffle: false - worker_num: 2 - use_process: false - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_id', 'im_shape', 'im_info'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: true - batch_size: 1 - shuffle: false diff --git a/static/configs/anchor_free/fcos_r50_fpn_multiscale_2x.yml b/static/configs/anchor_free/fcos_r50_fpn_multiscale_2x.yml deleted file mode 100644 index 7ce07dec59eee10afb85aea00b6d24add76af9c2..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/fcos_r50_fpn_multiscale_2x.yml +++ /dev/null @@ -1,180 +0,0 @@ -architecture: FCOS -max_iters: 180000 -use_gpu: true -snapshot_iter: 20000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/fcos_r50_fpn_multiscale_2x/model_final -num_classes: 80 - -FCOS: - backbone: ResNet - fpn: FPN - fcos_head: FCOSHead - -ResNet: - norm_type: affine_channel - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - freeze_at: 2 - -FPN: - min_level: 3 - max_level: 7 - num_chan: 256 - use_c5: false - spatial_scale: [0.03125, 0.0625, 0.125] - has_extra_convs: true - -FCOSHead: - num_classes: 80 - fpn_stride: [8, 16, 32, 64, 128] - num_convs: 4 - norm_type: "gn" - fcos_loss: FCOSLoss - norm_reg_targets: True - centerness_on_reg: True - use_dcn_in_tower: False - nms: MultiClassNMS - -MultiClassNMS: - score_threshold: 0.025 - nms_top_k: 1000 - keep_top_k: 100 - nms_threshold: 0.6 - background_label: -1 - -FCOSLoss: - loss_alpha: 0.25 - loss_gamma: 2.0 - iou_loss_type: "giou" - reg_weights: 1.0 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'fcos_target'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: [640, 672, 704, 736, 768, 800] - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: false - - !Gt2FCOSTarget - object_sizes_boundary: [64, 128, 256, 512] - center_sampling_radius: 1.5 - downsample_ratios: [8, 16, 32, 64, 128] - norm_reg_targets: True - batch_size: 2 - shuffle: true - worker_num: 4 - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_id', 'im_shape', 'im_info'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: true - batch_size: 1 - shuffle: false - worker_num: 2 - use_process: false - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_id', 'im_shape', 'im_info'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 128 - use_padded_im_info: true - batch_size: 1 - shuffle: false diff --git a/static/configs/anchor_free/pafnet_10x_coco.yml b/static/configs/anchor_free/pafnet_10x_coco.yml deleted file mode 100644 index 4c6728bcda05654e1e5c383e084e4aad1bbc6c6e..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/pafnet_10x_coco.yml +++ /dev/null @@ -1,170 +0,0 @@ -architecture: TTFNet -use_gpu: true -max_iters: 150000 -log_smooth_window: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -weights: output/pafnet_10x_coco/model_final -num_classes: 80 -use_ema: true -ema_decay: 0.9998 - -TTFNet: - backbone: ResNet - ttf_head: TTFHead - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [2, 3, 4, 5] - variant: d - dcn_v2_stages: [3, 4, 5] - -TTFHead: - head_conv: 128 - wh_conv: 64 - hm_head_conv_num: 2 - wh_head_conv_num: 2 - wh_offset_base: 16 - wh_loss: GiouLoss - dcn_head: True - -GiouLoss: - loss_weight: 5. - do_average: false - use_class_weight: false - -LearningRate: - base_lr: 0.015 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 112500 - - 137500 - - !LinearWarmup - start_factor: 0.2 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0004 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'ttf_heatmap', 'ttf_box_target', 'ttf_reg_weight'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_cutmix: True - - !CutmixImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort - hue: [-18., 18., 0.5] - saturation: [0.5, 1.5, 0.5] - contrast: [0.5, 1.5, 0.5] - brightness: [-32., 32., 0.5] - random_apply: False - hsv_format: True - random_channel: True - - !RandomExpand - ratio: 4 - prob: 0.5 - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop - aspect_ratio: NULL - cover_all_box: True - - !RandomFlipImage - prob: 0.5 - batch_transforms: - - !RandomShape - sizes: [416, 448, 480, 512, 544, 576, 608, 640, 672] - random_inter: True - resize_box: True - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123.675, 116.28, 103.53] - std: [58.395, 57.12, 57.375] - - !Permute - to_bgr: false - channel_first: true - - !Gt2TTFTarget - num_classes: 80 - down_ratio: 4 - - !PadBatch - pad_to_stride: 32 - batch_size: 12 - shuffle: true - worker_num: 8 - bufsize: 2 - use_process: false - cutmix_epoch: 100 - -EvalReader: - inputs_def: - image_shape: [3, 512, 512] - fields: ['image', 'im_id', 'scale_factor'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !Resize - target_dim: 512 - - !NormalizeImage - mean: [123.675, 116.28, 103.53] - std: [58.395, 57.12, 57.375] - is_scale: false - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 - drop_empty: false - worker_num: 8 - bufsize: 16 - -TestReader: - inputs_def: - image_shape: [3, 512, 512] - fields: ['image', 'im_id', 'scale_factor'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !Resize - interp: 1 - target_dim: 512 - - !NormalizeImage - mean: [123.675, 116.28, 103.53] - std: [58.395, 57.12, 57.375] - is_scale: false - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/anchor_free/pafnet_lite_mobilenet_v3_20x_coco.yml b/static/configs/anchor_free/pafnet_lite_mobilenet_v3_20x_coco.yml deleted file mode 100644 index 1b14238839e52a48707395de301c986c6f263334..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/pafnet_lite_mobilenet_v3_20x_coco.yml +++ /dev/null @@ -1,171 +0,0 @@ -architecture: TTFNet -use_gpu: true -max_iters: 300000 -log_smooth_window: 20 -save_dir: output -snapshot_iter: 50000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar -weights: output/pafnet_lite_mobilenet_v3_20x_coco/model_final -num_classes: 80 - -TTFNet: - backbone: MobileNetV3RCNN - ttf_head: TTFLiteHead - -MobileNetV3RCNN: - norm_type: sync_bn - norm_decay: 0.0 - model_name: large - scale: 1.0 - conv_decay: 0.00001 - lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] - freeze_norm: false - -TTFLiteHead: - head_conv: 48 - -GiouLoss: - loss_weight: 5. - do_average: false - use_class_weight: false - -LearningRate: - base_lr: 0.015 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 225000 - - 275000 - - !LinearWarmup - start_factor: 0.2 - steps: 1000 - -OptimizerBuilder: - clip_grad_by_norm: 35 - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0004 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'ttf_heatmap', 'ttf_box_target', 'ttf_reg_weight'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_cutmix: True - - !ColorDistort - hue: [-18., 18., 0.5] - saturation: [0.5, 1.5, 0.5] - contrast: [0.5, 1.5, 0.5] - brightness: [-32., 32., 0.5] - random_apply: False - hsv_format: False - random_channel: True - - !RandomExpand - ratio: 4 - prob: 0.5 - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop - aspect_ratio: NULL - cover_all_box: True - - !CutmixImage - alpha: 1.5 - beta: 1.5 - - !RandomFlipImage - prob: 0.5 - - !GridMaskOp - use_h: true - use_w: true - rotate: 1 - offset: false - ratio: 0.5 - mode: 1 - prob: 0.7 - upper_iter: 300000 - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512] - random_inter: True - resize_box: True - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123.675, 116.28, 103.53] - std: [58.395, 57.12, 57.375] - - !Permute - to_bgr: false - channel_first: true - - !Gt2TTFTarget - num_classes: 80 - down_ratio: 4 - - !PadBatch - pad_to_stride: 32 - batch_size: 12 - shuffle: true - worker_num: 8 - bufsize: 2 - use_process: false - cutmix_epoch: 200 - -EvalReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'im_id', 'scale_factor'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !Resize - target_dim: 320 - - !NormalizeImage - mean: [123.675, 116.28, 103.53] - std: [58.395, 57.12, 57.375] - is_scale: false - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 - drop_empty: false - worker_num: 2 - bufsize: 2 - -TestReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'im_id', 'scale_factor'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !Resize - interp: 1 - target_dim: 320 - - !NormalizeImage - mean: [123.675, 116.28, 103.53] - std: [58.395, 57.12, 57.375] - is_scale: false - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/anchor_free/ttfnet_darknet.yml b/static/configs/anchor_free/ttfnet_darknet.yml deleted file mode 100644 index d32224b6d24f6f758aa54d66f07b7c883da8a2ca..0000000000000000000000000000000000000000 --- a/static/configs/anchor_free/ttfnet_darknet.yml +++ /dev/null @@ -1,141 +0,0 @@ -architecture: TTFNet -use_gpu: true -max_iters: 15000 -log_iter: 20 -save_dir: output -snapshot_iter: 1000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar -weights: output/ttfnet_darknet/model_final -num_classes: 80 - -TTFNet: - backbone: DarkNet - ttf_head: TTFHead - -DarkNet: - norm_type: bn - norm_decay: 0.0004 - depth: 53 - freeze_at: 1 - -TTFHead: - head_conv: 128 - wh_conv: 64 - hm_head_conv_num: 2 - wh_head_conv_num: 2 - wh_offset_base: 16 - wh_loss: GiouLoss - -GiouLoss: - loss_weight: 5. - do_average: false - use_class_weight: false - -LearningRate: - base_lr: 0.015 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 11250 - - 13750 - - !LinearWarmup - start_factor: 0.2 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0004 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'ttf_heatmap', 'ttf_box_target', 'ttf_reg_weight'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: true - - !Resize - target_dim: 512 - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123.675, 116.28, 103.53] - std: [58.395, 57.12, 57.375] - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !Gt2TTFTarget - num_classes: 80 - down_ratio: 4 - - !PadBatch - pad_to_stride: 32 - batch_size: 12 - shuffle: true - worker_num: 8 - bufsize: 2 - use_process: true - -EvalReader: - inputs_def: - image_shape: [3, 512, 512] - fields: ['image', 'im_id', 'scale_factor'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !Resize - target_dim: 512 - - !NormalizeImage - mean: [123.675, 116.28, 103.53] - std: [58.395, 57.12, 57.375] - is_scale: false - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 - drop_empty: false - worker_num: 8 - bufsize: 16 - -TestReader: - inputs_def: - image_shape: [3, 512, 512] - fields: ['image', 'im_id', 'scale_factor'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !Resize - interp: 1 - target_dim: 512 - - !NormalizeImage - mean: [123.675, 116.28, 103.53] - std: [58.395, 57.12, 57.375] - is_scale: false - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/autoaugment/README.md b/static/configs/autoaugment/README.md deleted file mode 100644 index 0f960db95325b7b27a7b2b5535f94d6325f19373..0000000000000000000000000000000000000000 --- a/static/configs/autoaugment/README.md +++ /dev/null @@ -1,23 +0,0 @@ -# Learning Data Augmentation Strategies for Object Detection - -## Introduction - -- Learning Data Augmentation Strategies for Object Detection: [https://arxiv.org/abs/1906.11172](https://arxiv.org/abs/1906.11172) - -``` -@article{Zoph2019LearningDA, - title={Learning Data Augmentation Strategies for Object Detection}, - author={Barret Zoph and Ekin Dogus Cubuk and Golnaz Ghiasi and Tsung-Yi Lin and Jonathon Shlens and Quoc V. Le}, - journal={ArXiv}, - year={2019}, - volume={abs/1906.11172} -} -``` - - -## Model Zoo - -| Backbone | Type | AutoAug policy | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | -| :---------------------- | :-------------:| :-------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| ResNet50-vd-FPN | Faster | v1 | 2 | 3x | 22.800 | 39.9 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_aa_3x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/autoaugment/faster_rcnn_r50_vd_fpn_aa_3x.yml) | -| ResNet101-vd-FPN | Faster | v1 | 2 | 3x | 17.652 | 42.5 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_aa_3x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/autoaugment/faster_rcnn_r101_vd_fpn_aa_3x.yml) | diff --git a/static/configs/autoaugment/faster_rcnn_r101_vd_fpn_aa_3x.yml b/static/configs/autoaugment/faster_rcnn_r101_vd_fpn_aa_3x.yml deleted file mode 100644 index ba83c1ef623da24f6c002b84850b743f711c2a89..0000000000000000000000000000000000000000 --- a/static/configs/autoaugment/faster_rcnn_r101_vd_fpn_aa_3x.yml +++ /dev/null @@ -1,127 +0,0 @@ -architecture: FasterRCNN -max_iters: 270000 -snapshot_iter: 30000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -weights: output/faster_rcnn_r101_vd_fpn_aa_3x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [180000, 240000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !AutoAugmentImage - autoaug_type: v1 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_size: 2 - use_process: true diff --git a/static/configs/autoaugment/faster_rcnn_r50_vd_fpn_aa_3x.yml b/static/configs/autoaugment/faster_rcnn_r50_vd_fpn_aa_3x.yml deleted file mode 100644 index 887aea5bc66c8179214925241837cb23f90ec275..0000000000000000000000000000000000000000 --- a/static/configs/autoaugment/faster_rcnn_r50_vd_fpn_aa_3x.yml +++ /dev/null @@ -1,127 +0,0 @@ -architecture: FasterRCNN -max_iters: 270000 -snapshot_iter: 30000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/faster_rcnn_r50_vd_fpn_aa_3x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [180000, 240000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !AutoAugmentImage - autoaug_type: v1 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_size: 2 - use_process: true diff --git a/static/configs/cascade_mask_rcnn_r50_fpn_1x.yml b/static/configs/cascade_mask_rcnn_r50_fpn_1x.yml deleted file mode 100644 index cd1b466e9a41a78da2e74371b36fac2ae9a3335f..0000000000000000000000000000000000000000 --- a/static/configs/cascade_mask_rcnn_r50_fpn_1x.yml +++ /dev/null @@ -1,113 +0,0 @@ -architecture: CascadeMaskRCNN -use_gpu: true -max_iters: 180000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/cascade_mask_rcnn_r50_fpn_1x/model_final -num_classes: 81 - -CascadeMaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - mask_assigner: MaskAssigner - mask_head: MaskHead - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_hi: [0.5, 0.6, 0.7] - bg_thresh_lo: [0.0, 0.0, 0.0] - fg_fraction: 0.25 - fg_thresh: [0.5, 0.6, 0.7] - -MaskAssigner: - resolution: 28 - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_fpn_reader.yml' diff --git a/static/configs/cascade_rcnn_cls_aware_r101_vd_fpn_1x_softnms.yml b/static/configs/cascade_rcnn_cls_aware_r101_vd_fpn_1x_softnms.yml deleted file mode 100644 index 958c0d363386c1919f188650ded3184540bce7cd..0000000000000000000000000000000000000000 --- a/static/configs/cascade_rcnn_cls_aware_r101_vd_fpn_1x_softnms.yml +++ /dev/null @@ -1,109 +0,0 @@ -architecture: CascadeRCNNClsAware -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -weights: output/cascade_rcnn_cls_aware_r101_vd_fpn_1x_softnms/model_final -metric: COCO -num_classes: 81 - -CascadeRCNNClsAware: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 14 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - class_aware: True - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: MultiClassSoftNMS - -CascadeTwoFCHead: - mlp_dim: 1024 - -MultiClassSoftNMS: - score_threshold: 0.01 - keep_top_k: 300 - softnms_sigma: 0.5 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.0 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/cascade_rcnn_cls_aware_r101_vd_fpn_ms_test.yml b/static/configs/cascade_rcnn_cls_aware_r101_vd_fpn_ms_test.yml deleted file mode 100644 index f3f893014e14479d887f0a19c513c64bd3149d90..0000000000000000000000000000000000000000 --- a/static/configs/cascade_rcnn_cls_aware_r101_vd_fpn_ms_test.yml +++ /dev/null @@ -1,162 +0,0 @@ -architecture: CascadeRCNNClsAware -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -weights: output/cascade_rcnn_cls_aware_r101_vd_fpn_ms_test/model_final -metric: COCO -num_classes: 81 - -CascadeRCNNClsAware: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 14 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - class_aware: True - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeTwoFCHead: - mlp_dim: 1024 - -MultiScaleTEST: - score_thresh: 0.05 - nms_thresh: 0.5 - detections_per_im: 100 - enable_voting: true - vote_thresh: 0.9 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.0 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 - -EvalReader: - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - multi_scale: true - num_scales: 18 - use_flip: true - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_val2017.json - image_dir: val2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - - !MultiscaleTestResize - origin_target_size: 800 - origin_max_size: 1333 - target_size: - - 400 - - 500 - - 600 - - 700 - - 900 - - 1000 - - 1100 - - 1200 - max_size: 2000 - use_flip: true - - !Permute - channel_first: true - to_bgr: false - - !PadMultiScaleTest - pad_to_stride: 32 - worker_num: 2 diff --git a/static/configs/cascade_rcnn_r50_fpn_1x.yml b/static/configs/cascade_rcnn_r50_fpn_1x.yml deleted file mode 100644 index 7d255ba6f347ad91fb1b17ab10cfd2b413feb4dc..0000000000000000000000000000000000000000 --- a/static/configs/cascade_rcnn_r50_fpn_1x.yml +++ /dev/null @@ -1,106 +0,0 @@ -architecture: CascadeRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -weights: output/cascade_rcnn_r50_fpn_1x/model_final -metric: COCO -num_classes: 81 - -CascadeRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: affine_channel - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: b - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/cascade_rcnn_r50_fpn_1x_ms_test.yml b/static/configs/cascade_rcnn_r50_fpn_1x_ms_test.yml deleted file mode 100644 index 8b04aeeda73f30b5923cd9874e969eb8e977861b..0000000000000000000000000000000000000000 --- a/static/configs/cascade_rcnn_r50_fpn_1x_ms_test.yml +++ /dev/null @@ -1,161 +0,0 @@ -architecture: CascadeRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -weights: output/cascade_rcnn_r50_fpn_1x/model_final -metric: COCO -num_classes: 81 - -CascadeRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: affine_channel - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: b - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeTwoFCHead: - mlp_dim: 1024 - -MultiScaleTEST: - score_thresh: 0.05 - nms_thresh: 0.5 - detections_per_im: 100 - enable_voting: true - vote_thresh: 0.9 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 - -EvalReader: - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - multi_scale: true - num_scales: 18 - use_flip: true - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_val2017.json - image_dir: val2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - - !MultiscaleTestResize - origin_target_size: 800 - origin_max_size: 1333 - target_size: - - 400 - - 500 - - 600 - - 700 - - 900 - - 1000 - - 1100 - - 1200 - max_size: 2000 - use_flip: true - - !Permute - channel_first: true - to_bgr: false - - !PadMultiScaleTest - pad_to_stride: 32 - worker_num: 2 diff --git a/static/configs/dcn/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x.yml b/static/configs/dcn/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x.yml deleted file mode 100755 index 06d59aaff2ca5d5360dfeb35ceb6eebd3cf98895..0000000000000000000000000000000000000000 --- a/static/configs/dcn/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x.yml +++ /dev/null @@ -1,261 +0,0 @@ -architecture: CascadeMaskRCNN -max_iters: 300000 -snapshot_iter: 10 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_caffe_pretrained.tar -weights: output/cascade_mask_rcnn_dcn_se154_vd_fpn_gn_s1x/model_final -metric: COCO -num_classes: 81 - -CascadeMaskRCNN: - backbone: SENet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - mask_assigner: MaskAssigner - mask_head: MaskHead - -SENet: - depth: 152 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: bn - freeze_norm: True - variant: d - dcn_v2_stages: [3, 4, 5] - std_senet: True - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - freeze_norm: False - norm_type: gn - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - norm_type: gn - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_hi: [0.5, 0.6, 0.7] - bg_thresh_lo: [0.0, 0.0, 0.0] - fg_fraction: 0.25 - fg_thresh: [0.5, 0.6, 0.7] - -MaskAssigner: - resolution: 28 - -CascadeBBoxHead: - head: CascadeXConvNormHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeXConvNormHead: - norm_type: gn - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 280000] - - !LinearWarmup - start_factor: 0.01 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - # batch size per device - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_mask'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - image_dir: train2017 - anno_path: annotations/instances_train2017.json - sample_transforms: - - !DecodeImage - to_rgb: false - - !RandomFlipImage - is_mask_flip: true - is_normalized: false - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: False - mean: - - 102.9801 - - 115.9465 - - 122.7717 - std: - - 1.0 - - 1.0 - - 1.0 - - !ResizeImage - interp: 1 - target_size: - - 416 - - 448 - - 480 - - 512 - - 544 - - 576 - - 608 - - 640 - - 672 - - 704 - - 736 - - 768 - - 800 - - 832 - - 864 - - 896 - - 928 - - 960 - - 992 - - 1024 - - 1056 - - 1088 - - 1120 - - 1152 - - 1184 - - 1216 - - 1248 - - 1280 - - 1312 - - 1344 - - 1376 - - 1408 - max_size: 1600 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - worker_num: 8 - shuffle: true - -EvalReader: - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_val2017.json - image_dir: val2017 - sample_transforms: - - !DecodeImage - to_rgb: False - - !NormalizeImage - is_channel_first: false - is_scale: False - mean: - - 102.9801 - - 115.9465 - - 122.7717 - std: - - 1.0 - - 1.0 - - 1.0 - - !ResizeImage - interp: 1 - target_size: - - 800 - max_size: 1333 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - worker_num: 2 - drop_empty: false - -TestReader: - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: False - - !NormalizeImage - is_channel_first: false - is_scale: False - mean: - - 102.9801 - - 115.9465 - - 122.7717 - std: - - 1.0 - - 1.0 - - 1.0 - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - worker_num: 2 diff --git a/static/configs/dcn/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x_ms_test.yml b/static/configs/dcn/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x_ms_test.yml deleted file mode 100644 index 6b3476ec5d81d1a56c0bf35aebe45f24f8e51b29..0000000000000000000000000000000000000000 --- a/static/configs/dcn/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x_ms_test.yml +++ /dev/null @@ -1,279 +0,0 @@ -architecture: CascadeMaskRCNN -max_iters: 300000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_caffe_pretrained.tar -weights: output/cascade_mask_rcnn_dcn_se154_vd_fpn_gn_s1x/model_final -metric: COCO -num_classes: 81 - -CascadeMaskRCNN: - backbone: SENet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - mask_assigner: MaskAssigner - mask_head: MaskHead - -SENet: - depth: 152 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: bn - freeze_norm: True - variant: d - dcn_v2_stages: [3, 4, 5] - std_senet: True - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - freeze_norm: False - norm_type: gn - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - norm_type: gn - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_hi: [0.5, 0.6, 0.7] - bg_thresh_lo: [0.0, 0.0, 0.0] - fg_fraction: 0.25 - fg_thresh: [0.5, 0.6, 0.7] - -MaskAssigner: - resolution: 28 - -CascadeBBoxHead: - head: CascadeXConvNormHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeXConvNormHead: - norm_type: gn - -MultiScaleTEST: - score_thresh: 0.05 - nms_thresh: 0.5 - detections_per_im: 100 - enable_voting: true - vote_thresh: 0.9 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 280000] - - !LinearWarmup - start_factor: 0.01 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - # batch size per device - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_mask'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - image_dir: train2017 - anno_path: annotations/instances_train2017.json - sample_transforms: - - !DecodeImage - to_rgb: False - with_mixup: False - - !RandomFlipImage - is_mask_flip: true - is_normalized: false - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: False - mean: - - 102.9801 - - 115.9465 - - 122.7717 - std: - - 1.0 - - 1.0 - - 1.0 - - !ResizeImage - interp: 1 - target_size: - - 416 - - 448 - - 480 - - 512 - - 544 - - 576 - - 608 - - 640 - - 672 - - 704 - - 736 - - 768 - - 800 - - 832 - - 864 - - 896 - - 928 - - 960 - - 992 - - 1024 - - 1056 - - 1088 - - 1120 - - 1152 - - 1184 - - 1216 - - 1248 - - 1280 - - 1312 - - 1344 - - 1376 - - 1408 - max_size: 1600 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - worker_num: 8 - shuffle: true - -EvalReader: - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - multi_scale: true - # num_scale = (len(target_size) + 1) * (1 + use_flip) - num_scales: 18 - use_flip: true - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: False - - !NormalizeImage - is_channel_first: false - is_scale: False - mean: - - 102.9801 - - 115.9465 - - 122.7717 - std: - - 1.0 - - 1.0 - - 1.0 - - !MultiscaleTestResize - origin_target_size: 800 - origin_max_size: 1333 - target_size: - - 400 - - 500 - - 600 - - 700 - - 900 - - 1000 - - 1100 - - 1200 - max_size: 2000 - use_flip: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadMultiScaleTest - pad_to_stride: 32 - worker_num: 2 - -TestReader: - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: False - - !NormalizeImage - is_channel_first: false - is_scale: False - mean: - - 102.9801 - - 115.9465 - - 122.7717 - std: - - 1.0 - - 1.0 - - 1.0 - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 diff --git a/static/configs/dcn/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml b/static/configs/dcn/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml deleted file mode 100644 index 3330656919cbaa7ae4c1381d3db323764edb7e5a..0000000000000000000000000000000000000000 --- a/static/configs/dcn/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml +++ /dev/null @@ -1,212 +0,0 @@ -architecture: CascadeRCNN -max_iters: 460000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/CBResNet200_vd_pretrained.tar -weights: output/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms/model_final -metric: COCO -num_classes: 81 - -CascadeRCNN: - backbone: CBResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -CBResNet: - norm_type: bn - depth: 200 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - nonlocal_stages: [4] - repeat_num: 2 - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 14 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: MultiClassSoftNMS - -CascadeTwoFCHead: - mlp_dim: 1024 - -MultiClassSoftNMS: - score_threshold: 0.01 - keep_top_k: 300 - softnms_sigma: 0.5 - -LearningRate: - base_lr: 0.005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [340000, 440000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: True - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - - !ResizeImage - interp: 1 - target_size: [416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152, 1184, 1216, 1248, 1280, 1312, 1344, 1376, 1408] - max_size: 1600 - use_cv2: true - - !Permute - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - worker_num: 2 - shuffle: true - -EvalReader: - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: False - - !NormalizeImage - is_channel_first: false - is_scale: True - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - - !ResizeImage - interp: 1 - target_size: - - 1200 - max_size: 2000 - use_cv2: true - - !Permute - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - worker_num: 2 diff --git a/static/configs/dcn/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml b/static/configs/dcn/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml deleted file mode 100644 index 29b204cd733af120333c3e2e87fe97e8236f03d1..0000000000000000000000000000000000000000 --- a/static/configs/dcn/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml +++ /dev/null @@ -1,214 +0,0 @@ -architecture: CascadeRCNNClsAware -max_iters: 460000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet200_vd_pretrained.tar -weights: output/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/model_final -metric: COCO -num_classes: 81 - -CascadeRCNNClsAware: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 200 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - nonlocal_stages: [4] - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 14 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - class_aware: True - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: MultiClassSoftNMS - -CascadeTwoFCHead: - mlp_dim: 1024 - -MultiClassSoftNMS: - score_threshold: 0.01 - keep_top_k: 300 - softnms_sigma: 0.5 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [340000, 440000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: True - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - - !ResizeImage - interp: 1 - target_size: [416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152, 1184, 1216, 1248, 1280, 1312, 1344, 1376, 1408] - max_size: 1800 - use_cv2: true - - !Permute - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - batch_size: 1 - shuffle: true - drop_last: false - worker_num: 2 - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: False - - !NormalizeImage - is_channel_first: false - is_scale: True - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - - !ResizeImage - interp: 1 - target_size: - - 1200 - max_size: 2000 - use_cv2: true - - !Permute - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - batch_size: 1 - worker_num: 2 - drop_empty: false - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - worker_num: 2 diff --git a/static/configs/dcn/cascade_rcnn_dcn_r101_vd_fpn_1x.yml b/static/configs/dcn/cascade_rcnn_dcn_r101_vd_fpn_1x.yml deleted file mode 100644 index d4597b328a879885b62c30aaa90e82e8f17caa58..0000000000000000000000000000000000000000 --- a/static/configs/dcn/cascade_rcnn_dcn_r101_vd_fpn_1x.yml +++ /dev/null @@ -1,107 +0,0 @@ -architecture: CascadeRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -weights: output/cascade_rcnn_dcn_r101_vd_fpn_1x/model_final -metric: COCO -num_classes: 81 - -CascadeRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/dcn/cascade_rcnn_dcn_r50_fpn_1x.yml b/static/configs/dcn/cascade_rcnn_dcn_r50_fpn_1x.yml deleted file mode 100644 index ec12ec0c132927a7439e59326f44020983f25465..0000000000000000000000000000000000000000 --- a/static/configs/dcn/cascade_rcnn_dcn_r50_fpn_1x.yml +++ /dev/null @@ -1,107 +0,0 @@ -architecture: CascadeRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -weights: output/cascade_rcnn_dcn_r50_fpn_1x/model_final -metric: COCO -num_classes: 81 - -CascadeRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: b - dcn_v2_stages: [3, 4, 5] - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x.yml b/static/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x.yml deleted file mode 100644 index 80f059961c2decf4f4a2bb8af86198a049e4de6d..0000000000000000000000000000000000000000 --- a/static/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x.yml +++ /dev/null @@ -1,109 +0,0 @@ -architecture: CascadeRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar -weights: output/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x/model_final -metric: COCO -num_classes: 81 - -CascadeRCNN: - backbone: ResNeXt - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNeXt: - norm_type: bn - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/dcn/faster_rcnn_dcn_r101_vd_fpn_1x.yml b/static/configs/dcn/faster_rcnn_dcn_r101_vd_fpn_1x.yml deleted file mode 100644 index be99fa74b57dbb2d9114fd2392a114120fbdccec..0000000000000000000000000000000000000000 --- a/static/configs/dcn/faster_rcnn_dcn_r101_vd_fpn_1x.yml +++ /dev/null @@ -1,108 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -weights: output/faster_rcnn_dcn_r101_vd_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - # batch size per device - batch_size: 2 diff --git a/static/configs/dcn/faster_rcnn_dcn_r50_fpn_1x.yml b/static/configs/dcn/faster_rcnn_dcn_r50_fpn_1x.yml deleted file mode 100644 index 828ec4d40714d66ed2e05fffd14436516bd231d4..0000000000000000000000000000000000000000 --- a/static/configs/dcn/faster_rcnn_dcn_r50_fpn_1x.yml +++ /dev/null @@ -1,108 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -use_gpu: true -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/faster_rcnn_dcn_r50_fpn_1x/model_final -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - norm_type: bn - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - dcn_v2_stages: [3, 4, 5] - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_lo: 0.0 - bg_thresh_hi: 0.5 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - # batch size per device - batch_size: 2 diff --git a/static/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_2x.yml b/static/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_2x.yml deleted file mode 100644 index 39d45b2476e358e3223d029e12e140b6bd1cd692..0000000000000000000000000000000000000000 --- a/static/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_2x.yml +++ /dev/null @@ -1,108 +0,0 @@ -architecture: FasterRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/faster_rcnn_dcn_r50_vd_fpn_2x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - # batch size per device - batch_size: 2 diff --git a/static/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x.yml b/static/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x.yml deleted file mode 100644 index 4f537a05d7b4e5a94390a3874f4a50121c33ba46..0000000000000000000000000000000000000000 --- a/static/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x.yml +++ /dev/null @@ -1,108 +0,0 @@ -architecture: FasterRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar -weights: output/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNeXt - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNeXt: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: bn - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - - -_READER_: '../faster_fpn_reader.yml' diff --git a/static/configs/dcn/mask_rcnn_dcn_r101_vd_fpn_1x.yml b/static/configs/dcn/mask_rcnn_dcn_r101_vd_fpn_1x.yml deleted file mode 100644 index caee6d5ef0fee8cb4bb33b24a5b8576f7b53e914..0000000000000000000000000000000000000000 --- a/static/configs/dcn/mask_rcnn_dcn_r101_vd_fpn_1x.yml +++ /dev/null @@ -1,113 +0,0 @@ -architecture: MaskRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -weights: output/mask_rcnn_dcn_r101_vd_fpn_1x/model_final -metric: COCO -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../mask_fpn_reader.yml' diff --git a/static/configs/dcn/mask_rcnn_dcn_r50_fpn_1x.yml b/static/configs/dcn/mask_rcnn_dcn_r50_fpn_1x.yml deleted file mode 100644 index d952967345bd6fb240aad84a51f2e545b61b9721..0000000000000000000000000000000000000000 --- a/static/configs/dcn/mask_rcnn_dcn_r50_fpn_1x.yml +++ /dev/null @@ -1,113 +0,0 @@ -architecture: MaskRCNN -use_gpu: true -max_iters: 180000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/mask_rcnn_dcn_r50_fpn_1x/model_final -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - dcn_v2_stages: [3, 4, 5] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - - -_READER_: '../mask_fpn_reader.yml' diff --git a/static/configs/dcn/mask_rcnn_dcn_r50_vd_fpn_2x.yml b/static/configs/dcn/mask_rcnn_dcn_r50_vd_fpn_2x.yml deleted file mode 100644 index 4169c12d20308dc7994863035ba3130e38d20d18..0000000000000000000000000000000000000000 --- a/static/configs/dcn/mask_rcnn_dcn_r50_vd_fpn_2x.yml +++ /dev/null @@ -1,113 +0,0 @@ -architecture: MaskRCNN -use_gpu: true -max_iters: 360000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -metric: COCO -weights: output/mask_rcnn_dcn_r50_vd_fpn_2x/model_final -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../mask_fpn_reader.yml' diff --git a/static/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x.yml b/static/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x.yml deleted file mode 100644 index d5a665cafcab391a4661db31090e29452b3546ec..0000000000000000000000000000000000000000 --- a/static/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x.yml +++ /dev/null @@ -1,115 +0,0 @@ -architecture: MaskRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar -weights: output/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x/model_final -metric: COCO -num_classes: 81 - -MaskRCNN: - backbone: ResNeXt - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNeXt: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: bn - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../mask_fpn_reader.yml' diff --git a/static/configs/dcn/yolov3_enhance_reader.yml b/static/configs/dcn/yolov3_enhance_reader.yml deleted file mode 100644 index 0cb379d2542587d04fee3cd68d42adf22ec18031..0000000000000000000000000000000000000000 --- a/static/configs/dcn/yolov3_enhance_reader.yml +++ /dev/null @@ -1,106 +0,0 @@ -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - use_fine_grained_loss: true - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: False - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - downsample_ratios: [32, 16, 8] - batch_size: 8 - shuffle: true - drop_last: true - worker_num: 8 - bufsize: 16 - use_process: true - -EvalReader: - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'im_size', 'im_id'] - num_max_boxes: 50 - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_val2017.json - image_dir: val2017 - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: false - - !ResizeImage - interp: 2 - target_size: 608 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: False - is_channel_first: false - - !PadBox - num_max_boxes: 50 - - !Permute - to_bgr: false - channel_first: True - batch_size: 8 - drop_empty: false - worker_num: 8 - bufsize: 16 - -TestReader: - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: false - - !ResizeImage - interp: 2 - target_size: 608 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: False - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/dcn/yolov3_r50vd_dcn.yml b/static/configs/dcn/yolov3_r50vd_dcn.yml deleted file mode 100755 index cdf8c23adb9fde7158423da6be0c5a37de5cd832..0000000000000000000000000000000000000000 --- a/static/configs/dcn/yolov3_r50vd_dcn.yml +++ /dev/null @@ -1,66 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 500000 -log_iter: 20 -save_dir: output -snapshot_iter: 20000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/yolov3_r50vd_dcn/model_final -num_classes: 80 -use_fine_grained_loss: false - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: '../yolov3_reader.yml' diff --git a/static/configs/dcn/yolov3_r50vd_dcn_db_iouaware_obj365_pretrained_coco.yml b/static/configs/dcn/yolov3_r50vd_dcn_db_iouaware_obj365_pretrained_coco.yml deleted file mode 100755 index 287442f8362c8412fbf8652b05e9151152acf435..0000000000000000000000000000000000000000 --- a/static/configs/dcn/yolov3_r50vd_dcn_db_iouaware_obj365_pretrained_coco.yml +++ /dev/null @@ -1,83 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 85000 -log_iter: 1 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/ResNet50_vd_dcn_db_obj365_pretrained.tar -weights: output/yolov3_r50vd_dcn_db_iouaware_obj365_pretrained_coco/model_final -num_classes: 80 -use_fine_grained_loss: true - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - iou_aware: true - iou_aware_factor: 0.4 - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - iou_aware_loss: IouAwareLoss - -IouLoss: - loss_weight: 2.5 - max_height: 608 - max_width: 608 - -IouAwareLoss: - loss_weight: 1.0 - max_height: 608 - max_width: 608 - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 55000 - - 75000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_enhance_reader.yml' diff --git a/static/configs/dcn/yolov3_r50vd_dcn_db_iouloss_obj365_pretrained_coco.yml b/static/configs/dcn/yolov3_r50vd_dcn_db_iouloss_obj365_pretrained_coco.yml deleted file mode 100755 index 26e434a22cd2ed859f64c77691d3338de3758467..0000000000000000000000000000000000000000 --- a/static/configs/dcn/yolov3_r50vd_dcn_db_iouloss_obj365_pretrained_coco.yml +++ /dev/null @@ -1,75 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 85000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/ResNet50_vd_dcn_db_obj365_pretrained.tar -weights: output/yolov3_r50vd_dcn_db_iouloss_obj365_pretrained_coco/model_final -num_classes: 80 -use_fine_grained_loss: true - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - -IouLoss: - loss_weight: 2.5 - max_height: 608 - max_width: 608 - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 55000 - - 75000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_enhance_reader.yml' diff --git a/static/configs/dcn/yolov3_r50vd_dcn_db_obj365_pretrained_coco.yml b/static/configs/dcn/yolov3_r50vd_dcn_db_obj365_pretrained_coco.yml deleted file mode 100755 index c274b64ab08aa2600f50642d64f525fda7e0b481..0000000000000000000000000000000000000000 --- a/static/configs/dcn/yolov3_r50vd_dcn_db_obj365_pretrained_coco.yml +++ /dev/null @@ -1,70 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 85000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/ResNet50_vd_dcn_db_obj365_pretrained.tar -weights: output/yolov3_r50vd_dcn_db_obj365_pretrained_coco/model_final -num_classes: 80 -use_fine_grained_loss: true - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - drop_block: true - keep_prob: 0.94 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - use_fine_grained_loss: true - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 55000 - - 75000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_enhance_reader.yml' diff --git a/static/configs/dcn/yolov3_r50vd_dcn_obj365_pretrained_coco.yml b/static/configs/dcn/yolov3_r50vd_dcn_obj365_pretrained_coco.yml deleted file mode 100755 index 31d3fcd01c4c4fe27cfd2cb2b923a2d586115d6e..0000000000000000000000000000000000000000 --- a/static/configs/dcn/yolov3_r50vd_dcn_obj365_pretrained_coco.yml +++ /dev/null @@ -1,68 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 85000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/ResNet50_vd_dcn_db_obj365_pretrained.tar -weights: output/yolov3_r50vd_dcn_db_obj365_pretrained_coco/model_final -num_classes: 80 -use_fine_grained_loss: true - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - use_fine_grained_loss: true - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 55000 - - 75000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_enhance_reader.yml' diff --git a/static/configs/efficientdet_d0.yml b/static/configs/efficientdet_d0.yml deleted file mode 100644 index 10e7625e24e5263d34743da41820c100ee8dea86..0000000000000000000000000000000000000000 --- a/static/configs/efficientdet_d0.yml +++ /dev/null @@ -1,157 +0,0 @@ -architecture: EfficientDet -max_iters: 281250 -use_gpu: true -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB0_pretrained.tar -weights: output/efficientdet_d0/model_final -log_iter: 20 -snapshot_iter: 10000 -metric: COCO -save_dir: output -num_classes: 81 -use_ema: true -ema_decay: 0.9998 - -EfficientDet: - backbone: EfficientNet - fpn: BiFPN - efficient_head: EfficientHead - anchor_grid: AnchorGrid - box_loss_weight: 50. - -EfficientNet: - # norm_type: sync_bn - # TODO - norm_type: bn - scale: b0 - use_se: true - -BiFPN: - num_chan: 64 - repeat: 3 - levels: 5 - -EfficientHead: - repeat: 3 - num_chan: 64 - prior_prob: 0.01 - num_anchors: 9 - gamma: 1.5 - alpha: 0.25 - delta: 0.1 - output_decoder: - score_thresh: 0.05 # originally 0. - nms_thresh: 0.5 - pre_nms_top_n: 1000 # originally 5000 - detections_per_im: 100 - nms_eta: 1.0 - -AnchorGrid: - anchor_base_scale: 4 - num_scales: 3 - aspect_ratios: [[1, 1], [1.4, 0.7], [0.7, 1.4]] - -LearningRate: - base_lr: 0.16 - schedulers: - - !CosineDecayWithSkip - total_steps: 281250 - skip_steps: 938 - - !LinearWarmup - start_factor: 0.05 - steps: 938 - -OptimizerBuilder: - clip_grad_by_norm: 10. - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.00004 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_id', 'fg_num', 'gt_label', 'gt_target'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !RandomScaledCrop - target_dim: 512 - scale_range: [.1, 2.] - interp: 1 - - !Permute - to_bgr: false - channel_first: true - - !TargetAssign - image_size: 512 - batch_size: 16 - shuffle: true - worker_num: 32 - bufsize: 16 - use_process: true - drop_empty: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeAndPad - target_dim: 512 - interp: 1 - - !Permute - channel_first: true - to_bgr: false - drop_empty: false - batch_size: 16 - shuffle: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id'] - image_shape: [3, 512, 512] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeAndPad - target_dim: 512 - interp: 1 - - !Permute - channel_first: true - to_bgr: false - batch_size: 16 - shuffle: false diff --git a/static/configs/face_detection/README.md b/static/configs/face_detection/README.md deleted file mode 100644 index 527aa708ce0e9a135e8c88cdc03f717921ab2a55..0000000000000000000000000000000000000000 --- a/static/configs/face_detection/README.md +++ /dev/null @@ -1,2 +0,0 @@ -**文档教程请参考:** [FACE_DETECTION.md](../../docs/featured_model/FACE_DETECTION.md)
    -**English document please refer:** [FACE_DETECTION_en.md](../../docs/featured_model/FACE_DETECTION_en.md) diff --git a/static/configs/face_detection/blazeface.yml b/static/configs/face_detection/blazeface.yml deleted file mode 100644 index c02b6e3c73980bff057747b3299b8b5552d0db21..0000000000000000000000000000000000000000 --- a/static/configs/face_detection/blazeface.yml +++ /dev/null @@ -1,121 +0,0 @@ -architecture: BlazeFace -max_iters: 320000 -pretrain_weights: -use_gpu: true -snapshot_iter: 10000 -log_iter: 20 -metric: WIDERFACE -save_dir: output -weights: output/blazeface/model_final -# 1(label_class) + 1(background) -num_classes: 2 - -BlazeFace: - backbone: BlazeNet - output_decoder: - keep_top_k: 750 - nms_threshold: 0.3 - nms_top_k: 5000 - score_threshold: 0.01 - min_sizes: [[16.,24.], [32., 48., 64., 80., 96., 128.]] - use_density_prior_box: false - -BlazeNet: - with_extra_blocks: true - lite_edition: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 300000] - -OptimizerBuilder: - optimizer: - momentum: 0.0 - type: RMSPropOptimizer - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 640, 640] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_train_bbx_gt.txt - image_dir: WIDER_train/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !ExpandImage - max_ratio: 4 - prob: 0.5 - - !CropImageWithDataAchorSampling - anchor_sampler: - - [1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2, 0.0] - batch_sampler: - - [1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - target_size: 640 - - !RandomInterpImage - target_size: 640 - - !RandomFlipImage - is_normalized: true - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [127.502231, 127.502231, 127.502231] - batch_size: 8 - use_process: true - worker_num: 8 - shuffle: true - -EvalReader: - inputs_def: - fields: ['image', 'im_id'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_val_bbx_gt.txt - image_dir: WIDER_val/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123, 117, 104] - std: [127.502231, 127.502231, 127.502231] - - !Permute {} - batch_size: 1 - -TestReader: - inputs_def: - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123, 117, 104] - std: [127.502231, 127.502231, 127.502231] - - !Permute {} - batch_size: 1 diff --git a/static/configs/face_detection/blazeface_keypoint.yml b/static/configs/face_detection/blazeface_keypoint.yml deleted file mode 100644 index b20cc5bbb55dfda91ff8d5e77e41adfec4a960c0..0000000000000000000000000000000000000000 --- a/static/configs/face_detection/blazeface_keypoint.yml +++ /dev/null @@ -1,130 +0,0 @@ -architecture: BlazeFace -max_iters: 160000 -pretrain_weights: -use_gpu: true -snapshot_iter: 10000 -log_iter: 20 -metric: WIDERFACE -save_dir: output -weights: output/blazeface_keypoint/model_final.pdparams -# 1(label_class) + 1(background) -num_classes: 2 -with_lmk: true - -BlazeFace: - backbone: BlazeNet - output_decoder: - keep_top_k: 750 - nms_threshold: 0.3 - nms_top_k: 5000 - score_threshold: 0.01 - min_sizes: [[16.,24.], [32., 48., 64., 80., 96., 128.]] - use_density_prior_box: false - lmk_loss: - overlap_threshold: 0.35 - neg_overlap: 0.35 - -BlazeNet: - with_extra_blocks: true - lite_edition: false - -LearningRate: - base_lr: 0.002 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 150000] - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 640, 640] - fields: ['image', 'gt_bbox', 'gt_class', 'gt_keypoint', 'keypoint_ignore'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_train_bbx_lmk_gt.txt - image_dir: WIDER_train/images - with_lmk: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !ExpandImage - max_ratio: 4 - prob: 0.5 - - !CropImageWithDataAchorSampling - anchor_sampler: - - [1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2, 0.0] - batch_sampler: - - [1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - target_size: 640 - - !ResizeImage - target_size: 640 - interp: 1 - - !RandomInterpImage - target_size: 640 - - !RandomFlipImage - is_normalized: true - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [127.502231, 127.502231, 127.502231] - batch_size: 16 - use_process: true - worker_num: 8 - shuffle: true - -EvalReader: - inputs_def: - fields: ['image', 'im_id'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_val_bbx_gt.txt - image_dir: WIDER_val/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [127.502231, 127.502231, 127.502231] - batch_size: 1 - -TestReader: - inputs_def: - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - target_size: 640 - interp: 1 - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [127.502231, 127.502231, 127.502231] - batch_size: 1 diff --git a/static/configs/face_detection/blazeface_nas.yml b/static/configs/face_detection/blazeface_nas.yml deleted file mode 100644 index be645502fea725fdaea1d140e21adfbed5e2e4ca..0000000000000000000000000000000000000000 --- a/static/configs/face_detection/blazeface_nas.yml +++ /dev/null @@ -1,123 +0,0 @@ -architecture: BlazeFace -max_iters: 320000 -pretrain_weights: -use_gpu: true -snapshot_iter: 10000 -log_iter: 20 -metric: WIDERFACE -save_dir: output -weights: output/blazeface_nas/model_final -# 1(label_class) + 1(background) -num_classes: 2 - -BlazeFace: - backbone: BlazeNet - output_decoder: - keep_top_k: 750 - nms_threshold: 0.3 - nms_top_k: 5000 - score_threshold: 0.01 - min_sizes: [[16.,24.], [32., 48., 64., 80., 96., 128.]] - use_density_prior_box: false - -BlazeNet: - blaze_filters: [[12, 12], [12, 12, 2], [12, 12]] - double_blaze_filters: [[12, 16, 24, 2], [24, 12, 24], [24, 16, 72, 2], [72, 12, 72]] - with_extra_blocks: true - lite_edition: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 300000] - -OptimizerBuilder: - optimizer: - momentum: 0.0 - type: RMSPropOptimizer - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 640, 640] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_train_bbx_gt.txt - image_dir: WIDER_train/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !ExpandImage - max_ratio: 4 - prob: 0.5 - - !CropImageWithDataAchorSampling - anchor_sampler: - - [1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2, 0.0] - batch_sampler: - - [1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - target_size: 640 - - !RandomInterpImage - target_size: 640 - - !RandomFlipImage - is_normalized: true - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [127.502231, 127.502231, 127.502231] - batch_size: 8 - use_process: true - worker_num: 8 - shuffle: true - -EvalReader: - inputs_def: - fields: ['image', 'im_id'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_val_bbx_gt.txt - image_dir: WIDER_val/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123, 117, 104] - std: [127.502231, 127.502231, 127.502231] - - !Permute {} - batch_size: 1 - -TestReader: - inputs_def: - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123, 117, 104] - std: [127.502231, 127.502231, 127.502231] - - !Permute {} - batch_size: 1 diff --git a/static/configs/face_detection/blazeface_nas_v2.yml b/static/configs/face_detection/blazeface_nas_v2.yml deleted file mode 100644 index 1741e530b37e9c53c02742fdd23292d018fcbadc..0000000000000000000000000000000000000000 --- a/static/configs/face_detection/blazeface_nas_v2.yml +++ /dev/null @@ -1,123 +0,0 @@ -architecture: BlazeFace -max_iters: 320000 -pretrain_weights: -use_gpu: true -snapshot_iter: 10000 -log_iter: 20 -metric: WIDERFACE -save_dir: output -weights: output/blazeface_nas_v2/model_final -# 1(label_class) + 1(background) -num_classes: 2 - -BlazeFace: - backbone: BlazeNet - output_decoder: - keep_top_k: 750 - nms_threshold: 0.3 - nms_top_k: 5000 - score_threshold: 0.01 - min_sizes: [[16.,24.], [32., 48., 64., 80., 96., 128.]] - use_density_prior_box: false - -BlazeNet: - blaze_filters: [[12, 12], [12, 12, 2], [12, 12]] - double_blaze_filters: [[12, 16, 32, 2], [32, 32, 32], [32, 16, 72, 2], [72, 24, 72]] - with_extra_blocks: true - lite_edition: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 300000] - -OptimizerBuilder: - optimizer: - momentum: 0.0 - type: RMSPropOptimizer - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 640, 640] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_train_bbx_gt.txt - image_dir: WIDER_train/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !ExpandImage - max_ratio: 4 - prob: 0.5 - - !CropImageWithDataAchorSampling - anchor_sampler: - - [1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2, 0.0] - batch_sampler: - - [1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - target_size: 640 - - !RandomInterpImage - target_size: 640 - - !RandomFlipImage - is_normalized: true - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [127.502231, 127.502231, 127.502231] - batch_size: 8 - use_process: true - worker_num: 8 - shuffle: true - -EvalReader: - inputs_def: - fields: ['image', 'im_id'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_val_bbx_gt.txt - image_dir: WIDER_val/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123, 117, 104] - std: [127.502231, 127.502231, 127.502231] - - !Permute {} - batch_size: 1 - -TestReader: - inputs_def: - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123, 117, 104] - std: [127.502231, 127.502231, 127.502231] - - !Permute {} - batch_size: 1 diff --git a/static/configs/face_detection/faceboxes.yml b/static/configs/face_detection/faceboxes.yml deleted file mode 100644 index 9f6fcfb79191536cd05acfa8a7202f8df6fc10c4..0000000000000000000000000000000000000000 --- a/static/configs/face_detection/faceboxes.yml +++ /dev/null @@ -1,122 +0,0 @@ -architecture: FaceBoxes -pretrain_weights: -use_gpu: true -max_iters: 320000 -snapshot_iter: 10000 -log_iter: 20 -metric: WIDERFACE -save_dir: output -weights: output/faceboxes/model_final -# 1(label_class) + 1(background) -num_classes: 2 - -FaceBoxes: - backbone: FaceBoxNet - densities: [[4, 2, 1], [1], [1]] - fixed_sizes: [[32., 64., 128.], [256.], [512.]] - output_decoder: - keep_top_k: 750 - nms_threshold: 0.3 - nms_top_k: 5000 - score_threshold: 0.01 - -FaceBoxNet: - with_extra_blocks: true - lite_edition: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 300000] - -OptimizerBuilder: - optimizer: - momentum: 0.0 - type: RMSPropOptimizer - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - batch_size: 8 - use_process: True - worker_num: 8 - shuffle: true - inputs_def: - image_shape: [3, 640, 640] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_train_bbx_gt.txt - image_dir: WIDER_train/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !ExpandImage - max_ratio: 4 - prob: 0.5 - - !CropImageWithDataAchorSampling - anchor_sampler: - - [1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2, 0.0] - batch_sampler: - - [1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - target_size: 640 - - !RandomInterpImage - target_size: 640 - - !RandomFlipImage - is_normalized: true - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [127.502231, 127.502231, 127.502231] - -EvalReader: - batch_size: 1 - use_process: false - inputs_def: - fields: ['image', 'im_id'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_val_bbx_gt.txt - image_dir: WIDER_val/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123, 117, 104] - std: [127.502231, 127.502231, 127.502231] - - !Permute {} - -TestReader: - inputs_def: - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123, 117, 104] - std: [127.502231, 127.502231, 127.502231] - - !Permute {} - batch_size: 1 diff --git a/static/configs/face_detection/faceboxes_lite.yml b/static/configs/face_detection/faceboxes_lite.yml deleted file mode 100644 index ca8d37a5c68eccbb241eb61ec5756e0465306d89..0000000000000000000000000000000000000000 --- a/static/configs/face_detection/faceboxes_lite.yml +++ /dev/null @@ -1,123 +0,0 @@ -architecture: FaceBoxes -pretrain_weights: -use_gpu: true -max_iters: 320000 -snapshot_iter: 10000 -log_iter: 20 -metric: WIDERFACE -save_dir: output -weights: output/faceboxes_lite/model_final -# 1(label_class) + 1(background) -num_classes: 2 - -FaceBoxes: - backbone: FaceBoxNet - densities: [[2, 1, 1], [1, 1]] - fixed_sizes: [[16., 32., 64.], [96., 128.]] - output_decoder: - keep_top_k: 750 - nms_threshold: 0.3 - nms_top_k: 5000 - score_threshold: 0.01 - -FaceBoxNet: - with_extra_blocks: true - lite_edition: true - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 300000] - -OptimizerBuilder: - optimizer: - momentum: 0.0 - type: RMSPropOptimizer - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - batch_size: 8 - use_process: True - worker_num: 8 - shuffle: true - inputs_def: - image_shape: [3, 640, 640] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_train_bbx_gt.txt - image_dir: WIDER_train/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !ExpandImage - max_ratio: 4 - prob: 0.5 - - !CropImageWithDataAchorSampling - anchor_sampler: - - [1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2, 0.0] - batch_sampler: - - [1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - - [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0] - target_size: 640 - - !RandomInterpImage - target_size: 640 - - !RandomFlipImage - is_normalized: true - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [127.502231, 127.502231, 127.502231] - -EvalReader: - batch_size: 1 - use_process: false - inputs_def: - fields: ['image', 'im_id'] - dataset: - !WIDERFaceDataSet - dataset_dir: dataset/wider_face - anno_path: wider_face_split/wider_face_val_bbx_gt.txt - image_dir: WIDER_val/images - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123, 117, 104] - std: [127.502231, 127.502231, 127.502231] - - !Permute {} - - -TestReader: - inputs_def: - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: false - mean: [123, 117, 104] - std: [127.502231, 127.502231, 127.502231] - - !Permute {} - batch_size: 1 diff --git a/static/configs/faster_fpn_reader.yml b/static/configs/faster_fpn_reader.yml deleted file mode 100644 index 1ddac93b4402c7fcc6488ade52f3a61cc7769d44..0000000000000000000000000000000000000000 --- a/static/configs/faster_fpn_reader.yml +++ /dev/null @@ -1,101 +0,0 @@ -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 1 - shuffle: true - worker_num: 2 - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'im_shape', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false diff --git a/static/configs/faster_rcnn_cbr101_vd_dual_fpn_1x.yml b/static/configs/faster_rcnn_cbr101_vd_dual_fpn_1x.yml deleted file mode 100644 index 5005c8a36f707992f011db49ac5f9f38b2bb4c47..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_cbr101_vd_dual_fpn_1x.yml +++ /dev/null @@ -1,108 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/CBResNet101_vd_pretrained.tar -weights: output/faster_rcnn_cbr101_vd_dual_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: CBResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -CBResNet: - norm_type: bn - norm_decay: 0. - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - repeat_num: 2 - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/faster_rcnn_cbr50_vd_dual_fpn_1x.yml b/static/configs/faster_rcnn_cbr50_vd_dual_fpn_1x.yml deleted file mode 100644 index 4b763af7279beec1c04aed023c9e98039fc0efc7..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_cbr50_vd_dual_fpn_1x.yml +++ /dev/null @@ -1,109 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/CBResNet50_vd_pretrained.tar -weights: output/faster_rcnn_cbr50_vd_dual_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: CBResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -CBResNet: - norm_type: bn - norm_decay: 0. - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - repeat_num: 2 - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - # batch size per device - batch_size: 2 diff --git a/static/configs/faster_rcnn_r101_1x.yml b/static/configs/faster_rcnn_r101_1x.yml deleted file mode 100644 index 06f460f0dc02d36e74b685db75094ebef5da80d2..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r101_1x.yml +++ /dev/null @@ -1,91 +0,0 @@ -architecture: FasterRCNN -use_gpu: true -max_iters: 180000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar -metric: COCO -weights: output/faster_rcnn_r101_1x/model_final -num_classes: 81 - -FasterRCNN: - backbone: ResNet - rpn_head: RPNHead - roi_extractor: RoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - norm_type: affine_channel - depth: 101 - feature_maps: 4 - freeze_at: 2 - -ResNetC5: - depth: 101 - norm_type: affine_channel - -RPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - use_random: true - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 12000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 6000 - post_nms_top_n: 1000 - -RoIAlign: - resolution: 14 - sampling_ratio: 0 - spatial_scale: 0.0625 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: ResNetC5 - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_reader.yml' diff --git a/static/configs/faster_rcnn_r101_fpn_1x.yml b/static/configs/faster_rcnn_r101_fpn_1x.yml deleted file mode 100644 index 6bc0d3d10e569259304b863280b8c4271c3628bc..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r101_fpn_1x.yml +++ /dev/null @@ -1,103 +0,0 @@ -architecture: FasterRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar -weights: output/faster_rcnn_r101_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' diff --git a/static/configs/faster_rcnn_r101_fpn_2x.yml b/static/configs/faster_rcnn_r101_fpn_2x.yml deleted file mode 100644 index f9ce45b59525e96f6a75c61e2ad958fe3f067d2b..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r101_fpn_2x.yml +++ /dev/null @@ -1,103 +0,0 @@ -architecture: FasterRCNN -max_iters: 360000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar -weights: output/faster_rcnn_r101_fpn_2x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' diff --git a/static/configs/faster_rcnn_r101_vd_fpn_1x.yml b/static/configs/faster_rcnn_r101_vd_fpn_1x.yml deleted file mode 100644 index 464a325dc220a7a87e6d05ec94b81504f621f025..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r101_vd_fpn_1x.yml +++ /dev/null @@ -1,104 +0,0 @@ -architecture: FasterRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -weights: output/faster_rcnn_r101_vd_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' diff --git a/static/configs/faster_rcnn_r101_vd_fpn_2x.yml b/static/configs/faster_rcnn_r101_vd_fpn_2x.yml deleted file mode 100644 index ee5ea586c68a3ff7bf91f3009eec08b552b14561..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r101_vd_fpn_2x.yml +++ /dev/null @@ -1,104 +0,0 @@ -architecture: FasterRCNN -max_iters: 360000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -weights: output/faster_rcnn_r101_vd_fpn_2x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' diff --git a/static/configs/faster_rcnn_r34_fpn_1x.yml b/static/configs/faster_rcnn_r34_fpn_1x.yml deleted file mode 100644 index 1ea388e088cfe4d4cc328a8fb42c916df97fdc22..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r34_fpn_1x.yml +++ /dev/null @@ -1,106 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -use_gpu: true -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar -metric: COCO -weights: output/faster_rcnn_r34_fpn_1x/model_final -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - norm_type: bn - norm_decay: 0. - depth: 34 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_lo: 0.0 - bg_thresh_hi: 0.5 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/faster_rcnn_r34_vd_fpn_1x.yml b/static/configs/faster_rcnn_r34_vd_fpn_1x.yml deleted file mode 100644 index b176aad69e0478ed81a50b66a5c625fcd109dc36..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r34_vd_fpn_1x.yml +++ /dev/null @@ -1,107 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -use_gpu: true -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_pretrained.tar -metric: COCO -weights: output/faster_rcnn_r34_fpn_1x/model_final -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - norm_type: bn - norm_decay: 0. - depth: 34 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_lo: 0.0 - bg_thresh_hi: 0.5 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/faster_rcnn_r50_1x.yml b/static/configs/faster_rcnn_r50_1x.yml deleted file mode 100644 index 26753124560365c818fc15ed04ee99f242b89145..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r50_1x.yml +++ /dev/null @@ -1,91 +0,0 @@ -architecture: FasterRCNN -use_gpu: true -max_iters: 180000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/faster_rcnn_r50_1x/model_final -num_classes: 81 - -FasterRCNN: - backbone: ResNet - rpn_head: RPNHead - roi_extractor: RoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - norm_type: affine_channel - depth: 50 - feature_maps: 4 - freeze_at: 2 - -ResNetC5: - depth: 50 - norm_type: affine_channel - -RPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - use_random: true - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 12000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 6000 - post_nms_top_n: 1000 - -RoIAlign: - resolution: 14 - sampling_ratio: 0 - spatial_scale: 0.0625 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: ResNetC5 - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_reader.yml' diff --git a/static/configs/faster_rcnn_r50_2x.yml b/static/configs/faster_rcnn_r50_2x.yml deleted file mode 100644 index 9112d9616f9c3baaba2ae377add9729a848d2188..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r50_2x.yml +++ /dev/null @@ -1,91 +0,0 @@ -architecture: FasterRCNN -use_gpu: true -max_iters: 360000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/faster_rcnn_r50_2x/model_final -num_classes: 81 - -FasterRCNN: - backbone: ResNet - rpn_head: RPNHead - roi_extractor: RoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - norm_type: affine_channel - depth: 50 - feature_maps: 4 - freeze_at: 2 - -ResNetC5: - depth: 50 - norm_type: affine_channel - -RPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - use_random: true - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 12000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 6000 - post_nms_top_n: 1000 - -RoIAlign: - resolution: 14 - sampling_ratio: 0 - spatial_scale: 0.0625 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: ResNetC5 - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_reader.yml' diff --git a/static/configs/faster_rcnn_r50_fpn_1x.yml b/static/configs/faster_rcnn_r50_fpn_1x.yml deleted file mode 100644 index 83dcc356d6136626568bdaee3e3f753a95be3c68..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r50_fpn_1x.yml +++ /dev/null @@ -1,105 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -use_gpu: true -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/faster_rcnn_r50_fpn_1x/model_final -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - norm_type: bn - norm_decay: 0. - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_lo: 0.0 - bg_thresh_hi: 0.5 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/faster_rcnn_r50_fpn_2x.yml b/static/configs/faster_rcnn_r50_fpn_2x.yml deleted file mode 100644 index 909f0fd1c12c1f39c9590089ff2348f22e426011..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r50_fpn_2x.yml +++ /dev/null @@ -1,106 +0,0 @@ -architecture: FasterRCNN -max_iters: 180000 -use_gpu: true -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/faster_rcnn_r50_fpn_2x/model_final -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - norm_type: affine_channel - norm_decay: 0. - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_lo: 0.0 - bg_thresh_hi: 0.5 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/faster_rcnn_r50_vd_1x.yml b/static/configs/faster_rcnn_r50_vd_1x.yml deleted file mode 100644 index 8a10886b5ff7bd9d3a476ea5819cf9067e9a9819..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r50_vd_1x.yml +++ /dev/null @@ -1,93 +0,0 @@ -architecture: FasterRCNN -use_gpu: true -max_iters: 180000 -log_iter: 20 -save_dir: output/faster-r50-vd-c4-1x -snapshot_iter: 10000 -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -metric: COCO -weights: output/faster_rcnn_r50_vd_1x/model_final -num_classes: 81 - -FasterRCNN: - backbone: ResNet - rpn_head: RPNHead - roi_extractor: RoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - norm_type: affine_channel - depth: 50 - feature_maps: 4 - freeze_at: 2 - variant: d - -ResNetC5: - depth: 50 - norm_type: affine_channel - variant: d - -RPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - use_random: true - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 12000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 6000 - post_nms_top_n: 1000 - -RoIAlign: - resolution: 14 - sampling_ratio: 0 - spatial_scale: 0.0625 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: ResNetC5 - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_reader.yml' diff --git a/static/configs/faster_rcnn_r50_vd_fpn_2x.yml b/static/configs/faster_rcnn_r50_vd_fpn_2x.yml deleted file mode 100644 index 51f8ed5502573ab59516e8694ed1d583dafa50ed..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r50_vd_fpn_2x.yml +++ /dev/null @@ -1,106 +0,0 @@ -architecture: FasterRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/faster_rcnn_r50_vd_fpn_2x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/faster_rcnn_r50_vd_fpn_roadsign_kunlun.yml b/static/configs/faster_rcnn_r50_vd_fpn_roadsign_kunlun.yml deleted file mode 100644 index 7401438562952ab5f0191e8f9ea9a74c30a98db9..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_r50_vd_fpn_roadsign_kunlun.yml +++ /dev/null @@ -1,239 +0,0 @@ -architecture: FasterRCNN -use_gpu: false -use_xpu: true -max_iters: 2000 -log_iter: 1 -save_dir: output -snapshot_iter: 500 -metric: VOC -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_2x.tar -weights: output/faster_rcnn_r50_vd_fpn_roadsign_kunlun/model_final -num_classes: 5 -finetune_exclude_pretrained_params: ['cls_score, bbox_pred'] - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - norm_type: affine_channel - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - -ResNetC5: - depth: 50 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - use_random: true - - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.0001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 1300 - - 1800 - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 100 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: train.txt - with_background: true - - batch_size: 1 - bufsize: 2 - shuffle: true - drop_empty: true - drop_last: true - mixup_epoch: -1 - use_process: false - worker_num: 2 - - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - is_normalized: true - prob: 0.5 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - - -EvalReader: - batch_size: 1 - bufsize: 1 - shuffle: false - drop_empty: false - drop_last: false - use_process: false - worker_num: 1 - - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape', 'gt_bbox', 'gt_class', 'is_difficult'] - - dataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: valid.txt - with_background: true - - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - - -TestReader: - batch_size: 1 - drop_empty: false - drop_last: false - - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - - dataset: - !ImageFolder - anno_path: dataset/roadsign_voc/label_list.txt - with_background: true - - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true diff --git a/static/configs/faster_rcnn_se154_vd_fpn_s1x.yml b/static/configs/faster_rcnn_se154_vd_fpn_s1x.yml deleted file mode 100644 index 02d0469616037f1fc136a1bd7e77849afd599146..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_se154_vd_fpn_s1x.yml +++ /dev/null @@ -1,106 +0,0 @@ -architecture: FasterRCNN -max_iters: 260000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_pretrained.tar -weights: output/faster_rcnn_se154_vd_fpn_s1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: SENet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -SENet: - depth: 152 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [200000, 240000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' diff --git a/static/configs/faster_rcnn_x101_vd_64x4d_fpn_1x.yml b/static/configs/faster_rcnn_x101_vd_64x4d_fpn_1x.yml deleted file mode 100644 index fc83b00b2e6222aa0b633f217d3e3e5aec7be08d..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_x101_vd_64x4d_fpn_1x.yml +++ /dev/null @@ -1,107 +0,0 @@ -architecture: FasterRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar -weights: output/faster_rcnn_x101_vd_64x4d_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNeXt - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNeXt: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - values: null - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' diff --git a/static/configs/faster_rcnn_x101_vd_64x4d_fpn_2x.yml b/static/configs/faster_rcnn_x101_vd_64x4d_fpn_2x.yml deleted file mode 100644 index e41022149c37efe5bb0cdc608b64d795bcf4bdc3..0000000000000000000000000000000000000000 --- a/static/configs/faster_rcnn_x101_vd_64x4d_fpn_2x.yml +++ /dev/null @@ -1,106 +0,0 @@ -architecture: FasterRCNN -max_iters: 360000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar -weights: output/faster_rcnn_x101_vd_64x4d_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNeXt - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNeXt: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' diff --git a/static/configs/faster_reader.yml b/static/configs/faster_reader.yml deleted file mode 100644 index 3099bb656d641d99c71e12aa6f61fb625d23d6bd..0000000000000000000000000000000000000000 --- a/static/configs/faster_reader.yml +++ /dev/null @@ -1,91 +0,0 @@ -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: -1 - use_padded_im_info: false - batch_size: 1 - shuffle: true - worker_num: 2 - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'im_shape', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_size: 1 - shuffle: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_size: 1 - shuffle: false diff --git a/static/configs/gcnet/README.md b/static/configs/gcnet/README.md deleted file mode 100644 index 5c25d34676f9bd6bcc66a8d90a4197f69c744d9c..0000000000000000000000000000000000000000 --- a/static/configs/gcnet/README.md +++ /dev/null @@ -1,34 +0,0 @@ -# GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond - -## Introduction - -- GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond -: [https://arxiv.org/abs/1904.11492](https://arxiv.org/abs/1904.11492) - -``` -@article{DBLP:journals/corr/abs-1904-11492, - author = {Yue Cao and - Jiarui Xu and - Stephen Lin and - Fangyun Wei and - Han Hu}, - title = {GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond}, - journal = {CoRR}, - volume = {abs/1904.11492}, - year = {2019}, - url = {http://arxiv.org/abs/1904.11492}, - archivePrefix = {arXiv}, - eprint = {1904.11492}, - timestamp = {Tue, 09 Jul 2019 16:48:55 +0200}, - biburl = {https://dblp.org/rec/bib/journals/corr/abs-1904-11492}, - bibsource = {dblp computer science bibliography, https://dblp.org} -} -``` - - -## Model Zoo - -| Backbone | Type | Context| Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | -| :---------------------- | :-------------: | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| ResNet50-vd-FPN | Mask | GC(c3-c5, r16, add) | 2 | 2x | 15.31 | 41.4 | 36.8 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.yml) | -| ResNet50-vd-FPN | Mask | GC(c3-c5, r16, mul) | 2 | 2x | 15.35 | 40.7 | 36.1 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.yml) | diff --git a/static/configs/gcnet/README_cn.md b/static/configs/gcnet/README_cn.md deleted file mode 100644 index 579bfe0ca9fca376e7645dd0a9b59b15d7e817a0..0000000000000000000000000000000000000000 --- a/static/configs/gcnet/README_cn.md +++ /dev/null @@ -1,69 +0,0 @@ -# GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond - -## 简介 - -Nonlocal基于自注意力机制,给出了捕捉长时依赖的方法,但是在该论文中,作者通过可视化分析发现,相同图像中对于不同位置点的attention map几乎是一致的,也就是说在Nonlocal计算过程中有很大的资源浪费(冗余计算)。SENet使用全局上下文对不同的通道进行权重标定,计算量很小,但是这样无法充分利用全局上下文信息。论文中作者结合了Nonlocal和SENet两者的优点,提出了GCNet模块,在保证较小计算量的情况下,很好地融合了全局上下文信息。 - -论文中基于attention map差距很小的现象,设计了simplified nonlocal结构(SNL),结构如下图所示,对所有位置共享全局attention map。 - -
    - -
    - - -SNL的网络输出计算如下 - -
    - -
    - -为进一步减少计算量,将$W_v$提取到attention pooling计算的外面,表示为 - -
    - -
    - -对应结构如下所示。通过共享attention map,计算量减少为之前的1/WH。 - -
    - -
    - -SNL模块可以抽象为上下文建模、特征转换和特征聚合三个部分,特征转化部分有大量参数,因此在这里参考SE的结构,最终GC block的结构如下所示。使用两层降维的1*1卷积降低计算量,由于两层卷积参数较难优化,在这里加入layer normalization的正则化层降低优化难度。 - -
    - -
    - -该模块可以很方便地插入到骨干网络中,提升模型的全局上下文表达能力,可以提升检测和分割任务的模型性能。 - - -## 模型库 - -| 骨架网络 | 网络类型 | Context设置 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | -| :---------------------- | :-------------: | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| ResNet50-vd-FPN | Mask | GC(c3-c5, r16, add) | 2 | 2x | 15.31 | 41.4 | 36.8 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.yml) | -| ResNet50-vd-FPN | Mask | GC(c3-c5, r16, mul) | 2 | 2x | 15.35 | 40.7 | 36.1 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.yml) | - - -## 引用 - -``` -@article{DBLP:journals/corr/abs-1904-11492, - author = {Yue Cao and - Jiarui Xu and - Stephen Lin and - Fangyun Wei and - Han Hu}, - title = {GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond}, - journal = {CoRR}, - volume = {abs/1904.11492}, - year = {2019}, - url = {http://arxiv.org/abs/1904.11492}, - archivePrefix = {arXiv}, - eprint = {1904.11492}, - timestamp = {Tue, 09 Jul 2019 16:48:55 +0200}, - biburl = {https://dblp.org/rec/bib/journals/corr/abs-1904-11492}, - bibsource = {dblp computer science bibliography, https://dblp.org} -} -``` diff --git a/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.yml b/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.yml deleted file mode 100644 index 49dd68977407aedbce8a65271d9b269f7b97342a..0000000000000000000000000000000000000000 --- a/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.yml +++ /dev/null @@ -1,119 +0,0 @@ -architecture: MaskRCNN -use_gpu: true -max_iters: 180000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -metric: COCO -weights: output/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x/model_final -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - gcb_stages: [3, 4, 5] - gcb_params: - ratio: 0.0625 - pooling_type: att - fusion_types: [channel_add] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../mask_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.yml b/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.yml deleted file mode 100644 index 3b93547b491c371947ce9eec5d64448879bed01c..0000000000000000000000000000000000000000 --- a/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.yml +++ /dev/null @@ -1,119 +0,0 @@ -architecture: MaskRCNN -use_gpu: true -max_iters: 180000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -metric: COCO -weights: output/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x/model_final -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - gcb_stages: [3, 4, 5] - gcb_params: - ratio: 0.0625 - pooling_type: att - fusion_types: [channel_mul] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../mask_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x.yml b/static/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x.yml deleted file mode 100644 index 566f4e86394542874df1aac2a01d5fee95900e78..0000000000000000000000000000000000000000 --- a/static/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x.yml +++ /dev/null @@ -1,117 +0,0 @@ -architecture: CascadeMaskRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -weights: output/cascade_mask_rcnn_r50_fpn_gn_2x/model_final -metric: COCO -num_classes: 81 - -CascadeMaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - mask_head: MaskHead - mask_assigner: MaskAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - norm_type: gn - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - norm_type: gn - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_hi: [0.5, 0.6, 0.7] - bg_thresh_lo: [0.0, 0.0, 0.0] - fg_fraction: 0.25 - fg_thresh: [0.5, 0.6, 0.7] - -MaskAssigner: - resolution: 28 - -CascadeBBoxHead: - head: CascadeXConvNormHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeXConvNormHead: - norm_type: gn - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../mask_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/gn/faster_rcnn_r50_fpn_gn_2x.yml b/static/configs/gn/faster_rcnn_r50_fpn_gn_2x.yml deleted file mode 100644 index 2f95b4b784cf4323c8b1f7bab8c57f3fa962044f..0000000000000000000000000000000000000000 --- a/static/configs/gn/faster_rcnn_r50_fpn_gn_2x.yml +++ /dev/null @@ -1,106 +0,0 @@ -architecture: FasterRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/faster_rcnn_r50_fpn_gn/model_final -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - norm_type: gn - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_lo: 0.0 - bg_thresh_hi: 0.5 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: XConvNormHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -XConvNormHead: - norm_type: gn - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/gn/mask_rcnn_r50_fpn_gn_2x.yml b/static/configs/gn/mask_rcnn_r50_fpn_gn_2x.yml deleted file mode 100644 index 644cf28f8718d8033ae9352497fbcb18830c90c9..0000000000000000000000000000000000000000 --- a/static/configs/gn/mask_rcnn_r50_fpn_gn_2x.yml +++ /dev/null @@ -1,113 +0,0 @@ -architecture: MaskRCNN -max_iters: 360000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -weights: output/mask_rcnn_r50_fpn_gn_2x/model_final -metric: COCO -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - norm_type: gn - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - norm_type: gn - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: XConvNormHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -XConvNormHead: - norm_type: gn - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../mask_fpn_reader.yml' diff --git a/static/configs/gridmask/README.md b/static/configs/gridmask/README.md deleted file mode 100644 index 3e39b8a7120b6522fe151ee3b378d584d5556d7d..0000000000000000000000000000000000000000 --- a/static/configs/gridmask/README.md +++ /dev/null @@ -1,22 +0,0 @@ -# GridMask Data Augmentation - -## Introduction - -- GridMask Data Augmentation -: [https://arxiv.org/abs/2001.04086](https://arxiv.org/abs/2001.04086) - -``` -@article{chen2020gridmask, - title={GridMask data augmentation}, - author={Chen, Pengguang}, - journal={arXiv preprint arXiv:2001.04086}, - year={2020} -} -``` - - -## Model Zoo - -| Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | -| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| ResNet50-vd-FPN | Faster | 2 | 4x | 21.847 | 39.1% | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_gridmask_4x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/gridmask/faster_rcnn_r50_vd_fpn_gridmask_4x.yml) | diff --git a/static/configs/gridmask/faster_rcnn_r50_vd_fpn_gridmask_4x.yml b/static/configs/gridmask/faster_rcnn_r50_vd_fpn_gridmask_4x.yml deleted file mode 100755 index 43bb740094a22440dc4b7b11c5385fabbe6cf208..0000000000000000000000000000000000000000 --- a/static/configs/gridmask/faster_rcnn_r50_vd_fpn_gridmask_4x.yml +++ /dev/null @@ -1,135 +0,0 @@ -architecture: FasterRCNN -max_iters: 360000 -snapshot_iter: 40000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/faster_rcnn_r50_vd_fpn_gridmask_4x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !GridMaskOp - use_h: true - use_w: true - rotate: 1 - offset: false - ratio: 0.5 - mode: 1 - prob: 0.7 - upper_iter: 360000 - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_size: 2 - worker_num: 2 - use_process: true diff --git a/static/configs/hrnet/README.md b/static/configs/hrnet/README.md deleted file mode 100644 index c18cb6d7d27ba1ac30d9fcb48803233dc6aa16e1..0000000000000000000000000000000000000000 --- a/static/configs/hrnet/README.md +++ /dev/null @@ -1,34 +0,0 @@ -# High-resolution networks (HRNets) for object detection - -## Introduction - -- Deep High-Resolution Representation Learning for Human Pose Estimation: [https://arxiv.org/abs/1902.09212](https://arxiv.org/abs/1902.09212) - -``` -@inproceedings{SunXLW19, - title={Deep High-Resolution Representation Learning for Human Pose Estimation}, - author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang}, - booktitle={CVPR}, - year={2019} -} -``` - -- High-Resolution Representations for Labeling Pixels and Regions: [https://arxiv.org/abs/1904.04514](https://arxiv.org/abs/1904.04514) - -``` -@article{SunZJCXLMWLW19, - title={High-Resolution Representations for Labeling Pixels and Regions}, - author={Ke Sun and Yang Zhao and Borui Jiang and Tianheng Cheng and Bin Xiao - and Dong Liu and Yadong Mu and Xinggang Wang and Wenyu Liu and Jingdong Wang}, - journal = {CoRR}, - volume = {abs/1904.04514}, - year={2019} -} -``` - -## Model Zoo - -| Backbone | Type | deformable Conv | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | -| :---------------------- | :------------- | :---: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| HRNetV2p_W18 | Faster | False | 2 | 1x | 17.509 | 36.0 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x.yml) | -| HRNetV2p_W18 | Faster | False | 2 | 2x | 17.509 | 38.0 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x.yml) | diff --git a/static/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x.yml b/static/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x.yml deleted file mode 100644 index 3108e9c60e18f0470d12fceb54c7223cb79c9c48..0000000000000000000000000000000000000000 --- a/static/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x.yml +++ /dev/null @@ -1,103 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar -weights: output/faster_rcnn_hrnetv2p_w18_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: HRNet - fpn: HRFPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -HRNet: - feature_maps: [2, 3, 4, 5] - width: 18 - freeze_at: 0 - norm_type: bn - -HRFPN: - num_chan: 256 - share_conv: false - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x.yml b/static/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x.yml deleted file mode 100644 index ecc307e075e662ba21402bde6b3dd7e472135567..0000000000000000000000000000000000000000 --- a/static/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x.yml +++ /dev/null @@ -1,103 +0,0 @@ -architecture: FasterRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar -weights: output/faster_rcnn_hrnetv2p_w18_2x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: HRNet - fpn: HRFPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -HRNet: - feature_maps: [2, 3, 4, 5] - width: 18 - freeze_at: 0 - norm_type: bn - -HRFPN: - num_chan: 256 - share_conv: false - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/htc/README.md b/static/configs/htc/README.md deleted file mode 100644 index 519c0cde318d128400a42e2a4bfba6891ac40f85..0000000000000000000000000000000000000000 --- a/static/configs/htc/README.md +++ /dev/null @@ -1,26 +0,0 @@ -# Hybrid Task Cascade for Instance Segmentation - -## Introduction - -We provide config files to reproduce the results in the CVPR 2019 paper for [Hybrid Task Cascade](https://arxiv.org/abs/1901.07518). - -``` -@inproceedings{chen2019hybrid, - title={Hybrid task cascade for instance segmentation}, - author={Chen, Kai and Pang, Jiangmiao and Wang, Jiaqi and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Shi, Jianping and Ouyang, Wanli and Chen Change Loy and Dahua Lin}, - booktitle={IEEE Conference on Computer Vision and Pattern Recognition}, - year={2019} -} -``` - -## Dataset - -HTC requires COCO and COCO-stuff dataset for training. - -## Results and Models - -The results on COCO 2017val are shown in the below table. (results on test-dev are usually slightly higher than val) - - | Backbone | Lr schd | Inf time (fps) | box AP | mask AP | Download | - |:---------:|:-------:|:--------------:|:------:|:-------:|:--------:| - | R-50-FPN | 1x | 11 | 42.9 | 37.0 | [model](https://paddlemodels.bj.bcebos.com/object_detection/htc_r50_fpn_1x.pdparams ) | diff --git a/static/configs/htc/htc_r50_fpn_1x.yml b/static/configs/htc/htc_r50_fpn_1x.yml deleted file mode 100644 index 348343ccf4dbc511cfe2a98e80d97dad33572743..0000000000000000000000000000000000000000 --- a/static/configs/htc/htc_r50_fpn_1x.yml +++ /dev/null @@ -1,225 +0,0 @@ -architecture: HybridTaskCascade -use_gpu: true -max_iters: 180000 -snapshot_iter: 10000 -log_iter: 50 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/htc_r50_fpn_1x/model_final -num_classes: 81 - -HybridTaskCascade: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: HTCBBoxHead - bbox_assigner: CascadeBBoxAssigner - mask_assigner: MaskAssigner - mask_head: HTCMaskHead - fused_semantic_head: FusedSemanticHead - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 1000 - -# bbox roi extractor -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -# semantic roi extractor -RoIAlign: - resolution: 14 - sampling_ratio: 2 - -HTCMaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - lr_ratio: 2.0 - -FusedSemanticHead: - semantic_num_class: 183 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_hi: [0.5, 0.6, 0.7] - bg_thresh_lo: [0.0, 0.0, 0.0] - fg_fraction: 0.25 - fg_thresh: [0.5, 0.6, 0.7] - -MaskAssigner: - resolution: 28 - -HTCBBoxHead: - head: CascadeTwoFCHead - nms: MultiClassSoftNMS - -MultiClassSoftNMS: - score_threshold: 0.01 - keep_top_k: 300 - softnms_sigma: 0.5 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - clip_grad_by_norm: 35.0 - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - batch_size: 1 - worker_num: 2 - shuffle: true - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_train2017.json - image_dir: train2017 - load_semantic: True - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_mask', 'semantic'] - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - is_mask_flip: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 1 - shuffle: false - drop_last: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 1 - shuffle: false diff --git a/static/configs/iou_loss/README.md b/static/configs/iou_loss/README.md deleted file mode 100644 index 2082ece3892341f05954a83d51e6859ebeb6e4e5..0000000000000000000000000000000000000000 --- a/static/configs/iou_loss/README.md +++ /dev/null @@ -1,48 +0,0 @@ -# Improvements of IOU loss - -## Introduction - -- Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression: [https://arxiv.org/abs/1902.09630](https://arxiv.org/abs/1902.09630) - -``` -@article{DBLP:journals/corr/abs-1902-09630, - author = {Seyed Hamid Rezatofighi and - Nathan Tsoi and - JunYoung Gwak and - Amir Sadeghian and - Ian D. Reid and - Silvio Savarese}, - title = {Generalized Intersection over Union: {A} Metric and {A} Loss for Bounding - Box Regression}, - journal = {CoRR}, - volume = {abs/1902.09630}, - year = {2019}, - url = {http://arxiv.org/abs/1902.09630}, - archivePrefix = {arXiv}, - eprint = {1902.09630}, - timestamp = {Tue, 21 May 2019 18:03:36 +0200}, - biburl = {https://dblp.org/rec/bib/journals/corr/abs-1902-09630}, - bibsource = {dblp computer science bibliography, https://dblp.org} -} -``` - -- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression: [https://arxiv.org/abs/1911.08287](https://arxiv.org/abs/1911.08287) - -``` -@article{Zheng2019DistanceIoULF, - title={Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression}, - author={Zhaohui Zheng and Ping Wang and Wei Liu and Jinze Li and Rongguang Ye and Dongwei Ren}, - journal={ArXiv}, - year={2019}, - volume={abs/1911.08287} -} -``` - -## Model Zoo - - -| Backbone | Type | Loss Type | Loss Weight | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | -| :---------------------- | :------------- | :---: | :---: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :---: | -| ResNet50-vd-FPN | Faster | GIOU | 10 | 2 | 1x | 22.94 | 39.4 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_giou_loss_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_giou_loss_1x.yml) | -| ResNet50-vd-FPN | Faster | DIOU | 12 | 2 | 1x | 22.94 | 39.2 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_diou_loss_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_diou_loss_1x.yml) | -| ResNet50-vd-FPN | Faster | CIOU | 12 | 2 | 1x | 22.95 | 39.6 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_ciou_loss_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_ciou_loss_1x.yml) | diff --git a/static/configs/iou_loss/README_cn.md b/static/configs/iou_loss/README_cn.md deleted file mode 100644 index 3288e587dda9e787c7de265c5d94c2d39f764264..0000000000000000000000000000000000000000 --- a/static/configs/iou_loss/README_cn.md +++ /dev/null @@ -1,126 +0,0 @@ -# Improvements of IOU loss - -## 简介 - -### GIOU loss - -IOU是论文中十分常用的指标,它对于物体的尺度并不敏感,在之前的检测任务中,常使用smooth l1 loss计算边框loss,但是该种方法计算出来的loss一方面无法与最终的IOU指标直接对应,同时也对检测框的尺度较为敏感,因此有学者提出将IOU loss作为回归的loss;但是如果IOU为0,则loss为0,同时IOU loss也没有考虑物体方向没有对齐时的loss,该论文基于此进行改进,计算GIOU的方法如下。 - - -
    - -
    - - -最终GIOU loss为1-GIOU所得的值。具体来看,IOU可以直接反映边框与真值之间的交并比,C为能够包含A和B的最小封闭凸物体,因此即使A和B的交并比为0,GIOU也会随着A和B的相对距离而不断变化,因此模型参数可以继续得到优化。在A和B的长宽保持恒定的情况下,两者距离越远,GIOU越小,GIOU loss越大。 - -使用GIOU loss计算边框损失的流程图如下。 - -
    - -
    - - -PaddleDetection也开源了基于faster rcnn的GIOU loss实现。使用GIOU loss替换传统的smooth l1 loss,基于faster rcnn的resnet50-vd-fpn 1x实验,coco val mAP能由38.3%提升到39.4%(没有带来任何预测耗时的损失) - - -### DIOU/CIOU loss - -GIOU loss解决了IOU loss中预测边框A与真值B的交并比为0时,模型无法给出优化方向的问题,但是仍然有2种情况难以解决, -1. 当边框A和边框B处于包含关系的时候,GIOU loss退化为IOU loss,此时模型收敛较慢。 -2. 当A与B相交,若A和B的的x1与x2均相等或者y1与y2均相等,GIOU loss仍然会退化为IOU loss,收敛很慢。 - -基于此,论文提出了DIOU loss与CIOU loss,解决收敛速度慢以及部分条件下无法收敛的问题。 -为加速收敛,论文在改进的loss中引入距离的概念,具体地,边框loss可以定义为如下形式: - - -
    - -
    - - -其中 是惩罚项,考虑预测边框与真值的距离损失时,惩罚项可以定义为 - - -
    - -
    - - -其中分子表示预测框与真值边框中心点的欧式距离,分母的c表示预测框与真值边框的最小外包边框的对角长度。因此DIOU loss可以写为 - -
    - -
    - - -相对于GIOU loss,DIOU loss不仅考虑了IOU,也考虑边框之间的距离,从而加快了模型收敛的速度。但是使用DIOU loss作为边框损失函数时,只考虑了边框的交并比以及中心点的距离,没有考虑到预测边框与真值的长宽比差异的情况,因此论文中提出了CIOU loss,惩罚项添加关于长宽比的约束。具体地,惩罚项定义如下 - -
    - -
    - - -其中v为惩罚项,α为惩罚系数,定义分别如下 - -
    - -
    - - -CIOU loss使得在边框回归时,与目标框有重叠甚至包含时能够更快更准确地收敛。 -在NMS阶段,一般的阈值计算为IOU,论文使用了DIOU修正后的阈值,检测框得分的更新方法如下。 - -
    - -
    - - -这使得模型效果有进一步的提升。 - - -## 模型库 - -| 骨架网络 | 网络类型 | Loss类型 | Loss权重 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | -| :---------------------- | :------------- | :---: | :---: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :---: | -| ResNet50-vd-FPN | Faster | GIOU | 10 | 2 | 1x | 22.94 | 39.4 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_giou_loss_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_giou_loss_1x.yml) | -| ResNet50-vd-FPN | Faster | DIOU | 12 | 2 | 1x | 22.94 | 39.2 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_diou_loss_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_diou_loss_1x.yml) | -| ResNet50-vd-FPN | Faster | CIOU | 12 | 2 | 1x | 22.95 | 39.6 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_ciou_loss_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_ciou_loss_1x.yml) | - - - -## 引用 - -``` -@article{DBLP:journals/corr/abs-1902-09630, - author = {Seyed Hamid Rezatofighi and - Nathan Tsoi and - JunYoung Gwak and - Amir Sadeghian and - Ian D. Reid and - Silvio Savarese}, - title = {Generalized Intersection over Union: {A} Metric and {A} Loss for Bounding - Box Regression}, - journal = {CoRR}, - volume = {abs/1902.09630}, - year = {2019}, - url = {http://arxiv.org/abs/1902.09630}, - archivePrefix = {arXiv}, - eprint = {1902.09630}, - timestamp = {Tue, 21 May 2019 18:03:36 +0200}, - biburl = {https://dblp.org/rec/bib/journals/corr/abs-1902-09630}, - bibsource = {dblp computer science bibliography, https://dblp.org} -} -``` - -- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression: [https://arxiv.org/abs/1911.08287](https://arxiv.org/abs/1911.08287) - -``` -@article{Zheng2019DistanceIoULF, - title={Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression}, - author={Zhaohui Zheng and Ping Wang and Wei Liu and Jinze Li and Rongguang Ye and Dongwei Ren}, - journal={ArXiv}, - year={2019}, - volume={abs/1911.08287} -} -``` diff --git a/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_ciou_loss_1x.yml b/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_ciou_loss_1x.yml deleted file mode 100644 index aa6c17b79ec1381af552ace8d31777a4673a1f0c..0000000000000000000000000000000000000000 --- a/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_ciou_loss_1x.yml +++ /dev/null @@ -1,114 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/faster_rcnn_r50_vd_fpn_diou_loss_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: MultiClassDiouNMS - bbox_loss: DiouLoss - -MultiClassDiouNMS: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -DiouLoss: - loss_weight: 10.0 - is_cls_agnostic: false - use_complete_iou_loss: true - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_diou_loss_1x.yml b/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_diou_loss_1x.yml deleted file mode 100644 index f780c919f203618df16d9e2b7fb1833ec2507713..0000000000000000000000000000000000000000 --- a/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_diou_loss_1x.yml +++ /dev/null @@ -1,112 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/faster_rcnn_r50_vd_fpn_diou_loss_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - bbox_loss: DiouLoss - -DiouLoss: - loss_weight: 12.0 - is_cls_agnostic: false - use_complete_iou_loss: false - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_giou_loss_1x.yml b/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_giou_loss_1x.yml deleted file mode 100644 index 66721a03fa2f25d0fcd90154f9cf7e723271fcc4..0000000000000000000000000000000000000000 --- a/static/configs/iou_loss/faster_rcnn_r50_vd_fpn_giou_loss_1x.yml +++ /dev/null @@ -1,111 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/faster_rcnn_r50_vd_fpn_giou_loss_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - bbox_loss: GiouLoss - -GiouLoss: - loss_weight: 10.0 - is_cls_agnostic: false - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/libra_rcnn/README.md b/static/configs/libra_rcnn/README.md deleted file mode 100644 index 3451390d34f969444503f4c5ebc90993be85b568..0000000000000000000000000000000000000000 --- a/static/configs/libra_rcnn/README.md +++ /dev/null @@ -1,23 +0,0 @@ -# Libra R-CNN: Towards Balanced Learning for Object Detection - -## Introduction - -- Libra R-CNN: Towards Balanced Learning for Object Detection -: [https://arxiv.org/abs/1904.02701](https://arxiv.org/abs/1904.02701) - -``` -@inproceedings{pang2019libra, - title={Libra R-CNN: Towards Balanced Learning for Object Detection}, - author={Pang, Jiangmiao and Chen, Kai and Shi, Jianping and Feng, Huajun and Ouyang, Wanli and Dahua Lin}, - booktitle={IEEE Conference on Computer Vision and Pattern Recognition}, - year={2019} -} -``` - - -## Model Zoo - -| Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | -| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| ResNet50-vd-BFP | Faster | 2 | 1x | 18.247 | 40.5 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/libra_rcnn_r50_vd_fpn_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/libra_rcnn/libra_rcnn_r50_vd_fpn_1x.yml) | -| ResNet101-vd-BFP | Faster | 2 | 1x | 14.865 | 42.5 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/libra_rcnn_r101_vd_fpn_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/libra_rcnn/libra_rcnn_r101_vd_fpn_1x.yml) | diff --git a/static/configs/libra_rcnn/README_cn.md b/static/configs/libra_rcnn/README_cn.md deleted file mode 100644 index 7d71aae1bb977aac21dc28520633995d006442a8..0000000000000000000000000000000000000000 --- a/static/configs/libra_rcnn/README_cn.md +++ /dev/null @@ -1,75 +0,0 @@ -# Libra R-CNN: Towards Balanced Learning for Object Detection - -## 简介 - -检测模型训练大多包含3个步骤:候选区域生成与选择、特征提取、类别分类和检测框回归多任务的训练与收敛。 - -论文主要分析了在检测任务中,三个层面的不均衡现象限制了模型的性能,分别是样本(sample level)、特征(feature level)以及目标级别(objective level)的不均衡,提出了3种方案,用于解决上述三个不均衡的现象。三个解决方法如下。 - -### IoU-balanced Sampling - -Faster RCNN中生成许多候选框之后,使用随机的方法挑选正负样本,但是这导致了一个问题:负样本中有70%的候选框与真值的IOU都在0~0.05之间,分布如下图所示。使用在线难负样本挖掘(OHEM)的方法可以缓解这种情况,但是不同IOU区间的采样样本仍然差距仍然比较大,而且流程复杂。作者提出了均衡的负样本采样策略,即将IOU阈值区间分为K份,在每个子区间都采样相同数量的负样本(如果达不到平均数量,则取所有在该子区间的样本),最终可以保证采样得到的负样本在不同的IOU子区间达到尽量均衡的状态。这种方法思路简单,效果也比OHEM要更好一些。 - - -
    - -
    - - -### Balanced Feature Pyramid(BFP) - -之前的FPN结构中使用横向连接的操作融合骨干网络的特征,论文中提出了一个如下图,主要包括rescaling, integrating, refining and strengthening,共4个部分。首先将不同层级的特征图缩放到同一尺度,之后对特征图进行加权平均,使用Nonlocal模块进一步提炼特征,最终将提炼后的特征图进行缩放,作为残差项与不同层级的特征图相加,得到最终输出的特征图。这种平衡的特征图金字塔结构相对于标准的FPN在coco数据集上可以带来0.8%左右的精度提升。 - -
    - -
    - - - -### Balanced L1 Loss - -物体检测任务中,需要同时优化分类loss与边框的回归loss,当分类得分很高时,即使回归效果很差,也会使得模型有比较高的精度,因此可以考虑增加回归loss的权重。假设bbox loss<=1的边框为inliers(可以被视为简单的样本),bbox loss>1的边框为outliers(可以被视为难样本),假设直接调整所有边框的回归loss,这会导致模型对outliers更加敏感,而且基于smooth l1 loss的边框loss计算方法有以下缺点,当边框为inliers时,其梯度很小,当边框为outliers时,梯度幅值为1。smooth l1 loss的梯度计算方法定义如下。 - -
    - -
    - - -因此论文考虑增加inliers的梯度值,尽量平衡inliers与outliers的loss梯度比例。最终Libra loss的梯度计算方法如下所示。 - -
    - -
    - - -在不同的超参数下,梯度可视化如下图所示。 - - -
    - -
    - - -可以看出Libra loss与smooth l1 loss对于outliers的梯度是相同的,但是在inliers中,Libra loss的梯度更大一些,从而增大了不同情况下的边框回归loss,平衡了难易边框学习的loss,同时也提升了边框回归效果对检测模型性能的影响。 - -论文将3个部分融合在一起,在coco两阶段目标检测任务中有1.1%~2.5%的绝对精度提升,效果十分明显。 - - -## 模型库 - - -| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | -| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| ResNet50-vd-BFP | Faster | 2 | 1x | 18.247 | 40.5 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/libra_rcnn_r50_vd_fpn_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/libra_rcnn/libra_rcnn_r50_vd_fpn_1x.yml) | -| ResNet101-vd-BFP | Faster | 2 | 1x | 14.865 | 42.5 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/libra_rcnn_r101_vd_fpn_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/libra_rcnn/libra_rcnn_r101_vd_fpn_1x.yml) | - -## 引用 - -``` -@inproceedings{pang2019libra, - title={Libra R-CNN: Towards Balanced Learning for Object Detection}, - author={Pang, Jiangmiao and Chen, Kai and Shi, Jianping and Feng, Huajun and Ouyang, Wanli and Dahua Lin}, - booktitle={IEEE Conference on Computer Vision and Pattern Recognition}, - year={2019} -} -``` diff --git a/static/configs/libra_rcnn/libra_rcnn_r101_vd_fpn_1x.yml b/static/configs/libra_rcnn/libra_rcnn_r101_vd_fpn_1x.yml deleted file mode 100644 index 2c425a58c91a47dfc3dff7b7df69249cf0b0d2f9..0000000000000000000000000000000000000000 --- a/static/configs/libra_rcnn/libra_rcnn_r101_vd_fpn_1x.yml +++ /dev/null @@ -1,117 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -weights: output/libra_rcnn_r101_vd_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: BFP - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: LibraBBoxAssigner - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -BFP: - base_neck: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - refine_level: 2 - refine_type: nonlocal - nonlocal_reduction: 1.0 - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -LibraBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - bbox_loss: BalancedL1Loss - -BalancedL1Loss: - alpha: 0.5 - gamma: 1.5 - beta: 1.0 - loss_weight: 1.0 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/libra_rcnn/libra_rcnn_r50_vd_fpn_1x.yml b/static/configs/libra_rcnn/libra_rcnn_r50_vd_fpn_1x.yml deleted file mode 100644 index 6208466ab72810da5f5e74dd4d4dbf299ac250ee..0000000000000000000000000000000000000000 --- a/static/configs/libra_rcnn/libra_rcnn_r50_vd_fpn_1x.yml +++ /dev/null @@ -1,117 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/libra_rcnn_r50_vd_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: BFP - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: LibraBBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -BFP: - base_neck: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - refine_level: 2 - refine_type: nonlocal - nonlocal_reduction: 1.0 - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -LibraBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - bbox_loss: BalancedL1Loss - -BalancedL1Loss: - alpha: 0.5 - gamma: 1.5 - beta: 1.0 - loss_weight: 1.0 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/mask_fpn_reader.yml b/static/configs/mask_fpn_reader.yml deleted file mode 100644 index aca3e0ba902bc44fb6bd0b49f282d6a2457a086f..0000000000000000000000000000000000000000 --- a/static/configs/mask_fpn_reader.yml +++ /dev/null @@ -1,103 +0,0 @@ -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_mask'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - is_mask_flip: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 1 - shuffle: true - worker_num: 2 - drop_last: false - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_last: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_last: false diff --git a/static/configs/mask_rcnn_r101_fpn_1x.yml b/static/configs/mask_rcnn_r101_fpn_1x.yml deleted file mode 100644 index c1cd56230c2376792fb85afca6fb6988a69391b9..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_r101_fpn_1x.yml +++ /dev/null @@ -1,111 +0,0 @@ -architecture: MaskRCNN -use_gpu: true -max_iters: 180000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar -metric: COCO -weights: output/mask_rcnn_r101_fpn_1x/model_final -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_fpn_reader.yml' diff --git a/static/configs/mask_rcnn_r101_vd_fpn_1x.yml b/static/configs/mask_rcnn_r101_vd_fpn_1x.yml deleted file mode 100644 index 1ba08f542fa85599887fe5d53ae60baf075fb9ce..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_r101_vd_fpn_1x.yml +++ /dev/null @@ -1,112 +0,0 @@ -architecture: MaskRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -weights: output/mask_rcnn_r101_vd_fpn_1x/model_final -metric: COCO -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_fpn_reader.yml' diff --git a/static/configs/mask_rcnn_r50_1x.yml b/static/configs/mask_rcnn_r50_1x.yml deleted file mode 100644 index 127e783713a981b6d04e534ff129a86dd48a30a1..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_r50_1x.yml +++ /dev/null @@ -1,102 +0,0 @@ -architecture: MaskRCNN -use_gpu: true -max_iters: 180000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/mask_rcnn_r50_1x/model_final -num_classes: 81 - -MaskRCNN: - backbone: ResNet - rpn_head: RPNHead - roi_extractor: RoIAlign - bbox_assigner: BBoxAssigner - bbox_head: BBoxHead - mask_assigner: MaskAssigner - mask_head: MaskHead - -ResNet: - norm_type: affine_channel - norm_decay: 0. - depth: 50 - feature_maps: 4 - freeze_at: 2 - -ResNetC5: - depth: 50 - norm_type: affine_channel - -RPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 12000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 6000 - post_nms_top_n: 1000 - -RoIAlign: - resolution: 14 - spatial_scale: 0.0625 - sampling_ratio: 0 - -BBoxHead: - head: ResNetC5 - nms: - keep_top_k: 100 - nms_threshold: 0.5 - normalized: false - score_threshold: 0.05 - -MaskHead: - dilation: 1 - conv_dim: 256 - resolution: 14 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 14 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_reader.yml' diff --git a/static/configs/mask_rcnn_r50_1x_cocome_kunlun.yml b/static/configs/mask_rcnn_r50_1x_cocome_kunlun.yml deleted file mode 100644 index 58517fd4a56811b8efbce57ab0248adc3dfe902e..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_r50_1x_cocome_kunlun.yml +++ /dev/null @@ -1,104 +0,0 @@ -architecture: MaskRCNN -use_gpu: false -use_xpu: true -max_iters: 1200 -snapshot_iter: 100 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_2x.tar -metric: COCO -weights: output/mask_rcnn_r50_1x_cocome_kunlun/model_final -num_classes: 2 -finetune_exclude_pretrained_params: ['cls_score'] - -MaskRCNN: - backbone: ResNet - rpn_head: RPNHead - roi_extractor: RoIAlign - bbox_assigner: BBoxAssigner - bbox_head: BBoxHead - mask_assigner: MaskAssigner - mask_head: MaskHead - -ResNet: - norm_type: affine_channel - norm_decay: 0. - depth: 50 - feature_maps: 4 - freeze_at: 2 - -ResNetC5: - depth: 50 - norm_type: affine_channel - -RPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 12000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 6000 - post_nms_top_n: 1000 - -RoIAlign: - resolution: 14 - spatial_scale: 0.0625 - sampling_ratio: 0 - -BBoxHead: - head: ResNetC5 - nms: - keep_top_k: 100 - nms_threshold: 0.5 - normalized: false - score_threshold: 0.05 - -MaskHead: - dilation: 1 - conv_dim: 256 - resolution: 14 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 14 - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [900, 1100] - - !LinearWarmup - start_factor: 0.1 - steps: 300 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_reader_cocome.yml' diff --git a/static/configs/mask_rcnn_r50_2x.yml b/static/configs/mask_rcnn_r50_2x.yml deleted file mode 100644 index 8a8e62bc0c20f83af23c1c1af3a8232d2a8fe534..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_r50_2x.yml +++ /dev/null @@ -1,104 +0,0 @@ -architecture: MaskRCNN -use_gpu: true -max_iters: 360000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/mask_rcnn_r50_2x/model_final -num_classes: 81 - -MaskRCNN: - backbone: ResNet - rpn_head: RPNHead - roi_extractor: RoIAlign - bbox_assigner: BBoxAssigner - bbox_head: BBoxHead - mask_assigner: MaskAssigner - mask_head: MaskHead - - -ResNet: - norm_type: affine_channel - norm_decay: 0. - depth: 50 - feature_maps: 4 - freeze_at: 2 - -ResNetC5: - depth: 50 - norm_type: affine_channel - -RPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 12000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 6000 - post_nms_top_n: 1000 - -RoIAlign: - resolution: 14 - spatial_scale: 0.0625 - sampling_ratio: 0 - -BBoxHead: - head: ResNetC5 - nms: - keep_top_k: 100 - nms_threshold: 0.5 - normalized: false - score_threshold: 0.05 - -MaskHead: - dilation: 1 - conv_dim: 256 - resolution: 14 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 14 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - #start the warm up from base_lr * start_factor - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_reader.yml' diff --git a/static/configs/mask_rcnn_r50_fpn_1x.yml b/static/configs/mask_rcnn_r50_fpn_1x.yml deleted file mode 100644 index 35a5495f42fb3679fb7fe49543e11ea62182f779..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_r50_fpn_1x.yml +++ /dev/null @@ -1,111 +0,0 @@ -architecture: MaskRCNN -use_gpu: true -max_iters: 180000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/mask_rcnn_r50_fpn_1x/model_final -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_fpn_reader.yml' diff --git a/static/configs/mask_rcnn_r50_fpn_2x.yml b/static/configs/mask_rcnn_r50_fpn_2x.yml deleted file mode 100644 index 9fffd92211bda8bed1c28d94f1fded93730805d5..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_r50_fpn_2x.yml +++ /dev/null @@ -1,111 +0,0 @@ -architecture: MaskRCNN -max_iters: 360000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -weights: output/mask_rcnn_r50_fpn_2x/model_final -metric: COCO -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_fpn_reader.yml' diff --git a/static/configs/mask_rcnn_r50_vd_fpn_2x.yml b/static/configs/mask_rcnn_r50_vd_fpn_2x.yml deleted file mode 100644 index 5f46b475b83aa2925783f70ccff63c745d23a16a..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_r50_vd_fpn_2x.yml +++ /dev/null @@ -1,112 +0,0 @@ -architecture: MaskRCNN -use_gpu: true -max_iters: 360000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -metric: COCO -weights: output/mask_rcnn_r50_vd_fpn_2x/model_final -num_classes: 81 - -MaskRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_fpn_reader.yml' diff --git a/static/configs/mask_rcnn_se154_vd_fpn_s1x.yml b/static/configs/mask_rcnn_se154_vd_fpn_s1x.yml deleted file mode 100644 index fe973ececb8545bf05b1772bff910ca40ac3bc2c..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_se154_vd_fpn_s1x.yml +++ /dev/null @@ -1,114 +0,0 @@ -architecture: MaskRCNN -max_iters: 260000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_pretrained.tar -weights: output/mask_rcnn_se154_vd_fpn_s1x/model_final -metric: COCO -num_classes: 81 - -MaskRCNN: - backbone: SENet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -SENet: - depth: 152 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [200000, 240000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_fpn_reader.yml' diff --git a/static/configs/mask_rcnn_x101_vd_64x4d_fpn_1x.yml b/static/configs/mask_rcnn_x101_vd_64x4d_fpn_1x.yml deleted file mode 100644 index 315dc2daf13759cc0b168768fbd80e2aab99e969..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_x101_vd_64x4d_fpn_1x.yml +++ /dev/null @@ -1,114 +0,0 @@ -architecture: MaskRCNN -max_iters: 180000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar -weights: output/mask_rcnn_x101_vd_64x4d_fpn_1x/model_final -metric: COCO -num_classes: 81 - -MaskRCNN: - backbone: ResNeXt - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNeXt: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_fpn_reader.yml' diff --git a/static/configs/mask_rcnn_x101_vd_64x4d_fpn_2x.yml b/static/configs/mask_rcnn_x101_vd_64x4d_fpn_2x.yml deleted file mode 100644 index 3630c269ca10a1d279228d500e16ccce266fd957..0000000000000000000000000000000000000000 --- a/static/configs/mask_rcnn_x101_vd_64x4d_fpn_2x.yml +++ /dev/null @@ -1,114 +0,0 @@ -architecture: MaskRCNN -max_iters: 360000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar -weights: output/mask_rcnn_x101_vd_64x4d_fpn_2x/model_final -metric: COCO -num_classes: 81 - -MaskRCNN: - backbone: ResNeXt - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNeXt: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: affine_channel - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'mask_fpn_reader.yml' diff --git a/static/configs/mask_reader.yml b/static/configs/mask_reader.yml deleted file mode 100644 index 165a09b82bb448dede9273ae3b7da297e318c131..0000000000000000000000000000000000000000 --- a/static/configs/mask_reader.yml +++ /dev/null @@ -1,95 +0,0 @@ -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_mask'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - is_mask_flip: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: -1 - use_padded_im_info: false - batch_size: 1 - shuffle: true - worker_num: 2 - drop_last: false - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_size: 1 - shuffle: false - drop_last: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_size: 1 - shuffle: false - drop_last: false diff --git a/static/configs/mask_reader_cocome.yml b/static/configs/mask_reader_cocome.yml deleted file mode 100644 index 1b44491c5c3a7dfbc12138c682ac00b24946ef4a..0000000000000000000000000000000000000000 --- a/static/configs/mask_reader_cocome.yml +++ /dev/null @@ -1,95 +0,0 @@ -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_mask'] - dataset: - !COCODataSet - image_dir: train - anno_path: annotations/instances_split_train.json - dataset_dir: dataset/cocome - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - is_mask_flip: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: -1 - use_padded_im_info: false - batch_size: 1 - shuffle: true - worker_num: 2 - drop_last: false - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: train - anno_path: annotations/instances_split_val.json - dataset_dir: dataset/cocome - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_size: 1 - shuffle: false - drop_last: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: dataset/cocome/annotations/instances_split_val.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_size: 1 - shuffle: false - drop_last: false diff --git a/static/configs/mobile/README.md b/static/configs/mobile/README.md deleted file mode 100755 index 24db1c98cd0c182d653e715919a7fcc7f9f60579..0000000000000000000000000000000000000000 --- a/static/configs/mobile/README.md +++ /dev/null @@ -1,97 +0,0 @@ -[English](README_en.md) | 简体中文 - -# 移动端模型库 - - -## 模型 - -PaddleDetection目前提供一系列针对移动应用进行优化的模型,主要支持以下结构: - -| 骨干网络 | 结构 | 输入大小 | 图片/gpu [1](#gpu) | 学习率策略 | Box AP | 下载 | PaddleLite模型下载 | -| :----------------------- | :------------------------ | :---: | :--------------------: | :------------ | :----: | :--- | :----------------- | -| MobileNetV3 Small | SSDLite | 320 | 64 | 400K (cosine) | 16.2 | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/ssdlite_mobilenet_v3_small.pdparams) | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/ssdlite_mobilenet_v3_small.tar) | -| MobileNetV3 Small | SSDLite Quant [2](#quant) | 320 | 64 | 400K (cosine) | 15.4 | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/ssdlite_mobilenet_v3_small_quant.tar) | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/ssdlite_mobilenet_v3_small_quant.tar) | -| MobileNetV3 Large | SSDLite | 320 | 64 | 400K (cosine) | 23.3 | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/ssdlite_mobilenet_v3_large.pdparams) | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/ssdlite_mobilenet_v3_large.tar) | -| MobileNetV3 Large | SSDLite Quant [2](#quant) | 320 | 64 | 400K (cosine) | 22.6 | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/ssdlite_mobilenet_v3_large_quant.tar) | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/ssdlite_mobilenet_v3_large_quant.tar) | -| MobileNetV3 Large w/ FPN | Cascade RCNN | 320 | 2 | 500k (cosine) | 25.0 | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/cascade_rcnn_mobilenetv3_fpn_320.tar) | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/cascade_rcnn_mobilenetv3_fpn_320.tar) | -| MobileNetV3 Large w/ FPN | Cascade RCNN | 640 | 2 | 500k (cosine) | 30.2 | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/cascade_rcnn_mobilenetv3_fpn_640.tar) | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/cascade_rcnn_mobilenetv3_fpn_640.tar) | -| MobileNetV3 Large | YOLOv3 | 320 | 8 | 500K | 27.1 | [链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v3.pdparams) | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/yolov3_mobilenet_v3.tar) | -| MobileNetV3 Large | YOLOv3 Prune [3](#prune) | 320 | 8 | - | 24.6 | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/yolov3_mobilenet_v3_prune75875_FPGM_distillby_r34.pdparams) | [链接](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/yolov3_mobilenet_v3_prune86_FPGM_320.tar) | - -**注意**: - -- [1] 模型统一使用8卡训练。 -- [2] 参考下面关于[SSDLite量化的说明](#SSDLite量化说明)。 -- [3] 参考下面关于[YOLO剪裁的说明](#YOLOv3剪裁说明)。 - - -## 评测结果 - -- 模型使用 [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) 2.6 (即将发布) 在下列平台上进行了测试 - - Qualcomm Snapdragon 625 - - Qualcomm Snapdragon 835 - - Qualcomm Snapdragon 845 - - Qualcomm Snapdragon 855 - - HiSilicon Kirin 970 - - HiSilicon Kirin 980 - -- 单CPU线程 (单位: ms) - -| | SD625 | SD835 | SD845 | SD855 | Kirin 970 | Kirin 980 | -|------------------|---------|---------|---------|---------|-----------|-----------| -| SSDLite Large | 289.071 | 134.408 | 91.933 | 48.2206 | 144.914 | 55.1186 | -| SSDLite Large Quant | | | | | | | -| SSDLite Small | 122.932 | 57.1914 | 41.003 | 22.0694 | 61.5468 | 25.2106 | -| SSDLite Small Quant | | | | | | | -| YOLOv3 baseline | 1082.5 | 435.77 | 317.189 | 155.948 | 536.987 | 178.999 | -| YOLOv3 prune | 253.98 | 131.279 | 89.4124 | 48.2856 | 122.732 | 55.8626 | -| Cascade RCNN 320 | 286.526 | 125.635 | 87.404 | 46.184 | 149.179 | 52.9994 | -| Cascade RCNN 640 | 1115.66 | 495.926 | 351.361 | 189.722 | 573.558 | 207.917 | - -- 4 CPU线程 (单位: ms) - -| | SD625 | SD835 | SD845 | SD855 | Kirin 970 | Kirin 980 | -|------------------|---------|---------|---------|---------|-----------|-----------| -| SSDLite Large | 107.535 | 51.1382 | 34.6392 | 20.4978 | 50.5598 | 24.5318 | -| SSDLite Large Quant | | | | | | | -| SSDLite Small | 51.5704 | 24.5156 | 18.5486 | 11.4218 | 24.9946 | 16.7158 | -| SSDLite Small Quant | | | | | | | -| YOLOv3 baseline | 413.486 | 184.248 | 133.624 | 75.7354 | 202.263 | 126.435 | -| YOLOv3 prune | 98.5472 | 53.6228 | 34.4306 | 21.3112 | 44.0722 | 31.201 | -| Cascade RCNN 320 | 131.515 | 59.6026 | 39.4338 | 23.5802 | 58.5046 | 36.9486 | -| Cascade RCNN 640 | 473.083 | 224.543 | 156.205 | 100.686 | 231.108 | 138.391 | - -## SSDLite量化说明 - -在SSDLite模型中我们采用完整量化训练的方式对模型进行训练,在8卡GPU下共训练40万轮,训练中将`res_conv1`与`se_block`固定不训练,执行指令为: - -```shell -python slim/quantization/train.py --not_quant_pattern res_conv1 se_block \ - -c configs/ssd/ssdlite_mobilenet_v3_large.yml \ - --eval -``` -更多量化教程请参考[模型量化压缩教程](../../docs/advanced_tutorials/slim/quantization/QUANTIZATION.md) - -## YOLOv3剪裁说明 - -首先对YOLO检测头进行剪裁,然后再使用 YOLOv3-ResNet34 作为teacher网络对剪裁后的模型进行蒸馏, teacher网络在COCO上的mAP为31.4 (输入大小320\*320). - -可以使用如下两种方式进行剪裁: - -- 固定比例剪裁, 整体剪裁率是86% - - ```shell - --pruned_params="yolo_block.0.0.0.conv.weights,yolo_block.0.0.1.conv.weights,yolo_block.0.1.0.conv.weights,yolo_block.0.1.1.conv.weights,yolo_block.0.2.conv.weights,yolo_block.0.tip.conv.weights,yolo_block.1.0.0.conv.weights,yolo_block.1.0.1.conv.weights,yolo_block.1.1.0.conv.weights,yolo_block.1.1.1.conv.weights,yolo_block.1.2.conv.weights,yolo_block.1.tip.conv.weights,yolo_block.2.0.0.conv.weights,yolo_block.2.0.1.conv.weights,yolo_block.2.1.0.conv.weights,yolo_block.2.1.1.conv.weights,yolo_block.2.2.conv.weights,yolo_block.2.tip.conv.weights" \ - --pruned_ratios="0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.875,0.875,0.875,0.875,0.875,0.875" - ``` -- 使用 [FPGM](https://arxiv.org/abs/1811.00250) 算法剪裁: - - ```shell - --prune_criterion=geometry_median - ``` - - -## 敬请关注后续发布 - -- [ ] 更多模型 -- [ ] 量化模型 diff --git a/static/configs/mobile/README_en.md b/static/configs/mobile/README_en.md deleted file mode 100644 index 133afffe91de71c19f39d92bffdbbf234d1b26b6..0000000000000000000000000000000000000000 --- a/static/configs/mobile/README_en.md +++ /dev/null @@ -1,99 +0,0 @@ -English | [简体中文](README.md) - -# Mobile Model Zoo - - -## Models - -This directory contains models optimized for mobile applications, at present the following models included: - -| Backbone | Architecture | Input | Image/gpu [1](#gpu) | Lr schd | Box AP | Download | PaddleLite Model Download | -| :----------------------- | :------------------------ | :---: | :--------------------: | :------------ | :----: | :------- | :------------------------ | -| MobileNetV3 Small | SSDLite | 320 | 64 | 400K (cosine) | 16.2 | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/ssdlite_mobilenet_v3_small.pdparam) | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/ssdlite_mobilenet_v3_small.tar) | -| MobileNetV3 Small | SSDLite Quant [2](#quant) | 320 | 64 | 400K (cosine) | 15.4 | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/ssdlite_mobilenet_v3_small_quant.tar) | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/ssdlite_mobilenet_v3_small_quant.tar) | -| MobileNetV3 Large | SSDLite | 320 | 64 | 400K (cosine) | 23.3 | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/ssdlite_mobilenet_v3_large.pdparam) | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/ssdlite_mobilenet_v3_large.tar) | -| MobileNetV3 Large | SSDLite Quant [2](#quant) | 320 | 64 | 400K (cosine) | 22.6 | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/ssdlite_mobilenet_v3_large_quant.tar) | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/ssdlite_mobilenet_v3_large_quant.tar) | -| MobileNetV3 Large w/ FPN | Cascade RCNN | 320 | 2 | 500k (cosine) | 25.0 | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/cascade_rcnn_mobilenetv3_fpn_320.tar) | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/cascade_rcnn_mobilenetv3_fpn_320.tar) | -| MobileNetV3 Large w/ FPN | Cascade RCNN | 640 | 2 | 500k (cosine) | 30.2 | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/cascade_rcnn_mobilenetv3_fpn_640.tar) | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/cascade_rcnn_mobilenetv3_fpn_640.tar) | -| MobileNetV3 Large | YOLOv3 | 320 | 8 | 500K | 27.1 | [Link](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v3.pdparams) | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/yolov3_mobilenet_v3.tar) | -| MobileNetV3 Large | YOLOv3 Prune 2 | 320 | 8 | - | 24.6 | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/yolov3_mobilenet_v3_prune75875_FPGM_distillby_r34.pdparams) | [Link](https://paddlemodels.bj.bcebos.com/object_detection/mobile_models/lite/yolov3_mobilenet_v3_prune86_FPGM_320.tar) | - -**Notes**: - -- [1] All models are trained on 8 GPUs. -- [2] See the note section on [SSDLite quantization](#Notes-on-SSDLite-quant)。 -- [3] See the note section on [how YOLO head is pruned](#Notes-on-YOLOv3-pruning). - - -## Benchmarks Results - -- Models are benched on following chipsets with [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) 2.6 (to be released) - - Qualcomm Snapdragon 625 - - Qualcomm Snapdragon 835 - - Qualcomm Snapdragon 845 - - Qualcomm Snapdragon 855 - - HiSilicon Kirin 970 - - HiSilicon Kirin 980 - -- With 1 CPU thread (latency numbers are in ms) - - | | SD625 | SD835 | SD845 | SD855 | Kirin 970 | Kirin 980 | - |------------------|---------|---------|---------|---------|-----------|-----------| - | SSDLite Large | 289.071 | 134.408 | 91.933 | 48.2206 | 144.914 | 55.1186 | - | SSDLite Large Quant | | | | | | | - | SSDLite Small | 122.932 | 57.1914 | 41.003 | 22.0694 | 61.5468 | 25.2106 | - | SSDLite Small Quant | | | | | | | - | YOLOv3 baseline | 1082.5 | 435.77 | 317.189 | 155.948 | 536.987 | 178.999 | - | YOLOv3 prune | 253.98 | 131.279 | 89.4124 | 48.2856 | 122.732 | 55.8626 | - | Cascade RCNN 320 | 286.526 | 125.635 | 87.404 | 46.184 | 149.179 | 52.9994 | - | Cascade RCNN 640 | 1115.66 | 495.926 | 351.361 | 189.722 | 573.558 | 207.917 | - -- With 4 CPU threads (latency numbers are in ms) - - | | SD625 | SD835 | SD845 | SD855 | Kirin 970 | Kirin 980 | - |------------------|---------|---------|---------|---------|-----------|-----------| - | SSDLite Large | 107.535 | 51.1382 | 34.6392 | 20.4978 | 50.5598 | 24.5318 | - | SSDLite Large Quant | | | | | | | - | SSDLite Small | 51.5704 | 24.5156 | 18.5486 | 11.4218 | 24.9946 | 16.7158 | - | SSDLite Small Quant | | | | | | | - | YOLOv3 baseline | 413.486 | 184.248 | 133.624 | 75.7354 | 202.263 | 126.435 | - | YOLOv3 prune | 98.5472 | 53.6228 | 34.4306 | 21.3112 | 44.0722 | 31.201 | - | Cascade RCNN 320 | 131.515 | 59.6026 | 39.4338 | 23.5802 | 58.5046 | 36.9486 | - | Cascade RCNN 640 | 473.083 | 224.543 | 156.205 | 100.686 | 231.108 | 138.391 | - - -## Notes on SSDLite quantization - -We use a complete quantitative training method to train the SSDLite model. It is trained for a total of 400,000 rounds with the 8-card GPU. We freeze `res_conv1` and `se_block`. The command used is listed bellow: - -```shell -python slim/quantization/train.py --not_quant_pattern res_conv1 se_block \ - -c configs/ssd/ssdlite_mobilenet_v3_large.yml \ - --eval -``` - -For more quantization tutorials, please refer to [Model Quantization Compression Tutorial](../../docs/advanced_tutorials/slim/quantization/QUANTIZATION.md) - -## Notes on YOLOv3 pruning - -We pruned the YOLO-head and distill the pruned model with YOLOv3-ResNet34 as the teacher, which has a higher mAP on COCO (31.4 with 320\*320 input). - -The following configurations can be used for pruning: - -- Prune with fixed ratio, overall prune ratios is 86% - - ```shell - --pruned_params="yolo_block.0.0.0.conv.weights,yolo_block.0.0.1.conv.weights,yolo_block.0.1.0.conv.weights,yolo_block.0.1.1.conv.weights,yolo_block.0.2.conv.weights,yolo_block.0.tip.conv.weights,yolo_block.1.0.0.conv.weights,yolo_block.1.0.1.conv.weights,yolo_block.1.1.0.conv.weights,yolo_block.1.1.1.conv.weights,yolo_block.1.2.conv.weights,yolo_block.1.tip.conv.weights,yolo_block.2.0.0.conv.weights,yolo_block.2.0.1.conv.weights,yolo_block.2.1.0.conv.weights,yolo_block.2.1.1.conv.weights,yolo_block.2.2.conv.weights,yolo_block.2.tip.conv.weights" \ - --pruned_ratios="0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.875,0.875,0.875,0.875,0.875,0.875" - ``` -- Prune filters using [FPGM](https://arxiv.org/abs/1811.00250) algorithm: - - ```shell - --prune_criterion=geometry_median - ``` - - -## Upcoming - -- [ ] More models configurations -- [ ] Quantized models diff --git a/static/configs/mobile/cascade_rcnn_mobilenetv3_fpn_320.yml b/static/configs/mobile/cascade_rcnn_mobilenetv3_fpn_320.yml deleted file mode 100644 index e02b5ac6803c3570c400780c6c0a6ac594a5b048..0000000000000000000000000000000000000000 --- a/static/configs/mobile/cascade_rcnn_mobilenetv3_fpn_320.yml +++ /dev/null @@ -1,219 +0,0 @@ -architecture: CascadeRCNN -max_iters: 500000 -snapshot_iter: 50000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar -weights: output/cascade_rcnn_mobilenetv3_fpn_320/model_final -metric: COCO -num_classes: 81 - -CascadeRCNN: - backbone: MobileNetV3RCNN - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -MobileNetV3RCNN: - norm_type: bn - freeze_norm: true - norm_decay: 0.0 - feature_maps: [2, 3, 4] - conv_decay: 0.00001 - lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] - scale: 1.0 - model_name: large - -FPN: - min_level: 2 - max_level: 6 - num_chan: 48 - has_extra_convs: true - spatial_scale: [0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 16 - min_level: 2 - max_level: 6 - num_chan: 48 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 300 - post_nms_top_n: 100 - -FPNRoIAlign: - canconical_level: 3 - canonical_size: 112 - min_level: 2 - max_level: 4 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - bbox_loss: BalancedL1Loss - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -BalancedL1Loss: - alpha: 0.5 - gamma: 1.5 - beta: 1.0 - loss_weight: 1.0 - -CascadeTwoFCHead: - mlp_dim: 128 - -LearningRate: - base_lr: 0.02 - schedulers: - - !CosineDecay - max_iters: 500000 - - !LinearWarmup - start_factor: 0.1 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.00004 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !AutoAugmentImage - autoaug_type: v1 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: [224, 256, 288, 320, 352, 384] - max_size: 512 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 2 - shuffle: true - worker_num: 2 - use_process: false - - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 320 - target_size: 320 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - - - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 320 - target_size: 320 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_empty: false - worker_num: 2 diff --git a/static/configs/mobile/cascade_rcnn_mobilenetv3_fpn_640.yml b/static/configs/mobile/cascade_rcnn_mobilenetv3_fpn_640.yml deleted file mode 100644 index 5e2a486c0b693e80d955c718dd4e195405781d1b..0000000000000000000000000000000000000000 --- a/static/configs/mobile/cascade_rcnn_mobilenetv3_fpn_640.yml +++ /dev/null @@ -1,219 +0,0 @@ -architecture: CascadeRCNN -max_iters: 500000 -snapshot_iter: 50000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar -weights: output/cascade_rcnn_mobilenetv3_fpn_640/model_final -metric: COCO -num_classes: 81 - -CascadeRCNN: - backbone: MobileNetV3RCNN - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -MobileNetV3RCNN: - norm_type: bn - freeze_norm: true - norm_decay: 0.0 - feature_maps: [2, 3, 4] - conv_decay: 0.00001 - lr_mult_list: [1.0, 1.0, 1.0, 1.0, 1.0] - scale: 1.0 - model_name: large - -FPN: - min_level: 2 - max_level: 6 - num_chan: 48 - has_extra_convs: true - spatial_scale: [0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 24 - min_level: 2 - max_level: 6 - num_chan: 48 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 300 - post_nms_top_n: 100 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - bbox_loss: BalancedL1Loss - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -BalancedL1Loss: - alpha: 0.5 - gamma: 1.5 - beta: 1.0 - loss_weight: 1.0 - -CascadeTwoFCHead: - mlp_dim: 128 - -LearningRate: - base_lr: 0.02 - schedulers: - - !CosineDecay - max_iters: 500000 - - !LinearWarmup - start_factor: 0.1 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.00004 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !AutoAugmentImage - autoaug_type: v1 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: [416, 448, 480, 512, 544, 576, 608, 640, 672] - max_size: 1000 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 2 - shuffle: true - worker_num: 2 - use_process: false - - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 640 - target_size: 640 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - - - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 640 - target_size: 640 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_empty: false - worker_num: 2 diff --git a/static/configs/mobile/ssdlite_mobilenet_v3_large.yml b/static/configs/mobile/ssdlite_mobilenet_v3_large.yml deleted file mode 120000 index 345d8f12b405262ca0d6bfb6d4110632b0335d5a..0000000000000000000000000000000000000000 --- a/static/configs/mobile/ssdlite_mobilenet_v3_large.yml +++ /dev/null @@ -1 +0,0 @@ -../ssd/ssdlite_mobilenet_v3_large.yml \ No newline at end of file diff --git a/static/configs/mobile/ssdlite_mobilenet_v3_small.yml b/static/configs/mobile/ssdlite_mobilenet_v3_small.yml deleted file mode 120000 index 63fb2a9f353e6791b8007346cfd06541711d0541..0000000000000000000000000000000000000000 --- a/static/configs/mobile/ssdlite_mobilenet_v3_small.yml +++ /dev/null @@ -1 +0,0 @@ -../ssd/ssdlite_mobilenet_v3_small.yml \ No newline at end of file diff --git a/static/configs/mobile/yolov3_mobilenet_v3.yml b/static/configs/mobile/yolov3_mobilenet_v3.yml deleted file mode 120000 index ea0525a3eca88cd99d3e09df1f665a7271957e1d..0000000000000000000000000000000000000000 --- a/static/configs/mobile/yolov3_mobilenet_v3.yml +++ /dev/null @@ -1 +0,0 @@ -../yolov3_mobilenet_v3.yml \ No newline at end of file diff --git a/static/configs/mobile/yolov3_reader.yml b/static/configs/mobile/yolov3_reader.yml deleted file mode 120000 index 0539e0d461ce322f4686fb0542b1f9a3227f8013..0000000000000000000000000000000000000000 --- a/static/configs/mobile/yolov3_reader.yml +++ /dev/null @@ -1 +0,0 @@ -../yolov3_reader.yml \ No newline at end of file diff --git a/static/configs/obj365/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml b/static/configs/obj365/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml deleted file mode 100644 index 48ab8d5ae95a15eb69e8be12f2f4a00e601376b5..0000000000000000000000000000000000000000 --- a/static/configs/obj365/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml +++ /dev/null @@ -1,215 +0,0 @@ -architecture: CascadeRCNNClsAware -max_iters: 800000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet200_vd_pretrained.tar -weights: output/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/model_final -# obj365 dataset format and its eval method are same as those for coco -metric: COCO -num_classes: 366 - -CascadeRCNNClsAware: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 200 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - nonlocal_stages: [4] - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 14 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - class_aware: True - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: MultiClassSoftNMS - -CascadeTwoFCHead: - mlp_dim: 1024 - -MultiClassSoftNMS: - score_threshold: 0.001 - keep_top_k: 300 - softnms_sigma: 0.15 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [520000, 740000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - dataset_dir: dataset/obj365 - anno_path: train.json - image_dir: train - sample_transforms: - - !DecodeImage - to_rgb: True - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: True - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - - !ResizeImage - interp: 1 - target_size: [416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152, 1184, 1216, 1248, 1280, 1312, 1344, 1376, 1408] - max_size: 1800 - use_cv2: true - - !Permute - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - batch_size: 1 - shuffle: true - drop_last: false - worker_num: 2 - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !COCODataSet - dataset_dir: dataset/obj365 - anno_path: val.json - image_dir: val - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: False - - !NormalizeImage - is_channel_first: false - is_scale: True - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - - !ResizeImage - interp: 1 - target_size: - - 1200 - max_size: 2000 - use_cv2: true - - !Permute - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - batch_size: 1 - worker_num: 2 - drop_empty: false - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: dataset/coco/objects365_label.txt - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - worker_num: 2 diff --git a/static/configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas.yml b/static/configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas.yml deleted file mode 100644 index aef0fe9664382bfb452ed9bd55f21530b33e0d3a..0000000000000000000000000000000000000000 --- a/static/configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas.yml +++ /dev/null @@ -1,250 +0,0 @@ -architecture: CascadeRCNN -max_iters: 500000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_coco_pretrained.tar -weights: output/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas/model_final -metric: COCO -num_classes: 366 - -CascadeRCNN: - backbone: SENet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -SENet: - depth: 152 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: bn - freeze_norm: True - variant: d - dcn_v2_stages: [3, 4, 5] - std_senet: True - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - freeze_norm: False - norm_type: gn - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 1024 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeXConvNormHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -CascadeXConvNormHead: - norm_type: gn - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [400000, 460000] - - !LinearWarmup - start_factor: 0.01 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - dataset_dir: dataset/objects365 - anno_path: annotations/train.json - image_dir: train - sample_transforms: - - !DecodeImage - to_rgb: True - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: False - mean: - - 102.9801 - - 115.9465 - - 122.7717 - std: - - 1.0 - - 1.0 - - 1.0 - - !ResizeImage - interp: 1 - target_size: - - 416 - - 448 - - 480 - - 512 - - 544 - - 576 - - 608 - - 640 - - 672 - - 704 - - 736 - - 768 - - 800 - - 832 - - 864 - - 896 - - 928 - - 960 - - 992 - - 1024 - - 1056 - - 1088 - - 1120 - - 1152 - - 1184 - - 1216 - - 1248 - - 1280 - - 1312 - - 1344 - - 1376 - - 1408 - max_size: 1600 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - batch_size: 1 - worker_num: 4 - shuffle: true - class_aware_sampling: true - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !COCODataSet - dataset_dir: dataset/objects365 - anno_path: annotations/val.json - image_dir: val - sample_transforms: - - !DecodeImage - to_rgb: True - - !NormalizeImage - is_channel_first: false - is_scale: False - mean: - - 102.9801 - - 115.9465 - - 122.7717 - std: - - 1.0 - - 1.0 - - 1.0 - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - batch_size: 1 - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - batch_size: 1 - dataset: - !ImageFolder - anno_path: dataset/coco/objects365_label.txt - sample_transforms: - - !DecodeImage - to_rgb: True - - !NormalizeImage - is_channel_first: false - is_scale: False - mean: - - 102.9801 - - 115.9465 - - 122.7717 - std: - - 1.0 - - 1.0 - - 1.0 - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - worker_num: 2 diff --git a/static/configs/oidv5/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml b/static/configs/oidv5/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml deleted file mode 100644 index cfc99c67c979dc4b70997c2b8068c68b7aee57e2..0000000000000000000000000000000000000000 --- a/static/configs/oidv5/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml +++ /dev/null @@ -1,212 +0,0 @@ -architecture: CascadeRCNNClsAware -max_iters: 1500000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet200_vd_pretrained.tar -weights: output/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/model_final -metric: OID -num_classes: 501 - -CascadeRCNNClsAware: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 200 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - nonlocal_stages: [4] - -FPN: - min_level: 2 - max_level: 6 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 14 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - class_aware: True - -CascadeBBoxHead: - head: CascadeTwoFCHead - nms: MultiClassSoftNMS - -CascadeTwoFCHead: - mlp_dim: 1024 - -MultiClassSoftNMS: - score_threshold: 0.001 - keep_top_k: 300 - softnms_sigma: 0.15 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [1000000, 1400000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - dataset_dir: dataset/oid - anno_path: train.json - image_dir: train - sample_transforms: - - !DecodeImage - to_rgb: True - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: True - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - - !ResizeImage - interp: 1 - target_size: [416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024, 1056, 1088, 1120, 1152, 1184, 1216, 1248, 1280, 1312, 1344, 1376, 1408] - max_size: 1800 - use_cv2: true - - !Permute - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - batch_size: 1 - drop_last: false - shuffle: true - worker_num: 2 - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !COCODataSet - dataset_dir: dataset/oidv5 - anno_path: val.json - image_dir: val - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: False - - !NormalizeImage - is_channel_first: false - is_scale: True - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - - !ResizeImage - interp: 1 - target_size: - - 1200 - max_size: 2000 - use_cv2: true - - !Permute - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - batch_size: 1 - worker_num: 2 - drop_empty: false - -TestReader: - batch_size: 1 - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - worker_num: 2 diff --git a/static/configs/ppyolo/README.md b/static/configs/ppyolo/README.md deleted file mode 100644 index a993e119f025020e4414a8cf79895741fa1e6d96..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/README.md +++ /dev/null @@ -1,252 +0,0 @@ -English | [简体中文](README_cn.md) - -# PP-YOLO - -## Table of Contents -- [Introduction](#Introduction) -- [Model Zoo](#Model_Zoo) -- [Getting Start](#Getting_Start) -- [Future Work](#Future_Work) -- [Appendix](#Appendix) - -## Introduction - -[PP-YOLO](https://arxiv.org/abs/2007.12099) is a optimized model based on YOLOv3 in PaddleDetection,whose performance(mAP on COCO) and inference spped are better than [YOLOv4](https://arxiv.org/abs/2004.10934),PaddlePaddle 1.8.4(available on pip now) or [Daily Version](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev) is required to run this PP-YOLO。 - -PP-YOLO reached mmAP(IoU=0.5:0.95) as 45.9% on COCO test-dev2017 dataset, and inference speed of FP32 on single V100 is 72.9 FPS, inference speed of FP16 with TensorRT on single V100 is 155.6 FPS. - -
    - -
    - -PP-YOLO and PP-YOLOv2 improved performance and speed of YOLOv3 with following methods: - -- Better backbone: ResNet50vd-DCN -- Larger training batch size: 8 GPUs and mini-batch size as 24 on each GPU -- [Drop Block](https://arxiv.org/abs/1810.12890) -- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) -- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf) -- [Grid Sensitive](https://arxiv.org/abs/2004.10934) -- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf) -- [CoordConv](https://arxiv.org/abs/1807.03247) -- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729) -- Better ImageNet pretrain weights -- [PAN](https://arxiv.org/abs/1803.01534) -- Iou aware Loss -- larger input size - -## Model Zoo - -### PP-YOLO - -| Model | GPU number | images/GPU | backbone | input shape | Box APval | Box APtest | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config | -|:------------------------:|:----------:|:----------:|:----------:| :----------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :-----: | -| YOLOv4(AlexyAB) | - | - | CSPDarknet | 608 | - | 43.5 | 62 | 105.5 | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml) | -| YOLOv4(AlexyAB) | - | - | CSPDarknet | 512 | - | 43.0 | 83 | 138.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml) | -| YOLOv4(AlexyAB) | - | - | CSPDarknet | 416 | - | 41.2 | 96 | 164.0 | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml) | -| YOLOv4(AlexyAB) | - | - | CSPDarknet | 320 | - | 38.0 | 123 | 199.0 | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml) | -| PP-YOLO | 8 | 24 | ResNet50vd | 608 | 44.8 | 45.2 | 72.9 | 155.6 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml) | -| PP-YOLO | 8 | 24 | ResNet50vd | 512 | 43.9 | 44.4 | 89.9 | 188.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml) | -| PP-YOLO | 8 | 24 | ResNet50vd | 416 | 42.1 | 42.5 | 109.1 | 215.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml) | -| PP-YOLO | 8 | 24 | ResNet50vd | 320 | 38.9 | 39.3 | 132.2 | 242.2 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml) | -| PP-YOLO_2x | 8 | 24 | ResNet50vd | 608 | 45.3 | 45.9 | 72.9 | 155.6 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml) | -| PP-YOLO_2x | 8 | 24 | ResNet50vd | 512 | 44.4 | 45.0 | 89.9 | 188.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml) | -| PP-YOLO_2x | 8 | 24 | ResNet50vd | 416 | 42.7 | 43.2 | 109.1 | 215.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml) | -| PP-YOLO_2x | 8 | 24 | ResNet50vd | 320 | 39.5 | 40.1 | 132.2 | 242.2 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml) | -| PP-YOLO | 4 | 32 | ResNet18vd | 512 | 29.3 | 29.5 | 357.1 | 657.9 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml) | -| PP-YOLO | 4 | 32 | ResNet18vd | 416 | 28.6 | 28.9 | 409.8 | 719.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml) | -| PP-YOLO | 4 | 32 | ResNet18vd | 320 | 26.2 | 26.4 | 480.7 | 763.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml) | -| PP-YOLOv2 | 8 | 12 | ResNet50vd | 640 | 49.1 | 49.5 | 68.9 | 106.5 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolov2_r50vd_dcn.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolov2_r50vd_dcn.yml) | -| PP-YOLOv2 | 8 | 12 | ResNet101vd | 640 | 49.7 | 50.3 | 49.5 | 87.0 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolov2_r101vd_dcn.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolov2_r101vd_dcn.yml) | - - -**Notes:** - -- PP-YOLO is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset,Box APtest is evaluation results of `mAP(IoU=0.5:0.95)`. -- PP-YOLO used 8 GPUs for training and mini-batch size as 24 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](../../docs/FAQ.md). -- PP-YOLO inference speed is tesed on single Tesla V100 with batch size as 1, CUDA 10.2, CUDNN 7.5.1, TensorRT 5.1.2.2 in TensorRT mode. -- PP-YOLO FP32 inference speed testing uses inference model exported by `tools/export_model.py` and benchmarked by running `depoly/python/infer.py` with `--run_benchmark`. All testing results do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method. -- TensorRT FP16 inference speed testing exclude the time cost of bounding-box decoding(`yolo_box`) part comparing with FP32 testing above, which means that data reading, bounding-box decoding and post-processing(NMS) is excluded(test method same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) too) -- YOLOv4(AlexyAB) performance and inference speed is copy from single Tesla V100 testing results in [YOLOv4 github repo](https://github.com/AlexeyAB/darknet), Tesla V100 TensorRT FP16 inference speed is testing with tkDNN configuration and TensorRT 5.1.2.2 on single Tesla V100 based on [AlexyAB/darknet repo](https://github.com/AlexeyAB/darknet). -- Download and configuration of YOLOv4(AlexyAB) is reproduced model of YOLOv4 in PaddleDetection, whose evaluation performance is same as YOLOv4(AlexyAB), and finetune training is supported in PaddleDetection currently, reproducing by training from backbone pretrain weights is on working, see [PaddleDetection YOLOv4](../yolov4/README.md) for details. -- PP-YOLO trained with `batch_size=24` in each GPU with memory as 32G, configuation yaml with `batch_size=12` which can be trained on GPU with memory as 16G is provided as `ppyolo_2x_bs12.yml`, training with `batch_size=12` reached `mAP(IoU=0.5:0.95) = 45.1%` on COCO val2017 dataset, download weights by [ppyolo_2x_bs12 model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x_bs12.pdparams) - -### PP-YOLO for mobile - -| Model | GPU number | images/GPU | Model Size | input shape | Box APval | Box AP50val | Kirin 990 1xCore(FPS) | download | inference model download | config | -|:----------------------------:|:----------:|:----------:| :--------: | :----------:| :------------------: | :--------------------: | :-------------------: | :------: | :----------------------: | :-----: | -| PP-YOLO_MobileNetV3_large | 4 | 32 | 18MB | 320 | 23.2 | 42.6 | 15.6 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.pdparams) | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_large.yml) | -| PP-YOLO_MobileNetV3_small | 4 | 32 | 11MB | 320 | 17.2 | 33.8 | 28.6 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small.pdparams) | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml) | - -**Notes:** - -- PP-YOLO_MobileNetV3 is trained on COCO train2017 datast and evaluated on val2017 dataset,Box APval is evaluation results of `mAP(IoU=0.5:0.95)`, Box APval is evaluation results of `mAP(IoU=0.5)`. -- PP-YOLO_MobileNetV3 used 4 GPUs for training and mini-batch size as 32 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](../../docs/FAQ.md). -- PP-YOLO_MobileNetV3 inference speed is tested on Kirin 990 with 1 thread. - -### Slim PP-YOLO - -| Model | GPU number | images/GPU | Prune Ratio | Teacher Model | Model Size | input shape | Box APval | Kirin 990 1xCore(FPS) | download | inference model download | config | -|:----------------------------:|:----------:|:----------:| :---------: | :-----------------------: | :--------: | :----------:| :------------------: | :-------------------: | :------: | :----------------------: | :-----: | -| PP-YOLO_MobileNetV3_small | 4 | 32 | 75% | PP-YOLO_MobileNetV3_large | 4.2MB | 320 | 16.2 | 39.8 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.pdparams) | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml) | - -- Slim PP-YOLO is trained by slim traing method from [Distill pruned model](../../slim/extensions/distill_pruned_model/README.md),distill training pruned PP-YOLO_MobileNetV3_small model with PP-YOLO_MobileNetV3_large model as the teacher model -- Pruning detectiom head of PP-YOLO model with ratio as 75%, while the arguments are `--pruned_params="yolo_block.0.2.conv.weights,yolo_block.0.tip.conv.weights,yolo_block.1.2.conv.weights,yolo_block.1.tip.conv.weights" --pruned_ratios="0.75,0.75,0.75,0.75"` -- For Slim PP-YOLO training, evaluation, inference and model exporting, please see [Distill pruned model](../../slim/extensions/distill_pruned_model/README.md) - -### PP-YOLO tiny - -| Model | GPU number | images/GPU | Model Size | Post Quant Model Size | input shape | Box APval | Kirin 990 4xCore(FPS) | download | config | config | post quant model | -|:----------------------------:|:-------:|:-------------:|:----------:| :-------------------: | :----------:| :------------------: | :-------------------: | :------: | :----: | :----: | :--------------: | -| PP-YOLO tiny | 8 | 32 | 4.2MB | **1.3M** | 320 | 20.6 | 92.3 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_tiny.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_tiny.yml) | [inference model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) | -| PP-YOLO tiny | 8 | 32 | 4.2MB | **1.3M** | 416 | 22.7 | 65.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_tiny.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_tiny.yml) | [inference model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) | - -**Notes:** - -- PP-YOLO-tiny is trained on COCO train2017 datast and evaluated on val2017 dataset,Box APval is evaluation results of `mAP(IoU=0.5:0.95)`, Box APval is evaluation results of `mAP(IoU=0.5)`. -- PP-YOLO-tiny used 8 GPUs for training and mini-batch size as 32 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/docs/FAQ.md). -- PP-YOLO-tiny inference speed is tested on Kirin 990 with 4 threads by arm8 -- we alse provide PP-YOLO-tiny post quant inference model, which can compress model to **1.3MB** with nearly no inference on inference speed and performance - -### PP-YOLO on Pascal VOC - -PP-YOLO trained on Pascal VOC dataset as follows: - -| Model | GPU number | images/GPU | backbone | input shape | Box AP50val | download | config | -|:------------------:|:----------:|:----------:|:----------:| :----------:| :--------------------: | :------: | :-----: | -| PP-YOLO | 8 | 12 | ResNet50vd | 608 | 84.9 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml) | -| PP-YOLO | 8 | 12 | ResNet50vd | 416 | 84.3 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml) | -| PP-YOLO | 8 | 12 | ResNet50vd | 320 | 82.2 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml) | -| PP-YOLO_EB | 8 | 8 | ResNet34vd | 480 | 86.4 | [model](https://bj.bcebos.com/v1/paddlemodels/object_detection/ppyolo_eb_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_eb_voc.yml) | - -**Notes:** PP-YOLO-EB is specially designed for [EdgeBoard](https://ai.baidu.com/tech/hardware/deepkit) hardware. - -## Getting Start - -### 1. Training - -Training PP-YOLO on 8 GPUs with following command(all commands should be run under PaddleDetection root directory as default), use `--eval` to enable alternate evaluation during training. - -```bash -CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppyolo.yml --eval -``` - -optional: Run `tools/anchor_cluster.py` to get anchors suitable for your dataset, and modify the anchor setting in `configs/ppyolo/ppyolo.yml`. - -``` bash -python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000 -``` - -### 2. Evaluation - -Evaluating PP-YOLO on COCO val2017 dataset in single GPU with following commands: - -```bash -# use weights released in PaddleDetection model zoo -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams - -# use saved checkpoint in training -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo.yml -o weights=output/ppyolo/best_model -``` - -For evaluation on COCO test-dev2017 dataset, `configs/ppyolo/ppyolo_test.yml` should be used, please download COCO test-dev2017 dataset from [COCO dataset download](https://cocodataset.org/#download) and decompress to pathes configured by `EvalReader.dataset` in `configs/ppyolo/ppyolo_test.yml` and run evaluation by following command: - -```bash -# use weights released in PaddleDetection model zoo -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams - -# use saved checkpoint in training -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=output/ppyolo/best_model -``` - -Evaluation results will be saved in `bbox.json`, compress it into a `zip` package and upload to [COCO dataset evaluation](https://competitions.codalab.org/competitions/20794#participate) to evaluate. - -**NOTE:** `configs/ppyolo/ppyolo_test.yml` is only used for evaluation on COCO test-dev2017 dataset, could not be used for training or COCO val2017 dataset evaluating. - -### 3. Inference - -Inference images in single GPU with following commands, use `--infer_img` to inference a single image and `--infer_dir` to inference all images in the directory. - -```bash -# inference single image -CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --infer_img=demo/000000014439_640x640.jpg - -# inference all images in the directory -CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --infer_dir=demo -``` - -### 4. Inferece deployment and benchmark - -For inference deployment or benchmard, model exported with `tools/export_model.py` should be used and perform inference with Paddle inference library with following commands: - -```bash -# export model, model will be save in output/ppyolo as default -python tools/export_model.py -c configs/ppyolo/ppyolo.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams - -# inference with Paddle Inference library -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output/ppyolo --image_file=demo/000000014439_640x640.jpg --use_gpu=True -``` - -Benchmark testing for PP-YOLO uses model without data reading and post-processing(NMS), export model with `--exclude_nms` to prunce NMS for benchmark testing from mode with following commands: - -```bash -# export model, --exclude_nms to prune NMS part, model will be save in output/ppyolo as default -python tools/export_model.py -c configs/ppyolo/ppyolo.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --exclude_nms - -# FP32 benchmark -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output/ppyolo --image_file=demo/000000014439_640x640.jpg --use_gpu=True --run_benchmark=True - -# TensorRT FP16 benchmark -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output/ppyolo --image_file=demo/000000014439_640x640.jpg --use_gpu=True --run_benchmark=True --run_mode=trt_fp16 -``` - -## Appendix - -Optimizing method and ablation experiments of PP-YOLO compared with YOLOv3. - -| NO. | Model | Box APval | Box APtest | Params(M) | FLOPs(G) | V100 FP32 FPS | -| :--: | :--------------------------- | :------------------: |:--------------------: | :-------: | :------: | :-----------: | -| A | YOLOv3-DarkNet53 | 38.9 | - | 59.13 | 65.52 | 58.2 | -| B | YOLOv3-ResNet50vd-DCN | 39.1 | - | 43.89 | 44.71 | 79.2 | -| C | B + LB + EMA + DropBlock | 41.4 | - | 43.89 | 44.71 | 79.2 | -| D | C + IoU Loss | 41.9 | - | 43.89 | 44.71 | 79.2 | -| E | D + IoU Aware | 42.5 | - | 43.90 | 44.71 | 74.9 | -| F | E + Grid Sensitive | 42.8 | - | 43.90 | 44.71 | 74.8 | -| G | F + Matrix NMS | 43.5 | - | 43.90 | 44.71 | 74.8 | -| H | G + CoordConv | 44.0 | - | 43.93 | 44.76 | 74.1 | -| I | H + SPP | 44.3 | 45.2 | 44.93 | 45.12 | 72.9 | -| J | I + Better ImageNet Pretrain | 44.8 | 45.2 | 44.93 | 45.12 | 72.9 | -| K | J + 2x Scheduler | 45.3 | 45.9 | 44.93 | 45.12 | 72.9 | - -**Notes:** - -- Performance and inference spedd are measure with input shape as 608 -- All models are trained on COCO train2017 datast and evaluated on val2017 & test-dev2017 dataset,`Box AP` is evaluation results as `mAP(IoU=0.5:0.95)`. -- Inference speed is tested on single Tesla V100 with batch size as 1 following test method and environment configuration in benchmark above. -- [YOLOv3-DarkNet53](../yolov3_darknet.yml) with mAP as 38.9 is optimized YOLOv3 model in PaddleDetection,see [Model Zoo](../../docs/MODEL_ZOO.md) for details. - - -## Citation - -``` -@article{huang2021pp, - title={PP-YOLOv2: A Practical Object Detector}, - author={Huang, Xin and Wang, Xinxin and Lv, Wenyu and Bai, Xiaying and Long, Xiang and Deng, Kaipeng and Dang, Qingqing and Han, Shumin and Liu, Qiwen and Hu, Xiaoguang and others}, - journal={arXiv preprint arXiv:2104.10419}, - year={2021} -} -@misc{long2020ppyolo, -title={PP-YOLO: An Effective and Efficient Implementation of Object Detector}, -author={Xiang Long and Kaipeng Deng and Guanzhong Wang and Yang Zhang and Qingqing Dang and Yuan Gao and Hui Shen and Jianguo Ren and Shumin Han and Errui Ding and Shilei Wen}, -year={2020}, -eprint={2007.12099}, -archivePrefix={arXiv}, -primaryClass={cs.CV} -} -@misc{ppdet2019, -title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.}, -author={PaddlePaddle Authors}, -howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}}, -year={2019} -} -``` diff --git a/static/configs/ppyolo/README_cn.md b/static/configs/ppyolo/README_cn.md deleted file mode 100644 index 6af1912dbb8c68fd59c3ec49822e77ab65349371..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/README_cn.md +++ /dev/null @@ -1,247 +0,0 @@ -简体中文 | [English](README.md) - -# PP-YOLO 模型 - -## 内容 -- [简介](#简介) -- [模型库与基线](#模型库与基线) -- [使用说明](#使用说明) -- [未来工作](#未来工作) -- [附录](#附录) - -## 简介 - -[PP-YOLO](https://arxiv.org/abs/2007.12099)是PaddleDetection优化和改进的YOLOv3的模型,其精度(COCO数据集mAP)和推理速度均优于[YOLOv4](https://arxiv.org/abs/2004.10934)模型,要求使用PaddlePaddle 1.8.4(可使用pip安装) 或适当的[develop版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev)。 - -PP-YOLO在[COCO](http://cocodataset.org) test-dev2017数据集上精度达到45.9%,在单卡V100上FP32推理速度为72.9 FPS, V100上开启TensorRT下FP16推理速度为155.6 FPS。 - -
    - -
    - -PP-YOLO和PP-YOLOv2从如下方面优化和提升YOLOv3模型的精度和速度: - -- 更优的骨干网络: ResNet50vd-DCN -- 更大的训练batch size: 8 GPUs,每GPU batch_size=24,对应调整学习率和迭代轮数 -- [Drop Block](https://arxiv.org/abs/1810.12890) -- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) -- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf) -- [Grid Sensitive](https://arxiv.org/abs/2004.10934) -- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf) -- [CoordConv](https://arxiv.org/abs/1807.03247) -- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729) -- 更优的预训练模型 -- [PAN](https://arxiv.org/abs/1803.01534) -- Iou aware Loss -- 更大的输入尺寸 - -## 模型库 - -### PP-YOLO模型 - -| 模型 | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box APval | Box APtest | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 | -|:------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :------: | -| YOLOv4(AlexyAB) | - | - | CSPDarknet | 608 | - | 43.5 | 62 | 105.5 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml) | -| YOLOv4(AlexyAB) | - | - | CSPDarknet | 512 | - | 43.0 | 83 | 138.4 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml) | -| YOLOv4(AlexyAB) | - | - | CSPDarknet | 416 | - | 41.2 | 96 | 164.0 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml) | -| YOLOv4(AlexyAB) | - | - | CSPDarknet | 320 | - | 38.0 | 123 | 199.0 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml) | -| PP-YOLO | 8 | 24 | ResNet50vd | 608 | 44.8 | 45.2 | 72.9 | 155.6 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml) | -| PP-YOLO | 8 | 24 | ResNet50vd | 512 | 43.9 | 44.4 | 89.9 | 188.4 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml) | -| PP-YOLO | 8 | 24 | ResNet50vd | 416 | 42.1 | 42.5 | 109.1 | 215.4 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml) | -| PP-YOLO | 8 | 24 | ResNet50vd | 320 | 38.9 | 39.3 | 132.2 | 242.2 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml) | -| PP-YOLO_2x | 8 | 24 | ResNet50vd | 608 | 45.3 | 45.9 | 72.9 | 155.6 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml) | -| PP-YOLO_2x | 8 | 24 | ResNet50vd | 512 | 44.4 | 45.0 | 89.9 | 188.4 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml) | -| PP-YOLO_2x | 8 | 24 | ResNet50vd | 416 | 42.7 | 43.2 | 109.1 | 215.4 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml) | -| PP-YOLO_2x | 8 | 24 | ResNet50vd | 320 | 39.5 | 40.1 | 132.2 | 242.2 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml) | -| PP-YOLO | 4 | 32 | ResNet18vd | 512 | 29.3 | 29.5 | 357.1 | 657.9 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml) | -| PP-YOLO | 4 | 32 | ResNet18vd | 416 | 28.6 | 28.9 | 409.8 | 719.4 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml) | -| PP-YOLO | 4 | 32 | ResNet18vd | 320 | 26.2 | 26.4 | 480.7 | 763.4 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml) | -| PP-YOLOv2 | 8 | 12 | ResNet50vd | 640 | 49.1 | 49.5 | 68.9 | 106.5 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolov2_r50vd_dcn.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolov2_r50vd_dcn.yml) | -| PP-YOLOv2 | 8 | 12 | ResNet101vd | 640 | 49.7 | 50.3 | 49.5 | 87.0 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolov2_r101vd_dcn.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolov2_r101vd_dcn.yml) | - -**注意:** - -- PP-YOLO模型使用COCO数据集中train2017作为训练集,使用val2017和test-dev2017作为测试集,Box APtest为`mAP(IoU=0.5:0.95)`评估结果。 -- PP-YOLO模型训练过程中使用8 GPUs,每GPU batch size为24进行训练,如训练GPU数和batch size不使用上述配置,须参考[FAQ](../../docs/FAQ.md)调整学习率和迭代次数。 -- PP-YOLO模型推理速度测试采用单卡V100,batch size=1进行测试,使用CUDA 10.2, CUDNN 7.5.1,TensorRT推理速度测试使用TensorRT 5.1.2.2。 -- PP-YOLO模型FP32的推理速度测试数据为使用`tools/export_model.py`脚本导出模型后,使用`deploy/python/infer.py`脚本中的`--run_benchnark`参数使用Paddle预测库进行推理速度benchmark测试结果, 且测试的均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致)。 -- TensorRT FP16的速度测试相比于FP32去除了`yolo_box`(bbox解码)部分耗时,即不包含数据预处理,bbox解码和NMS(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致)。 -- YOLOv4(AlexyAB)模型精度和V100 FP32推理速度数据使用[YOLOv4 github库](https://github.com/AlexeyAB/darknet)提供的单卡V100上精度速度测试数据,V100 TensorRT FP16推理速度为使用[AlexyAB/darknet](https://github.com/AlexeyAB/darknet)库中tkDNN配置于单卡V100,TensorRT 5.1.2.2的测试结果。 -- PP-YOLO模型推理速度测试采用单卡V100,batch size=1进行测试,使用CUDA 10.2, CUDNN 7.5.1,TensorRT推理速度测试使用TensorRT 5.1.2.2。 -- YOLOv4(AlexyAB)行`模型下载`和`配置文件`为PaddleDetection复现的YOLOv4模型,目前评估精度已对齐,支持finetune,训练精度对齐中,可参见[PaddleDetection YOLOv4 模型](../yolov4/README.md) -- PP-YOLO使用每GPU `batch_size=24`训练,需要使用显存为32G的GPU,我们也提供了`batch_size=12`的可以在显存为16G的GPU上训练的配置文件`ppyolo_2x_bs12.yml`,使用这个配置文件训练在COCO val2017数据集上评估结果为`mAP(IoU=0.5:0.95) = 45.1%`,可通过[ppyolo_2x_bs12模型](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x_bs12.pdparams)下载权重。 - -### PP-YOLO 轻量级模型 - -| 模型 | GPU个数 | 每GPU图片个数 | 模型体积 | 输入尺寸 | Box APval | Box AP50val | Kirin 990 1xCore (FPS) | 模型下载 | 预测模型下载 | 配置文件 | -|:----------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :--------------------: | :--------------------: | :------: | :----------: | :------: | -| PP-YOLO_MobileNetV3_large | 4 | 32 | 18MB | 320 | 23.2 | 42.6 | 14.1 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.pdparams) | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_large.yml) | -| PP-YOLO_MobileNetV3_small | 4 | 32 | 11MB | 320 | 17.2 | 33.8 | 21.5 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small.pdparams) | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml) | - -- PP-YOLO_MobileNetV3 模型使用COCO数据集中train2017作为训练集,使用val2017作为测试集,Box APval为`mAP(IoU=0.5:0.95)`评估结果, Box AP50val为`mAP(IoU=0.5)`评估结果。 -- PP-YOLO_MobileNetV3 模型训练过程中使用4GPU,每GPU batch size为32进行训练,如训练GPU数和batch size不使用上述配置,须参考[FAQ](../../docs/FAQ.md)调整学习率和迭代次数。 -- PP-YOLO_MobileNetV3 模型推理速度测试环境配置为麒麟990芯片单线程。 - -### PP-YOLO 轻量级裁剪模型 - -| 模型 | GPU 个数 | 每GPU图片个数 | 裁剪率 | Teacher模型 | 模型体积 | 输入尺寸 | Box APval | Kirin 990 1xCore (FPS) | 模型下载 | 预测模型下载 | 配置文件 | -|:----------------------------:|:----------:|:-------------:| :---------: | :-----------------------: | :--------: | :----------:| :------------------: | :--------------------: | :------: | :----------: | :------: | -| PP-YOLO_MobileNetV3_small | 4 | 32 | 75% | PP-YOLO_MobileNetV3_large | 4.2MB | 320 | 16.2 | 39.8 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.pdparams) | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml) | - -- PP-YOLO 轻量级裁剪模型采用[蒸馏通道剪裁模型](../../slim/extensions/distill_pruned_model/README.md) 的方式训练得到,基于 PP-YOLO_MobileNetV3_small 模型对Head部分做卷积通道剪裁后使用 PP-YOLO_MobileNetV3_large 模型进行蒸馏训练 -- 卷积通道检测对Head部分剪裁掉75%的通道数,及剪裁参数为`--pruned_params="yolo_block.0.2.conv.weights,yolo_block.0.tip.conv.weights,yolo_block.1.2.conv.weights,yolo_block.1.tip.conv.weights" --pruned_ratios="0.75,0.75,0.75,0.75"` -- PP-YOLO 轻量级裁剪模型的训练、评估、预测及模型导出方法见[蒸馏通道剪裁模型](../../slim/extensions/distill_pruned_model/README.md) - -### PP-YOLO tiny模型 - -| 模型 | GPU 个数 | 每GPU图片个数 | 模型体积 | 后量化模型体积 | 输入尺寸 | Box APval | Kirin 990 1xCore (FPS) | 模型下载 | 配置文件 | 量化后模型 | -|:----------------------------:|:----------:|:-------------:| :--------: | :------------: | :----------:| :------------------: | :--------------------: | :------: | :------: | :--------: | -| PP-YOLO tiny | 8 | 32 | 4.2MB | **1.3M** | 320 | 20.6 | 92.3 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_tiny.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_tiny.yml) | [预测模型](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) | -| PP-YOLO tiny | 8 | 32 | 4.2MB | **1.3M** | 416 | 22.7 | 65.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_tiny.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_tiny.yml) | [预测模型](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) | - -- PP-YOLO-tiny 模型使用COCO数据集中train2017作为训练集,使用val2017作为测试集,Box APval为`mAP(IoU=0.5:0.95)`评估结果, Box AP50val为`mAP(IoU=0.5)`评估结果。 -- PP-YOLO-tiny 模型训练过程中使用8GPU,每GPU batch size为32进行训练,如训练GPU数和batch size不使用上述配置,须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/docs/FAQ.md)调整学习率和迭代次数。 -- PP-YOLO-tiny 模型推理速度测试环境配置为麒麟990芯片4线程,arm8架构。 -- 我们也提供的PP-YOLO-tiny的后量化压缩模型,将模型体积压缩到**1.3M**,对精度和预测速度基本无影响 - -### Pascal VOC数据集上的PP-YOLO - -PP-YOLO在Pascal VOC数据集上训练模型如下: - -| 模型 | GPU个数 | 每GPU图片个数 | 骨干网络 | 输入尺寸 | Box AP50val | 模型下载 | 配置文件 | -|:------------------:|:-------:|:-------------:|:----------:| :----------:| :--------------------: | :------: | :-----: | -| PP-YOLO | 8 | 12 | ResNet50vd | 608 | 84.9 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml) | -| PP-YOLO | 8 | 12 | ResNet50vd | 416 | 84.3 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml) | -| PP-YOLO | 8 | 12 | ResNet50vd | 320 | 82.2 | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml) | -| PP-YOLO_EB | 8 | 8 | ResNet34vd | 480 | 86.4 | [model](https://bj.bcebos.com/v1/paddlemodels/object_detection/ppyolo_eb_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_eb_voc.yml) | - -**注意:** PP-YOLO-EB是针对[EdgeBoard](https://ai.baidu.com/tech/hardware/deepkit)硬件专门设计的模型. - - -## 使用说明 - -### 1. 训练 - -使用8GPU通过如下命令一键式启动训练(以下命令均默认在PaddleDetection根目录运行), 通过`--eval`参数开启训练中交替评估。 - -```bash -CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppyolo.yml --eval -``` -可选:在训练之前使用`tools/anchor_cluster.py`得到适用于你的数据集的anchor,并修改`configs/ppyolo/ppyolo.yml`中的anchor设置 -```bash -python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000 -``` - -### 2. 评估 - -使用单GPU通过如下命令一键式评估模型在COCO val2017数据集效果 - -```bash -# 使用PaddleDetection发布的权重 -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams - -# 使用训练保存的checkpoint -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo.yml -o weights=output/ppyolo/best_model -``` - -我们提供了`configs/ppyolo/ppyolo_test.yml`用于评估COCO test-dev2017数据集的效果,评估COCO test-dev2017数据集的效果须先从[COCO数据集下载页](https://cocodataset.org/#download)下载test-dev2017数据集,解压到`configs/ppyolo/ppyolo_test.yml`中`EvalReader.dataset`中配置的路径,并使用如下命令进行评估 - -```bash -# 使用PaddleDetection发布的权重 -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams - -# 使用训练保存的checkpoint -CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o weights=output/ppyolo/best_model -``` - -评估结果保存于`bbox.json`中,将其压缩为zip包后通过[COCO数据集评估页](https://competitions.codalab.org/competitions/20794#participate)提交评估。 - -**注意:** `configs/ppyolo/ppyolo_test.yml`仅用于评估COCO test-dev数据集,不用于训练和评估COCO val2017数据集。 - -### 3. 推理 - -使用单GPU通过如下命令一键式推理图像,通过`--infer_img`指定图像路径,或通过`--infer_dir`指定目录并推理目录下所有图像 - -```bash -# 推理单张图像 -CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --infer_img=demo/000000014439_640x640.jpg - -# 推理目录下所有图像 -CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --infer_dir=demo -``` - -### 4. 推理部署与benchmark - -PP-YOLO模型部署及推理benchmark需要通过`tools/export_model.py`导出模型后使用Paddle预测库进行部署和推理,可通过如下命令一键式启动。 - -```bash -# 导出模型,默认存储于output/ppyolo目录 -python tools/export_model.py -c configs/ppyolo/ppyolo.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams - -# 预测库推理 -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output/ppyolo --image_file=demo/000000014439_640x640.jpg --use_gpu=True -``` - -PP-YOLO模型benchmark测试为不包含数据预处理和网络输出后处理(NMS)的网络结构部分数据,导出模型时须指定`--exlcude_nms`来裁剪掉模型中后处理的NMS部分,通过如下命令进行模型导出和benchmark测试。 - -```bash -# 导出模型,通过--exclude_nms参数裁剪掉模型中的NMS部分,默认存储于output/ppyolo目录 -python tools/export_model.py -c configs/ppyolo/ppyolo.yml -o weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --exclude_nms - -# FP32 benchmark测试 -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output/ppyolo --image_file=demo/000000014439_640x640.jpg --use_gpu=True --run_benchmark=True - -# TensorRT FP16 benchmark测试 -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output/ppyolo --image_file=demo/000000014439_640x640.jpg --use_gpu=True --run_benchmark=True --run_mode=trt_fp16 -``` - -## 附录 - -PP-YOLO模型相对于YOLOv3模型优化项消融实验数据如下表所示。 - -| 序号 | 模型 | Box APval | Box APtest | 参数量(M) | FLOPs(G) | V100 FP32 FPS | -| :--: | :--------------------------- | :------------------: | :-------------------: | :-------: | :------: | :-----------: | -| A | YOLOv3-DarkNet53 | 38.9 | - | 59.13 | 65.52 | 58.2 | -| B | YOLOv3-ResNet50vd-DCN | 39.1 | - | 43.89 | 44.71 | 79.2 | -| C | B + LB + EMA + DropBlock | 41.4 | - | 43.89 | 44.71 | 79.2 | -| D | C + IoU Loss | 41.9 | - | 43.89 | 44.71 | 79.2 | -| E | D + IoU Aware | 42.5 | - | 43.90 | 44.71 | 74.9 | -| F | E + Grid Sensitive | 42.8 | - | 43.90 | 44.71 | 74.8 | -| G | F + Matrix NMS | 43.5 | - | 43.90 | 44.71 | 74.8 | -| H | G + CoordConv | 44.0 | - | 43.93 | 44.76 | 74.1 | -| I | H + SPP | 44.3 | 45.2 | 44.93 | 45.12 | 72.9 | -| J | I + Better ImageNet Pretrain | 44.8 | 45.2 | 44.93 | 45.12 | 72.9 | -| K | J + 2x Scheduler | 45.3 | 45.9 | 44.93 | 45.12 | 72.9 | - -**注意:** - -- 精度与推理速度数据均为使用输入图像尺寸为608的测试结果 -- Box AP为在COCO train2017数据集训练,val2017和test-dev2017数据集上评估`mAP(IoU=0.5:0.95)`数据 -- 推理速度为单卡V100上,batch size=1, 使用上述benchmark测试方法的测试结果,测试环境配置为CUDA 10.2,CUDNN 7.5.1 -- [YOLOv3-DarkNet53](../yolov3_darknet.yml)精度38.9为PaddleDetection优化后的YOLOv3模型,可参见[模型库](../../docs/MODEL_ZOO_cn.md) - - -## 引用 - -``` -@article{huang2021pp, - title={PP-YOLOv2: A Practical Object Detector}, - author={Huang, Xin and Wang, Xinxin and Lv, Wenyu and Bai, Xiaying and Long, Xiang and Deng, Kaipeng and Dang, Qingqing and Han, Shumin and Liu, Qiwen and Hu, Xiaoguang and others}, - journal={arXiv preprint arXiv:2104.10419}, - year={2021} -} -@misc{long2020ppyolo, -title={PP-YOLO: An Effective and Efficient Implementation of Object Detector}, -author={Xiang Long and Kaipeng Deng and Guanzhong Wang and Yang Zhang and Qingqing Dang and Yuan Gao and Hui Shen and Jianguo Ren and Shumin Han and Errui Ding and Shilei Wen}, -year={2020}, -eprint={2007.12099}, -archivePrefix={arXiv}, -primaryClass={cs.CV} -} -@misc{ppdet2019, -title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.}, -author={PaddlePaddle Authors}, -howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}}, -year={2019} -} -``` diff --git a/static/configs/ppyolo/ppyolo.yml b/static/configs/ppyolo/ppyolo.yml deleted file mode 100644 index 9fe271bd1cd24ffe73b81671fb19f18677cf0e78..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo.yml +++ /dev/null @@ -1,90 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 250000 -log_iter: 100 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -weights: output/ppyolo/model_final -num_classes: 80 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - coord_conv: true - iou_aware: true - iou_aware_factor: 0.4 - scale_x_y: 1.05 - spp: true - yolo_loss: YOLOv3Loss - nms: MatrixNMS - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.7 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - iou_aware_loss: IouAwareLoss - -IouLoss: - loss_weight: 2.5 - max_height: 608 - max_width: 608 - -IouAwareLoss: - loss_weight: 1.0 - max_height: 608 - max_width: 608 - -MatrixNMS: - background_label: -1 - keep_top_k: 100 - normalized: false - score_threshold: 0.01 - post_threshold: 0.01 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 150000 - - 200000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolo_reader.yml' diff --git a/static/configs/ppyolo/ppyolo_2x.yml b/static/configs/ppyolo/ppyolo_2x.yml deleted file mode 100644 index d5d8a5b9c019d69eaf75983437054c69d59e3620..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_2x.yml +++ /dev/null @@ -1,90 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 500000 -log_iter: 100 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -weights: output/ppyolo/model_final -num_classes: 80 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - coord_conv: true - iou_aware: true - iou_aware_factor: 0.4 - scale_x_y: 1.05 - spp: true - yolo_loss: YOLOv3Loss - nms: MatrixNMS - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.7 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - iou_aware_loss: IouAwareLoss - -IouLoss: - loss_weight: 2.5 - max_height: 608 - max_width: 608 - -IouAwareLoss: - loss_weight: 1.0 - max_height: 608 - max_width: 608 - -MatrixNMS: - background_label: -1 - keep_top_k: 100 - normalized: false - score_threshold: 0.01 - post_threshold: 0.01 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolo_reader.yml' diff --git a/static/configs/ppyolo/ppyolo_2x_bs12.yml b/static/configs/ppyolo/ppyolo_2x_bs12.yml deleted file mode 100644 index 128fb55207f8764b2df795712ab387a226dc2fb5..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_2x_bs12.yml +++ /dev/null @@ -1,92 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 500000 -log_iter: 100 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -weights: output/ppyolo/model_final -num_classes: 80 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - coord_conv: true - iou_aware: true - iou_aware_factor: 0.4 - scale_x_y: 1.05 - spp: true - yolo_loss: YOLOv3Loss - nms: MatrixNMS - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.7 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - iou_aware_loss: IouAwareLoss - -IouLoss: - loss_weight: 2.5 - max_height: 608 - max_width: 608 - -IouAwareLoss: - loss_weight: 1.0 - max_height: 608 - max_width: 608 - -MatrixNMS: - background_label: -1 - keep_top_k: 100 - normalized: false - score_threshold: 0.01 - post_threshold: 0.01 - -LearningRate: - base_lr: 0.005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolo_reader.yml' -TrainReader: - batch_size: 12 diff --git a/static/configs/ppyolo/ppyolo_eb.yml b/static/configs/ppyolo/ppyolo_eb.yml deleted file mode 100644 index f32634b4557024ef9ab7697c6d6874fbcb38ebf7..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_eb.yml +++ /dev/null @@ -1,74 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 500000 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_pretrained.tar -weights: output/ppyolo_eb/best_model -num_classes: 80 -use_fine_grained_loss: true -log_iter: 1000 -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: ResNet_EB - yolo_head: EBHead - -ResNet_EB: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 34 - variant: d - feature_maps: [3, 4, 5] - -EBHead: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - -IouLoss: - loss_weight: 2.5 - max_height: 608 - max_width: 608 - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 320000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolo_reader.yml' diff --git a/static/configs/ppyolo/ppyolo_eb_voc.yml b/static/configs/ppyolo/ppyolo_eb_voc.yml deleted file mode 100644 index 3d30c14afde86bb5bb12194afd630dea7e6caff5..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_eb_voc.yml +++ /dev/null @@ -1,103 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 70000 -log_smooth_window: 20 -save_dir: output -snapshot_iter: 3000 -metric: VOC -map_type: integral -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_pretrained.tar -weights: output/ppyolo_eb_voc/best_model -num_classes: 20 -use_fine_grained_loss: true -log_iter: 1000 -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: ResNet_EB - yolo_head: EBHead - -ResNet_EB: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 34 - variant: d - feature_maps: [3, 4, 5] - -EBHead: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - -IouLoss: - loss_weight: 2.5 - max_height: 608 - max_width: 608 - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 35000 - - 60000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolo_reader.yml' -TrainReader: - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: trainval.txt - use_default_label: false - with_background: false - mixup_epoch: 200 - batch_size: 8 - -EvalReader: - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: test.txt - use_default_label: false - with_background: false - -TestReader: - dataset: - !ImageFolder - use_default_label: false - with_background: false diff --git a/static/configs/ppyolo/ppyolo_mobilenet_v3_large.yml b/static/configs/ppyolo/ppyolo_mobilenet_v3_large.yml deleted file mode 100755 index 262c6c0b94032d45e9d544f25cf674047dc08048..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_mobilenet_v3_large.yml +++ /dev/null @@ -1,192 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 250000 -log_smooth_window: 20 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar -weights: output/ppyolo_tiny/model_final -num_classes: 80 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: MobileNetV3 - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -MobileNetV3: - norm_type: sync_bn - norm_decay: 0. - model_name: large - scale: 1. - extra_block_filters: [] - feature_maps: [1, 2, 3, 4, 6] - - -YOLOv3Head: - anchor_masks: [[3, 4, 5], [0, 1, 2]] - anchors: [[11, 18], [34, 47], [51, 126], - [115, 71], [120, 195], [254, 235]] - norm_decay: 0. - conv_block_num: 0 - coord_conv: true - scale_x_y: 1.05 - yolo_loss: YOLOv3Loss - spp: true - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.005 - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.5 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - -IouLoss: - loss_weight: 2.5 - max_height: 512 - max_width: 512 - -LearningRate: - base_lr: 0.005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 150000 - - 200000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolo_reader.yml' -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 90 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 90 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [224, 256, 288, 320, 352, 384, 416, 448, 480, 512] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[3, 4, 5], [0, 1, 2]] - anchors: [[11, 18], [34, 47], [51, 126], - [115, 71], [120, 195], [254, 235]] - downsample_ratios: [32, 16] - iou_thresh: 0.25 - num_classes: 80 - batch_size: 32 - shuffle: true - mixup_epoch: 200 - drop_last: true - worker_num: 8 - bufsize: 4 - use_process: true - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id'] - num_max_boxes: 90 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 320 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 90 - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 - drop_empty: false - worker_num: 2 - bufsize: 4 - -TestReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 320 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml b/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml deleted file mode 100755 index 9c3e27976e285386331d8d52004f1a79ac11aec0..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml +++ /dev/null @@ -1,192 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 250000 -log_smooth_window: 20 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_ssld_pretrained.tar -weights: output/ppyolo_tiny/model_final -num_classes: 80 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: MobileNetV3 - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -MobileNetV3: - norm_type: sync_bn - norm_decay: 0. - model_name: small - scale: 1. - extra_block_filters: [] - feature_maps: [1, 2, 3, 4, 6] - - -YOLOv3Head: - anchor_masks: [[3, 4, 5], [0, 1, 2]] - anchors: [[11, 18], [34, 47], [51, 126], - [115, 71], [120, 195], [254, 235]] - norm_decay: 0. - conv_block_num: 0 - coord_conv: true - scale_x_y: 1.05 - yolo_loss: YOLOv3Loss - spp: true - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.005 - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.5 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - -IouLoss: - loss_weight: 2.5 - max_height: 512 - max_width: 512 - -LearningRate: - base_lr: 0.005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 150000 - - 200000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolo_reader.yml' -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 90 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 90 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [224, 256, 288, 320, 352, 384, 416, 448, 480, 512] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[3, 4, 5], [0, 1, 2]] - anchors: [[11, 18], [34, 47], [51, 126], - [115, 71], [120, 195], [254, 235]] - downsample_ratios: [32, 16] - iou_thresh: 0.25 - num_classes: 80 - batch_size: 32 - shuffle: true - mixup_epoch: 200 - drop_last: true - worker_num: 8 - bufsize: 4 - use_process: true - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id'] - num_max_boxes: 90 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 320 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 90 - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 - drop_empty: false - worker_num: 2 - bufsize: 4 - -TestReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 320 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/ppyolo/ppyolo_r18vd.yml b/static/configs/ppyolo/ppyolo_r18vd.yml deleted file mode 100755 index e65adecbb29e214772bc959d33b8f73578e5572d..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_r18vd.yml +++ /dev/null @@ -1,133 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 250000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar -weights: output/ppyolo_tiny/model_final -num_classes: 80 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 18 - feature_maps: [4, 5] - variant: d - -YOLOv3Head: - anchor_masks: [[3, 4, 5], [0, 1, 2]] - anchors: [[10, 14], [23, 27], [37, 58], - [81, 82], [135, 169], [344, 319]] - norm_decay: 0. - conv_block_num: 0 - scale_x_y: 1.05 - yolo_loss: YOLOv3Loss - nms: MatrixNMS - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.7 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - -IouLoss: - loss_weight: 2.5 - max_height: 608 - max_width: 608 - -MatrixNMS: - background_label: -1 - keep_top_k: 100 - normalized: false - score_threshold: 0.01 - post_threshold: 0.01 - -LearningRate: - base_lr: 0.004 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 150000 - - 200000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolo_reader.yml' -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: train_data/dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[3, 4, 5], [0, 1, 2]] - anchors: [[10, 14], [23, 27], [37, 58], - [81, 82], [135, 169], [344, 319]] - downsample_ratios: [32, 16] - batch_size: 32 - shuffle: true - mixup_epoch: 500 - drop_last: true - worker_num: 16 - bufsize: 8 - use_process: true diff --git a/static/configs/ppyolo/ppyolo_reader.yml b/static/configs/ppyolo/ppyolo_reader.yml deleted file mode 100644 index f03e47216f9ce2003e66c2032511acf05e6ec1d9..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_reader.yml +++ /dev/null @@ -1,111 +0,0 @@ -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - ratio: 2.0 - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - downsample_ratios: [32, 16, 8] - batch_size: 24 - shuffle: true - mixup_epoch: 25000 - drop_last: true - worker_num: 8 - bufsize: 4 - use_process: true - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id'] - num_max_boxes: 50 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 50 - - !Permute - to_bgr: false - channel_first: True - batch_size: 8 - drop_empty: false - worker_num: 8 - bufsize: 4 - -TestReader: - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/ppyolo/ppyolo_roadsign_kunlun.yml b/static/configs/ppyolo/ppyolo_roadsign_kunlun.yml deleted file mode 100644 index 46d6348416674e82e9a95f5356db86d995dd7d68..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_roadsign_kunlun.yml +++ /dev/null @@ -1,198 +0,0 @@ -architecture: YOLOv3 -use_gpu: false -use_xpu: true -max_iters: 5000 -log_iter: 1 -save_dir: output -snapshot_iter: 500 -metric: VOC -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams -weights: output/ppyolo_roadsign_kunlun/model_final -num_classes: 4 -finetune_exclude_pretrained_params: ['yolo_output'] -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -ResNet: - norm_type: 'bn' - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - coord_conv: true - iou_aware: true - iou_aware_factor: 0.4 - scale_x_y: 1.05 - spp: true - yolo_loss: YOLOv3Loss - nms: MatrixNMS - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.7 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - iou_aware_loss: IouAwareLoss - -IouLoss: - loss_weight: 2.5 - max_height: 608 - max_width: 608 - -IouAwareLoss: - loss_weight: 1.0 - max_height: 608 - max_width: 608 - -MatrixNMS: - background_label: -1 - keep_top_k: 100 - normalized: false - score_threshold: 0.01 - post_threshold: 0.01 - -LearningRate: - base_lr: 0.0001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 800 - - 110 - - !LinearWarmup - start_factor: 0 - steps: 100 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: train.txt - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - ratio: 1.5 - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - - !Gt2YoloTarget - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - downsample_ratios: [32, 16, 8] - batch_size: 8 - shuffle: true - mixup_epoch: 250 - drop_last: true - worker_num: 2 - bufsize: 2 - use_process: false #true - - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: valid.txt - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 50 - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 - drop_empty: false - worker_num: 4 - bufsize: 2 - -TestReader: - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: dataset/roadsign_voc/label_list.txt - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/ppyolo/ppyolo_tiny.yml b/static/configs/ppyolo/ppyolo_tiny.yml deleted file mode 100755 index aa80ebf41ee035863ad76417de7a01915b383598..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_tiny.yml +++ /dev/null @@ -1,193 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 300000 -log_smooth_window: 100 -log_iter: 100 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar -weights: output/ppyolo_tiny/model_final -num_classes: 80 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: MobileNetV3 - yolo_head: PPYOLOTinyHead - use_fine_grained_loss: true - -MobileNetV3: - norm_type: sync_bn - norm_decay: 0. - model_name: large - scale: .5 - extra_block_filters: [] - feature_maps: [1, 2, 3, 4, 6] - -PPYOLOTinyHead: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 15], [24, 36], [72, 42], - [35, 87], [102, 96], [60, 170], - [220, 125], [128, 222], [264, 266]] - detection_block_channels: [160, 128, 96] - norm_decay: 0. - scale_x_y: 1.05 - yolo_loss: YOLOv3Loss - spp: true - drop_block: true - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.5 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - -IouLoss: - loss_weight: 2.5 - max_height: 512 - max_width: 512 - -LearningRate: - base_lr: 0.005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 200000 - - 250000 - - 280000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.949 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 100 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: train_data/dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - ratio: 2 - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 100 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [192, 224, 256, 288, 320, 352, 384, 416, 448, 480, 512] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 15], [24, 36], [72, 42], - [35, 87], [102, 96], [60, 170], - [220, 125], [128, 222], [264, 266]] - downsample_ratios: [32, 16, 8] - iou_thresh: 0.25 - num_classes: 80 - batch_size: 32 - shuffle: true - mixup_epoch: 200 - drop_last: true - worker_num: 16 - bufsize: 4 - use_process: true - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id'] - num_max_boxes: 100 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: train_data/dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 320 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 100 - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 - drop_empty: false - worker_num: 2 - bufsize: 4 - -TestReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 320 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/ppyolo/ppyolo_voc.yml b/static/configs/ppyolo/ppyolo_voc.yml deleted file mode 100644 index cf138863d74fbf8406c0e78fae1c3e819c80bff1..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolo_voc.yml +++ /dev/null @@ -1,117 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 70000 -log_smooth_window: 20 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: VOC -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -weights: output/ppyolo/model_final -num_classes: 20 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - coord_conv: true - iou_aware: true - iou_aware_factor: 0.4 - scale_x_y: 1.05 - spp: true - yolo_loss: YOLOv3Loss - nms: MatrixNMS - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.7 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - iou_aware_loss: IouAwareLoss - -IouLoss: - loss_weight: 2.5 - max_height: 608 - max_width: 608 - -IouAwareLoss: - loss_weight: 1.0 - max_height: 608 - max_width: 608 - -MatrixNMS: - background_label: -1 - keep_top_k: 100 - normalized: false - score_threshold: 0.01 - post_threshold: 0.01 - -LearningRate: - base_lr: 0.00333 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 56000 - - 62000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolo_reader.yml' -TrainReader: - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: trainval.txt - use_default_label: true - with_background: false - mixup_epoch: 350 - batch_size: 12 - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: test.txt - use_default_label: true - with_background: false - -TestReader: - dataset: - !ImageFolder - use_default_label: true - with_background: false diff --git a/static/configs/ppyolo/ppyolov2_r101vd_dcn.yml b/static/configs/ppyolo/ppyolov2_r101vd_dcn.yml deleted file mode 100644 index 9ba339912fe791ef3d9f48442f7e91bf1e2bec12..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolov2_r101vd_dcn.yml +++ /dev/null @@ -1,89 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 450000 -log_iter: 100 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar -weights: output/ppyolov2_r101vd_dcn/model_final -num_classes: 80 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3PANHead - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 101 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3PANHead: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - iou_aware: true - iou_aware_factor: 0.5 - scale_x_y: 1.05 - spp: true - yolo_loss: YOLOv3Loss - nms: MatrixNMS - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.7 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - iou_aware_loss: IouAwareLoss - -IouLoss: - loss_weight: 2.5 - max_height: 768 - max_width: 768 - -IouAwareLoss: - loss_weight: 1.0 - max_height: 768 - max_width: 768 - -MatrixNMS: - background_label: -1 - keep_top_k: 100 - normalized: false - score_threshold: 0.01 - post_threshold: 0.01 - -LearningRate: - base_lr: 0.005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 300000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - clip_grad_by_norm: 35. - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolov2_reader.yml' diff --git a/static/configs/ppyolo/ppyolov2_r50vd_dcn.yml b/static/configs/ppyolo/ppyolov2_r50vd_dcn.yml deleted file mode 100644 index 7ceb75833767b60c76528e6fb07786b40dbd6bdd..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolov2_r50vd_dcn.yml +++ /dev/null @@ -1,89 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 450000 -log_iter: 100 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -weights: output/ppyolov2_r50vd_dcn/model_final -num_classes: 80 -use_fine_grained_loss: true -use_ema: true -ema_decay: 0.9998 - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3PANHead - use_fine_grained_loss: true - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - variant: d - dcn_v2_stages: [5] - -YOLOv3PANHead: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - iou_aware: true - iou_aware_factor: 0.5 - scale_x_y: 1.05 - spp: true - yolo_loss: YOLOv3Loss - nms: MatrixNMS - drop_block: true - -YOLOv3Loss: - ignore_thresh: 0.7 - scale_x_y: 1.05 - label_smooth: false - use_fine_grained_loss: true - iou_loss: IouLoss - iou_aware_loss: IouAwareLoss - -IouLoss: - loss_weight: 2.5 - max_height: 768 - max_width: 768 - -IouAwareLoss: - loss_weight: 1.0 - max_height: 768 - max_width: 768 - -MatrixNMS: - background_label: -1 - keep_top_k: 100 - normalized: false - score_threshold: 0.01 - post_threshold: 0.01 - -LearningRate: - base_lr: 0.005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 300000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - clip_grad_by_norm: 35. - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'ppyolov2_reader.yml' diff --git a/static/configs/ppyolo/ppyolov2_reader.yml b/static/configs/ppyolo/ppyolov2_reader.yml deleted file mode 100644 index d291ab1f7b02b4cca3dba589e552376e4d4d12f1..0000000000000000000000000000000000000000 --- a/static/configs/ppyolo/ppyolov2_reader.yml +++ /dev/null @@ -1,110 +0,0 @@ -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 100 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 100 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - downsample_ratios: [32, 16, 8] - batch_size: 12 - shuffle: true - mixup_epoch: 25000 - drop_last: true - worker_num: 8 - bufsize: 4 - use_process: true - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id'] - num_max_boxes: 100 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 640 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 50 - - !Permute - to_bgr: false - channel_first: True - batch_size: 8 - drop_empty: false - worker_num: 8 - bufsize: 4 - -TestReader: - inputs_def: - image_shape: [3, 640, 640] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 640 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/random_erasing/README.md b/static/configs/random_erasing/README.md deleted file mode 100644 index 2b101348e086a8115f5fe4e6a0a10a008e6228b3..0000000000000000000000000000000000000000 --- a/static/configs/random_erasing/README.md +++ /dev/null @@ -1,21 +0,0 @@ -# Random Erasing Data Augmentation - -## Introduction - -- Random Erasing Data Augmentation -: [https://arxiv.org/abs/1708.04896](https://arxiv.org/abs/1708.04896) - -``` -@article{zhong1708random, - title={Random erasing data augmentation. arXiv 2017}, - author={Zhong, Z and Zheng, L and Kang, G and Li, S and Yang, Y}, - journal={arXiv preprint arXiv:1708.04896} -} -``` - - -## Model Zoo - -| Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | -| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| ResNet50-vd-FPN | Faster | 2 | 4x | 21.847 | 39.0% | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_random_erasing_4x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/random_erasing/faster_rcnn_r50_vd_fpn_random_erasing_4x.yml) | diff --git a/static/configs/random_erasing/faster_rcnn_r50_vd_fpn_random_erasing_4x.yml b/static/configs/random_erasing/faster_rcnn_r50_vd_fpn_random_erasing_4x.yml deleted file mode 100644 index 049b34b1936c76dd07698bc32f414f66e2d4047a..0000000000000000000000000000000000000000 --- a/static/configs/random_erasing/faster_rcnn_r50_vd_fpn_random_erasing_4x.yml +++ /dev/null @@ -1,144 +0,0 @@ -architecture: FasterRCNN -max_iters: 360000 -snapshot_iter: 40000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -weights: output/faster_rcnn_r50_vd_fpn_random_erasing_4x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [240000, 320000] - - !LinearWarmup - start_factor: 0.1 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !RandomErasingImage - prob: 0.5 - sl: 0.02 - sh: 0.4 - r1: 0.3 - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 2 - shuffle: true - worker_num: 2 - use_process: false diff --git a/static/configs/rcnn_enhance/README.md b/static/configs/rcnn_enhance/README.md deleted file mode 100644 index 7fc757fc159e67ed1266c04c067467b2d5da111b..0000000000000000000000000000000000000000 --- a/static/configs/rcnn_enhance/README.md +++ /dev/null @@ -1,40 +0,0 @@ -## 服务器端实用目标检测方案 - -### 简介 - -* 近年来,学术界和工业界广泛关注图像中目标检测任务。基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)中SSLD蒸馏方案训练得到的ResNet50_vd预训练模型(ImageNet1k验证集上Top1 Acc为82.39%),结合PaddleDetection中的丰富算子,飞桨提供了一种面向服务器端实用的目标检测方案PSS-DET(Practical Server Side Detection)。基于COCO2017目标检测数据集,V100单卡预测速度为为61FPS时,COCO mAP可达41.6%;预测速度为20FPS时,COCO mAP可达47.8%。 - -* 以标准的Faster RCNN ResNet50_vd FPN为例,下表给出了PSS-DET不同的模块的速度与精度收益。 - -| Trick | Train scale | Test scale | COCO mAP | Infer speed/FPS | -|- |:-: |:-: | :-: | :-: | -| `baseline` | 640x640 | 640x640 | 36.4% | 43.589 | -| +`test proposal=pre/post topk 500/300` | 640x640 | 640x640 | 36.2% | 52.512 | -| +`fpn channel=64` | 640x640 | 640x640 | 35.1% | 67.450 | -| +`ssld pretrain` | 640x640 | 640x640 | 36.3% | 67.450 | -| +`ciou loss` | 640x640 | 640x640 | 37.1% | 67.450 | -| +`DCNv2` | 640x640 | 640x640 | 39.4% | 60.345 | -| +`3x, multi-scale training` | 640x640 | 640x640 | 41.0% | 60.345 | -| +`auto augment` | 640x640 | 640x640 | 41.4% | 60.345 | -| +`libra sampling` | 640x640 | 640x640 | 41.6% | 60.345 | - - -基于该实验结论,PaddleDetection结合Cascade RCNN,使用更大的训练与评估尺度(1000x1500),最终在单卡V100上速度为20FPS,COCO mAP达47.8%。下图给出了目前类似速度的目标检测方法的速度与精度指标。 - - -![pssdet](../../docs/images/pssdet.png) - -**注意** -> 这里为了更方便地对比,统一将V100的预测耗时乘以1.2倍,近似转化为Titan V的预测耗时。 - - -### 模型库 - -| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 | -| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :-------------: | :-----: | -| ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.6 | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/rcnn_enhance/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) | -| ResNet50-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 20.001 | 47.8 | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/rcnn_enhance/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) | -| ResNet101-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 19.523 | 49.4 | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/rcnn_enhance/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.yml) | - - -**注**:generic文件夹下面的配置文件对应的预训练模型均只支持预测,不支持训练与评估。 diff --git a/static/configs/rcnn_enhance/README_en.md b/static/configs/rcnn_enhance/README_en.md deleted file mode 100644 index 2572376450f55d6851a53da647b7ab27d754953b..0000000000000000000000000000000000000000 --- a/static/configs/rcnn_enhance/README_en.md +++ /dev/null @@ -1,44 +0,0 @@ -# Practical Server-side detection method base on RCNN - -## Introduction - - -* In recent years, object detection tasks have attracted widespread attention. [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) open-sourced the ResNet50_vd_SSLD pretrained model based on ImageNet(Top1 Acc 82.4%). And based on the pretrained model, PaddleDetection provided the PSS-DET (Practical Server-side detection) with the help of the rich operators in PaddleDetection. The inference speed can reach 61FPS on single V100 GPU when COCO mAP is 41.6%, and 20FPS when COCO mAP is 47.8%. - -* We take the standard `Faster RCNN ResNet50_vd FPN` as an example. The following table shows ablation study of PSS-DET. - -| Trick | Train scale | Test scale | COCO mAP | Infer speed/FPS | -|- |:-: |:-: | :-: | :-: | -| `baseline` | 640x640 | 640x640 | 36.4% | 43.589 | -| +`test proposal=pre/post topk 500/300` | 640x640 | 640x640 | 36.2% | 52.512 | -| +`fpn channel=64` | 640x640 | 640x640 | 35.1% | 67.450 | -| +`ssld pretrain` | 640x640 | 640x640 | 36.3% | 67.450 | -| +`ciou loss` | 640x640 | 640x640 | 37.1% | 67.450 | -| +`DCNv2` | 640x640 | 640x640 | 39.4% | 60.345 | -| +`3x, multi-scale training` | 640x640 | 640x640 | 41.0% | 60.345 | -| +`auto augment` | 640x640 | 640x640 | 41.4% | 60.345 | -| +`libra sampling` | 640x640 | 640x640 | 41.6% | 60.345 | - - -And the following figure shows `mAP-Speed` curves for some common detectors. - - -![pssdet](../../docs/images/pssdet.png) - - -**Note** -> For fair comparison, inference time for PSS-DET models on V100 GPU is transformed to Titan V GPU by multiplying by 1.2 times. - - -## Model Zoo - -#### COCO dataset - -| Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | -| :---------------------- | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.6 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/rcnn_enhance/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) | -| ResNet50-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 20.001 | 47.8 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/rcnn_enhance/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.yml) | -| ResNet101-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 19.523 | 49.4 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/rcnn_enhance/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.yml) | - - -**Attention**: Pretrained models whose congigurations are in the directory `generic` just support inference but do not support training and evaluation as now. diff --git a/static/configs/rcnn_enhance/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.yml b/static/configs/rcnn_enhance/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.yml deleted file mode 100644 index 517b56b85545204e1ed8c412a29f57f71cc719fa..0000000000000000000000000000000000000000 --- a/static/configs/rcnn_enhance/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.yml +++ /dev/null @@ -1,215 +0,0 @@ -architecture: CascadeRCNN -max_iters: 270000 -snapshot_iter: 30000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar -weights: output/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side/model_final -metric: COCO -num_classes: 81 - -CascadeRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - lr_mult_list: [0.05, 0.05, 0.1, 0.15] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 64 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 64 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 500 - post_nms_top_n: 300 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - bbox_loss: BalancedL1Loss - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -BalancedL1Loss: - alpha: 0.5 - gamma: 1.5 - beta: 1.0 - loss_weight: 1.0 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [180000, 240000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !AutoAugmentImage - autoaug_type: v1 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: [640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024] - max_size: 1500 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 2 - shuffle: true - worker_num: 2 - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false diff --git a/static/configs/rcnn_enhance/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.yml b/static/configs/rcnn_enhance/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.yml deleted file mode 100644 index 140bcd12b42ddb951037e906134e5f0db7a585fb..0000000000000000000000000000000000000000 --- a/static/configs/rcnn_enhance/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side.yml +++ /dev/null @@ -1,218 +0,0 @@ -architecture: CascadeRCNN -max_iters: 270000 -snapshot_iter: 30000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -weights: output/cascade_rcnn_dcn_r50_vd_fpn_3x_server_side/model_final -metric: COCO -num_classes: 81 - -CascadeRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - lr_mult_list: [0.05, 0.05, 0.1, 0.15] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 64 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 64 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 500 - post_nms_top_n: 300 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - bbox_loss: BalancedL1Loss - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -BalancedL1Loss: - alpha: 0.5 - gamma: 1.5 - beta: 1.0 - loss_weight: 1.0 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [180000, 240000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !AutoAugmentImage - autoaug_type: v1 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: [640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024] - max_size: 1500 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 2 - shuffle: true - worker_num: 2 - use_process: false - - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1500 - target_size: 1000 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - - - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1500 - target_size: 1000 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_empty: false - worker_num: 2 diff --git a/static/configs/rcnn_enhance/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.yml b/static/configs/rcnn_enhance/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.yml deleted file mode 100644 index 45c63d84afaa83a6da635fef6205dc088c44ddc6..0000000000000000000000000000000000000000 --- a/static/configs/rcnn_enhance/faster_rcnn_dcn_r50_vd_fpn_3x_server_side.yml +++ /dev/null @@ -1,217 +0,0 @@ -architecture: FasterRCNN -max_iters: 270000 -snapshot_iter: 30000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -weights: output/faster_rcnn_dcn_r50_vd_fpn_3x_server_side/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: LibraBBoxAssigner - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - dcn_v2_stages: [3, 4, 5] - lr_mult_list: [0.05, 0.05, 0.1, 0.15] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 64 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 64 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 300 - pre_nms_top_n: 500 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -LibraBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - bbox_loss: DiouLoss - -DiouLoss: - loss_weight: 10.0 - is_cls_agnostic: false - use_complete_iou_loss: true - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [180000, 240000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !AutoAugmentImage - autoaug_type: v1 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: [384, 416, 448, 480, 512, 544, 576, 608, 640, 672] - max_size: 1000 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 2 - shuffle: true - worker_num: 2 - use_process: false - - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 640 - target_size: 640 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - - - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 640 - target_size: 640 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_empty: false - worker_num: 2 diff --git a/static/configs/rcnn_enhance/generic/cascade_rcnn_cbr101_vd_fpn_server_side.yml b/static/configs/rcnn_enhance/generic/cascade_rcnn_cbr101_vd_fpn_server_side.yml deleted file mode 100644 index f659c1bd1b97294793797a8db13e1e66cbaa2901..0000000000000000000000000000000000000000 --- a/static/configs/rcnn_enhance/generic/cascade_rcnn_cbr101_vd_fpn_server_side.yml +++ /dev/null @@ -1,219 +0,0 @@ -architecture: CascadeRCNN -max_iters: 1500000 -snapshot_iter: 100000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/CBResNet101_vd_ssld_pretrained.tar -weights: output/cascade_rcnn_cbr101_vd_fpn_server_side/model_final -metric: VOC -num_classes: 677 - -CascadeRCNN: - backbone: CBResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -CBResNet: - norm_type: bn - norm_decay: 0. - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - repeat_num: 2 - lr_mult_list: [0.05, 0.05, 0.1, 0.15] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 500 - post_nms_top_n: 300 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 14 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - bbox_loss: BalancedL1Loss - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -BalancedL1Loss: - alpha: 0.5 - gamma: 1.5 - beta: 1.0 - loss_weight: 1.0 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [1000000, 1400000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !AutoAugmentImage - autoaug_type: v1 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: [640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024] - max_size: 1500 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 1 - shuffle: true - worker_num: 2 - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1300 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_empty: false - worker_num: 2 - - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - use_default_label: false - with_background: true - anno_path: ./dataset/voc/generic_det_label_list.txt - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false diff --git a/static/configs/rcnn_enhance/generic/cascade_rcnn_dcn_r101_vd_fpn_gen_server_side.yml b/static/configs/rcnn_enhance/generic/cascade_rcnn_dcn_r101_vd_fpn_gen_server_side.yml deleted file mode 100644 index 4d4f9ee262b6c64cc60cf5da61f0c4dddb369efd..0000000000000000000000000000000000000000 --- a/static/configs/rcnn_enhance/generic/cascade_rcnn_dcn_r101_vd_fpn_gen_server_side.yml +++ /dev/null @@ -1,217 +0,0 @@ -architecture: CascadeRCNN -max_iters: 1500000 -snapshot_iter: 100000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar -weights: output/cascade_rcnn_dcn_r101_vd_fpn_gen_server_side/model_final -metric: VOC -num_classes: 677 - -CascadeRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - lr_mult_list: [0.05, 0.05, 0.1, 0.15] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 64 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 64 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 500 - post_nms_top_n: 300 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - bbox_loss: BalancedL1Loss - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -BalancedL1Loss: - alpha: 0.5 - gamma: 1.5 - beta: 1.0 - loss_weight: 1.0 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [1000000, 1400000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !AutoAugmentImage - autoaug_type: v1 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: [640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024] - max_size: 1500 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 1 - shuffle: true - worker_num: 2 - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1300 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - use_default_label: false - with_background: true - anno_path: ./dataset/voc/generic_det_label_list.txt - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false diff --git a/static/configs/rcnn_enhance/generic/cascade_rcnn_dcn_r50_vd_fpn_gen_server_side.yml b/static/configs/rcnn_enhance/generic/cascade_rcnn_dcn_r50_vd_fpn_gen_server_side.yml deleted file mode 100644 index 7eba812677a6477aa3c54a44a7a0604e921b35c4..0000000000000000000000000000000000000000 --- a/static/configs/rcnn_enhance/generic/cascade_rcnn_dcn_r50_vd_fpn_gen_server_side.yml +++ /dev/null @@ -1,217 +0,0 @@ -architecture: CascadeRCNN -max_iters: 750000 -snapshot_iter: 50000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_v2_pretrained.tar -weights: output/cascade_rcnn_dcn_r50_vd_fpn_gen_server_side/model_final -metric: VOC -num_classes: 677 - -CascadeRCNN: - backbone: ResNet - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: CascadeBBoxHead - bbox_assigner: CascadeBBoxAssigner - -ResNet: - norm_type: bn - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - variant: d - dcn_v2_stages: [3, 4, 5] - lr_mult_list: [0.05, 0.05, 0.1, 0.15] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 64 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - min_level: 2 - max_level: 6 - num_chan: 64 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_positive_overlap: 0.7 - rpn_negative_overlap: 0.3 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 500 - post_nms_top_n: 300 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - min_level: 2 - max_level: 5 - box_resolution: 7 - sampling_ratio: 2 - -CascadeBBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [10, 20, 30] - bg_thresh_lo: [0.0, 0.0, 0.0] - bg_thresh_hi: [0.5, 0.6, 0.7] - fg_thresh: [0.5, 0.6, 0.7] - fg_fraction: 0.25 - -CascadeBBoxHead: - head: CascadeTwoFCHead - bbox_loss: BalancedL1Loss - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -BalancedL1Loss: - alpha: 0.5 - gamma: 1.5 - beta: 1.0 - loss_weight: 1.0 - -CascadeTwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [500000, 700000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomFlipImage - prob: 0.5 - - !AutoAugmentImage - autoaug_type: v1 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - target_size: [640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024] - max_size: 1500 - interp: 1 - use_cv2: true - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - batch_size: 2 - shuffle: true - worker_num: 2 - use_process: false - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - # for voc - #fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1500 - target_size: 1000 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - # set image_shape if needed - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - use_default_label: false - with_background: true - anno_path: ./dataset/voc/generic_det_label_list.txt - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !ResizeImage - interp: 1 - max_size: 1500 - target_size: 1000 - use_cv2: true - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: true - batch_size: 1 - shuffle: false diff --git a/static/configs/res2net/README.md b/static/configs/res2net/README.md deleted file mode 100644 index 4926264e1395540ec230a02e09892292cde99c8e..0000000000000000000000000000000000000000 --- a/static/configs/res2net/README.md +++ /dev/null @@ -1,36 +0,0 @@ -# Res2Net - -## Introduction - -- Res2Net: A New Multi-scale Backbone Architecture: [https://arxiv.org/abs/1904.01169](https://arxiv.org/abs/1904.01169) - -``` -@article{DBLP:journals/corr/abs-1904-01169, - author = {Shanghua Gao and - Ming{-}Ming Cheng and - Kai Zhao and - Xinyu Zhang and - Ming{-}Hsuan Yang and - Philip H. S. Torr}, - title = {Res2Net: {A} New Multi-scale Backbone Architecture}, - journal = {CoRR}, - volume = {abs/1904.01169}, - year = {2019}, - url = {http://arxiv.org/abs/1904.01169}, - archivePrefix = {arXiv}, - eprint = {1904.01169}, - timestamp = {Thu, 25 Apr 2019 10:24:54 +0200}, - biburl = {https://dblp.org/rec/bib/journals/corr/abs-1904-01169}, - bibsource = {dblp computer science bibliography, https://dblp.org} -} -``` - - -## Model Zoo - -| Backbone | Type | deformable Conv | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs | -| :---------------------- | :------------- | :---: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: | -| Res2Net50-FPN | Faster | False | 2 | 1x | 20.320 | 39.5 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_res2net50_vb_26w_4s_fpn_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_1x.yml) | -| Res2Net50-FPN | Mask | False | 2 | 2x | 16.069 | 40.7 | 36.2 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_res2net50_vb_26w_4s_fpn_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_2x.yml) | -| Res2Net50-vd-FPN | Mask | False | 2 | 2x | 15.816 | 40.9 | 36.2 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_res2net50_vd_26w_4s_fpn_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_2x.yml) | -| Res2Net50-vd-FPN | Mask | True | 2 | 2x | 14.478 | 43.5 | 38.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x.yml) | diff --git a/static/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_1x.yml b/static/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_1x.yml deleted file mode 100644 index f91914bbeced7af01342fe424328918d05bae71d..0000000000000000000000000000000000000000 --- a/static/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_1x.yml +++ /dev/null @@ -1,108 +0,0 @@ -architecture: FasterRCNN -max_iters: 90000 -snapshot_iter: 10000 -use_gpu: true -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar -weights: output/faster_rcnn_res2net50_vb_26w_4s_fpn_1x/model_final -metric: COCO -num_classes: 81 - -FasterRCNN: - backbone: Res2Net - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -Res2Net: - depth: 50 - width: 26 - scales: 4 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: b - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - anchor_sizes: [32, 64, 128, 256, 512] - aspect_ratios: [0.5, 1.0, 2.0] - stride: [16.0, 16.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 2000 - pre_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - post_nms_top_n: 1000 - pre_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - box_resolution: 7 - sampling_ratio: 2 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../faster_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/res2net/mask_rcnn_res2net50_vb_26w_4s_fpn_2x.yml b/static/configs/res2net/mask_rcnn_res2net50_vb_26w_4s_fpn_2x.yml deleted file mode 100644 index 4f26fbd63a8fa2e2618b51e11138f7d0658d09e3..0000000000000000000000000000000000000000 --- a/static/configs/res2net/mask_rcnn_res2net50_vb_26w_4s_fpn_2x.yml +++ /dev/null @@ -1,117 +0,0 @@ -architecture: MaskRCNN -trarchitecture: MaskRCNN -use_gpu: true -max_iters: 180000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar -metric: COCO -weights: output/mask_rcnn_res2net50_vb_26w_4s_fpn_2x/model_final -num_classes: 81 - -MaskRCNN: - backbone: Res2Net - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -Res2Net: - depth: 50 - width: 26 - scales: 4 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: b - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../mask_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_2x.yml b/static/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_2x.yml deleted file mode 100644 index 987e01ac10ac29f1ec8666234bd29dbcfa59cd96..0000000000000000000000000000000000000000 --- a/static/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_2x.yml +++ /dev/null @@ -1,117 +0,0 @@ -architecture: MaskRCNN -trarchitecture: MaskRCNN -use_gpu: true -max_iters: 180000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_vd_26w_4s_pretrained.tar -metric: COCO -weights: output/mask_rcnn_res2net50_vd_26w_4s_fpn_2x/model_final -num_classes: 81 - -MaskRCNN: - backbone: Res2Net - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -Res2Net: - depth: 50 - width: 26 - scales: 4 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../mask_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x.yml b/static/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x.yml deleted file mode 100644 index 986d672c95e85566f5ab9283381f8205e9b425e8..0000000000000000000000000000000000000000 --- a/static/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x.yml +++ /dev/null @@ -1,118 +0,0 @@ -architecture: MaskRCNN -trarchitecture: MaskRCNN -use_gpu: true -max_iters: 90000 -snapshot_iter: 10000 -log_iter: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_vd_26w_4s_pretrained.tar -metric: COCO -weights: output/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x/model_final -num_classes: 81 - -MaskRCNN: - backbone: Res2Net - fpn: FPN - rpn_head: FPNRPNHead - roi_extractor: FPNRoIAlign - bbox_head: BBoxHead - bbox_assigner: BBoxAssigner - -Res2Net: - depth: 50 - width: 26 - scales: 4 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - variant: d - dcn_v2_stages: [3, 4, 5] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - -FPNRPNHead: - anchor_generator: - aspect_ratios: [0.5, 1.0, 2.0] - variance: [1.0, 1.0, 1.0, 1.0] - anchor_start_size: 32 - max_level: 6 - min_level: 2 - num_chan: 256 - rpn_target_assign: - rpn_batch_size_per_im: 256 - rpn_fg_fraction: 0.5 - rpn_negative_overlap: 0.3 - rpn_positive_overlap: 0.7 - rpn_straddle_thresh: 0.0 - train_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 2000 - post_nms_top_n: 2000 - test_proposal: - min_size: 0.0 - nms_thresh: 0.7 - pre_nms_top_n: 1000 - post_nms_top_n: 1000 - -FPNRoIAlign: - canconical_level: 4 - canonical_size: 224 - max_level: 5 - min_level: 2 - sampling_ratio: 2 - box_resolution: 7 - mask_resolution: 14 - -MaskHead: - dilation: 1 - conv_dim: 256 - num_convs: 4 - resolution: 28 - -BBoxAssigner: - batch_size_per_im: 512 - bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] - bg_thresh_hi: 0.5 - bg_thresh_lo: 0.0 - fg_fraction: 0.25 - fg_thresh: 0.5 - -MaskAssigner: - resolution: 28 - -BBoxHead: - head: TwoFCHead - nms: - keep_top_k: 100 - nms_threshold: 0.5 - score_threshold: 0.05 - -TwoFCHead: - mlp_dim: 1024 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: '../mask_fpn_reader.yml' -TrainReader: - batch_size: 2 diff --git a/static/configs/retinanet_r101_fpn_1x.yml b/static/configs/retinanet_r101_fpn_1x.yml deleted file mode 100644 index 8cbff0bbd358e5f0b3223310b30b3f4d8ba9b8ad..0000000000000000000000000000000000000000 --- a/static/configs/retinanet_r101_fpn_1x.yml +++ /dev/null @@ -1,90 +0,0 @@ -architecture: RetinaNet -max_iters: 90000 -use_gpu: true -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar -weights: output/retinanet_r101_fpn_1x/model_final -log_iter: 20 -snapshot_iter: 10000 -metric: COCO -save_dir: output -num_classes: 81 - -RetinaNet: - backbone: ResNet - fpn: FPN - retina_head: RetinaHead - -ResNet: - norm_type: affine_channel - norm_decay: 0. - depth: 101 - feature_maps: [3, 4, 5] - freeze_at: 2 - -FPN: - max_level: 7 - min_level: 3 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125] - has_extra_convs: true - -RetinaHead: - num_convs_per_octave: 4 - num_chan: 256 - max_level: 7 - min_level: 3 - prior_prob: 0.01 - base_scale: 4 - num_scales_per_octave: 3 - anchor_generator: - aspect_ratios: [1.0, 2.0, 0.5] - variance: [1.0, 1.0, 1.0, 1.0] - target_assign: - positive_overlap: 0.5 - negative_overlap: 0.4 - gamma: 2.0 - alpha: 0.25 - sigma: 3.0151134457776365 - output_decoder: - score_thresh: 0.05 - nms_thresh: 0.5 - pre_nms_top_n: 1000 - detections_per_im: 100 - nms_eta: 1.0 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 - batch_transforms: - - !PadBatch - pad_to_stride: 128 - -EvalReader: - batch_size: 2 - batch_transforms: - - !PadBatch - pad_to_stride: 128 - -TestReader: - batch_size: 1 - batch_transforms: - - !PadBatch - pad_to_stride: 128 diff --git a/static/configs/retinanet_r50_fpn_1x.yml b/static/configs/retinanet_r50_fpn_1x.yml deleted file mode 100644 index 6cdfcad29d979a92583cacf8c5e59e14f4ecea96..0000000000000000000000000000000000000000 --- a/static/configs/retinanet_r50_fpn_1x.yml +++ /dev/null @@ -1,90 +0,0 @@ -architecture: RetinaNet -max_iters: 90000 -use_gpu: true -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -weights: output/retinanet_r50_fpn_1x/model_final -log_iter: 20 -snapshot_iter: 10000 -metric: COCO -save_dir: output -num_classes: 81 - -RetinaNet: - backbone: ResNet - fpn: FPN - retina_head: RetinaHead - -ResNet: - norm_type: affine_channel - norm_decay: 0. - depth: 50 - feature_maps: [3, 4, 5] - freeze_at: 2 - -FPN: - max_level: 7 - min_level: 3 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125] - has_extra_convs: true - -RetinaHead: - num_convs_per_octave: 4 - num_chan: 256 - max_level: 7 - min_level: 3 - prior_prob: 0.01 - base_scale: 4 - num_scales_per_octave: 3 - anchor_generator: - aspect_ratios: [1.0, 2.0, 0.5] - variance: [1.0, 1.0, 1.0, 1.0] - target_assign: - positive_overlap: 0.5 - negative_overlap: 0.4 - gamma: 2.0 - alpha: 0.25 - sigma: 3.0151134457776365 - output_decoder: - score_thresh: 0.05 - nms_thresh: 0.5 - pre_nms_top_n: 1000 - detections_per_im: 100 - nms_eta: 1.0 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_size: 2 - batch_transforms: - - !PadBatch - pad_to_stride: 128 - -EvalReader: - batch_size: 2 - batch_transforms: - - !PadBatch - pad_to_stride: 128 - -TestReader: - batch_size: 1 - batch_transforms: - - !PadBatch - pad_to_stride: 128 diff --git a/static/configs/retinanet_x101_vd_64x4d_fpn_1x.yml b/static/configs/retinanet_x101_vd_64x4d_fpn_1x.yml deleted file mode 100644 index acdad29f48b94436b5fd0d039fbc1e61a10a7d99..0000000000000000000000000000000000000000 --- a/static/configs/retinanet_x101_vd_64x4d_fpn_1x.yml +++ /dev/null @@ -1,89 +0,0 @@ -architecture: RetinaNet -max_iters: 180000 -use_gpu: true -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar -weights: output/retinanet_x101_vd_64x4d_fpn_1x/model_final -log_iter: 20 -snapshot_iter: 30000 -metric: COCO -save_dir: output -num_classes: 81 - -RetinaNet: - backbone: ResNeXt - fpn: FPN - retina_head: RetinaHead - -ResNeXt: - depth: 101 - feature_maps: [3, 4, 5] - freeze_at: 2 - group_width: 4 - groups: 64 - norm_type: bn - variant: d - -FPN: - max_level: 7 - min_level: 3 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125] - has_extra_convs: true - -RetinaHead: - num_convs_per_octave: 4 - num_chan: 256 - max_level: 7 - min_level: 3 - prior_prob: 0.01 - base_scale: 4 - num_scales_per_octave: 3 - anchor_generator: - aspect_ratios: [1.0, 2.0, 0.5] - variance: [1.0, 1.0, 1.0, 1.0] - target_assign: - positive_overlap: 0.5 - negative_overlap: 0.4 - gamma: 2.0 - alpha: 0.25 - sigma: 3.0151134457776365 - output_decoder: - score_thresh: 0.05 - nms_thresh: 0.5 - pre_nms_top_n: 1000 - detections_per_im: 100 - nms_eta: 1.0 - -LearningRate: - base_lr: 0.005 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [120000, 160000] - - !LinearWarmup - start_factor: 0.1 - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'faster_fpn_reader.yml' -TrainReader: - batch_transforms: - - !PadBatch - pad_to_stride: 128 - -EvalReader: - batch_transforms: - - !PadBatch - pad_to_stride: 128 - -TestReader: - batch_transforms: - - !PadBatch - pad_to_stride: 128 diff --git a/static/configs/solov2/README.md b/static/configs/solov2/README.md deleted file mode 100644 index c7581da229a16b24de831a9861aec327d6f337e4..0000000000000000000000000000000000000000 --- a/static/configs/solov2/README.md +++ /dev/null @@ -1,50 +0,0 @@ -# SOLOv2 for instance segmentation - -## Introduction - -SOLOv2 (Segmenting Objects by Locations) is a fast instance segmentation framework with strong performance. We reproduced the model of the paper, and improved and optimized the accuracy and speed of the SOLOv2. - -**Highlights:** - -- Performance: `Light-R50-VD-DCN-FPN` model reached 38.6 FPS on single Tesla V100, and mask ap on the COCO-val dataset reached 38.8, which increased inference speed by 24%, mAP increased by 2.4 percentage points. -- Training Time: The training time of the model of `solov2_r50_fpn_1x` on Tesla v100 with 8 GPU is only 10 hours. - -
    - -
    - - -## Model Zoo - -| Detector | Backbone | Multi-scale training | Lr schd | Mask APval | V100 FP32(FPS) | GPU | Download | Configs | -| :-------: | :---------------------: | :-------------------: | :-----: | :--------------------: | :-------------: | :-----: | :---------: | :------------------------: | -| YOLACT++ | R50-FPN | False | 80w iter | 34.1 (test-dev) | 33.5 | Xp | - | - | -| CenterMask | R50-FPN | True | 2x | 36.4 | 13.9 | Xp | - | - | -| CenterMask | V2-99-FPN | True | 3x | 40.2 | 8.9 | Xp | - | - | -| PolarMask | R50-FPN | True | 2x | 30.5 | 9.4 | V100 | - | - | -| BlendMask | R50-FPN | True | 3x | 37.8 | 13.5 | V100 | - | - | -| SOLOv2 (Paper) | R50-FPN | False | 1x | 34.8 | 18.5 | V100 | - | - | -| SOLOv2 (Paper) | X101-DCN-FPN | True | 3x | 42.4 | 5.9 | V100 | - | - | -| SOLOv2 | Mobilenetv3-FPN | True | 3x | 30.0 | 50 | V100 | [model](https://paddlemodels.bj.bcebos.com/object_detection/solov2_mobilenetv3_fpn_448_3x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/solov2/solov2_mobilenetv3_fpn_448_3x.yml) | -| SOLOv2 | R50-FPN | False | 1x | 35.6 | 21.9 | V100 | [model](https://paddlemodels.bj.bcebos.com/object_detection/solov2_r50_fpn_1x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/solov2/solov2_r50_fpn_1x.yml) | -| SOLOv2 | R50-FPN | True | 3x | 37.9 | 21.9 | V100 | [model](https://paddlemodels.bj.bcebos.com/object_detection/solov2_r50_fpn_3x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/solov2/solov2_r50_fpn_3x.yml) | -| SOLOv2 | R101-VD-FPN | True | 3x | 42.6 | 12.1 | V100 | [model](https://paddlemodels.bj.bcebos.com/object_detection/solov2_r101_vd_fpn_3x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/solov2/solov2_r101_vd_fpn_3x.yml) | - -## Enhanced model -| Backbone | Input size | Lr schd | V100 FP32(FPS) | Mask APval | Download | Configs | -| :---------------------: | :-------------------: | :-----: | :------------: | :-----: | :---------: | :------------------------: | -| Light-R50-VD-DCN-FPN | 512 | 3x | 38.6 | 38.8 | [model](https://paddlemodels.bj.bcebos.com/object_detection/solov2_light_r50_vd_fpn_dcn_512_3x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/solov2/solov2_light_r50_vd_fpn_dcn_512_3x.yml) | - -**Notes:** - -- SOLOv2 is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`. - -## Citations -``` -@article{wang2020solov2, - title={SOLOv2: Dynamic, Faster and Stronger}, - author={Wang, Xinlong and Zhang, Rufeng and Kong, Tao and Li, Lei and Shen, Chunhua}, - journal={arXiv preprint arXiv:2003.10152}, - year={2020} -} -``` diff --git a/static/configs/solov2/solov2_light_448_reader.yml b/static/configs/solov2/solov2_light_448_reader.yml deleted file mode 100644 index 07b2a484104702c1f3615a69ed47abde5a639eb5..0000000000000000000000000000000000000000 --- a/static/configs/solov2/solov2_light_448_reader.yml +++ /dev/null @@ -1,103 +0,0 @@ -TrainReader: - batch_size: 4 - worker_num: 2 - inputs_def: - fields: ['image', 'im_id', 'gt_segm'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_train2017.json - image_dir: train2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !Poly2Mask {} - - !ColorDistort {} - - !RandomCrop - is_mask_crop: True - - !ResizeImage - target_size: [352, 384, 416, 448, 480, 512] - max_size: 768 - interp: 1 - use_cv2: true - resize_box: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - - !Gt2Solov2Target - num_grids: [40, 36, 24, 16, 12] - scale_ranges: [[1, 56], [28, 112], [56, 224], [112, 448], [224, 896]] - coord_sigma: 0.2 - shuffle: True - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 768 - target_size: 448 - use_cv2: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - # only support batch_size=1 when evaluation - batch_size: 1 - shuffle: false - drop_last: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: dataset/coco/annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 768 - target_size: 448 - use_cv2: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false diff --git a/static/configs/solov2/solov2_light_r50_vd_fpn_dcn_512_3x.yml b/static/configs/solov2/solov2_light_r50_vd_fpn_dcn_512_3x.yml deleted file mode 100644 index b45581bdd04c7c3687f062097381c014e7fa7b29..0000000000000000000000000000000000000000 --- a/static/configs/solov2/solov2_light_r50_vd_fpn_dcn_512_3x.yml +++ /dev/null @@ -1,80 +0,0 @@ -architecture: SOLOv2 -use_gpu: true -max_iters: 270000 -snapshot_iter: 30000 -log_smooth_window: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -metric: COCO -weights: output/solov2_light_r50_vd_fpn_dcn_512_3x/model_final -num_classes: 81 -use_ema: true -ema_decay: 0.9998 - -SOLOv2: - backbone: ResNet - fpn: FPN - bbox_head: SOLOv2Head - mask_head: SOLOv2MaskHead - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 0 - freeze_norm: false - norm_type: sync_bn - dcn_v2_stages: [3, 4, 5] - variant: d - lr_mult_list: [0.05, 0.05, 0.1, 0.15] - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - reverse_out: True - -SOLOv2Head: - seg_feat_channels: 256 - stacked_convs: 3 - num_grids: [40, 36, 24, 16, 12] - kernel_out_channels: 128 - solov2_loss: SOLOv2Loss - mask_nms: MaskMatrixNMS - dcn_v2_stages: [2,] - drop_block: True - -SOLOv2MaskHead: - in_channels: 128 - out_channels: 128 - start_level: 0 - end_level: 3 - -SOLOv2Loss: - ins_loss_weight: 3.0 - focal_loss_gamma: 2.0 - focal_loss_alpha: 0.25 - -MaskMatrixNMS: - pre_nms_top_n: 500 - post_nms_top_n: 100 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [180000, 240000] - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'solov2_light_reader.yml' diff --git a/static/configs/solov2/solov2_light_reader.yml b/static/configs/solov2/solov2_light_reader.yml deleted file mode 100644 index 26228f6c565a6d1ab4e2d250bd878b777691884c..0000000000000000000000000000000000000000 --- a/static/configs/solov2/solov2_light_reader.yml +++ /dev/null @@ -1,103 +0,0 @@ -TrainReader: - batch_size: 2 - worker_num: 2 - inputs_def: - fields: ['image', 'im_id', 'gt_segm'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_train2017.json - image_dir: train2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !Poly2Mask {} - - !ColorDistort {} - - !RandomCrop - is_mask_crop: True - - !ResizeImage - target_size: [352, 384, 416, 448, 480, 512] - max_size: 852 - interp: 1 - use_cv2: true - resize_box: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - - !Gt2Solov2Target - num_grids: [40, 36, 24, 16, 12] - scale_ranges: [[1, 64], [32, 128], [64, 256], [128, 512], [256, 2048]] - coord_sigma: 0.2 - shuffle: True - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 852 - target_size: 512 - use_cv2: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - # only support batch_size=1 when evaluation - batch_size: 1 - shuffle: false - drop_last: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: dataset/coco/annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 852 - target_size: 512 - use_cv2: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false diff --git a/static/configs/solov2/solov2_mobilenetv3_fpn_448_3x.yml b/static/configs/solov2/solov2_mobilenetv3_fpn_448_3x.yml deleted file mode 100644 index 11e005d26cfce67fad4a7723c9d181715f3618ed..0000000000000000000000000000000000000000 --- a/static/configs/solov2/solov2_mobilenetv3_fpn_448_3x.yml +++ /dev/null @@ -1,79 +0,0 @@ -architecture: SOLOv2 -use_gpu: true -max_iters: 135000 -snapshot_iter: 20000 -log_smooth_window: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar -metric: COCO -weights: output/solov2/solov2_mobilenetv3_fpn_448_3x/model_final -num_classes: 81 -use_ema: true -ema_decay: 0.9998 - -SOLOv2: - backbone: MobileNetV3RCNN - fpn: FPN - bbox_head: SOLOv2Head - mask_head: SOLOv2MaskHead - -MobileNetV3RCNN: - norm_type: bn - freeze_norm: true - norm_decay: 0.0 - feature_maps: [2, 3, 4, 6] - conv_decay: 0.00001 - lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] - scale: 1.0 - model_name: large - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - reverse_out: True - -SOLOv2Head: - seg_feat_channels: 256 - stacked_convs: 2 - num_grids: [40, 36, 24, 16, 12] - kernel_out_channels: 128 - solov2_loss: SOLOv2Loss - mask_nms: MaskMatrixNMS - drop_block: True - -SOLOv2MaskHead: - in_channels: 128 - out_channels: 128 - start_level: 0 - end_level: 3 - -SOLOv2Loss: - ins_loss_weight: 3.0 - focal_loss_gamma: 2.0 - focal_loss_alpha: 0.25 - -MaskMatrixNMS: - pre_nms_top_n: 500 - post_nms_top_n: 100 - -LearningRate: - base_lr: 0.02 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [90000, 120000] - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'solov2_light_448_reader.yml' diff --git a/static/configs/solov2/solov2_r101_vd_fpn_3x.yml b/static/configs/solov2/solov2_r101_vd_fpn_3x.yml deleted file mode 100644 index 09f1e265bfa076c5637150121b688efd6980fe01..0000000000000000000000000000000000000000 --- a/static/configs/solov2/solov2_r101_vd_fpn_3x.yml +++ /dev/null @@ -1,107 +0,0 @@ -architecture: SOLOv2 -use_gpu: true -max_iters: 270000 -snapshot_iter: 30000 -log_smooth_window: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar -metric: COCO -weights: output/solov2_r101_vd_fpn_3x/model_final -num_classes: 81 -use_ema: true -ema_decay: 0.9998 - -SOLOv2: - backbone: ResNet - fpn: FPN - bbox_head: SOLOv2Head - mask_head: SOLOv2MaskHead - -ResNet: - depth: 101 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - dcn_v2_stages: [3, 4, 5] - variant: d - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - reverse_out: True - -SOLOv2Head: - seg_feat_channels: 512 - stacked_convs: 4 - num_grids: [40, 36, 24, 16, 12] - kernel_out_channels: 256 - solov2_loss: SOLOv2Loss - mask_nms: MaskMatrixNMS - dcn_v2_stages: [0, 1, 2, 3] - -SOLOv2MaskHead: - in_channels: 128 - out_channels: 256 - start_level: 0 - end_level: 3 - use_dcn_in_tower: True - -SOLOv2Loss: - ins_loss_weight: 3.0 - focal_loss_gamma: 2.0 - focal_loss_alpha: 0.25 - -MaskMatrixNMS: - pre_nms_top_n: 500 - post_nms_top_n: 100 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [180000, 240000] - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'solov2_reader.yml' -TrainReader: - batch_size: 2 - sample_transforms: - - !DecodeImage - to_rgb: true - - !Poly2Mask {} - - !ResizeImage - target_size: [640, 672, 704, 736, 768, 800] - max_size: 1333 - interp: 1 - use_cv2: true - resize_box: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - - !Gt2Solov2Target - num_grids: [40, 36, 24, 16, 12] - scale_ranges: [[1, 96], [48, 192], [96, 384], [192, 768], [384, 2048]] - coord_sigma: 0.2 diff --git a/static/configs/solov2/solov2_r50_fpn_1x.yml b/static/configs/solov2/solov2_r50_fpn_1x.yml deleted file mode 100644 index 587661b29956c96d69e3cf18421a41587701a648..0000000000000000000000000000000000000000 --- a/static/configs/solov2/solov2_r50_fpn_1x.yml +++ /dev/null @@ -1,72 +0,0 @@ -architecture: SOLOv2 -use_gpu: true -max_iters: 90000 -snapshot_iter: 10000 -log_smooth_window: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/solov2_r50_fpn_1x/model_final -num_classes: 81 - -SOLOv2: - backbone: ResNet - fpn: FPN - bbox_head: SOLOv2Head - mask_head: SOLOv2MaskHead - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - reverse_out: True - -SOLOv2Head: - seg_feat_channels: 512 - stacked_convs: 4 - num_grids: [40, 36, 24, 16, 12] - kernel_out_channels: 256 - solov2_loss: SOLOv2Loss - mask_nms: MaskMatrixNMS - -SOLOv2MaskHead: - in_channels: 128 - out_channels: 256 - start_level: 0 - end_level: 3 - -SOLOv2Loss: - ins_loss_weight: 3.0 - focal_loss_gamma: 2.0 - focal_loss_alpha: 0.25 - -MaskMatrixNMS: - pre_nms_top_n: 500 - post_nms_top_n: 100 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [60000, 80000] - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'solov2_reader.yml' diff --git a/static/configs/solov2/solov2_r50_fpn_3x.yml b/static/configs/solov2/solov2_r50_fpn_3x.yml deleted file mode 100644 index ecaeb7bbfa9045aaf29b83e396444c62f37cabce..0000000000000000000000000000000000000000 --- a/static/configs/solov2/solov2_r50_fpn_3x.yml +++ /dev/null @@ -1,101 +0,0 @@ -architecture: SOLOv2 -use_gpu: true -max_iters: 270000 -snapshot_iter: 30000 -log_smooth_window: 20 -save_dir: output -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar -metric: COCO -weights: output/solov2/solov2_r50_fpn_3x/model_final -num_classes: 81 - -SOLOv2: - backbone: ResNet - fpn: FPN - bbox_head: SOLOv2Head - mask_head: SOLOv2MaskHead - -ResNet: - depth: 50 - feature_maps: [2, 3, 4, 5] - freeze_at: 2 - norm_type: bn - -FPN: - max_level: 6 - min_level: 2 - num_chan: 256 - spatial_scale: [0.03125, 0.0625, 0.125, 0.25] - reverse_out: True - -SOLOv2Head: - seg_feat_channels: 512 - stacked_convs: 4 - num_grids: [40, 36, 24, 16, 12] - kernel_out_channels: 256 - solov2_loss: SOLOv2Loss - mask_nms: MaskMatrixNMS - -SOLOv2MaskHead: - in_channels: 128 - out_channels: 256 - start_level: 0 - end_level: 3 - -SOLOv2Loss: - ins_loss_weight: 3.0 - focal_loss_gamma: 2.0 - focal_loss_alpha: 0.25 - -MaskMatrixNMS: - pre_nms_top_n: 500 - post_nms_top_n: 100 - -LearningRate: - base_lr: 0.01 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [180000, 240000] - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0001 - type: L2 - -_READER_: 'solov2_reader.yml' -TrainReader: - batch_size: 2 - sample_transforms: - - !DecodeImage - to_rgb: true - - !Poly2Mask {} - - !ResizeImage - target_size: [640, 672, 704, 736, 768, 800] - max_size: 1333 - interp: 1 - use_cv2: true - resize_box: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - - !Gt2Solov2Target - num_grids: [40, 36, 24, 16, 12] - scale_ranges: [[1, 96], [48, 192], [96, 384], [192, 768], [384, 2048]] - coord_sigma: 0.2 diff --git a/static/configs/solov2/solov2_reader.yml b/static/configs/solov2/solov2_reader.yml deleted file mode 100644 index 46f81958ea96708a55769665903b345afb2346ff..0000000000000000000000000000000000000000 --- a/static/configs/solov2/solov2_reader.yml +++ /dev/null @@ -1,100 +0,0 @@ -TrainReader: - batch_size: 2 - worker_num: 2 - inputs_def: - fields: ['image', 'im_id', 'gt_segm'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_train2017.json - image_dir: train2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !Poly2Mask {} - - !ResizeImage - target_size: 800 - max_size: 1333 - interp: 1 - use_cv2: true - resize_box: true - - !RandomFlipImage - prob: 0.5 - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - to_bgr: false - channel_first: true - batch_transforms: - - !PadBatch - pad_to_stride: 32 - - !Gt2Solov2Target - num_grids: [40, 36, 24, 16, 12] - scale_ranges: [[1, 96], [48, 192], [96, 384], [192, 768], [384, 2048]] - coord_sigma: 0.2 - shuffle: True - -EvalReader: - inputs_def: - fields: ['image', 'im_info', 'im_id'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false - # only support batch_size=1 when evaluation - batch_size: 1 - shuffle: false - drop_last: false - drop_empty: false - worker_num: 2 - -TestReader: - inputs_def: - fields: ['image', 'im_info', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: dataset/coco/annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 1333 - target_size: 800 - use_cv2: true - - !NormalizeImage - is_channel_first: false - is_scale: true - mean: [0.485,0.456,0.406] - std: [0.229, 0.224,0.225] - - !Permute - channel_first: true - to_bgr: false - batch_transforms: - - !PadBatch - pad_to_stride: 32 - use_padded_im_info: false diff --git a/static/configs/ssd/ssd_mobilenet_v1_roadsign_kunlun.yml b/static/configs/ssd/ssd_mobilenet_v1_roadsign_kunlun.yml deleted file mode 100644 index bd6e24e1ed8933bd5acaa3441800f350c043f8fb..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssd_mobilenet_v1_roadsign_kunlun.yml +++ /dev/null @@ -1,143 +0,0 @@ -architecture: SSD -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/ssd_mobilenet_v1_voc.tar -use_gpu: false -use_xpu: true -max_iters: 3000 -snapshot_iter: 500 -log_iter: 1 -metric: VOC -map_type: 11point -save_dir: output -weights: output/ssd_mobilenet_v1_roadsign_kunlun/model_final -num_classes: 5 - -SSD: - backbone: MobileNet - multi_box_head: MultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -MobileNet: - norm_decay: 0. - conv_group_scale: 1 - conv_learning_rate: 0.1 - extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] - with_extra_blocks: true - -MultiBoxHead: - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] - base_size: 300 - flip: true - max_ratio: 90 - max_sizes: [[], 150.0, 195.0, 240.0, 285.0, 300.0] - min_ratio: 20 - min_sizes: [60.0, 105.0, 150.0, 195.0, 240.0, 285.0] - offset: 0.5 - -LearningRate: - schedulers: - - !PiecewiseDecay - milestones: [2000, 3000, 4000, 5000] - values: [0.0001, 0.00005, 0.000025, 0.00001, 0.000001] - -OptimizerBuilder: - optimizer: - momentum: 0.0 - type: RMSPropOptimizer - regularizer: - factor: 0.00005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 300, 300] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !VOCDataSet - anno_path: train.txt - dataset_dir: dataset/roadsign_voc - #use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [127.5, 127.5, 127.5] - - !RandomCrop - allow_no_crop: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 300 - use_cv2: false - - !RandomFlipImage - is_normalized: true - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [127.5, 127.5, 127.5] - std: [127.502231, 127.502231, 127.502231] - batch_size: 32 - shuffle: true - drop_last: true - worker_num: 8 - bufsize: 16 - use_process: false - -EvalReader: - inputs_def: - image_shape: [3, 300, 300] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id', 'is_difficult'] - dataset: - !VOCDataSet - anno_path: valid.txt - dataset_dir: dataset/roadsign_voc - #use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 300 - use_cv2: false - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [127.5, 127.5, 127.5] - std: [127.502231, 127.502231, 127.502231] - batch_size: 32 - worker_num: 8 - bufsize: 16 - use_process: false - -TestReader: - inputs_def: - image_shape: [3,300,300] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: test.txt - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 300 - use_cv2: true - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [127.5, 127.5, 127.5] - std: [127.502231, 127.502231, 127.502231] - batch_size: 1 diff --git a/static/configs/ssd/ssd_mobilenet_v1_voc.yml b/static/configs/ssd/ssd_mobilenet_v1_voc.yml deleted file mode 100644 index de5bae42f859390ca233a4b30532da267f1105fa..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssd_mobilenet_v1_voc.yml +++ /dev/null @@ -1,143 +0,0 @@ -architecture: SSD -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/ssd_mobilenet_v1_coco_pretrained.tar -use_gpu: true -max_iters: 28000 -snapshot_iter: 2000 -log_iter: 1 -metric: VOC -map_type: 11point -save_dir: output -weights: output/ssd_mobilenet_v1_voc/model_final -# 20(label_class) + 1(background) -num_classes: 21 - -SSD: - backbone: MobileNet - multi_box_head: MultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -MobileNet: - norm_decay: 0. - conv_group_scale: 1 - conv_learning_rate: 0.1 - extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] - with_extra_blocks: true - -MultiBoxHead: - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] - base_size: 300 - flip: true - max_ratio: 90 - max_sizes: [[], 150.0, 195.0, 240.0, 285.0, 300.0] - min_ratio: 20 - min_sizes: [60.0, 105.0, 150.0, 195.0, 240.0, 285.0] - offset: 0.5 - -LearningRate: - schedulers: - - !PiecewiseDecay - milestones: [10000, 15000, 20000, 25000] - values: [0.001, 0.0005, 0.00025, 0.0001, 0.00001] - -OptimizerBuilder: - optimizer: - momentum: 0.0 - type: RMSPropOptimizer - regularizer: - factor: 0.00005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 300, 300] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !VOCDataSet - anno_path: trainval.txt - dataset_dir: dataset/voc - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [127.5, 127.5, 127.5] - - !RandomCrop - allow_no_crop: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 300 - use_cv2: false - - !RandomFlipImage - is_normalized: true - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [127.5, 127.5, 127.5] - std: [127.502231, 127.502231, 127.502231] - batch_size: 32 - shuffle: true - drop_last: true - worker_num: 8 - bufsize: 16 - use_process: true - -EvalReader: - inputs_def: - image_shape: [3, 300, 300] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id', 'is_difficult'] - dataset: - !VOCDataSet - anno_path: test.txt - dataset_dir: dataset/voc - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 300 - use_cv2: false - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [127.5, 127.5, 127.5] - std: [127.502231, 127.502231, 127.502231] - batch_size: 32 - worker_num: 8 - bufsize: 16 - use_process: false - -TestReader: - inputs_def: - image_shape: [3,300,300] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: test.txt - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 300 - use_cv2: true - - !Permute {} - - !NormalizeImage - is_scale: false - mean: [127.5, 127.5, 127.5] - std: [127.502231, 127.502231, 127.502231] - batch_size: 1 diff --git a/static/configs/ssd/ssd_vgg16_300.yml b/static/configs/ssd/ssd_vgg16_300.yml deleted file mode 100644 index 24100301041b17464d927fdf9a21bbc1959829f7..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssd_vgg16_300.yml +++ /dev/null @@ -1,149 +0,0 @@ -architecture: SSD -use_gpu: true -max_iters: 400000 -snapshot_iter: 10000 -log_iter: 20 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_caffe_pretrained.tar -save_dir: output -weights: output/ssd_vgg16_300/model_final -num_classes: 81 - -SSD: - backbone: VGG - multi_box_head: MultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -VGG: - depth: 16 - with_extra_blocks: true - normalizations: [20., -1, -1, -1, -1, -1] - -MultiBoxHead: - base_size: 300 - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2.], [2.]] - min_ratio: 15 - max_ratio: 90 - min_sizes: [30.0, 60.0, 111.0, 162.0, 213.0, 264.0] - max_sizes: [60.0, 111.0, 162.0, 213.0, 264.0, 315.0] - steps: [8, 16, 32, 64, 100, 300] - offset: 0.5 - flip: true - kernel_size: 3 - pad: 1 - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [280000, 360000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 300, 300] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [104, 117, 123] - - !RandomCrop - allow_no_crop: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 300 - use_cv2: false - - !RandomFlipImage - is_normalized: true - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [1, 1, 1] - batch_size: 8 - shuffle: true - worker_num: 8 - bufsize: 16 - use_process: true - drop_empty: true - -EvalReader: - inputs_def: - image_shape: [3, 300, 300] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 300 - use_cv2: false - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [1, 1, 1] - batch_size: 16 - worker_num: 8 - bufsize: 16 - -TestReader: - inputs_def: - image_shape: [3,300,300] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 300 - use_cv2: true - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [1, 1, 1] - batch_size: 1 diff --git a/static/configs/ssd/ssd_vgg16_300_voc.yml b/static/configs/ssd/ssd_vgg16_300_voc.yml deleted file mode 100644 index 37d834780ef9070039b031f18abd5496643d0257..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssd_vgg16_300_voc.yml +++ /dev/null @@ -1,149 +0,0 @@ -architecture: SSD -use_gpu: true -max_iters: 120001 -snapshot_iter: 10000 -log_iter: 20 -metric: VOC -map_type: 11point -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_caffe_pretrained.tar -save_dir: output -weights: output/ssd_vgg16_300_voc/model_final -# 20(label_class) + 1(background) -num_classes: 21 - -SSD: - backbone: VGG - multi_box_head: MultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -VGG: - depth: 16 - with_extra_blocks: true - normalizations: [20., -1, -1, -1, -1, -1] - -MultiBoxHead: - base_size: 300 - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2.], [2.]] - min_ratio: 20 - max_ratio: 90 - min_sizes: [30.0, 60.0, 111.0, 162.0, 213.0, 264.0] - max_sizes: [60.0, 111.0, 162.0, 213.0, 264.0, 315.0] - steps: [8, 16, 32, 64, 100, 300] - offset: 0.5 - flip: true - min_max_aspect_ratios_order: true - kernel_size: 3 - pad: 1 - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [80000, 100000] - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 300, 300] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: trainval.txt - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [104, 117, 123] - - !RandomCrop - allow_no_crop: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 300 - use_cv2: false - - !RandomFlipImage - is_normalized: true - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [1, 1, 1] - batch_size: 8 - shuffle: true - worker_num: 8 - bufsize: 16 - use_process: true - -EvalReader: - inputs_def: - image_shape: [3, 300, 300] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id', 'is_difficult'] - dataset: - !VOCDataSet - anno_path: test.txt - dataset_dir: dataset/voc - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 300 - use_cv2: false - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [1, 1, 1] - batch_size: 32 - worker_num: 8 - bufsize: 16 - -TestReader: - inputs_def: - image_shape: [3,300,300] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: test.txt - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 300 - use_cv2: true - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [1, 1, 1] - batch_size: 1 diff --git a/static/configs/ssd/ssd_vgg16_512.yml b/static/configs/ssd/ssd_vgg16_512.yml deleted file mode 100644 index 0587fc626f4f72110bb7dc8eda7fb1bf2fe96ed7..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssd_vgg16_512.yml +++ /dev/null @@ -1,151 +0,0 @@ -architecture: SSD -use_gpu: true -max_iters: 400000 -snapshot_iter: 10000 -log_iter: 20 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_caffe_pretrained.tar -save_dir: output -weights: output/ssd_vgg16_512/model_final -num_classes: 81 - -SSD: - backbone: VGG - multi_box_head: MultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -VGG: - depth: 16 - with_extra_blocks: true - normalizations: [20., -1, -1, -1, -1, -1, -1] - extra_block_filters: [[256, 512, 1, 2, 3], [128, 256, 1, 2, 3], [128, 256, 1, 2, 3], [128, 256, 1, 2, 3], [128, 256, 1, 1, 4]] - - -MultiBoxHead: - base_size: 512 - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2.], [2.]] - min_ratio: 15 - max_ratio: 90 - min_sizes: [20.0, 51.0, 133.0, 215.0, 296.0, 378.0, 460.0] - max_sizes: [51.0, 133.0, 215.0, 296.0, 378.0, 460.0, 542.0] - steps: [8, 16, 32, 64, 128, 256, 512] - offset: 0.5 - flip: true - kernel_size: 3 - pad: 1 - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [280000, 360000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 512, 512] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [104, 117, 123] - - !RandomCrop - allow_no_crop: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 512 - use_cv2: false - - !RandomFlipImage - is_normalized: true - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [1, 1, 1] - batch_size: 8 - shuffle: true - worker_num: 8 - bufsize: 16 - use_process: true - -EvalReader: - inputs_def: - image_shape: [3,512,512] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id'] - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !ResizeImage - interp: 1 - target_size: 512 - use_cv2: false - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [1, 1, 1] - batch_size: 8 - worker_num: 8 - bufsize: 16 - drop_empty: false - -TestReader: - inputs_def: - image_shape: [3,512,512] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 512 - use_cv2: true - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [104, 117, 123] - std: [1, 1, 1] - batch_size: 1 diff --git a/static/configs/ssd/ssd_vgg16_512_voc.yml b/static/configs/ssd/ssd_vgg16_512_voc.yml deleted file mode 100644 index e9dc59beb8026ae47a06ae508895925589080e82..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssd_vgg16_512_voc.yml +++ /dev/null @@ -1,153 +0,0 @@ -architecture: SSD -use_gpu: true -max_iters: 120000 -snapshot_iter: 10000 -log_iter: 20 -metric: VOC -map_type: 11point -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_caffe_pretrained.tar -save_dir: output -weights: output/ssd_vgg16_512_voc/model_final -# 20(label_class) + 1(background) -num_classes: 21 - -SSD: - backbone: VGG - multi_box_head: MultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -VGG: - depth: 16 - with_extra_blocks: true - normalizations: [20., -1, -1, -1, -1, -1, -1] - extra_block_filters: [[256, 512, 1, 2, 3], [128, 256, 1, 2, 3], [128, 256, 1, 2, 3], [128, 256, 1, 2, 3], [128, 256, 1, 1, 4]] - - -MultiBoxHead: - base_size: 512 - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2.], [2.]] - min_ratio: 20 - max_ratio: 90 - min_sizes: [20.0, 51.0, 133.0, 215.0, 296.0, 378.0, 460.0] - max_sizes: [51.0, 133.0, 215.0, 296.0, 378.0, 460.0, 542.0] - steps: [8, 16, 32, 64, 128, 256, 512] - offset: 0.5 - flip: true - kernel_size: 3 - pad: 1 - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: [80000, 100000] - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 500 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 512, 512] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: trainval.txt - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [123, 117, 104] - - !RandomCrop - allow_no_crop: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 512 - use_cv2: false - - !RandomFlipImage - is_normalized: true - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [123, 117, 104] - std: [1, 1, 1] - batch_size: 8 - shuffle: true - worker_num: 8 - bufsize: 16 - use_process: true - -EvalReader: - inputs_def: - image_shape: [3, 512, 512] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id', 'is_difficult'] - dataset: - !VOCDataSet - anno_path: test.txt - dataset_dir: dataset/voc - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 512 - use_cv2: false - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [123, 117, 104] - std: [1, 1, 1] - batch_size: 32 - worker_num: 8 - bufsize: 16 - -TestReader: - inputs_def: - image_shape: [3,512,512] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: test.txt - use_default_label: true - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 512 - use_cv2: true - - !Permute - to_bgr: false - - !NormalizeImage - is_scale: false - mean: [123, 117, 104] - std: [1, 1, 1] - batch_size: 1 diff --git a/static/configs/ssd/ssdlite_ghostnet.yml b/static/configs/ssd/ssdlite_ghostnet.yml deleted file mode 100644 index c76eb87833f08bc4a3dbf2637deeedb1be192789..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssdlite_ghostnet.yml +++ /dev/null @@ -1,161 +0,0 @@ -architecture: SSD -use_gpu: true -max_iters: 400000 -snapshot_iter: 20000 -log_iter: 20 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x1_3_ssld_pretrained.tar -save_dir: output -weights: output/ssdlite_ghostnet/model_final -# 80(label_class) + 1(background) -num_classes: 81 - -SSD: - backbone: GhostNet - multi_box_head: SSDLiteMultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - - -GhostNet: - scale: 1.3 - extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] - feature_maps: [5, 7, 8, 9, 10, 11] - conv_decay: 0.00004 - lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] - -SSDLiteMultiBoxHead: - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] - base_size: 320 - steps: [16, 32, 64, 107, 160, 320] - flip: true - clip: true - max_ratio: 95 - min_ratio: 20 - offset: 0.5 - conv_decay: 0.00004 - -LearningRate: - base_lr: 0.2 - schedulers: - - !CosineDecay - max_iters: 400000 - - !LinearWarmup - start_factor: 0.33333 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_train2017.json - image_dir: train2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop - allow_no_crop: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 320 - use_cv2: false - - !RandomFlipImage - is_normalized: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: true - batch_size: 64 - shuffle: true - drop_last: true - # Number of working threads/processes. To speed up, can be set to 16 or 32 etc. - worker_num: 8 - # Size of shared memory used in result queue. After increasing `worker_num`, need expand `memsize`. - memsize: 8G - # Buffer size for multi threads/processes.one instance in buffer is one batch data. - # To speed up, can be set to 64 or 128 etc. - bufsize: 32 - use_process: true - - -EvalReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_val2017.json - image_dir: val2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 320 - use_cv2: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 8 - worker_num: 8 - bufsize: 32 - use_process: false - -TestReader: - inputs_def: - image_shape: [3,320,320] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 320 - use_cv2: true - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/ssd/ssdlite_mobilenet_v1.yml b/static/configs/ssd/ssdlite_mobilenet_v1.yml deleted file mode 100644 index c118bc2e489e2808fea798ebc33a04f70afb4c91..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssdlite_mobilenet_v1.yml +++ /dev/null @@ -1,158 +0,0 @@ -architecture: SSD -use_gpu: true -max_iters: 400000 -snapshot_iter: 20000 -log_iter: 20 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_ssld_pretrained.tar -save_dir: output -weights: output/ssdlite_mobilenet_v1/model_final -num_classes: 81 - -SSD: - backbone: MobileNet - multi_box_head: SSDLiteMultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -MobileNet: - conv_decay: 0.00004 - conv_group_scale: 1 - extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] - with_extra_blocks: true - -SSDLiteMultiBoxHead: - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] - base_size: 300 - steps: [16, 32, 64, 100, 150, 300] - flip: true - clip: true - max_ratio: 95 - min_ratio: 20 - offset: 0.5 - conv_decay: 0.00004 - -LearningRate: - base_lr: 0.4 - schedulers: - - !CosineDecay - max_iters: 400000 - - !LinearWarmup - start_factor: 0.33333 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 300, 300] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_train2017.json - image_dir: train2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop - allow_no_crop: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 300 - use_cv2: false - - !RandomFlipImage - is_normalized: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: true - batch_size: 64 - shuffle: true - drop_last: true - # Number of working threads/processes. To speed up, can be set to 16 or 32 etc. - worker_num: 8 - # Size of shared memory used in result queue. After increasing `worker_num`, need expand `memsize`. - memsize: 8G - # Buffer size for multi threads/processes.one instance in buffer is one batch data. - # To speed up, can be set to 64 or 128 etc. - bufsize: 32 - use_process: true - - -EvalReader: - inputs_def: - image_shape: [3, 300, 300] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_val2017.json - image_dir: val2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 300 - use_cv2: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 8 - worker_num: 8 - bufsize: 32 - use_process: false - -TestReader: - inputs_def: - image_shape: [3,300,300] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 300 - use_cv2: true - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/ssd/ssdlite_mobilenet_v3_large.yml b/static/configs/ssd/ssdlite_mobilenet_v3_large.yml deleted file mode 100644 index d75693575867c872496881efde9b829562d920e5..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssdlite_mobilenet_v3_large.yml +++ /dev/null @@ -1,162 +0,0 @@ -architecture: SSD -use_gpu: true -max_iters: 400000 -snapshot_iter: 20000 -log_iter: 20 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar -save_dir: output -weights: output/ssdlite_mobilenet_v3_large/model_final -# 80(label_class) + 1(background) -num_classes: 81 - -SSD: - backbone: MobileNetV3 - multi_box_head: SSDLiteMultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -MobileNetV3: - scale: 1.0 - model_name: large - extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] - feature_maps: [5, 7, 8, 9, 10, 11] - lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] - conv_decay: 0.00004 - multiplier: 0.5 - -SSDLiteMultiBoxHead: - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] - base_size: 320 - steps: [16, 32, 64, 107, 160, 320] - flip: true - clip: true - max_ratio: 95 - min_ratio: 20 - offset: 0.5 - conv_decay: 0.00004 - -LearningRate: - base_lr: 0.4 - schedulers: - - !CosineDecay - max_iters: 400000 - - !LinearWarmup - start_factor: 0.33333 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_train2017.json - image_dir: train2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop - allow_no_crop: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 320 - use_cv2: false - - !RandomFlipImage - is_normalized: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: true - batch_size: 64 - shuffle: true - drop_last: true - # Number of working threads/processes. To speed up, can be set to 16 or 32 etc. - worker_num: 8 - # Size of shared memory used in result queue. After increasing `worker_num`, need expand `memsize`. - memsize: 8G - # Buffer size for multi threads/processes.one instance in buffer is one batch data. - # To speed up, can be set to 64 or 128 etc. - bufsize: 32 - use_process: true - - -EvalReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_val2017.json - image_dir: val2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 320 - use_cv2: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 8 - worker_num: 8 - bufsize: 32 - use_process: false - -TestReader: - inputs_def: - image_shape: [3,320,320] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 320 - use_cv2: true - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/ssd/ssdlite_mobilenet_v3_large_fpn.yml b/static/configs/ssd/ssdlite_mobilenet_v3_large_fpn.yml deleted file mode 100644 index 02abac3aa43bbd502a0f483c969710e0f50a458f..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssdlite_mobilenet_v3_large_fpn.yml +++ /dev/null @@ -1,169 +0,0 @@ -architecture: SSD -use_gpu: true -max_iters: 400000 -snapshot_iter: 20000 -log_iter: 20 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar -save_dir: output -weights: output/ssdlite_mobilenet_v3_large_fpn/model_final -# 80(label_class) + 1(background) -num_classes: 81 - -SSD: - backbone: MobileNetV3 - fpn: FPN - multi_box_head: SSDLiteMultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -FPN: - num_chan: 256 - max_level: 7 - norm_type: bn - norm_decay: 0.00004 - reverse_out: true - -MobileNetV3: - scale: 1.0 - model_name: large - extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] - feature_maps: [5, 7, 8, 9, 10, 11] - lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] - conv_decay: 0.00004 - -SSDLiteMultiBoxHead: - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] - base_size: 320 - steps: [16, 32, 64, 107, 160, 320] - flip: true - clip: true - max_ratio: 95 - min_ratio: 20 - offset: 0.5 - conv_decay: 0.00004 - -LearningRate: - base_lr: 0.4 - schedulers: - - !CosineDecay - max_iters: 400000 - - !LinearWarmup - start_factor: 0.33333 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_train2017.json - image_dir: train2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop - allow_no_crop: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 320 - use_cv2: false - - !RandomFlipImage - is_normalized: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: true - batch_size: 64 - shuffle: true - drop_last: true - # Number of working threads/processes. To speed up, can be set to 16 or 32 etc. - worker_num: 8 - # Size of shared memory used in result queue. After increasing `worker_num`, need expand `memsize`. - memsize: 8G - # Buffer size for multi threads/processes.one instance in buffer is one batch data. - # To speed up, can be set to 64 or 128 etc. - bufsize: 32 - use_process: true - - -EvalReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_val2017.json - image_dir: val2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 320 - use_cv2: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 8 - worker_num: 8 - bufsize: 32 - use_process: false - -TestReader: - inputs_def: - image_shape: [3,320,320] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 320 - use_cv2: true - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/ssd/ssdlite_mobilenet_v3_small.yml b/static/configs/ssd/ssdlite_mobilenet_v3_small.yml deleted file mode 100644 index 09dc73f368eae46d06b3f8c18ab79aa008bf38b8..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssdlite_mobilenet_v3_small.yml +++ /dev/null @@ -1,162 +0,0 @@ -architecture: SSD -use_gpu: true -max_iters: 400000 -snapshot_iter: 20000 -log_iter: 20 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_ssld_pretrained.tar -save_dir: output -weights: output/ssd_mobilenet_v3_small/model_final -# 80(label_class) + 1(background) -num_classes: 81 - -SSD: - backbone: MobileNetV3 - multi_box_head: SSDLiteMultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -MobileNetV3: - scale: 1.0 - model_name: small - extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] - feature_maps: [5, 7, 8, 9, 10, 11] - lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] - conv_decay: 0.00004 - multiplier: 0.5 - -SSDLiteMultiBoxHead: - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] - base_size: 320 - steps: [16, 32, 64, 107, 160, 320] - flip: true - clip: true - max_ratio: 95 - min_ratio: 20 - offset: 0.5 - conv_decay: 0.00004 - -LearningRate: - base_lr: 0.4 - schedulers: - - !CosineDecay - max_iters: 400000 - - !LinearWarmup - start_factor: 0.33333 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_train2017.json - image_dir: train2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop - allow_no_crop: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 320 - use_cv2: false - - !RandomFlipImage - is_normalized: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: true - batch_size: 64 - shuffle: true - drop_last: true - # Number of working threads/processes. To speed up, can be set to 16 or 32 etc. - worker_num: 8 - # Size of shared memory used in result queue. After increasing `worker_num`, need expand `memsize`. - memsize: 8G - # Buffer size for multi threads/processes.one instance in buffer is one batch data. - # To speed up, can be set to 64 or 128 etc. - bufsize: 32 - use_process: true - - -EvalReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_val2017.json - image_dir: val2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 320 - use_cv2: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 8 - worker_num: 8 - bufsize: 32 - use_process: false - -TestReader: - inputs_def: - image_shape: [3,320,320] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 320 - use_cv2: true - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/ssd/ssdlite_mobilenet_v3_small_fpn.yml b/static/configs/ssd/ssdlite_mobilenet_v3_small_fpn.yml deleted file mode 100644 index a8270ca220577b18cb68324fc5695818b8ecaad4..0000000000000000000000000000000000000000 --- a/static/configs/ssd/ssdlite_mobilenet_v3_small_fpn.yml +++ /dev/null @@ -1,169 +0,0 @@ -architecture: SSD -use_gpu: true -max_iters: 400000 -snapshot_iter: 20000 -log_iter: 20 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_ssld_pretrained.tar -save_dir: output -weights: output/ssdlite_mobilenet_v3_small_fpn/model_final -# 80(label_class) + 1(background) -num_classes: 81 - -SSD: - backbone: MobileNetV3 - fpn: FPN - multi_box_head: SSDLiteMultiBoxHead - output_decoder: - background_label: 0 - keep_top_k: 200 - nms_eta: 1.0 - nms_threshold: 0.45 - nms_top_k: 400 - score_threshold: 0.01 - -FPN: - num_chan: 256 - max_level: 7 - norm_type: bn - norm_decay: 0.00004 - reverse_out: true - -MobileNetV3: - scale: 1.0 - model_name: small - extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]] - feature_maps: [5, 7, 8, 9, 10, 11] - lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75] - conv_decay: 0.00004 - -SSDLiteMultiBoxHead: - aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]] - base_size: 320 - steps: [16, 32, 64, 107, 160, 320] - flip: true - clip: true - max_ratio: 95 - min_ratio: 20 - offset: 0.5 - conv_decay: 0.00004 - -LearningRate: - base_lr: 0.4 - schedulers: - - !CosineDecay - max_iters: 400000 - - !LinearWarmup - start_factor: 0.33333 - steps: 2000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'gt_bbox', 'gt_class'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_train2017.json - image_dir: train2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !RandomDistort - brightness_lower: 0.875 - brightness_upper: 1.125 - is_order: true - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop - allow_no_crop: false - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 320 - use_cv2: false - - !RandomFlipImage - is_normalized: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: true - batch_size: 64 - shuffle: true - drop_last: true - # Number of working threads/processes. To speed up, can be set to 16 or 32 etc. - worker_num: 8 - # Size of shared memory used in result queue. After increasing `worker_num`, need expand `memsize`. - memsize: 8G - # Buffer size for multi threads/processes.one instance in buffer is one batch data. - # To speed up, can be set to 64 or 128 etc. - bufsize: 32 - use_process: true - - -EvalReader: - inputs_def: - image_shape: [3, 320, 320] - fields: ['image', 'gt_bbox', 'gt_class', 'im_shape', 'im_id'] - dataset: - !COCODataSet - dataset_dir: dataset/coco - anno_path: annotations/instances_val2017.json - image_dir: val2017 - sample_transforms: - - !DecodeImage - to_rgb: true - - !NormalizeBox {} - - !ResizeImage - interp: 1 - target_size: 320 - use_cv2: false - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 8 - worker_num: 8 - bufsize: 32 - use_process: false - -TestReader: - inputs_def: - image_shape: [3,320,320] - fields: ['image', 'im_id', 'im_shape'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - sample_transforms: - - !DecodeImage - to_rgb: true - - !ResizeImage - interp: 1 - max_size: 0 - target_size: 320 - use_cv2: true - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/yolov3_darknet.yml b/static/configs/yolov3_darknet.yml deleted file mode 100644 index 0aa2fcac288f7ba06302e266e5eed8ef6a7aa542..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_darknet.yml +++ /dev/null @@ -1,61 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 500000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar -weights: output/yolov3_darknet/model_final -num_classes: 80 -use_fine_grained_loss: false - -YOLOv3: - backbone: DarkNet - yolo_head: YOLOv3Head - -DarkNet: - norm_type: sync_bn - norm_decay: 0. - depth: 53 - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: true - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_reader.yml' diff --git a/static/configs/yolov3_darknet_roadsign.yml b/static/configs/yolov3_darknet_roadsign.yml deleted file mode 100644 index 16fa37a3c9a779184ee3457f0d45a74fa5890a8a..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_darknet_roadsign.yml +++ /dev/null @@ -1,172 +0,0 @@ -architecture: YOLOv3 -use_gpu: false -max_iters: 1200 -log_iter: 1 -save_dir: output -snapshot_iter: 200 -metric: VOC -map_type: integral -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar -weights: output/yolov3_darknet_roadsign/model_final -num_classes: 4 -finetune_exclude_pretrained_params: ['yolo_output'] -use_fine_grained_loss: false - -YOLOv3: - backbone: DarkNet - yolo_head: YOLOv3Head - -DarkNet: - norm_type: sync_bn - norm_decay: 0. - depth: 53 - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: true - -LearningRate: - base_lr: 0.0001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 800 - - 1100 - - !LinearWarmup - start_factor: 0. - steps: 100 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: train.txt - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - ratio: 1.5 - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - downsample_ratios: [32, 16, 8] - batch_size: 4 - shuffle: true - mixup_epoch: 250 - drop_last: true - worker_num: 2 - bufsize: 2 - use_process: false #true - - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: valid.txt - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 50 - - !Permute - to_bgr: false - channel_first: True - batch_size: 4 - drop_empty: false - worker_num: 4 - bufsize: 2 - -TestReader: - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: dataset/roadsign_voc/label_list.txt - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/yolov3_darknet_roadsign_kunlun.yml b/static/configs/yolov3_darknet_roadsign_kunlun.yml deleted file mode 100644 index 3def79eb187e5968f9589a9d681a9d46b7b00951..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_darknet_roadsign_kunlun.yml +++ /dev/null @@ -1,173 +0,0 @@ -architecture: YOLOv3 -use_gpu: false -use_xpu: true -max_iters: 1200 -log_iter: 1 -save_dir: output -snapshot_iter: 200 -metric: VOC -map_type: integral -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar -weights: output/yolov3_darknet_roadsign_xpu/model_final -num_classes: 4 -finetune_exclude_pretrained_params: ['yolo_output'] -use_fine_grained_loss: false - -YOLOv3: - backbone: DarkNet - yolo_head: YOLOv3Head - -DarkNet: - norm_type: bn - norm_decay: 0. - depth: 53 - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: true - -LearningRate: - base_lr: 0.000125 #0.00025 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 800 #400 - - 1100 #550 - - !LinearWarmup - start_factor: 0. - steps: 200 #200 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: train.txt - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - ratio: 1.5 - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - downsample_ratios: [32, 16, 8] - batch_size: 2 - shuffle: true - mixup_epoch: 250 - drop_last: true - worker_num: 2 - bufsize: 2 - use_process: false #true - - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: valid.txt - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 50 - - !Permute - to_bgr: false - channel_first: True - batch_size: 4 - drop_empty: false - worker_num: 4 - bufsize: 2 - -TestReader: - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: dataset/roadsign_voc/label_list.txt - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/yolov3_darknet_voc.yml b/static/configs/yolov3_darknet_voc.yml deleted file mode 100644 index f8435a9b89a2fc629c805fcaf95e005af2200771..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_darknet_voc.yml +++ /dev/null @@ -1,89 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 70000 -log_iter: 20 -save_dir: output -snapshot_iter: 2000 -metric: VOC -map_type: 11point -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar -weights: output/yolov3_darknet_voc/model_final -num_classes: 20 -use_fine_grained_loss: false - -YOLOv3: - backbone: DarkNet - yolo_head: YOLOv3Head - -DarkNet: - norm_type: sync_bn - norm_decay: 0. - depth: 53 - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 55000 - - 62000 - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_reader.yml' -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: trainval.txt - use_default_label: true - with_background: false - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: test.txt - use_default_label: true - with_background: false - -TestReader: - dataset: - !ImageFolder - use_default_label: true - with_background: false diff --git a/static/configs/yolov3_darknet_voc_diouloss.yml b/static/configs/yolov3_darknet_voc_diouloss.yml deleted file mode 100644 index 979c825e18ac9cc315c45c209202f8586249cec6..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_darknet_voc_diouloss.yml +++ /dev/null @@ -1,93 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 70000 -log_iter: 20 -save_dir: output -snapshot_iter: 2000 -metric: VOC -map_type: 11point -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar -weights: output/yolov3_darknet_voc/model_final -num_classes: 20 -use_fine_grained_loss: true - -YOLOv3: - backbone: DarkNet - yolo_head: YOLOv3Head - -DarkNet: - norm_type: sync_bn - norm_decay: 0. - depth: 53 - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - iou_loss: DiouLossYolo - -DiouLossYolo: - loss_weight: 5 - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 55000 - - 62000 - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_reader.yml' -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: trainval.txt - use_default_label: true - with_background: false - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: test.txt - use_default_label: true - with_background: false - -TestReader: - dataset: - !ImageFolder - use_default_label: true - with_background: false diff --git a/static/configs/yolov3_mobilenet_v1.yml b/static/configs/yolov3_mobilenet_v1.yml deleted file mode 100644 index d39530341d2e7cfdaa302c7a3ff1136fa67b2080..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_mobilenet_v1.yml +++ /dev/null @@ -1,62 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 500000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar -weights: output/yolov3_mobilenet_v1/model_final -num_classes: 80 -use_fine_grained_loss: false - -YOLOv3: - backbone: MobileNet - yolo_head: YOLOv3Head - -MobileNet: - norm_type: sync_bn - norm_decay: 0. - conv_group_scale: 1 - with_extra_blocks: false - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: true - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_reader.yml' diff --git a/static/configs/yolov3_mobilenet_v1_fruit.yml b/static/configs/yolov3_mobilenet_v1_fruit.yml deleted file mode 100644 index 757c106373ac0c1b0f85fd28a16ec7cdda2f7734..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_mobilenet_v1_fruit.yml +++ /dev/null @@ -1,129 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 20000 -log_iter: 20 -save_dir: output -snapshot_iter: 200 -metric: VOC -map_type: 11point -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar -weights: output/yolov3_mobilenet_v1_fruit/best_model -num_classes: 3 -finetune_exclude_pretrained_params: ['yolo_output'] -use_fine_grained_loss: false - -YOLOv3: - backbone: MobileNet - yolo_head: YOLOv3Head - -MobileNet: - norm_type: sync_bn - norm_decay: 0. - conv_group_scale: 1 - with_extra_blocks: false - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: true - -LearningRate: - base_lr: 0.00001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 15000 - - 18000 - - !LinearWarmup - start_factor: 0. - steps: 100 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_reader.yml' -# will merge TrainReader into yolov3_reader.yml -TrainReader: - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/fruit - anno_path: train.txt - with_background: false - use_default_label: false - sample_transforms: - - !DecodeImage - to_rgb: true - with_mixup: false - - !NormalizeBox {} - - !ExpandImage - max_ratio: 4.0 - mean: [123.675, 116.28, 103.53] - prob: 0.5 - - !RandomInterpImage - max_size: 0 - target_size: 608 - - !RandomFlipImage - is_normalized: true - prob: 0.5 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: true - is_channel_first: false - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [608] - - !Permute - channel_first: true - to_bgr: false - batch_size: 1 - shuffle: true - mixup_epoch: -1 - -EvalReader: - batch_size: 1 - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/fruit - anno_path: val.txt - use_default_label: false - with_background: false - -TestReader: - batch_size: 1 - dataset: - !ImageFolder - anno_path: dataset/fruit/label_list.txt - use_default_label: false - with_background: false diff --git a/static/configs/yolov3_mobilenet_v1_roadsign.yml b/static/configs/yolov3_mobilenet_v1_roadsign.yml deleted file mode 100644 index 89fd3e7cb8705864a0d4e1ff95043e1c740caf48..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_mobilenet_v1_roadsign.yml +++ /dev/null @@ -1,175 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 3600 -log_smooth_window: 20 -save_dir: output -snapshot_iter: 200 -metric: VOC -map_type: integral -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar -weights: output/yolov3_mobilenet_v1_roadsign/best_model -num_classes: 4 -finetune_exclude_pretrained_params: ['yolo_output'] -use_fine_grained_loss: false - -YOLOv3: - backbone: MobileNet - yolo_head: YOLOv3Head - -MobileNet: - norm_decay: 0. - conv_group_scale: 1 - with_extra_blocks: false - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: true - -LearningRate: - base_lr: 0.0001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 2400 - - 3300 - - !LinearWarmup - start_factor: 0.3333333333333333 - steps: 100 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -# _READER_: 'yolov3_reader.yml' -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: train.txt - with_background: false - use_default_label: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - ratio: 1.5 - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - downsample_ratios: [32, 16, 8] - batch_size: 8 - shuffle: true - mixup_epoch: 250 - drop_last: true - worker_num: 4 - bufsize: 2 - use_process: true - - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/roadsign_voc - anno_path: valid.txt - with_background: false - use_default_label: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 50 - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 - drop_empty: false - worker_num: 4 - bufsize: 2 - -TestReader: - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: dataset/roadsign_voc/label_list.txt - with_background: false - use_default_label: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/yolov3_mobilenet_v1_voc.yml b/static/configs/yolov3_mobilenet_v1_voc.yml deleted file mode 100644 index d5d324dc46c70bb6f7c510160aa77e75a5877ad3..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_mobilenet_v1_voc.yml +++ /dev/null @@ -1,87 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 70000 -log_iter: 20 -save_dir: output -snapshot_iter: 2000 -metric: VOC -map_type: 11point -pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar -weights: output/yolov3_mobilenet_v1_voc/model_final -num_classes: 20 -use_fine_grained_loss: false - -YOLOv3: - backbone: MobileNet - yolo_head: YOLOv3Head - -MobileNet: - norm_type: sync_bn - norm_decay: 0. - conv_group_scale: 1 - with_extra_blocks: false - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 55000 - - 62000 - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_reader.yml' -TrainReader: - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: trainval.txt - use_default_label: true - with_background: false - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: test.txt - use_default_label: true - with_background: false - -TestReader: - dataset: - !ImageFolder - use_default_label: true - with_background: false diff --git a/static/configs/yolov3_mobilenet_v3.yml b/static/configs/yolov3_mobilenet_v3.yml deleted file mode 100644 index bc2b6b3a55e64c1f30fb98ab1ab220b02a2dce9c..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_mobilenet_v3.yml +++ /dev/null @@ -1,64 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 500000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_pretrained.tar -weights: output/yolov3_mobilenet_v3/model_final -num_classes: 80 -use_fine_grained_loss: false - -YOLOv3: - backbone: MobileNetV3 - yolo_head: YOLOv3Head - -MobileNetV3: - norm_type: sync_bn - norm_decay: 0. - model_name: large - scale: 1. - extra_block_filters: [] - feature_maps: [1, 2, 3, 4, 6] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_reader.yml' diff --git a/static/configs/yolov3_r34.yml b/static/configs/yolov3_r34.yml deleted file mode 100644 index 76edc41d0d178605956593ad7df0cfbfb8e2d5b4..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_r34.yml +++ /dev/null @@ -1,64 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 500000 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar -weights: output/yolov3_r34/model_final -num_classes: 80 -use_fine_grained_loss: false - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 34 - feature_maps: [3, 4, 5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: true - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_reader.yml' diff --git a/static/configs/yolov3_r34_voc.yml b/static/configs/yolov3_r34_voc.yml deleted file mode 100644 index 0e48498e4ccc88c251884137afb31974a7a83aa9..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_r34_voc.yml +++ /dev/null @@ -1,89 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 70000 -log_iter: 20 -save_dir: output -snapshot_iter: 2000 -metric: VOC -map_type: 11point -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar -weights: output/yolov3_r34_voc/model_final -num_classes: 20 -use_fine_grained_loss: false - -YOLOv3: - backbone: ResNet - yolo_head: YOLOv3Head - -ResNet: - norm_type: sync_bn - freeze_at: 0 - freeze_norm: false - norm_decay: 0. - depth: 34 - feature_maps: [3, 4, 5] - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 55000 - - 62000 - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: 'yolov3_reader.yml' -TrainReader: - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: trainval.txt - use_default_label: true - with_background: false - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 50 - dataset: - !VOCDataSet - dataset_dir: dataset/voc - anno_path: test.txt - use_default_label: true - with_background: false - -TestReader: - dataset: - !ImageFolder - use_default_label: true - with_background: false diff --git a/static/configs/yolov3_reader.yml b/static/configs/yolov3_reader.yml deleted file mode 100644 index 2a8463f1e6c2cb598ea4a55c6289f5b04b290d4a..0000000000000000000000000000000000000000 --- a/static/configs/yolov3_reader.yml +++ /dev/null @@ -1,111 +0,0 @@ -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score'] - num_max_boxes: 50 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - with_mixup: True - - !MixupImage - alpha: 1.5 - beta: 1.5 - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] - random_inter: True - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - downsample_ratios: [32, 16, 8] - batch_size: 8 - shuffle: true - mixup_epoch: 250 - drop_last: true - worker_num: 8 - bufsize: 16 - use_process: true - - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id'] - num_max_boxes: 50 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 50 - - !Permute - to_bgr: false - channel_first: True - batch_size: 8 - drop_empty: false - worker_num: 8 - bufsize: 16 - -TestReader: - inputs_def: - image_shape: [3, 608, 608] - fields: ['image', 'im_size', 'im_id'] - dataset: - !ImageFolder - anno_path: annotations/instances_val2017.json - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 2 - - !NormalizeImage - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 diff --git a/static/configs/yolov4/README.md b/static/configs/yolov4/README.md deleted file mode 100644 index 55e8a050dbbb43737c1792e8501ee37d1f724ff0..0000000000000000000000000000000000000000 --- a/static/configs/yolov4/README.md +++ /dev/null @@ -1,60 +0,0 @@ -# YOLO v4 模型 - -## 内容 -- [简介](#简介) -- [模型库与基线](#模型库与基线) -- [未来工作](#未来工作) -- [如何贡献代码](#如何贡献代码) - -## 简介 - -[YOLO v4](https://arxiv.org/abs/2004.10934)的Paddle实现版本,要求使用PaddlePaddle2.0.0及以上版本或适当的develop版本 - -目前转换了[darknet](https://github.com/AlexeyAB/darknet)中YOLO v4的权重,可以直接对图片进行预测,在[test-dev2019](http://cocodataset.org/#detection-2019)中精度为43.5%。另外,支持VOC数据集上finetune,精度达到85.5% - -目前支持YOLO v4的多个模块: - -- mish激活函数 -- PAN模块 -- SPP模块 -- ciou loss -- label_smooth -- grid_sensitive - -目前支持YOLO系列的Anchor聚类算法 -``` bash -python tools/anchor_cluster.py -c ${config} -m ${method} -s ${size} -``` -主要参数配置参考下表 -| 参数 | 用途 | 默认值 | 备注 | -|:------:|:------:|:------:|:------:| -| -c/--config | 模型的配置文件 | 无默认值 | 必须指定 | -| -n/--n | 聚类的簇数 | 9 | Anchor的数目 | -| -s/--size | 图片的输入尺寸 | None | 若指定,则使用指定的尺寸,如果不指定, 则尝试从配置文件中读取图片尺寸 | -| -m/--method | 使用的Anchor聚类方法 | v2 | 目前只支持yolov2的聚类算法 | -| -i/--iters | kmeans聚类算法的迭代次数 | 1000 | kmeans算法收敛或者达到迭代次数后终止 | - -## 模型库 -下表中展示了当前支持的网络结构。 - -| | GPU个数 | 测试集 | 骨干网络 | 精度 | 模型下载 | 配置文件 | -|:------------------------:|:-------:|:------:|:--------------------------:|:------------------------:| :---------:| :-----: | -| YOLO v4 | - |test-dev2019 | CSPDarkNet53 | 43.5 |[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_cspdarknet.yml) | -| YOLO v4 VOC | 2 | VOC2007 | CSPDarkNet53 | 85.5 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_cspdarknet_voc.yml) | - -**注意:** - -- 由于原版YOLO v4使用coco trainval2014进行训练,训练样本中包含部分评估样本,若使用val集会导致精度虚高,因此使用coco test集对模型进行评估。 -- YOLO v4模型仅支持coco test集评估和图片预测,由于test集不包含目标框的真实标注,评估时会将预测结果保存在json文件中,请将结果提交至[cocodataset](http://cocodataset.org/#detection-2019)上查看最终精度指标。 -- coco测试集使用test2017,下载请参考[coco2017](http://cocodataset.org/#download) - - -## 未来工作 - -1. mish激活函数优化 -2. mosaic数据预处理实现 - - - -## 如何贡献代码 -我们非常欢迎您可以为PaddleDetection提供代码,您可以提交PR供我们review;也十分感谢您的反馈,可以提交相应issue,我们会及时解答。 diff --git a/static/configs/yolov4/yolov4_cspdarknet.yml b/static/configs/yolov4/yolov4_cspdarknet.yml deleted file mode 100644 index e2299feee7b5dda667329c22f87ef0924442645d..0000000000000000000000000000000000000000 --- a/static/configs/yolov4/yolov4_cspdarknet.yml +++ /dev/null @@ -1,118 +0,0 @@ -architecture: YOLOv4 -use_gpu: true -max_iters: 500200 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams -weights: output/yolov4_cspdarknet/model_final -num_classes: 80 -use_fine_grained_loss: true -save_prediction_only: True - -YOLOv4: - backbone: CSPDarkNet - yolo_head: YOLOv4Head - -CSPDarkNet: - norm_type: sync_bn - norm_decay: 0. - depth: 53 - -YOLOv4Head: - anchors: [[12, 16], [19, 36], [40, 28], [36, 75], [76, 55], - [72, 146], [142, 110], [192, 243], [459, 401]] - anchor_masks: [[0, 1, 2], [3, 4, 5], [6, 7, 8]] - nms: - background_label: -1 - keep_top_k: -1 - nms_threshold: 0.45 - nms_top_k: -1 - normalized: true - score_threshold: 0.001 - downsample: [8,16,32] - scale_x_y: [1.2, 1.1, 1.05] - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: true - downsample: [8,16,32] - scale_x_y: [1.2, 1.1, 1.05] - iou_loss: IouLoss - match_score: true - -IouLoss: - loss_weight: 0.07 - max_height: 608 - max_width: 608 - ciou_term: true - loss_square: false - -LearningRate: - base_lr: 0.0001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - clip_grad_by_norm: 10. - optimizer: - momentum: 0.949 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: '../yolov3_reader.yml' -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id'] - num_max_boxes: 90 - dataset: - !COCODataSet - image_dir: test2017 - anno_path: annotations/image_info_test-dev2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 1 - - !NormalizeImage - mean: [0., 0., 0.] - std: [1., 1., 1.] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - batch_size: 1 - -TestReader: - dataset: - !ImageFolder - use_default_label: true - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 1 - - !NormalizeImage - mean: [0., 0., 0.] - std: [1., 1., 1.] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True diff --git a/static/configs/yolov4/yolov4_cspdarknet_coco.yml b/static/configs/yolov4/yolov4_cspdarknet_coco.yml deleted file mode 100644 index 4cb44c0777f5b0bd026e63d13ca45ee5b4e497e8..0000000000000000000000000000000000000000 --- a/static/configs/yolov4/yolov4_cspdarknet_coco.yml +++ /dev/null @@ -1,174 +0,0 @@ -architecture: YOLOv4 -use_gpu: true -max_iters: 500200 -log_iter: 20 -save_dir: output -snapshot_iter: 10000 -metric: COCO -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/CSPDarkNet53_pretrained.pdparams -weights: output/yolov4_cspdarknet_coco/model_final -num_classes: 80 -use_fine_grained_loss: true - -YOLOv4: - backbone: CSPDarkNet - yolo_head: YOLOv4Head - -CSPDarkNet: - norm_type: sync_bn - norm_decay: 0. - depth: 53 - -YOLOv4Head: - anchors: [[12, 16], [19, 36], [40, 28], [36, 75], [76, 55], - [72, 146], [142, 110], [192, 243], [459, 401]] - anchor_masks: [[0, 1, 2], [3, 4, 5], [6, 7, 8]] - nms: - background_label: -1 - keep_top_k: -1 - nms_threshold: 0.45 - nms_top_k: -1 - normalized: true - score_threshold: 0.001 - downsample: [8,16,32] - scale_x_y: [1.2, 1.1, 1.05] - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: true - downsample: [8,16,32] - scale_x_y: [1.2, 1.1, 1.05] - iou_loss: IouLoss - match_score: true - -IouLoss: - loss_weight: 0.07 - max_height: 608 - max_width: 608 - ciou_term: true - loss_square: true - -LearningRate: - base_lr: 0.0001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 400000 - - 450000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - clip_grad_by_norm: 10. - optimizer: - momentum: 0.949 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: '../yolov3_reader.yml' -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score', 'im_id'] - num_max_boxes: 50 - dataset: - !COCODataSet - image_dir: train2017 - anno_path: annotations/instances_train2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] - random_inter: True - - !NormalizeImage - mean: [0.,0.,0.] - std: [1.,1.,1.] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[0, 1, 2], [3, 4, 5], [6, 7, 8]] - anchors: [[12, 16], [19, 36], [40, 28], - [36, 75], [76, 55], [72, 146], - [142, 110], [192, 243], [459, 401]] - downsample_ratios: [8, 16, 32] - batch_size: 8 - shuffle: true - drop_last: true - worker_num: 8 - bufsize: 16 - use_process: true - drop_empty: false - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id'] - num_max_boxes: 90 - dataset: - !COCODataSet - image_dir: val2017 - anno_path: annotations/instances_val2017.json - dataset_dir: dataset/coco - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 1 - - !NormalizeImage - mean: [0., 0., 0.] - std: [1., 1., 1.] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 90 - - !Permute - to_bgr: false - channel_first: True - batch_size: 4 - drop_empty: false - worker_num: 8 - bufsize: 16 - -TestReader: - dataset: - !ImageFolder - use_default_label: true - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 1 - - !NormalizeImage - mean: [0., 0., 0.] - std: [1., 1., 1.] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True diff --git a/static/configs/yolov4/yolov4_cspdarknet_voc.yml b/static/configs/yolov4/yolov4_cspdarknet_voc.yml deleted file mode 100644 index 4bdc7669a6807fffaf1a2d4d340a1cfca70f4ede..0000000000000000000000000000000000000000 --- a/static/configs/yolov4/yolov4_cspdarknet_voc.yml +++ /dev/null @@ -1,173 +0,0 @@ -architecture: YOLOv4 -use_gpu: true -max_iters: 140000 -log_iter: 20 -save_dir: output -snapshot_iter: 1000 -metric: VOC -pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams -weights: output/yolov4_cspdarknet_voc/model_final -num_classes: 20 -use_fine_grained_loss: true - -YOLOv4: - backbone: CSPDarkNet - yolo_head: YOLOv4Head - -CSPDarkNet: - norm_type: sync_bn - norm_decay: 0. - depth: 53 - -YOLOv4Head: - anchors: [[12, 16], [19, 36], [40, 28], [36, 75], [76, 55], - [72, 146], [142, 110], [192, 243], [459, 401]] - anchor_masks: [[0, 1, 2], [3, 4, 5], [6, 7, 8]] - nms: - background_label: -1 - keep_top_k: -1 - nms_threshold: 0.45 - nms_top_k: -1 - normalized: true - score_threshold: 0.001 - downsample: [8,16,32] - scale_x_y: [1.2, 1.1, 1.05] - -YOLOv3Loss: - ignore_thresh: 0.7 - label_smooth: true - downsample: [8,16,32] - scale_x_y: [1.2, 1.1, 1.05] - iou_loss: IouLoss - match_score: true - -IouLoss: - loss_weight: 0.07 - max_height: 608 - max_width: 608 - ciou_term: true - loss_square: true - -LearningRate: - base_lr: 0.0001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 110000 - - 130000 - - !LinearWarmup - start_factor: 0. - steps: 1000 - -OptimizerBuilder: - clip_grad_by_norm: 10. - optimizer: - momentum: 0.949 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: '../yolov3_reader.yml' -TrainReader: - inputs_def: - fields: ['image', 'gt_bbox', 'gt_class', 'gt_score', 'im_id'] - num_max_boxes: 50 - dataset: - !VOCDataSet - anno_path: trainval.txt - dataset_dir: dataset/voc - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ColorDistort {} - - !RandomExpand - fill_value: [123.675, 116.28, 103.53] - - !RandomCrop {} - - !RandomFlipImage - is_normalized: false - - !NormalizeBox {} - - !PadBox - num_max_boxes: 50 - - !BboxXYXY2XYWH {} - batch_transforms: - - !RandomShape - sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] - random_inter: True - - !NormalizeImage - mean: [0.,0.,0.] - std: [1.,1.,1.] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True - # Gt2YoloTarget is only used when use_fine_grained_loss set as true, - # this operator will be deleted automatically if use_fine_grained_loss - # is set as false - - !Gt2YoloTarget - anchor_masks: [[0, 1, 2], [3, 4, 5], [6, 7, 8]] - anchors: [[12, 16], [19, 36], [40, 28], - [36, 75], [76, 55], [72, 146], - [142, 110], [192, 243], [459, 401]] - downsample_ratios: [8, 16, 32] - batch_size: 4 - shuffle: true - drop_last: true - worker_num: 8 - bufsize: 16 - use_process: true - drop_empty: false - -EvalReader: - inputs_def: - fields: ['image', 'im_size', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult'] - num_max_boxes: 90 - dataset: - !VOCDataSet - anno_path: test.txt - dataset_dir: dataset/voc - use_default_label: true - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 1 - - !NormalizeImage - mean: [0., 0., 0.] - std: [1., 1., 1.] - is_scale: True - is_channel_first: false - - !PadBox - num_max_boxes: 90 - - !Permute - to_bgr: false - channel_first: True - batch_size: 4 - drop_empty: false - worker_num: 8 - bufsize: 16 - -TestReader: - dataset: - !ImageFolder - use_default_label: true - with_background: false - sample_transforms: - - !DecodeImage - to_rgb: True - - !ResizeImage - target_size: 608 - interp: 1 - - !NormalizeImage - mean: [0., 0., 0.] - std: [1., 1., 1.] - is_scale: True - is_channel_first: false - - !Permute - to_bgr: false - channel_first: True diff --git a/static/contrib/PedestrianDetection/demo/001.png b/static/contrib/PedestrianDetection/demo/001.png deleted file mode 100644 index 63ae9167fd03e8a95756fe5f6195fc8d741b9cfa..0000000000000000000000000000000000000000 Binary files a/static/contrib/PedestrianDetection/demo/001.png and /dev/null differ diff --git a/static/contrib/PedestrianDetection/demo/002.png b/static/contrib/PedestrianDetection/demo/002.png deleted file mode 100644 index 0de905cf55e6b02487ee1b8220810df8eaa24c2c..0000000000000000000000000000000000000000 Binary files a/static/contrib/PedestrianDetection/demo/002.png and /dev/null differ diff --git a/static/contrib/PedestrianDetection/demo/003.png b/static/contrib/PedestrianDetection/demo/003.png deleted file mode 100644 index e9026e099df42d4267be07a71401eb5426b47745..0000000000000000000000000000000000000000 Binary files a/static/contrib/PedestrianDetection/demo/003.png and /dev/null differ diff --git a/static/contrib/PedestrianDetection/demo/004.png b/static/contrib/PedestrianDetection/demo/004.png deleted file mode 100644 index d8118ec3e0ef63bc74e825b5e7638a1886580604..0000000000000000000000000000000000000000 Binary files a/static/contrib/PedestrianDetection/demo/004.png and /dev/null differ diff --git a/static/contrib/PedestrianDetection/pedestrian.json b/static/contrib/PedestrianDetection/pedestrian.json deleted file mode 100644 index f72fe6dc65209ab3506d18556fb8b83b6ec832a9..0000000000000000000000000000000000000000 --- a/static/contrib/PedestrianDetection/pedestrian.json +++ /dev/null @@ -1,11 +0,0 @@ -{ - "images": [], - "annotations": [], - "categories": [ - { - "supercategory": "component", - "id": 1, - "name": "pedestrian" - } - ] -} diff --git a/static/contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml b/static/contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml deleted file mode 100644 index 379ece6820bc3e5152795a36a27401a7baee025c..0000000000000000000000000000000000000000 --- a/static/contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml +++ /dev/null @@ -1,86 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 200000 -log_iter: 20 -save_dir: output -snapshot_iter: 5000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar -weights: https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar -num_classes: 1 -use_fine_grained_loss: false - -YOLOv3: - backbone: DarkNet - yolo_head: YOLOv3Head - -DarkNet: - norm_type: sync_bn - norm_decay: 0. - depth: 53 - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[10, 13], [16, 30], [33, 23], - [30, 61], [62, 45], [59, 119], - [116, 90], [156, 198], [373, 326]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 1000 - normalized: false - score_threshold: 0.01 - -YOLOv3Loss: - batch_size: 8 - ignore_thresh: 0.7 - label_smooth: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 150000 - - 180000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: '../../configs/yolov3_reader.yml' -TrainReader: - batch_size: 8 - dataset: - !COCODataSet - dataset_dir: dataset/pedestrian - anno_path: annotations/instances_train2017.json - image_dir: train2017 - with_background: false - -EvalReader: - batch_size: 8 - dataset: - !COCODataSet - dataset_dir: dataset/pedestrian - anno_path: annotations/instances_val2017.json - image_dir: val2017 - with_background: false - -TestReader: - batch_size: 1 - dataset: - !ImageFolder - anno_path: contrib/PedestrianDetection/pedestrian.json - with_background: false diff --git a/static/contrib/README.md b/static/contrib/README.md deleted file mode 100644 index 51f8b57e396ba892d401f29ec7ded657521b1065..0000000000000000000000000000000000000000 --- a/static/contrib/README.md +++ /dev/null @@ -1,2 +0,0 @@ -**文档教程请参考:** [CONTRIB_cn.md](../docs/featured_model/CONTRIB_cn.md)
    -**English document please refer:** [CONTRIB.md](../docs/featured_model/CONTRIB.md) diff --git a/static/contrib/VehicleDetection/demo/001.jpeg b/static/contrib/VehicleDetection/demo/001.jpeg deleted file mode 100644 index 8786db5eb6773931c363358bb39462b33db55369..0000000000000000000000000000000000000000 Binary files a/static/contrib/VehicleDetection/demo/001.jpeg and /dev/null differ diff --git a/static/contrib/VehicleDetection/demo/003.png b/static/contrib/VehicleDetection/demo/003.png deleted file mode 100644 index c01ab4ce769fb3b1c8863093a35d27da0ab10efd..0000000000000000000000000000000000000000 Binary files a/static/contrib/VehicleDetection/demo/003.png and /dev/null differ diff --git a/static/contrib/VehicleDetection/demo/004.png b/static/contrib/VehicleDetection/demo/004.png deleted file mode 100644 index 8907eb8d4d9b82e08ca214509c9fb41ca889db2a..0000000000000000000000000000000000000000 Binary files a/static/contrib/VehicleDetection/demo/004.png and /dev/null differ diff --git a/static/contrib/VehicleDetection/demo/005.png b/static/contrib/VehicleDetection/demo/005.png deleted file mode 100644 index bf17712809c2fe6fa8e7d4f093ec4ac94523537c..0000000000000000000000000000000000000000 Binary files a/static/contrib/VehicleDetection/demo/005.png and /dev/null differ diff --git a/static/contrib/VehicleDetection/vehicle.json b/static/contrib/VehicleDetection/vehicle.json deleted file mode 100644 index 5863a9a8c9e0d8b4daeff31e7fe7869e084d3fb4..0000000000000000000000000000000000000000 --- a/static/contrib/VehicleDetection/vehicle.json +++ /dev/null @@ -1,36 +0,0 @@ -{ - "images": [], - "annotations": [], - "categories": [ - { - "supercategory": "component", - "id": 1, - "name": "car" - }, - { - "supercategory": "component", - "id": 2, - "name": "truck" - }, - { - "supercategory": "component", - "id": 3, - "name": "bus" - }, - { - "supercategory": "component", - "id": 4, - "name": "motorbike" - }, - { - "supercategory": "component", - "id": 5, - "name": "tricycle" - }, - { - "supercategory": "component", - "id": 6, - "name": "carplate" - } - ] -} diff --git a/static/contrib/VehicleDetection/vehicle_yolov3_darknet.yml b/static/contrib/VehicleDetection/vehicle_yolov3_darknet.yml deleted file mode 100644 index 7c2ddbd1efcff7d5413168e4b9d9b62be1b1aa6f..0000000000000000000000000000000000000000 --- a/static/contrib/VehicleDetection/vehicle_yolov3_darknet.yml +++ /dev/null @@ -1,85 +0,0 @@ -architecture: YOLOv3 -use_gpu: true -max_iters: 120000 -log_iter: 20 -save_dir: output -snapshot_iter: 2000 -metric: COCO -pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar -weights: https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar -num_classes: 6 - -YOLOv3: - backbone: DarkNet - yolo_head: YOLOv3Head - -DarkNet: - norm_type: sync_bn - norm_decay: 0. - depth: 53 - -YOLOv3Head: - anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] - anchors: [[8, 9], [10, 23], [19, 15], - [23, 33], [40, 25], [54, 50], - [101, 80], [139, 145], [253, 224]] - norm_decay: 0. - yolo_loss: YOLOv3Loss - nms: - background_label: -1 - keep_top_k: 100 - nms_threshold: 0.45 - nms_top_k: 400 - normalized: false - score_threshold: 0.005 - -YOLOv3Loss: - batch_size: 8 - ignore_thresh: 0.7 - label_smooth: false - -LearningRate: - base_lr: 0.001 - schedulers: - - !PiecewiseDecay - gamma: 0.1 - milestones: - - 60000 - - 80000 - - !LinearWarmup - start_factor: 0. - steps: 4000 - -OptimizerBuilder: - optimizer: - momentum: 0.9 - type: Momentum - regularizer: - factor: 0.0005 - type: L2 - -_READER_: '../../configs/yolov3_reader.yml' -TrainReader: - batch_size: 8 - dataset: - !COCODataSet - dataset_dir: dataset/vehicle - anno_path: annotations/instances_train2017.json - image_dir: train2017 - with_background: false - -EvalReader: - batch_size: 8 - dataset: - !COCODataSet - dataset_dir: dataset/vehicle - anno_path: annotations/instances_val2017.json - image_dir: val2017 - with_background: false - -TestReader: - batch_size: 1 - dataset: - !ImageFolder - anno_path: contrib/VehicleDetection/vehicle.json - with_background: false diff --git a/static/dataset/coco/download_coco.py b/static/dataset/coco/download_coco.py deleted file mode 100644 index 47659fa76dd2c1183404667efac3a48de9b099c2..0000000000000000000000000000000000000000 --- a/static/dataset/coco/download_coco.py +++ /dev/null @@ -1,28 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import sys -import os.path as osp -import logging -# add python path of PadleDetection to sys.path -parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) -if parent_path not in sys.path: - sys.path.append(parent_path) - -from ppdet.utils.download import download_dataset - -logging.basicConfig(level=logging.INFO) - -download_path = osp.split(osp.realpath(sys.argv[0]))[0] -download_dataset(download_path, 'coco') diff --git a/static/dataset/coco/objects365_label.txt b/static/dataset/coco/objects365_label.txt deleted file mode 100644 index f71ffd8c01dc7b4b148cf7bb28da4ea038fbb64b..0000000000000000000000000000000000000000 --- a/static/dataset/coco/objects365_label.txt +++ /dev/null @@ -1,365 +0,0 @@ -person -sneakers -chair -hat -lamp -bottle -cabinet/shelf -cup -car -glasses -picture/frame -desk -handbag -street lights -book -plate -helmet -leather shoes -pillow -glove -potted plant -bracelet -flower -tv -storage box -vase -bench -wine glass -boots -bowl -dining table -umbrella -boat -flag -speaker -trash bin/can -stool -backpack -couch -belt -carpet -basket -towel/napkin -slippers -barrel/bucket -coffee table -suv -toy -tie -bed -traffic light -pen/pencil -microphone -sandals -canned -necklace -mirror -faucet -bicycle -bread -high heels -ring -van -watch -sink -horse -fish -apple -camera -candle -teddy bear -cake -motorcycle -wild bird -laptop -knife -traffic sign -cell phone -paddle -truck -cow -power outlet -clock -drum -fork -bus -hanger -nightstand -pot/pan -sheep -guitar -traffic cone -tea pot -keyboard -tripod -hockey -fan -dog -spoon -blackboard/whiteboard -balloon -air conditioner -cymbal -mouse -telephone -pickup truck -orange -banana -airplane -luggage -skis -soccer -trolley -oven -remote -baseball glove -paper towel -refrigerator -train -tomato -machinery vehicle -tent -shampoo/shower gel -head phone -lantern -donut -cleaning products -sailboat -tangerine -pizza -kite -computer box -elephant -toiletries -gas stove -broccoli -toilet -stroller -shovel -baseball bat -microwave -skateboard -surfboard -surveillance camera -gun -life saver -cat -lemon -liquid soap -zebra -duck -sports car -giraffe -pumpkin -piano -stop sign -radiator -converter -tissue -carrot -washing machine -vent -cookies -cutting/chopping board -tennis racket -candy -skating and skiing shoes -scissors -folder -baseball -strawberry -bow tie -pigeon -pepper -coffee machine -bathtub -snowboard -suitcase -grapes -ladder -pear -american football -basketball -potato -paint brush -printer -billiards -fire hydrant -goose -projector -sausage -fire extinguisher -extension cord -facial mask -tennis ball -chopsticks -electronic stove and gas stove -pie -frisbee -kettle -hamburger -golf club -cucumber -clutch -blender -tong -slide -hot dog -toothbrush -facial cleanser -mango -deer -egg -violin -marker -ship -chicken -onion -ice cream -tape -wheelchair -plum -bar soap -scale -watermelon -cabbage -router/modem -golf ball -pine apple -crane -fire truck -peach -cello -notepaper -tricycle -toaster -helicopter -green beans -brush -carriage -cigar -earphone -penguin -hurdle -swing -radio -CD -parking meter -swan -garlic -french fries -horn -avocado -saxophone -trumpet -sandwich -cue -kiwi fruit -bear -fishing rod -cherry -tablet -green vegetables -nuts -corn -key -screwdriver -globe -broom -pliers -volleyball -hammer -eggplant -trophy -dates -board eraser -rice -tape measure/ruler -dumbbell -hamimelon -stapler -camel -lettuce -goldfish -meat balls -medal -toothpaste -antelope -shrimp -rickshaw -trombone -pomegranate -coconut -jellyfish -mushroom -calculator -treadmill -butterfly -egg tart -cheese -pig -pomelo -race car -rice cooker -tuba -crosswalk sign -papaya -hair drier -green onion -chips -dolphin -sushi -urinal -donkey -electric drill -spring rolls -tortoise/turtle -parrot -flute -measuring cup -shark -steak -poker card -binoculars -llama -radish -noodles -yak -mop -crab -microscope -barbell -bread/bun -baozi -lion -red cabbage -polar bear -lighter -seal -mangosteen -comb -eraser -pitaya -scallop -pencil case -saw -table tennis paddle -okra -starfish -eagle -monkey -durian -game board -rabbit -french horn -ambulance -asparagus -hoverboard -pasta -target -hotair balloon -chainsaw -lobster -iron -flashlight \ No newline at end of file diff --git a/static/dataset/fddb/download.sh b/static/dataset/fddb/download.sh deleted file mode 100755 index 7a40c8b0511f9f7bf45ef6b10bc2a8725145f381..0000000000000000000000000000000000000000 --- a/static/dataset/fddb/download.sh +++ /dev/null @@ -1,31 +0,0 @@ -# All rights `PaddleDetection` reserved -# References: -# @TechReport{fddbTech, -# author = {Vidit Jain and Erik Learned-Miller}, -# title = {FDDB: A Benchmark for Face Detection in Unconstrained Settings}, -# institution = {University of Massachusetts, Amherst}, -# year = {2010}, -# number = {UM-CS-2010-009} -# } - -DIR="$( cd "$(dirname "$0")" ; pwd -P )" -cd "$DIR" - -# Download the data. -echo "Downloading..." -# external link to the Faces in the Wild dataset and annotations file -wget http://tamaraberg.com/faceDataset/originalPics.tar.gz -wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz -wget http://vis-www.cs.umass.edu/fddb/evaluation.tgz - -# Extract the data. -echo "Extracting..." -tar -zxf originalPics.tar.gz -tar -zxf FDDB-folds.tgz -tar -zxf evaluation.tgz - -# Generate full image path list and groundtruth in FDDB-folds: -cd FDDB-folds -cat `ls|grep -v"ellipse"` > filePath.txt && cat *ellipse* > fddb_annotFile.txt -cd .. -echo "------------- All done! --------------" diff --git a/static/dataset/fruit/download_fruit.py b/static/dataset/fruit/download_fruit.py deleted file mode 100644 index 2db2e207210c4bab39e8dfdb3abe91a51c49af1f..0000000000000000000000000000000000000000 --- a/static/dataset/fruit/download_fruit.py +++ /dev/null @@ -1,28 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import sys -import os.path as osp -import logging -# add python path of PadleDetection to sys.path -parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) -if parent_path not in sys.path: - sys.path.append(parent_path) - -from ppdet.utils.download import download_dataset - -logging.basicConfig(level=logging.INFO) - -download_path = osp.split(osp.realpath(sys.argv[0]))[0] -download_dataset(download_path, 'fruit') diff --git a/static/dataset/fruit/label_list.txt b/static/dataset/fruit/label_list.txt deleted file mode 100644 index 1f60d62c399939cd92e667c1fb938764b3ec2901..0000000000000000000000000000000000000000 --- a/static/dataset/fruit/label_list.txt +++ /dev/null @@ -1,3 +0,0 @@ -apple -banana -orange diff --git a/static/dataset/roadsign_voc/download_roadsign_voc.py b/static/dataset/roadsign_voc/download_roadsign_voc.py deleted file mode 100644 index 3cb517d3cf362e3ad2ec7b4ebf3bff54acb244d4..0000000000000000000000000000000000000000 --- a/static/dataset/roadsign_voc/download_roadsign_voc.py +++ /dev/null @@ -1,28 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import sys -import os.path as osp -import logging -# add python path of PadleDetection to sys.path -parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) -if parent_path not in sys.path: - sys.path.append(parent_path) - -from ppdet.utils.download import download_dataset - -logging.basicConfig(level=logging.INFO) - -download_path = osp.split(osp.realpath(sys.argv[0]))[0] -download_dataset(download_path, 'roadsign_voc') diff --git a/static/dataset/roadsign_voc/label_list.txt b/static/dataset/roadsign_voc/label_list.txt deleted file mode 100644 index 1be460f457a2fdbec91d3a69377c232ae4a6beb0..0000000000000000000000000000000000000000 --- a/static/dataset/roadsign_voc/label_list.txt +++ /dev/null @@ -1,4 +0,0 @@ -speedlimit -crosswalk -trafficlight -stop \ No newline at end of file diff --git a/static/dataset/voc/create_list.py b/static/dataset/voc/create_list.py deleted file mode 100644 index a137bd38caf713f5930768f0018f16dfaaf6feea..0000000000000000000000000000000000000000 --- a/static/dataset/voc/create_list.py +++ /dev/null @@ -1,45 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import sys -import os.path as osp -import logging -import argparse - -# add python path of PadleDetection to sys.path -parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) -if parent_path not in sys.path: - sys.path.append(parent_path) - -from ppdet.utils.download import create_voc_list -logging.basicConfig(level=logging.INFO) - - -def main(config): - voc_path = config.dataset_dir - create_voc_list(voc_path) - - -if __name__ == '__main__': - parser = argparse.ArgumentParser() - default_voc_path = osp.split(osp.realpath(sys.argv[0]))[0] - parser.add_argument( - "-d", - "--dataset_dir", - default=default_voc_path, - type=str, - help="VOC dataset directory, default is current directory.") - config = parser.parse_args() - - main(config) diff --git a/static/dataset/voc/download_voc.py b/static/dataset/voc/download_voc.py deleted file mode 100644 index 080226ee94ffbcae59c4caf509fcb2d6c67f7161..0000000000000000000000000000000000000000 --- a/static/dataset/voc/download_voc.py +++ /dev/null @@ -1,29 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import sys -import os.path as osp -import logging -# add python path of PadleDetection to sys.path -parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) -if parent_path not in sys.path: - sys.path.append(parent_path) - -from ppdet.utils.download import download_dataset, create_voc_list - -logging.basicConfig(level=logging.INFO) - -download_path = osp.split(osp.realpath(sys.argv[0]))[0] -download_dataset(download_path, 'voc') -create_voc_list(download_path) diff --git a/static/dataset/voc/generic_det_label_list.txt b/static/dataset/voc/generic_det_label_list.txt deleted file mode 100644 index 410f9ae593ba501be091bc267491f6158c339a44..0000000000000000000000000000000000000000 --- a/static/dataset/voc/generic_det_label_list.txt +++ /dev/null @@ -1,676 +0,0 @@ -Infant bed -Rose -Flag -Flashlight -Sea turtle -Camera -Animal -Glove -Crocodile -Cattle -House -Guacamole -Penguin -Vehicle registration plate -Bench -Ladybug -Human nose -Watermelon -Flute -Butterfly -Washing machine -Raccoon -Segway -Taco -Jellyfish -Cake -Pen -Cannon -Bread -Tree -Shellfish -Bed -Hamster -Hat -Toaster -Sombrero -Tiara -Bowl -Dragonfly -Moths and butterflies -Antelope -Vegetable -Torch -Building -Power plugs and sockets -Blender -Billiard table -Cutting board -Bronze sculpture -Turtle -Broccoli -Tiger -Mirror -Bear -Zucchini -Dress -Volleyball -Guitar -Reptile -Golf cart -Tart -Fedora -Carnivore -Car -Lighthouse -Coffeemaker -Food processor -Truck -Bookcase -Surfboard -Footwear -Bench -Necklace -Flower -Radish -Marine mammal -Frying pan -Tap -Peach -Knife -Handbag -Laptop -Tent -Ambulance -Christmas tree -Eagle -Limousine -Kitchen & dining room table -Polar bear -Tower -Football -Willow -Human head -Stop sign -Banana -Mixer -Binoculars -Dessert -Bee -Chair -Wood-burning stove -Flowerpot -Beaker -Oyster -Woodpecker -Harp -Bathtub -Wall clock -Sports uniform -Rhinoceros -Beehive -Cupboard -Chicken -Man -Blue jay -Cucumber -Balloon -Kite -Fireplace -Lantern -Missile -Book -Spoon -Grapefruit -Squirrel -Orange -Coat -Punching bag -Zebra -Billboard -Bicycle -Door handle -Mechanical fan -Ring binder -Table -Parrot -Sock -Vase -Weapon -Shotgun -Glasses -Seahorse -Belt -Watercraft -Window -Giraffe -Lion -Tire -Vehicle -Canoe -Tie -Shelf -Picture frame -Printer -Human leg -Boat -Slow cooker -Croissant -Candle -Pancake -Pillow -Coin -Stretcher -Sandal -Woman -Stairs -Harpsichord -Stool -Bus -Suitcase -Human mouth -Juice -Skull -Door -Violin -Chopsticks -Digital clock -Sunflower -Leopard -Bell pepper -Harbor seal -Snake -Sewing machine -Goose -Helicopter -Seat belt -Coffee cup -Microwave oven -Hot dog -Countertop -Serving tray -Dog bed -Beer -Sunglasses -Golf ball -Waffle -Palm tree -Trumpet -Ruler -Helmet -Ladder -Office building -Tablet computer -Toilet paper -Pomegranate -Skirt -Gas stove -Cookie -Cart -Raven -Egg -Burrito -Goat -Kitchen knife -Skateboard -Salt and pepper shakers -Lynx -Boot -Platter -Ski -Swimwear -Swimming pool -Drinking straw -Wrench -Drum -Ant -Human ear -Headphones -Fountain -Bird -Jeans -Television -Crab -Microphone -Home appliance -Snowplow -Beetle -Artichoke -Jet ski -Stationary bicycle -Human hair -Brown bear -Starfish -Fork -Lobster -Corded phone -Drink -Saucer -Carrot -Insect -Clock -Castle -Tennis racket -Ceiling fan -Asparagus -Jaguar -Musical instrument -Train -Cat -Rifle -Dumbbell -Mobile phone -Taxi -Shower -Pitcher -Lemon -Invertebrate -Turkey -High heels -Bust -Elephant -Scarf -Barrel -Trombone -Pumpkin -Box -Tomato -Frog -Bidet -Human face -Houseplant -Van -Shark -Ice cream -Swim cap -Falcon -Ostrich -Handgun -Whiteboard -Lizard -Pasta -Snowmobile -Light bulb -Window blind -Muffin -Pretzel -Computer monitor -Horn -Furniture -Sandwich -Fox -Convenience store -Fish -Fruit -Earrings -Curtain -Grape -Sofa bed -Horse -Luggage and bags -Desk -Crutch -Bicycle helmet -Tick -Airplane -Canary -Spatula -Watch -Lily -Kitchen appliance -Filing cabinet -Aircraft -Cake stand -Candy -Sink -Mouse -Wine -Wheelchair -Goldfish -Refrigerator -French fries -Drawer -Treadmill -Picnic basket -Dice -Cabbage -Football helmet -Pig -Person -Shorts -Gondola -Honeycomb -Doughnut -Chest of drawers -Land vehicle -Bat -Monkey -Dagger -Tableware -Human foot -Mug -Alarm clock -Pressure cooker -Human hand -Tortoise -Baseball glove -Sword -Pear -Miniskirt -Traffic sign -Girl -Roller skates -Dinosaur -Porch -Human beard -Submarine sandwich -Screwdriver -Strawberry -Wine glass -Seafood -Racket -Wheel -Sea lion -Toy -Tea -Tennis ball -Waste container -Mule -Cricket ball -Pineapple -Coconut -Doll -Coffee table -Snowman -Lavender -Shrimp -Maple -Cowboy hat -Goggles -Rugby ball -Caterpillar -Poster -Rocket -Organ -Saxophone -Traffic light -Cocktail -Plastic bag -Squash -Mushroom -Hamburger -Light switch -Parachute -Teddy bear -Winter melon -Deer -Musical keyboard -Plumbing fixture -Scoreboard -Baseball bat -Envelope -Adhesive tape -Briefcase -Paddle -Bow and arrow -Telephone -Sheep -Jacket -Boy -Pizza -Otter -Office supplies -Couch -Cello -Bull -Camel -Ball -Duck -Whale -Shirt -Tank -Motorcycle -Accordion -Owl -Porcupine -Sun hat -Nail -Scissors -Swan -Lamp -Crown -Piano -Sculpture -Cheetah -Oboe -Tin can -Mango -Tripod -Oven -Mouse -Barge -Coffee -Snowboard -Common fig -Salad -Marine invertebrates -Umbrella -Kangaroo -Human arm -Measuring cup -Snail -Loveseat -Suit -Teapot -Bottle -Alpaca -Kettle -Trousers -Popcorn -Centipede -Spider -Sparrow -Plate -Bagel -Personal care -Apple -Brassiere -Bathroom cabinet -studio couch -Computer keyboard -Table tennis racket -Sushi -Cabinetry -Street light -Towel -Nightstand -Rabbit -Dolphin -Dog -Jug -Wok -Fire hydrant -Human eye -Skyscraper -Backpack -Potato -Paper towel -Lifejacket -Bicycle wheel -Toilet -tuba -carpet -trolley -tv -fan -llama -stapler -tricycle -head_phone -air_conditioner -cookies -towel/napkin -boots -sausage -suv -bar_soap -baseball -luggage -poker_card -shovel -marker -earphone -projector -pencil_case -french_horn -tangerine -router/modem -folder -donut -durian -sailboat -nuts -coffee_machine -meat_balls -basket -extension_cord -green_beans -avocado -soccer -egg_tart -clutch -slide -fishing_rod -hanger -bread/bun -surveillance_camera -globe -blackboard/whiteboard -life_saver -pigeon -red_cabbage -cymbal -faucet -steak -swing -mangosteen -cheese -urinal -lettuce -hurdle -ring -basketball -potted_plant -rickshaw -target -race_car -bow_tie -iron -toiletries -donkey -saw -hammer -billiards -cutting/chopping_board -power_outlet -hair_drier -baozi -medal -liquid_soap -wild_bird -leather_shoes -dining_table -game_board -barbell -radio -street_lights -tape -hockey -spring_rolls -rice -golf_club -lighter -chips -microscope -cell_phone -fire_truck -noodles -cabinet/shelf -electronic_stove_and_gas_stove -key -comb -trash_bin/can -toothbrush -dates -electric_drill -cow -eggplant -broom -vent -tong -green_onion -scallop -facial_cleanser -toothpaste -hamimelon -eraser -shampoo/shower_gel -CD -skating_and_skiing_shoes -american_football -slippers -pitaya -pot/pan -calculator -tissue -table_tennis_paddle -board_eraser -speaker -papaya -cigar -notepaper -garlic -rice_cooker -canned -parking_meter -flashlight -paint_brush -cup -cue -crosswalk_sign -kiwi_fruit -radiator -mop -chainsaw -sandals -storage_box -onion -bracelet -fire_extinguisher -scale -okra -microwave -sneakers -pepper -corn -pomelo -computer_box -pliers -trophy -plum -brush -machinery_vehicle -yak -crane -converter -facial_mask -carriage -pickup_truck -traffic_cone -pie -pen/pencil -sports_car -frisbee -cleaning_products -remote -stroller diff --git a/static/dataset/voc/generic_det_label_list_zh.txt b/static/dataset/voc/generic_det_label_list_zh.txt deleted file mode 100644 index 0012d759df820f99d6fd814215a78453274b26fa..0000000000000000000000000000000000000000 --- a/static/dataset/voc/generic_det_label_list_zh.txt +++ /dev/null @@ -1,676 +0,0 @@ -婴儿床 -玫瑰 -旗 -手电筒 -海龟 -照相机 -动物 -手套 -鳄鱼 -牛 -房子 -鳄梨酱 -企鹅 -车辆牌照 -凳子 -瓢虫 -人鼻 -西瓜 -长笛 -蝴蝶 -洗衣机 -浣熊 -赛格威 -墨西哥玉米薄饼卷 -海蜇 -蛋糕 -笔 -加农炮 -面包 -树 -贝类 -床 -仓鼠 -帽子 -烤面包机 -帽帽 -冠状头饰 -碗 -蜻蜓 -飞蛾和蝴蝶 -羚羊 -蔬菜 -火炬 -建筑物 -电源插头和插座 -搅拌机 -台球桌 -切割板 -青铜雕塑 -乌龟 -西兰花 -老虎 -镜子 -熊 -西葫芦 -礼服 -排球 -吉他 -爬行动物 -高尔夫球车 -蛋挞 -费多拉 -食肉动物 -小型车 -灯塔 -咖啡壶 -食品加工厂 -卡车 -书柜 -冲浪板 -鞋类 -凳子 -项链 -花 -萝卜 -海洋哺乳动物 -煎锅 -水龙头 -桃 -刀 -手提包 -笔记本电脑 -帐篷 -救护车 -圣诞树 -鹰 -豪华轿车 -厨房和餐桌 -北极熊 -塔楼 -足球 -柳树 -人头 -停车标志 -香蕉 -搅拌机 -双筒望远镜 -甜点 -蜜蜂 -椅子 -烧柴炉 -花盆 -烧杯 -牡蛎 -啄木鸟 -竖琴 -浴缸 -挂钟 -运动服 -犀牛 -蜂箱 -橱柜 -鸡 -人 -冠蓝鸦 -黄瓜 -气球 -风筝 -壁炉 -灯笼 -导弹 -书 -勺子 -葡萄柚 -松鼠 -橙色 -外套 -打孔袋 -斑马 -广告牌 -自行车 -门把手 -机械风扇 -环形粘结剂 -桌子 -鹦鹉 -袜子 -花瓶 -武器 -猎枪 -玻璃杯 -海马 -腰带 -船舶 -窗口 -长颈鹿 -狮子 -轮胎 -车辆 -独木舟 -领带 -架子 -相框 -打印机 -人腿 -小船 -慢炖锅 -牛角包 -蜡烛 -煎饼 -枕头 -硬币 -担架 -凉鞋 -女人 -楼梯 -拨弦键琴 -凳子 -公共汽车 -手提箱 -人口学 -果汁 -颅骨 -门 -小提琴 -筷子 -数字时钟 -向日葵 -豹 -甜椒 -海港海豹 -蛇 -缝纫机 -鹅 -直升机 -座椅安全带 -咖啡杯 -微波炉 -热狗 -台面 -服务托盘 -狗床 -啤酒 -太阳镜 -高尔夫球 -华夫饼干 -棕榈树 -小号 -尺子 -头盔 -梯子 -办公楼 -平板电脑 -厕纸 -石榴 -裙子 -煤气炉 -曲奇饼干 -大车 -掠夺 -鸡蛋 -墨西哥煎饼 -山羊 -菜刀 -滑板 -盐和胡椒瓶 -猞猁 -靴子 -大浅盘 -滑雪板 -泳装 -游泳池 -吸管 -扳手 -鼓 -蚂蚁 -人耳 -耳机 -喷泉 -鸟 -牛仔裤 -电视机 -蟹 -话筒 -家用电器 -除雪机 -甲虫 -朝鲜蓟 -喷气式滑雪板 -固定自行车 -人发 -棕熊 -海星 -叉子 -龙虾 -有线电话 -饮料 -碟 -胡萝卜 -昆虫 -时钟 -城堡 -网球拍 -吊扇 -芦笋 -美洲虎 -乐器 -火车 -猫 -来复枪 -哑铃 -手机 -出租车 -淋浴 -投掷者 -柠檬 -无脊椎动物 -火鸡 -高跟鞋 -打破 -大象 -围巾 -枪管 -长号 -南瓜 -盒子 -番茄 -蛙 -坐浴盆 -人脸 -室内植物 -厢式货车 -鲨鱼 -冰淇淋 -游泳帽 -隼 -鸵鸟 -手枪 -白板 -蜥蜴 -面食 -雪车 -灯泡 -窗盲 -松饼 -椒盐脆饼 -计算机显示器 -喇叭 -家具 -三明治 -福克斯 -便利店 -鱼 -水果 -耳环 -帷幕 -葡萄 -沙发床 -马 -行李和行李 -书桌 -拐杖 -自行车头盔 -滴答声 -飞机 -金丝雀 -铲 -手表 -莉莉 -厨房用具 -文件柜 -飞机 -蛋糕架 -糖果 -水槽 -鼠标 -葡萄酒 -轮椅 -金鱼 -冰箱 -炸薯条 -抽屉 -单调的工作 -野餐篮子 -骰子 -甘蓝 -足球头盔 -猪 -人 -短裤 -贡多拉 -蜂巢 -炸圈饼 -抽屉柜 -陆地车辆 -蝙蝠 -猴子 -匕首 -餐具 -人足 -马克杯 -闹钟 -高压锅 -人手 -乌龟 -棒球手套 -剑 -梨 -迷你裙 -交通标志 -女孩 -旱冰鞋 -恐龙 -门廊 -胡须 -潜艇三明治 -螺丝起子 -草莓 -酒杯 -海鲜 -球拍 -车轮 -海狮 -玩具 -茶叶 -网球 -废物容器 -骡子 -板球 -菠萝 -椰子 -娃娃 -咖啡桌 -雪人 -薰衣草 -小虾 -枫树 -牛仔帽 -护目镜 -橄榄球 -毛虫 -海报 -火箭 -器官 -萨克斯 -交通灯 -鸡尾酒 -塑料袋 -壁球 -蘑菇 -汉堡包 -电灯开关 -降落伞 -泰迪熊 -冬瓜 -鹿 -音乐键盘 -卫生器具 -记分牌 -棒球棒 -包络线 -胶带 -公文包 -桨 -弓箭 -电话 -羊 -夹克 -男孩 -披萨 -水獭 -办公用品 -沙发 -大提琴 -公牛 -骆驼 -球 -鸭子 -鲸鱼 -衬衫 -坦克 -摩托车 -手风琴 -猫头鹰 -豪猪 -太阳帽 -钉子 -剪刀 -天鹅 -灯 -皇冠 -钢琴 -雕塑 -猎豹 -双簧管 -罐头罐 -芒果 -三脚架 -烤箱 -鼠标 -驳船 -咖啡 -滑雪板 -普通无花果 -沙拉 -无脊椎动物 -雨伞 -袋鼠 -人手臂 -量杯 -蜗牛 -相思 -西服 -茶壶 -瓶 -羊驼 -水壶 -裤子 -爆米花 -蜈蚣 -蜘蛛 -麻雀 -盘子 -百吉饼 -个人护理 -苹果 -胸罩 -浴室柜 -演播室沙发 -电脑键盘 -乒乓球拍 -寿司 -橱柜 -路灯 -毛巾 -床头柜 -兔 -海豚 -狗 -大罐 -炒锅 -消火栓 -人眼 -摩天大楼 -背包 -马铃薯 -纸巾 -小精灵 -自行车车轮 -卫生间 -大号 -地毯 -手推车 -电视 -风扇 -美洲驼 -订书机 -三轮车 -耳机 -空调器 -饼干 -毛巾/餐巾 -靴子 -香肠 -运动型多用途汽车 -肥皂 -棒球 -行李 -扑克牌 -铲子 -标记笔 -耳机 -投影机 -铅笔盒 -法国圆号 -橘子 -路由器 -文件夹 -甜甜圈 -榴莲 -帆船 -坚果 -咖啡机 -肉丸 -篮子 -插线板 -青豆 -鳄梨 -英式足球 -蛋挞 -离合器 -滑梯 -鱼竿 -衣架 -面包 -监控摄像头 -地球仪 -黑板/白板 -救生员 -鸽子 -红卷心菜 -铜钹 -水龙头 -牛排 -秋千 -山竹 -奶酪 -小便池 -生菜 -跨栏 -戒指 -篮球 -盆栽植物 -人力车 -目标 -赛车 -蝴蝶结 -熨斗 -化妆品 -驴 -锯 -铁锤 -台球 -切割/砧板 -电源插座 -吹风机 -包子 -奖章/奖牌 -液体肥皂 -野鸟 -皮鞋 -餐桌 -游戏板 -杠铃 -收音机 -路灯 -磁带 -曲棍球 -春卷 -大米 -高尔夫俱乐部 -打火机 -炸薯条 -显微镜 -手机 -消防车 -面条 -橱柜/架子 -电磁炉和煤气炉 -钥匙 -梳子 -垃圾箱/罐 -牙刷 -枣子 -电钻 -奶牛 -茄子 -扫帚 -抽油烟机 -钳子 -大葱 -扇贝 -洁面乳 -牙膏 -哈密瓜 -橡皮擦 -洗发水/沐浴露 -光盘 -溜冰鞋和滑雪鞋 -美式足球 -拖鞋 -火龙果 -锅/平底锅 -计算器 -纸巾 -乒乓球拍 -板擦 -扬声器 -木瓜 -雪茄 -信纸 -大蒜 -电饭锅 -罐装的 -停车计时器 -手电筒 -画笔 -杯子 -球杆 -人行横道标志 -奇异果/猕猴桃 -散热器 -拖把 -电锯 -凉鞋拖鞋 -储物箱 -洋葱 -手镯 -灭火器 -秤 -秋葵 -微波炉 -运动鞋 -胡椒 -玉米 -柚子 -主机 -钳子 -奖杯 -李子/梅子 -刷子/画笔 -机械车辆 -牦牛 -起重机 -转换器 -面膜 -马车 -皮卡车 -交通锥 -馅饼 -钢笔/铅笔 -跑车 -飞盘 -清洁用品/洗涤剂/洗衣液 -遥控器 -婴儿车/手推车 diff --git a/static/dataset/voc/label_list.txt b/static/dataset/voc/label_list.txt deleted file mode 100644 index 8420ab35ede7400974f25836a6bb543024686a0e..0000000000000000000000000000000000000000 --- a/static/dataset/voc/label_list.txt +++ /dev/null @@ -1,20 +0,0 @@ -aeroplane -bicycle -bird -boat -bottle -bus -car -cat -chair -cow -diningtable -dog -horse -motorbike -person -pottedplant -sheep -sofa -train -tvmonitor diff --git a/static/dataset/wider_face/download.sh b/static/dataset/wider_face/download.sh deleted file mode 100755 index 59a2054def3dfa7e27a2ac7ba84b779800a32933..0000000000000000000000000000000000000000 --- a/static/dataset/wider_face/download.sh +++ /dev/null @@ -1,21 +0,0 @@ -# All rights `PaddleDetection` reserved -# References: -# @inproceedings{yang2016wider, -# Author = {Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou}, -# Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, -# Title = {WIDER FACE: A Face Detection Benchmark}, -# Year = {2016}} - -DIR="$( cd "$(dirname "$0")" ; pwd -P )" -cd "$DIR" - -# Download the data. -echo "Downloading..." -wget https://dataset.bj.bcebos.com/wider_face/WIDER_train.zip -wget https://dataset.bj.bcebos.com/wider_face/WIDER_val.zip -wget https://dataset.bj.bcebos.com/wider_face/wider_face_split.zip -# Extract the data. -echo "Extracting..." -unzip -q WIDER_train.zip -unzip -q WIDER_val.zip -unzip -q wider_face_split.zip diff --git a/static/demo/000000014439.jpg b/static/demo/000000014439.jpg deleted file mode 100644 index 0abbdab06eb5950b93908cc91adfa640e8a3ac78..0000000000000000000000000000000000000000 Binary files a/static/demo/000000014439.jpg and /dev/null differ diff --git a/static/demo/000000014439_640x640.jpg b/static/demo/000000014439_640x640.jpg deleted file mode 100644 index 58e9d3e228af43c9b55d8d0cb385ce82ebb8b996..0000000000000000000000000000000000000000 Binary files a/static/demo/000000014439_640x640.jpg and /dev/null differ diff --git a/static/demo/000000087038.jpg b/static/demo/000000087038.jpg deleted file mode 100644 index 9f77f5d5f057b6f92dc096da704ecb8dee99bdf5..0000000000000000000000000000000000000000 Binary files a/static/demo/000000087038.jpg and /dev/null differ diff --git a/static/demo/000000570688.jpg b/static/demo/000000570688.jpg deleted file mode 100644 index cb304bd56c4010c08611a30dcca58ea9140cea54..0000000000000000000000000000000000000000 Binary files a/static/demo/000000570688.jpg and /dev/null differ diff --git a/static/demo/infer_cfg.yml b/static/demo/infer_cfg.yml deleted file mode 100644 index 99f1d63fa6b3159924ac781f488e9d12ee1b2192..0000000000000000000000000000000000000000 --- a/static/demo/infer_cfg.yml +++ /dev/null @@ -1,48 +0,0 @@ -draw_threshold: 0.5 -use_python_inference: false -mode: fluid -metric: VOC -arch: YOLO -min_subgraph_size: 3 -with_background: false -Preprocess: -- interp: 2 - max_size: 0 - target_size: 608 - type: Resize - use_cv2: true -- is_channel_first: false - is_scale: true - mean: - - 0.485 - - 0.456 - - 0.406 - std: - - 0.229 - - 0.224 - - 0.225 - type: Normalize -- channel_first: true - to_bgr: false - type: Permute -label_list: -- aeroplane -- bicycle -- bird -- boat -- bottle -- bus -- car -- cat -- chair -- cow -- diningtable -- dog -- horse -- motorbike -- person -- pottedplant -- sheep -- sofa -- train -- tvmonitor diff --git a/static/demo/mask_rcnn_demo.ipynb b/static/demo/mask_rcnn_demo.ipynb deleted file mode 100644 index 5027ce1e4518254d1ef30ea512df2a8319a6e9ff..0000000000000000000000000000000000000000 --- a/static/demo/mask_rcnn_demo.ipynb +++ /dev/null @@ -1,413 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "source": [ - "Change working directory to the project root" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "autoscroll": false, - "ein.hycell": false, - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/home/yang/PaddleDetection\n" - ] - } - ], - "source": [ - "%cd .." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "source": [ - "Now let's take a look at the input image." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "autoscroll": false, - "ein.hycell": false, - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from PIL import Image\n", - "\n", - "image_path = 'demo/000000570688.jpg'\n", - "img = Image.open(image_path)\n", - "img" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "source": [ - "For inference, preprocess only involves decoding, normalization and transposing to CHW.\n", - "\n", - "**NOTE:** in most cases, one should use the configuration based [data feed](../docs/DATA.md) API which greatly simplifies the data pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "autoscroll": false, - "ein.hycell": false, - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [], - "source": [ - "from ppdet.data.transform.operators import DecodeImage, NormalizeImage, Permute\n", - "\n", - "sample = {'im_file': image_path}\n", - "decode = DecodeImage(to_rgb=True)\n", - "normalize = NormalizeImage(\n", - " mean=[0.485, 0.456, 0.406],\n", - " std=[0.229, 0.224, 0.225],\n", - " is_scale=True,\n", - " is_channel_first=False)\n", - "permute = Permute(to_bgr=False, channel_first=True)\n", - "\n", - "sample = permute(normalize(decode(sample)))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "source": [ - "Some extra effort is needed to massage data into the desired format. \n", - "\n", - "**NOTE:** Again, if the data feed API is used, these are handled automatically." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": { - "autoscroll": false, - "ein.hycell": false, - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [], - "source": [ - "import numpy as np\n", - "\n", - "h = sample['h']\n", - "w = sample['w']\n", - "im_info = np.array((h, w, 1), dtype=np.float32)\n", - "\n", - "sample['im_info'] = im_info\n", - "sample['im_shape'] = im_info\n", - "\n", - "# we don't need these\n", - "for key in ['im_file', 'h', 'w']:\n", - " del sample[key]\n", - "\n", - "# batch of a single sample\n", - "sample = {k: v[np.newaxis, ...] for k, v in sample.items()}\n", - "\n", - "feed_var_def = [\n", - " {'name': 'image', 'shape': (h, w, 3)},\n", - " {'name': 'im_info', 'shape': [3]},\n", - " {'name': 'im_shape', 'shape': [3]},\n", - "]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "source": [ - "Next, build the [Mask R-CNN](https://arxiv.org/abs/1703.06870) model and associated fluid programs" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": { - "autoscroll": false, - "ein.hycell": false, - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [], - "source": [ - "from paddle import fluid\n", - "from ppdet.modeling import (MaskRCNN, ResNet, ResNetC5, RPNHead, RoIAlign,\n", - " BBoxHead, MaskHead, BBoxAssigner, MaskAssigner)\n", - "\n", - "roi_size = 14\n", - "\n", - "model = MaskRCNN(\n", - " ResNet(feature_maps=4),\n", - " RPNHead(),\n", - " BBoxHead(ResNetC5()),\n", - " BBoxAssigner(),\n", - " RoIAlign(resolution=roi_size),\n", - " MaskAssigner(),\n", - " MaskHead())\n", - "\n", - "startup_prog = fluid.Program()\n", - "infer_prog = fluid.Program()\n", - "with fluid.program_guard(infer_prog, startup_prog):\n", - " with fluid.unique_name.guard():\n", - " feed_vars = {\n", - " var['name']: fluid.data(\n", - " name=var['name'],\n", - " shape=var['shape'],\n", - " dtype='float32',\n", - " lod_level=0) for var in feed_var_def\n", - " }\n", - " test_fetches = model.test(feed_vars)\n", - "infer_prog = infer_prog.clone(for_test=True)\n", - "\n", - "# use GPU if available\n", - "if fluid.core.get_cuda_device_count() > 0:\n", - " place = fluid.CUDAPlace(0)\n", - "else:\n", - " place = fluid.CPUPlace()\n", - "\n", - "exe = fluid.Executor(place)\n", - "_ = exe.run(startup_prog)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "source": [ - "Load the checkpoint weights, just wait a couple of minutes for it to be downloaded." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": { - "autoscroll": false, - "ein.hycell": false, - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 140690/140690 [00:12<00:00, 10843.70KB/s]\n" - ] - } - ], - "source": [ - "from ppdet.utils import checkpoint\n", - "\n", - "ckpt_url = 'https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_1x.tar'\n", - "checkpoint.load_checkpoint(exe, infer_prog, ckpt_url)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "source": [ - "Run the program and fetch the result. " - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": { - "autoscroll": false, - "ein.hycell": false, - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [], - "source": [ - "output = exe.run(infer_prog, feed=sample,\n", - " fetch_list=[t.name for t in test_fetches.values()],\n", - " return_numpy=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "source": [ - "Again, we need to massage the result a bit for visualization." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": { - "autoscroll": false, - "ein.hycell": false, - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [], - "source": [ - "res = {\n", - " k: (np.array(v), v.recursive_sequence_lengths())\n", - " for k, v in zip(test_fetches.keys(), output)\n", - "}\n", - "# fake image id\n", - "res['im_id'] = [[[0] for _ in range(res['bbox'][1][0][0])]]\n", - "res['im_shape'] = [[im_info]]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "source": [ - "Now overlay the bboxes and masks onto the image...\n", - "\n", - "And voila, we've successully built and run the Mask R-CNN inference pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": { - "autoscroll": false, - "ein.hycell": false, - "ein.tags": "worksheet-0", - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from ppdet.utils.coco_eval import bbox2out, mask2out, coco17_category_info\n", - "from ppdet.utils.visualizer import visualize_results\n", - "\n", - "cls2cat, cat2name = coco17_category_info()\n", - "bboxes = bbox2out([res], cls2cat)\n", - "masks = mask2out([res], cls2cat, roi_size)\n", - "\n", - "visualize_results(img, 0, cat2name, 0.5, bboxes, masks)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 2", - "language": "python", - "name": "python2" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 2 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.16" - }, - "name": "mask_rcnn_demo.ipynb" - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/static/demo/orange_71.jpg b/static/demo/orange_71.jpg deleted file mode 100644 index da7974a1a1371298f1ca5f4ef9c82bd3824d7ac3..0000000000000000000000000000000000000000 Binary files a/static/demo/orange_71.jpg and /dev/null differ diff --git a/static/demo/road554.png b/static/demo/road554.png deleted file mode 100644 index 7733e57f922b0fee893775da4f698c202804966f..0000000000000000000000000000000000000000 Binary files a/static/demo/road554.png and /dev/null differ diff --git a/static/deploy/README.md b/static/deploy/README.md deleted file mode 100644 index 39595a7ff4cdbd99a4c1b6043212b40a542f23cc..0000000000000000000000000000000000000000 --- a/static/deploy/README.md +++ /dev/null @@ -1,28 +0,0 @@ -# PaddleDetection 预测部署 - -`PaddleDetection`目前支持: -- 使用`Python`和`C++`部署在`Windows` 和`Linux` 上运行 -- [在线服务化部署](./serving/README.md) -- [移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo) - -## 模型导出 -训练得到一个满足要求的模型后,如果想要将该模型接入到C++服务器端预测库或移动端预测库,需要通过`tools/export_model.py`导出该模型。 - -- [导出教程](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/docs/advanced_tutorials/deploy/EXPORT_MODEL.md) - -模型导出后, 目录结构如下(以`yolov3_darknet`为例): -``` -yolov3_darknet # 模型目录 -├── infer_cfg.yml # 模型配置信息 -├── __model__ # 模型文件 -└── __params__ # 参数文件 -``` - -预测时,该目录所在的路径会作为程序的输入参数。 - -## 预测部署 -- [1. Python预测(支持 Linux 和 Windows)](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/deploy/python) -- [2. C++预测(支持 Linux 和 Windows)](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/deploy/cpp) -- [3. 在线服务化部署](./serving/README.md) -- [4. 移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo) -- [5. Jetson设备部署](./cpp/docs/Jetson_build.md) diff --git a/static/deploy/android_demo/README.md b/static/deploy/android_demo/README.md deleted file mode 100644 index 5d158baef396661f4783c56654af5f67f2a2e5fe..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/README.md +++ /dev/null @@ -1,38 +0,0 @@ -# PaddleDetection安卓端demo - -### 下载试用 -可通过[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/lite/paddledetection_app.apk)直接下载,或直接使用手机浏览器扫描二维码下载安装: - -
    - -
    - -### 环境搭建与代码运行 -- 安装最新版本的Android Studio,可以从https://developer.android.com/studio 下载。本demo使用是4.0版本Android Studio编写。 -- 下载NDK 20 以上版本,NDK 20版本以上均可以编译成功。可以用以下方式安装和测试NDK编译环境:点击 File -> New ->New Project,新建 "Native C++" project。 -- 导入项目:点击 File->New->Import Project..., 跟随Android Studio的引导导入项目即可。 -- 首先打开`app/build.gradle`文件,运行`downloadAndExtractArchives`函数,完成PaddleLite预测库与模型的下载与压缩。 -- 连接并选择设备,编译app并运行。 - -### 效果展示 -
    - -
    - -### 更新预测库与模型 - -#### 更新预测库 - -- 参考[ Paddle-Lite文档](https://github.com/PaddlePaddle/Paddle-Lite/wiki),编译Android等预测库,或直接下载最新[Paddle Lite预编译库](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html)。 -- 更新`app/libs`下`PaddlePredictor.jar`包,更新`app/src/main/jniLibs/arm64-v8a/libpaddle_lite_jni.so`包,更新`app/src/main/jniLibs/armeabi-v7a/libpaddle_lite_jni.so`包。 - -#### 更新模型 - -- 本demo中支持SSD与YOLO系列模型,如想更新模型,请替换`app/src/main/assets/models`下相关`model.nb`权重文件。 -- 如果想要加入新的算法模型,如人脸检测、实例分割等,需要在`app/src/main/assets/models`下放入新模型,并修改`app/src/main/cpp`下的数据预处理代码以适配新的模型算法。 -- 如更新的模型是非COCO数据集模型,请更新`app/src/main/assets/labels`下的类别标签文件。 - -### 获取更多支持 -- 本demo依赖[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite),Android工程开发可参考[Paddle-Lite Android 工程示例教程](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/android_app_demo.html#android-demo)。 -- 更多Paddle-Lite的demo请参考[Paddle-Lite-Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo)。 -- 前往[端计算模型生成平台EasyEdge](https://ai.baidu.com/easyedge/app/open_source_demo?referrerUrl=paddlelite),获得更多开发支持。 diff --git a/static/deploy/android_demo/app/app.iml b/static/deploy/android_demo/app/app.iml deleted file mode 100644 index 11aa7180158079e82ad75ace06073d9b379dd7fb..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/app.iml +++ /dev/null @@ -1,181 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/build.gradle b/static/deploy/android_demo/app/build.gradle deleted file mode 100644 index 759a09aba0315c13dea04cb3971dff11063ab415..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/build.gradle +++ /dev/null @@ -1,132 +0,0 @@ -import java.security.MessageDigest - -apply plugin: 'com.android.application' - -android { - compileSdkVersion 30 - buildToolsVersion "30.0.2" - - defaultConfig { - applicationId "com.baidu.paddledetection.detection" - minSdkVersion 23 - targetSdkVersion 30 - versionCode 1 - versionName "1.0" - testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner" - externalNativeBuild { - cmake { - arguments '-DANDROID_PLATFORM=android-23', '-DANDROID_STL=c++_shared', "-DANDROID_TOOLCHAIN=" - abiFilters 'arm64-v8a' - cppFlags "-std=c++11" - } - } - } - buildTypes { - release { - minifyEnabled false - proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro' - } - } - externalNativeBuild { - cmake { - path "src/main/cpp/CMakeLists.txt" - version "3.10.2" - } - } -} - -dependencies { - implementation fileTree(dir: "libs", include: ["*.jar"]) - implementation 'androidx.appcompat:appcompat:1.1.0' - implementation 'com.google.android.material:material:1.0.0' - implementation 'androidx.constraintlayout:constraintlayout:1.1.3' - implementation 'androidx.navigation:navigation-fragment:2.1.0' - implementation 'androidx.navigation:navigation-ui:2.1.0' - testImplementation 'junit:junit:4.12' - androidTestImplementation 'androidx.test.ext:junit:1.1.1' - androidTestImplementation 'androidx.test.espresso:espresso-core:3.2.0' -} - -def archives = [ - [ - 'src' : 'https://paddlelite-demo.bj.bcebos.com/libs/android/paddle_lite_libs_v2_6_1.tar.gz', - 'dest': 'PaddleLite' - ], - [ - 'src' : 'https://paddlelite-demo.bj.bcebos.com/libs/android/opencv-4.2.0-android-sdk.tar.gz', - 'dest': 'OpenCV' - ], - // yolov3_mobilenet_v3 - [ - 'src' : 'https://paddlelite-demo.bj.bcebos.com/models/yolov3_mobilenet_v3_prune86_FPGM_320_fp32_for_cpu_v2_6_1.tar.gz', - 'dest' : 'src/main/assets/models/yolov3_mobilenet_v3_for_cpu' - ], - [ - 'src' : 'https://paddlelite-demo.bj.bcebos.com/models/yolov3_mobilenet_v3_prune86_FPGM_320_fp32_for_hybrid_cpu_npu_v2_6_1.tar.gz', - 'dest' : 'src/main/assets/models/yolov3_mobilenet_v3_for_hybrid_cpu_npu' - ], - // pp-yolo tiny comming soon - // ssd_mobilenet_v1 voc - [ - 'src' : 'https://paddlelite-demo.bj.bcebos.com/models/ssdlite_mobilenet_v3_large_for_cpu_nb.tar.gz', - 'dest' : 'src/main/assets/models/ssdlite_mobilenet_v3_large_for_cpu_nb' - ], -] - - -task downloadAndExtractArchives(type: DefaultTask) { - doFirst { - println "Downloading and extracting archives including libs and models" - } - doLast { - // Prepare cache folder for archives - String cachePath = "cache" - if (!file("${cachePath}").exists()) { - mkdir "${cachePath}" - } - archives.eachWithIndex { archive, index -> - MessageDigest messageDigest = MessageDigest.getInstance('MD5') - messageDigest.update(archive.src.bytes) - String cacheName = new BigInteger(1, messageDigest.digest()).toString(32) - // Download the target archive if not exists - boolean copyFiles = !file("${archive.dest}").exists() - if (!file("${cachePath}/${cacheName}.tar.gz").exists()) { - ant.get(src: archive.src, dest: file("${cachePath}/${cacheName}.tar.gz")) - copyFiles = true; // force to copy files from the latest archive files - } - // Extract the target archive if its dest path does not exists - if (copyFiles) { - copy { - from tarTree("${cachePath}/${cacheName}.tar.gz") - into "${archive.dest}" - } - } - // Unpack libs - copy { - from tarTree("cache/${cacheName}.tar.gz") - into "cache/${cacheName}" - } - // Copy PaddlePredictor.jar - if (!file("libs/PaddlePredictor.jar").exists()) { - copy { - from "cache/${cacheName}/java/PaddlePredictor.jar" - into "libs" - } - } - // Copy libpaddle_lite_jni.so for armeabi-v7a and arm64-v8a - if (!file("src/main/jniLibs/armeabi-v7a/libpaddle_lite_jni.so").exists()) { - copy { - from "cache/${cacheName}/java/libs/armeabi-v7a/libpaddle_lite_jni.so" - into "src/main/jniLibs/armeabi-v7a" - } - } - if (!file("src/main/jniLibs/arm64-v8a/libpaddle_lite_jni.so").exists()) { - copy { - from "cache/${cacheName}/java/libs/arm64-v8a/libpaddle_lite_jni.so" - into "src/main/jniLibs/arm64-v8a" - } - } - } - } -} -preBuild.dependsOn downloadAndExtractArchives diff --git a/static/deploy/android_demo/app/gradlew b/static/deploy/android_demo/app/gradlew deleted file mode 100644 index cccdd3d517fc5249beaefa600691cf150f2fa3e6..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/gradlew +++ /dev/null @@ -1,172 +0,0 @@ -#!/usr/bin/env sh - -############################################################################## -## -## Gradle start up script for UN*X -## -############################################################################## - -# Attempt to set APP_HOME -# Resolve links: $0 may be a link -PRG="$0" -# Need this for relative symlinks. -while [ -h "$PRG" ] ; do - ls=`ls -ld "$PRG"` - link=`expr "$ls" : '.*-> \(.*\)$'` - if expr "$link" : '/.*' > /dev/null; then - PRG="$link" - else - PRG=`dirname "$PRG"`"/$link" - fi -done -SAVED="`pwd`" -cd "`dirname \"$PRG\"`/" >/dev/null -APP_HOME="`pwd -P`" -cd "$SAVED" >/dev/null - -APP_NAME="Gradle" -APP_BASE_NAME=`basename "$0"` - -# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script. -DEFAULT_JVM_OPTS="" - -# Use the maximum available, or set MAX_FD != -1 to use that value. -MAX_FD="maximum" - -warn () { - echo "$*" -} - -die () { - echo - echo "$*" - echo - exit 1 -} - -# OS specific support (must be 'true' or 'false'). -cygwin=false -msys=false -darwin=false -nonstop=false -case "`uname`" in - CYGWIN* ) - cygwin=true - ;; - Darwin* ) - darwin=true - ;; - MINGW* ) - msys=true - ;; - NONSTOP* ) - nonstop=true - ;; -esac - -CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar - -# Determine the Java command to use to start the JVM. -if [ -n "$JAVA_HOME" ] ; then - if [ -x "$JAVA_HOME/jre/sh/java" ] ; then - # IBM's JDK on AIX uses strange locations for the executables - JAVACMD="$JAVA_HOME/jre/sh/java" - else - JAVACMD="$JAVA_HOME/bin/java" - fi - if [ ! -x "$JAVACMD" ] ; then - die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME - -Please set the JAVA_HOME variable in your environment to match the -location of your Java installation." - fi -else - JAVACMD="java" - which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. - -Please set the JAVA_HOME variable in your environment to match the -location of your Java installation." -fi - -# Increase the maximum file descriptors if we can. -if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then - MAX_FD_LIMIT=`ulimit -H -n` - if [ $? -eq 0 ] ; then - if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then - MAX_FD="$MAX_FD_LIMIT" - fi - ulimit -n $MAX_FD - if [ $? -ne 0 ] ; then - warn "Could not set maximum file descriptor limit: $MAX_FD" - fi - else - warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT" - fi -fi - -# For Darwin, add options to specify how the application appears in the dock -if $darwin; then - GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\"" -fi - -# For Cygwin, switch paths to Windows format before running java -if $cygwin ; then - APP_HOME=`cygpath --path --mixed "$APP_HOME"` - CLASSPATH=`cygpath --path --mixed "$CLASSPATH"` - JAVACMD=`cygpath --unix "$JAVACMD"` - - # We build the pattern for arguments to be converted via cygpath - ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null` - SEP="" - for dir in $ROOTDIRSRAW ; do - ROOTDIRS="$ROOTDIRS$SEP$dir" - SEP="|" - done - OURCYGPATTERN="(^($ROOTDIRS))" - # Add a user-defined pattern to the cygpath arguments - if [ "$GRADLE_CYGPATTERN" != "" ] ; then - OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)" - fi - # Now convert the arguments - kludge to limit ourselves to /bin/sh - i=0 - for arg in "$@" ; do - CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -` - CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option - - if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition - eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"` - else - eval `echo args$i`="\"$arg\"" - fi - i=$((i+1)) - done - case $i in - (0) set -- ;; - (1) set -- "$args0" ;; - (2) set -- "$args0" "$args1" ;; - (3) set -- "$args0" "$args1" "$args2" ;; - (4) set -- "$args0" "$args1" "$args2" "$args3" ;; - (5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;; - (6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;; - (7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;; - (8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;; - (9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;; - esac -fi - -# Escape application args -save () { - for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done - echo " " -} -APP_ARGS=$(save "$@") - -# Collect all arguments for the java command, following the shell quoting and substitution rules -eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS" - -# by default we should be in the correct project dir, but when run from Finder on Mac, the cwd is wrong -if [ "$(uname)" = "Darwin" ] && [ "$HOME" = "$PWD" ]; then - cd "$(dirname "$0")" -fi - -exec "$JAVACMD" "$@" diff --git a/static/deploy/android_demo/app/gradlew.bat b/static/deploy/android_demo/app/gradlew.bat deleted file mode 100644 index e95643d6a2ca62258464e83c72f5156dc941c609..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/gradlew.bat +++ /dev/null @@ -1,84 +0,0 @@ -@if "%DEBUG%" == "" @echo off -@rem ########################################################################## -@rem -@rem Gradle startup script for Windows -@rem -@rem ########################################################################## - -@rem Set local scope for the variables with windows NT shell -if "%OS%"=="Windows_NT" setlocal - -set DIRNAME=%~dp0 -if "%DIRNAME%" == "" set DIRNAME=. -set APP_BASE_NAME=%~n0 -set APP_HOME=%DIRNAME% - -@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script. -set DEFAULT_JVM_OPTS= - -@rem Find java.exe -if defined JAVA_HOME goto findJavaFromJavaHome - -set JAVA_EXE=java.exe -%JAVA_EXE% -version >NUL 2>&1 -if "%ERRORLEVEL%" == "0" goto init - -echo. -echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. -echo. -echo Please set the JAVA_HOME variable in your environment to match the -echo location of your Java installation. - -goto fail - -:findJavaFromJavaHome -set JAVA_HOME=%JAVA_HOME:"=% -set JAVA_EXE=%JAVA_HOME%/bin/java.exe - -if exist "%JAVA_EXE%" goto init - -echo. -echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME% -echo. -echo Please set the JAVA_HOME variable in your environment to match the -echo location of your Java installation. - -goto fail - -:init -@rem Get command-line arguments, handling Windows variants - -if not "%OS%" == "Windows_NT" goto win9xME_args - -:win9xME_args -@rem Slurp the command line arguments. -set CMD_LINE_ARGS= -set _SKIP=2 - -:win9xME_args_slurp -if "x%~1" == "x" goto execute - -set CMD_LINE_ARGS=%* - -:execute -@rem Setup the command line - -set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar - -@rem Execute Gradle -"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %CMD_LINE_ARGS% - -:end -@rem End local scope for the variables with windows NT shell -if "%ERRORLEVEL%"=="0" goto mainEnd - -:fail -rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of -rem the _cmd.exe /c_ return code! -if not "" == "%GRADLE_EXIT_CONSOLE%" exit 1 -exit /b 1 - -:mainEnd -if "%OS%"=="Windows_NT" endlocal - -:omega diff --git a/static/deploy/android_demo/app/local.properties b/static/deploy/android_demo/app/local.properties deleted file mode 100644 index ae629f3566e49f127c50cb20f2449e3b5ea9fe52..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/local.properties +++ /dev/null @@ -1,8 +0,0 @@ -## This file must *NOT* be checked into Version Control Systems, -# as it contains information specific to your local configuration. -# -# Location of the SDK. This is only used by Gradle. -# For customization when using a Version Control System, please read the -# header note. -#Wed Sep 16 11:31:42 CST 2020 -sdk.dir=/Users/yuguanghua02/Library/Android/sdk diff --git a/static/deploy/android_demo/app/proguard-rules.pro b/static/deploy/android_demo/app/proguard-rules.pro deleted file mode 100644 index f1b424510da51fd82143bc74a0a801ae5a1e2fcd..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/proguard-rules.pro +++ /dev/null @@ -1,21 +0,0 @@ -# Add project specific ProGuard rules here. -# You can control the set of applied configuration files using the -# proguardFiles setting in build.gradle. -# -# For more details, see -# http://developer.android.com/guide/developing/tools/proguard.html - -# If your project uses WebView with JS, uncomment the following -# and specify the fully qualified class name to the JavaScript interface -# class: -#-keepclassmembers class fqcn.of.javascript.interface.for.webview { -# public *; -#} - -# Uncomment this to preserve the line number information for -# debugging stack traces. -#-keepattributes SourceFile,LineNumberTable - -# If you keep the line number information, uncomment this to -# hide the original source file name. -#-renamesourcefileattribute SourceFile diff --git a/static/deploy/android_demo/app/src/androidTest/java/com/baidu/paddledetection/detection/ExampleInstrumentedTest.java b/static/deploy/android_demo/app/src/androidTest/java/com/baidu/paddledetection/detection/ExampleInstrumentedTest.java deleted file mode 100644 index 2e9b169375455c89031b96a9c39574f1f10a6d17..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/androidTest/java/com/baidu/paddledetection/detection/ExampleInstrumentedTest.java +++ /dev/null @@ -1,26 +0,0 @@ -package com.baidu.paddledetection.detection; - -import android.content.Context; -import android.support.test.InstrumentationRegistry; -import android.support.test.runner.AndroidJUnit4; - -import org.junit.Test; -import org.junit.runner.RunWith; - -import static org.junit.Assert.*; - -/** - * Instrumented test, which will execute on an Android device. - * - * @see Testing documentation - */ -@RunWith(AndroidJUnit4.class) -public class ExampleInstrumentedTest { - @Test - public void useAppContext() { - // Context of the app under test. - Context appContext = InstrumentationRegistry.getTargetContext(); - - assertEquals("com.baidu.paddle.lite.demo", appContext.getPackageName()); - } -} diff --git a/static/deploy/android_demo/app/src/main/AndroidManifest.xml b/static/deploy/android_demo/app/src/main/AndroidManifest.xml deleted file mode 100644 index 1f33eb0e6c55fe006d791791f9b8db7f1f6b40b7..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/AndroidManifest.xml +++ /dev/null @@ -1,34 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/assets/images/home.jpg b/static/deploy/android_demo/app/src/main/assets/images/home.jpg deleted file mode 100644 index 19023f718333c56c70776c79201dc03d742c1ed3..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/assets/images/home.jpg and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/assets/images/kite.jpg b/static/deploy/android_demo/app/src/main/assets/images/kite.jpg deleted file mode 100644 index cb304bd56c4010c08611a30dcca58ea9140cea54..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/assets/images/kite.jpg and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/assets/labels/coco-labels-background.txt b/static/deploy/android_demo/app/src/main/assets/labels/coco-labels-background.txt deleted file mode 100644 index e290149963116c873fe6c54d28337a78bc500d8a..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/assets/labels/coco-labels-background.txt +++ /dev/null @@ -1,81 +0,0 @@ -background -person -bicycle -car -motorcycle -airplane -bus -train -truck -boat -traffic light -fire hydrant -stop sign -parking meter -bench -bird -cat -dog -horse -sheep -cow -elephant -bear -zebra -giraffe -backpack -umbrella -handbag -tie -suitcase -frisbee -skis -snowboard -sports ball -kite -baseball bat -baseball glove -skateboard -surfboard -tennis racket -bottle -wine glass -cup -fork -knife -spoon -bowl -banana -apple -sandwich -orange -broccoli -carrot -hot dog -pizza -donut -cake -chair -couch -potted plant -bed -dining table -toilet -tv -laptop -mouse -remote -keyboard -cell phone -microwave -oven -toaster -sink -refrigerator -book -clock -vase -scissors -teddy bear -hair drier -toothbrush \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/cpp/CMakeLists.txt b/static/deploy/android_demo/app/src/main/cpp/CMakeLists.txt deleted file mode 100644 index 90903a3687e76736d056d382149c1fabee4f760b..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/cpp/CMakeLists.txt +++ /dev/null @@ -1,170 +0,0 @@ -# For more information about using CMake with Android Studio, read the -# documentation: https://d.android.com/studio/projects/add-native-code.html - -# Sets the minimum version of CMake required to build the native library. - -cmake_minimum_required(VERSION 3.4.1) - -# Creates and names a library, sets it as either STATIC or SHARED, and provides -# the relative paths to its source code. You can define multiple libraries, and -# CMake builds them for you. Gradle automatically packages shared libraries with -# your APK. - -set(PaddleLite_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../../../PaddleLite") -include_directories(${PaddleLite_DIR}/cxx/include) - -set(OpenCV_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../../../OpenCV/sdk/native/jni") -find_package(OpenCV REQUIRED) -message(STATUS "OpenCV libraries: ${OpenCV_LIBS}") -include_directories(${OpenCV_INCLUDE_DIRS}) - -set(CMAKE_CXX_FLAGS - "${CMAKE_CXX_FLAGS} -ffast-math -Ofast -Os -DNDEBUG -fno-exceptions -fomit-frame-pointer -fno-asynchronous-unwind-tables -fno-unwind-tables" -) -set(CMAKE_CXX_FLAGS - "${CMAKE_CXX_FLAGS} -fvisibility=hidden -fvisibility-inlines-hidden -fdata-sections -ffunction-sections" -) -set(CMAKE_SHARED_LINKER_FLAGS - "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--gc-sections -Wl,-z,nocopyreloc") - -add_library( - # Sets the name of the library. - Native - # Sets the library as a shared library. - SHARED - # Provides a relative path to your source file(s). - Native.cc Pipeline.cc Utils.cc) - -find_library( - # Sets the name of the path variable. - log-lib - # Specifies the name of the NDK library that you want CMake to locate. - log) - -add_library( - # Sets the name of the library. - paddle_light_api_shared - # Sets the library as a shared library. - SHARED - # Provides a relative path to your source file(s). - IMPORTED) - -set_target_properties( - # Specifies the target library. - paddle_light_api_shared - # Specifies the parameter you want to define. - PROPERTIES - IMPORTED_LOCATION - ${PaddleLite_DIR}/cxx/libs/${ANDROID_ABI}/libpaddle_light_api_shared.so - # Provides the path to the library you want to import. -) - -# Comment the followings if libpaddle_light_api_shared.so is not fit for NPU -add_library( - # Sets the name of the library. - hiai - # Sets the library as a shared library. - SHARED - # Provides a relative path to your source file(s). - IMPORTED) - -set_target_properties( - # Specifies the target library. - hiai - # Specifies the parameter you want to define. - PROPERTIES IMPORTED_LOCATION - ${PaddleLite_DIR}/cxx/libs/${ANDROID_ABI}/libhiai.so - # Provides the path to the library you want to import. -) - -add_library( - # Sets the name of the library. - hiai_ir - # Sets the library as a shared library. - SHARED - # Provides a relative path to your source file(s). - IMPORTED) - -set_target_properties( - # Specifies the target library. - hiai_ir - # Specifies the parameter you want to define. - PROPERTIES IMPORTED_LOCATION - ${PaddleLite_DIR}/cxx/libs/${ANDROID_ABI}/libhiai_ir.so - # Provides the path to the library you want to import. -) - -add_library( - # Sets the name of the library. - hiai_ir_build - # Sets the library as a shared library. - SHARED - # Provides a relative path to your source file(s). - IMPORTED) - -set_target_properties( - # Specifies the target library. - hiai_ir_build - # Specifies the parameter you want to define. - PROPERTIES IMPORTED_LOCATION - ${PaddleLite_DIR}/cxx/libs/${ANDROID_ABI}/libhiai_ir_build.so - # Provides the path to the library you want to import. -) - -# Specifies libraries CMake should link to your target library. You can link -# multiple libraries, such as libraries you define in this build script, -# prebuilt third-party libraries, or system libraries. - -target_link_libraries( - # Specifies the target library. - Native - paddle_light_api_shared - ${OpenCV_LIBS} - GLESv2 - EGL - ${log-lib} - hiai - hiai_ir - hiai_ir_build - ) - -add_custom_command( - TARGET Native - POST_BUILD - COMMAND - ${CMAKE_COMMAND} -E copy - ${PaddleLite_DIR}/cxx/libs/${ANDROID_ABI}/libc++_shared.so - ${CMAKE_LIBRARY_OUTPUT_DIRECTORY}/libc++_shared.so) - -add_custom_command( - TARGET Native - POST_BUILD - COMMAND - ${CMAKE_COMMAND} -E copy - ${PaddleLite_DIR}/cxx/libs/${ANDROID_ABI}/libpaddle_light_api_shared.so - ${CMAKE_LIBRARY_OUTPUT_DIRECTORY}/libpaddle_light_api_shared.so) - -# Comment the followings if libpaddle_light_api_shared.so is not fit for NPU -add_custom_command( - TARGET Native - POST_BUILD - COMMAND - ${CMAKE_COMMAND} -E copy - ${PaddleLite_DIR}/cxx/libs/${ANDROID_ABI}/libhiai.so - ${CMAKE_LIBRARY_OUTPUT_DIRECTORY}/libhiai.so) - -add_custom_command( - TARGET Native - POST_BUILD - COMMAND - ${CMAKE_COMMAND} -E copy - ${PaddleLite_DIR}/cxx/libs/${ANDROID_ABI}/libhiai_ir.so - ${CMAKE_LIBRARY_OUTPUT_DIRECTORY}/libhiai_ir.so) - -add_custom_command( - TARGET Native - POST_BUILD - COMMAND - ${CMAKE_COMMAND} -E copy - ${PaddleLite_DIR}/cxx/libs/${ANDROID_ABI}/libhiai_ir_build.so - ${CMAKE_LIBRARY_OUTPUT_DIRECTORY}/libhiai_ir_build.so) diff --git a/static/deploy/android_demo/app/src/main/cpp/Native.cc b/static/deploy/android_demo/app/src/main/cpp/Native.cc deleted file mode 100644 index 1b2700a91c8b9bd2b6a186378b6bdc068e8927a9..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/cpp/Native.cc +++ /dev/null @@ -1,78 +0,0 @@ -// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -#include "Native.h" -#include "Pipeline.h" - -#ifdef __cplusplus -extern "C" { -#endif -/* - * Class: com_baidu_paddle_lite_demo_yolo_detection_Native - * Method: nativeInit - * Signature: - * (Ljava/lang/String;Ljava/lang/String;ILjava/lang/String;II[F[FF)J - */ -JNIEXPORT jlong JNICALL -Java_com_baidu_paddledetection_detection_Native_nativeInit( - JNIEnv *env, jclass thiz, jstring jModelDir, jstring jLabelPath, - jint cpuThreadNum, jstring jCPUPowerMode, jint inputWidth, jint inputHeight, - jfloatArray jInputMean, jfloatArray jInputStd, jfloat scoreThreshold) { - std::string modelDir = jstring_to_cpp_string(env, jModelDir); - std::string labelPath = jstring_to_cpp_string(env, jLabelPath); - std::string cpuPowerMode = jstring_to_cpp_string(env, jCPUPowerMode); - std::vector inputMean = jfloatarray_to_float_vector(env, jInputMean); - std::vector inputStd = jfloatarray_to_float_vector(env, jInputStd); - return reinterpret_cast( - new Pipeline(modelDir, labelPath, cpuThreadNum, cpuPowerMode, inputWidth, - inputHeight, inputMean, inputStd, scoreThreshold)); -} - -/* - * Class: com_baidu_paddle_lite_demo_yolo_detection_Native - * Method: nativeRelease - * Signature: (J)Z - */ -JNIEXPORT jboolean JNICALL -Java_com_baidu_paddledetection_detection_Native_nativeRelease( - JNIEnv *env, jclass thiz, jlong ctx) { - if (ctx == 0) { - return JNI_FALSE; - } - Pipeline *pipeline = reinterpret_cast(ctx); - delete pipeline; - return JNI_TRUE; -} - -/* - * Class: com_baidu_paddle_lite_demo_yolo_detection_Native - * Method: nativeProcess - * Signature: (JIIIILjava/lang/String;)Z - */ -JNIEXPORT jboolean JNICALL -Java_com_baidu_paddledetection_detection_Native_nativeProcess( - JNIEnv *env, jclass thiz, jlong ctx, jint inTextureId, jint outTextureId, - jint textureWidth, jint textureHeight, jstring jsavedImagePath) { - if (ctx == 0) { - return JNI_FALSE; - } - std::string savedImagePath = jstring_to_cpp_string(env, jsavedImagePath); - Pipeline *pipeline = reinterpret_cast(ctx); - return pipeline->Process(inTextureId, outTextureId, textureWidth, - textureHeight, savedImagePath); -} - -#ifdef __cplusplus -} -#endif diff --git a/static/deploy/android_demo/app/src/main/cpp/Native.h b/static/deploy/android_demo/app/src/main/cpp/Native.h deleted file mode 100644 index d595b8ea2c11c1b0a9219119abcc5678551c318f..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/cpp/Native.h +++ /dev/null @@ -1,116 +0,0 @@ -// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -#pragma once - -#include -#include -#include - -inline std::string jstring_to_cpp_string(JNIEnv *env, jstring jstr) { - // In java, a unicode char will be encoded using 2 bytes (utf16). - // so jstring will contain characters utf16. std::string in c++ is - // essentially a string of bytes, not characters, so if we want to - // pass jstring from JNI to c++, we have convert utf16 to bytes. - if (!jstr) { - return ""; - } - const jclass stringClass = env->GetObjectClass(jstr); - const jmethodID getBytes = - env->GetMethodID(stringClass, "getBytes", "(Ljava/lang/String;)[B"); - const jbyteArray stringJbytes = (jbyteArray)env->CallObjectMethod( - jstr, getBytes, env->NewStringUTF("UTF-8")); - - size_t length = (size_t)env->GetArrayLength(stringJbytes); - jbyte *pBytes = env->GetByteArrayElements(stringJbytes, NULL); - - std::string ret = std::string(reinterpret_cast(pBytes), length); - env->ReleaseByteArrayElements(stringJbytes, pBytes, JNI_ABORT); - - env->DeleteLocalRef(stringJbytes); - env->DeleteLocalRef(stringClass); - return ret; -} - -inline jstring cpp_string_to_jstring(JNIEnv *env, std::string str) { - auto *data = str.c_str(); - jclass strClass = env->FindClass("java/lang/String"); - jmethodID strClassInitMethodID = - env->GetMethodID(strClass, "", "([BLjava/lang/String;)V"); - - jbyteArray bytes = env->NewByteArray(strlen(data)); - env->SetByteArrayRegion(bytes, 0, strlen(data), - reinterpret_cast(data)); - - jstring encoding = env->NewStringUTF("UTF-8"); - jstring res = (jstring)( - env->NewObject(strClass, strClassInitMethodID, bytes, encoding)); - - env->DeleteLocalRef(strClass); - env->DeleteLocalRef(encoding); - env->DeleteLocalRef(bytes); - - return res; -} - -inline jfloatArray cpp_array_to_jfloatarray(JNIEnv *env, const float *buf, - int64_t len) { - jfloatArray result = env->NewFloatArray(len); - env->SetFloatArrayRegion(result, 0, len, buf); - return result; -} - -inline jintArray cpp_array_to_jintarray(JNIEnv *env, const int *buf, - int64_t len) { - jintArray result = env->NewIntArray(len); - env->SetIntArrayRegion(result, 0, len, buf); - return result; -} - -inline jbyteArray cpp_array_to_jbytearray(JNIEnv *env, const int8_t *buf, - int64_t len) { - jbyteArray result = env->NewByteArray(len); - env->SetByteArrayRegion(result, 0, len, buf); - return result; -} - -inline jlongArray int64_vector_to_jlongarray(JNIEnv *env, - const std::vector &vec) { - jlongArray result = env->NewLongArray(vec.size()); - jlong *buf = new jlong[vec.size()]; - for (size_t i = 0; i < vec.size(); ++i) { - buf[i] = (jlong)vec[i]; - } - env->SetLongArrayRegion(result, 0, vec.size(), buf); - delete[] buf; - return result; -} - -inline std::vector jlongarray_to_int64_vector(JNIEnv *env, - jlongArray data) { - int data_size = env->GetArrayLength(data); - jlong *data_ptr = env->GetLongArrayElements(data, nullptr); - std::vector data_vec(data_ptr, data_ptr + data_size); - env->ReleaseLongArrayElements(data, data_ptr, 0); - return data_vec; -} - -inline std::vector jfloatarray_to_float_vector(JNIEnv *env, - jfloatArray data) { - int data_size = env->GetArrayLength(data); - jfloat *data_ptr = env->GetFloatArrayElements(data, nullptr); - std::vector data_vec(data_ptr, data_ptr + data_size); - env->ReleaseFloatArrayElements(data, data_ptr, 0); - return data_vec; -} diff --git a/static/deploy/android_demo/app/src/main/cpp/Pipeline.cc b/static/deploy/android_demo/app/src/main/cpp/Pipeline.cc deleted file mode 100644 index b3e3476d961cfd608186f9672786770b39da3268..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/cpp/Pipeline.cc +++ /dev/null @@ -1,243 +0,0 @@ -// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -#include "Pipeline.h" - -Detector::Detector(const std::string &modelDir, const std::string &labelPath, - const int cpuThreadNum, const std::string &cpuPowerMode, - int inputWidth, int inputHeight, - const std::vector &inputMean, - const std::vector &inputStd, float scoreThreshold) - : inputWidth_(inputWidth), inputHeight_(inputHeight), inputMean_(inputMean), - inputStd_(inputStd), scoreThreshold_(scoreThreshold) { - paddle::lite_api::MobileConfig config; - config.set_model_from_file(modelDir + "/model.nb"); - config.set_threads(cpuThreadNum); - config.set_power_mode(ParsePowerMode(cpuPowerMode)); - predictor_ = - paddle::lite_api::CreatePaddlePredictor( - config); - labelList_ = LoadLabelList(labelPath); - colorMap_ = GenerateColorMap(labelList_.size()); -} - -std::vector Detector::LoadLabelList(const std::string &labelPath) { - std::ifstream file; - std::vector labels; - file.open(labelPath); - while (file) { - std::string line; - std::getline(file, line); - labels.push_back(line); - } - file.clear(); - file.close(); - return labels; -} - -std::vector Detector::GenerateColorMap(int numOfClasses) { - std::vector colorMap = std::vector(numOfClasses); - for (int i = 0; i < numOfClasses; i++) { - int j = 0; - int label = i; - int R = 0, G = 0, B = 0; - while (label) { - R |= (((label >> 0) & 1) << (7 - j)); - G |= (((label >> 1) & 1) << (7 - j)); - B |= (((label >> 2) & 1) << (7 - j)); - j++; - label >>= 3; - } - colorMap[i] = cv::Scalar(R, G, B); - } - return colorMap; -} - -void Detector::Preprocess(const cv::Mat &rgbaImage) { - // Set the data of input image - auto inputTensor = predictor_->GetInput(0); - std::vector inputShape = {1, 3, inputHeight_, inputWidth_}; - inputTensor->Resize(inputShape); - auto inputData = inputTensor->mutable_data(); - cv::Mat resizedRGBAImage; - cv::resize(rgbaImage, resizedRGBAImage, - cv::Size(inputShape[3], inputShape[2])); - cv::Mat resizedRGBImage; - cv::cvtColor(resizedRGBAImage, resizedRGBImage, cv::COLOR_BGRA2RGB); - resizedRGBImage.convertTo(resizedRGBImage, CV_32FC3, 1.0 / 255.0f); - NHWC3ToNC3HW(reinterpret_cast(resizedRGBImage.data), inputData, - inputMean_.data(), inputStd_.data(), inputShape[3], - inputShape[2]); - // Set the size of input image - auto sizeTensor = predictor_->GetInput(1); - sizeTensor->Resize({1, 2}); - auto sizeData = sizeTensor->mutable_data(); - sizeData[0] = inputShape[3]; - sizeData[1] = inputShape[2]; -} - -void Detector::Postprocess(std::vector *results) { - auto outputTensor = predictor_->GetOutput(0); - auto outputData = outputTensor->data(); - auto outputShape = outputTensor->shape(); - int outputSize = ShapeProduction(outputShape); - for (int i = 0; i < outputSize; i += 6) { - // Class id - auto class_id = static_cast(round(outputData[i])); - // Confidence score - auto score = outputData[i + 1]; - if (score < scoreThreshold_) - continue; - RESULT object; - object.class_name = class_id >= 0 && class_id < labelList_.size() - ? labelList_[class_id] - : "Unknow"; - object.fill_color = class_id >= 0 && class_id < colorMap_.size() - ? colorMap_[class_id] - : cv::Scalar(0, 0, 0); - object.score = score; - object.x = outputData[i + 2] / inputWidth_; - object.y = outputData[i + 3] / inputHeight_; - object.w = (outputData[i + 4] - outputData[i + 2] + 1) / inputWidth_; - object.h = (outputData[i + 5] - outputData[i + 3] + 1) / inputHeight_; - results->push_back(object); - } -} - -void Detector::Predict(const cv::Mat &rgbaImage, std::vector *results, - double *preprocessTime, double *predictTime, - double *postprocessTime) { - auto t = GetCurrentTime(); - - t = GetCurrentTime(); - Preprocess(rgbaImage); - *preprocessTime = GetElapsedTime(t); - LOGD("Detector postprocess costs %f ms", *preprocessTime); - - t = GetCurrentTime(); - predictor_->Run(); - *predictTime = GetElapsedTime(t); - LOGD("Detector predict costs %f ms", *predictTime); - - t = GetCurrentTime(); - Postprocess(results); - *postprocessTime = GetElapsedTime(t); - LOGD("Detector postprocess costs %f ms", *postprocessTime); -} - -Pipeline::Pipeline(const std::string &modelDir, const std::string &labelPath, - const int cpuThreadNum, const std::string &cpuPowerMode, - int inputWidth, int inputHeight, - const std::vector &inputMean, - const std::vector &inputStd, float scoreThreshold) { - detector_.reset(new Detector(modelDir, labelPath, cpuThreadNum, cpuPowerMode, - inputWidth, inputHeight, inputMean, inputStd, - scoreThreshold)); -} - -void Pipeline::VisualizeResults(const std::vector &results, - cv::Mat *rgbaImage) { - int w = rgbaImage->cols; - int h = rgbaImage->rows; - for (int i = 0; i < results.size(); i++) { - RESULT object = results[i]; - cv::Rect boundingBox = - cv::Rect(object.x * w, object.y * h, object.w * w, object.h * h) & - cv::Rect(0, 0, w - 1, h - 1); - // Configure text size - std::string text = object.class_name + " "; - text += std::to_string(static_cast(object.score * 100)) + "%"; - int fontFace = cv::FONT_HERSHEY_PLAIN; - double fontScale = 1.5f; - float fontThickness = 1.0f; - cv::Size textSize = - cv::getTextSize(text, fontFace, fontScale, fontThickness, nullptr); - // Draw roi object, text, and background - cv::rectangle(*rgbaImage, boundingBox, object.fill_color, 2); - cv::rectangle(*rgbaImage, - cv::Point2d(boundingBox.x, - boundingBox.y - round(textSize.height * 1.25f)), - cv::Point2d(boundingBox.x + boundingBox.width, boundingBox.y), - object.fill_color, -1); - cv::putText(*rgbaImage, text, cv::Point2d(boundingBox.x, boundingBox.y), - fontFace, fontScale, cv::Scalar(255, 255, 255), fontThickness); - } -} - -void Pipeline::VisualizeStatus(double readGLFBOTime, double writeGLTextureTime, - double preprocessTime, double predictTime, - double postprocessTime, cv::Mat *rgbaImage) { - char text[255]; - cv::Scalar fontColor = cv::Scalar(255, 255, 255); - int fontFace = cv::FONT_HERSHEY_PLAIN; - double fontScale = 1.f; - float fontThickness = 1; - sprintf(text, "Read GLFBO time: %.1f ms", readGLFBOTime); - cv::Size textSize = - cv::getTextSize(text, fontFace, fontScale, fontThickness, nullptr); - textSize.height *= 1.25f; - cv::Point2d offset(10, textSize.height + 15); - cv::putText(*rgbaImage, text, offset, fontFace, fontScale, fontColor, - fontThickness); - sprintf(text, "Write GLTexture time: %.1f ms", writeGLTextureTime); - offset.y += textSize.height; - cv::putText(*rgbaImage, text, offset, fontFace, fontScale, fontColor, - fontThickness); - sprintf(text, "Preprocess time: %.1f ms", preprocessTime); - offset.y += textSize.height; - cv::putText(*rgbaImage, text, offset, fontFace, fontScale, fontColor, - fontThickness); - sprintf(text, "Predict time: %.1f ms", predictTime); - offset.y += textSize.height; - cv::putText(*rgbaImage, text, offset, fontFace, fontScale, fontColor, - fontThickness); - sprintf(text, "Postprocess time: %.1f ms", postprocessTime); - offset.y += textSize.height; - cv::putText(*rgbaImage, text, offset, fontFace, fontScale, fontColor, - fontThickness); -} - -bool Pipeline::Process(int inTexureId, int outTextureId, int textureWidth, - int textureHeight, std::string savedImagePath) { - static double readGLFBOTime = 0, writeGLTextureTime = 0; - double preprocessTime = 0, predictTime = 0, postprocessTime = 0; - - // Read pixels from FBO texture to CV image - cv::Mat rgbaImage; - CreateRGBAImageFromGLFBOTexture(textureWidth, textureHeight, &rgbaImage, - &readGLFBOTime); - - // Feed the image, run inference and parse the results - std::vector results; - detector_->Predict(rgbaImage, &results, &preprocessTime, &predictTime, - &postprocessTime); - - // Visualize the objects to the origin image - VisualizeResults(results, &rgbaImage); - - // Visualize the status(performance data) to the origin image - VisualizeStatus(readGLFBOTime, writeGLTextureTime, preprocessTime, - predictTime, postprocessTime, &rgbaImage); - - // Dump modified image if savedImagePath is set - if (!savedImagePath.empty()) { - cv::Mat bgrImage; - cv::cvtColor(rgbaImage, bgrImage, cv::COLOR_RGBA2BGR); - imwrite(savedImagePath, bgrImage); - } - - // Write back to texture2D - WriteRGBAImageBackToGLTexture(rgbaImage, outTextureId, &writeGLTextureTime); - return true; -} diff --git a/static/deploy/android_demo/app/src/main/cpp/Pipeline.h b/static/deploy/android_demo/app/src/main/cpp/Pipeline.h deleted file mode 100644 index 91177d0417814cd60b01112674baf6387675f9a0..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/cpp/Pipeline.h +++ /dev/null @@ -1,112 +0,0 @@ -// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -#pragma once - -#include "Utils.h" -#include "paddle_api.h" -#include -#include -#include -#include -#include -#include -#include -#include - -struct RESULT { - std::string class_name; - cv::Scalar fill_color; - float score; - float x; - float y; - float w; - float h; -}; - -class Detector { -public: - explicit Detector(const std::string &modelDir, const std::string &labelPath, - const int cpuThreadNum, const std::string &cpuPowerMode, - int inputWidth, int inputHeight, - const std::vector &inputMean, - const std::vector &inputStd, float scoreThreshold); - - void Predict(const cv::Mat &rgbImage, std::vector *results, - double *preprocessTime, double *predictTime, - double *postprocessTime); - -private: - std::vector LoadLabelList(const std::string &path); - std::vector GenerateColorMap(int numOfClasses); - void Preprocess(const cv::Mat &rgbaImage); - void Postprocess(std::vector *results); - -private: - int inputWidth_; - int inputHeight_; - std::vector inputMean_; - std::vector inputStd_; - float scoreThreshold_; - std::vector labelList_; - std::vector colorMap_; - std::shared_ptr predictor_; -}; - -class Pipeline { -public: - Pipeline(const std::string &modelDir, const std::string &labelPath, - const int cpuThreadNum, const std::string &cpuPowerMode, - int inputWidth, int inputHeight, const std::vector &inputMean, - const std::vector &inputStd, float scoreThreshold); - - bool Process(int inTextureId, int outTextureId, int textureWidth, - int textureHeight, std::string savedImagePath); - -private: - // Read pixels from FBO texture to CV image - void CreateRGBAImageFromGLFBOTexture(int textureWidth, int textureHeight, - cv::Mat *rgbaImage, - double *readGLFBOTime) { - auto t = GetCurrentTime(); - rgbaImage->create(textureHeight, textureWidth, CV_8UC4); - glReadPixels(0, 0, textureWidth, textureHeight, GL_RGBA, GL_UNSIGNED_BYTE, - rgbaImage->data); - *readGLFBOTime = GetElapsedTime(t); - LOGD("Read from FBO texture costs %f ms", *readGLFBOTime); - } - - // Write back to texture2D - void WriteRGBAImageBackToGLTexture(const cv::Mat &rgbaImage, int textureId, - double *writeGLTextureTime) { - auto t = GetCurrentTime(); - glActiveTexture(GL_TEXTURE0); - glBindTexture(GL_TEXTURE_2D, textureId); - glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, rgbaImage.cols, rgbaImage.rows, - GL_RGBA, GL_UNSIGNED_BYTE, rgbaImage.data); - *writeGLTextureTime = GetElapsedTime(t); - LOGD("Write back to texture2D costs %f ms", *writeGLTextureTime); - } - - // Visualize the results to origin image - void VisualizeResults(const std::vector &results, cv::Mat *rgbaImage); - - // Visualize the status(performace data) to origin image - void VisualizeStatus(double readGLFBOTime, double writeGLTextureTime, - double preprocessTime, double predictTime, - double postprocessTime, cv::Mat *rgbaImage); - -private: - std::shared_ptr detector_; -}; diff --git a/static/deploy/android_demo/app/src/main/cpp/Utils.cc b/static/deploy/android_demo/app/src/main/cpp/Utils.cc deleted file mode 100644 index 63ea54fd4b2de0e9822f5c392606c3e7dca1da0b..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/cpp/Utils.cc +++ /dev/null @@ -1,78 +0,0 @@ -// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -#include "Utils.h" -#include - -int64_t ShapeProduction(const std::vector &shape) { - int64_t res = 1; - for (auto i : shape) - res *= i; - return res; -} - -void NHWC3ToNC3HW(const float *src, float *dst, const float *mean, - const float *std, int width, int height) { - int size = height * width; - float32x4_t vmean0 = vdupq_n_f32(mean ? mean[0] : 0.0f); - float32x4_t vmean1 = vdupq_n_f32(mean ? mean[1] : 0.0f); - float32x4_t vmean2 = vdupq_n_f32(mean ? mean[2] : 0.0f); - float32x4_t vscale0 = vdupq_n_f32(std ? (1.0f / std[0]) : 1.0f); - float32x4_t vscale1 = vdupq_n_f32(std ? (1.0f / std[1]) : 1.0f); - float32x4_t vscale2 = vdupq_n_f32(std ? (1.0f / std[2]) : 1.0f); - float *dst_c0 = dst; - float *dst_c1 = dst + size; - float *dst_c2 = dst + size * 2; - int i = 0; - for (; i < size - 3; i += 4) { - float32x4x3_t vin3 = vld3q_f32(src); - float32x4_t vsub0 = vsubq_f32(vin3.val[0], vmean0); - float32x4_t vsub1 = vsubq_f32(vin3.val[1], vmean1); - float32x4_t vsub2 = vsubq_f32(vin3.val[2], vmean2); - float32x4_t vs0 = vmulq_f32(vsub0, vscale0); - float32x4_t vs1 = vmulq_f32(vsub1, vscale1); - float32x4_t vs2 = vmulq_f32(vsub2, vscale2); - vst1q_f32(dst_c0, vs0); - vst1q_f32(dst_c1, vs1); - vst1q_f32(dst_c2, vs2); - src += 12; - dst_c0 += 4; - dst_c1 += 4; - dst_c2 += 4; - } - for (; i < size; i++) { - *(dst_c0++) = (*(src++) - mean[0]) / std[0]; - *(dst_c1++) = (*(src++) - mean[1]) / std[1]; - *(dst_c2++) = (*(src++) - mean[2]) / std[2]; - } -} - -void NHWC1ToNC1HW(const float *src, float *dst, const float *mean, - const float *std, int width, int height) { - int size = height * width; - float32x4_t vmean = vdupq_n_f32(mean ? mean[0] : 0.0f); - float32x4_t vscale = vdupq_n_f32(std ? (1.0f / std[0]) : 1.0f); - int i = 0; - for (; i < size - 3; i += 4) { - float32x4_t vin = vld1q_f32(src); - float32x4_t vsub = vsubq_f32(vin, vmean); - float32x4_t vs = vmulq_f32(vsub, vscale); - vst1q_f32(dst, vs); - src += 4; - dst += 4; - } - for (; i < size; i++) { - *(dst++) = (*(src++) - mean[0]) / std[0]; - } -} diff --git a/static/deploy/android_demo/app/src/main/cpp/Utils.h b/static/deploy/android_demo/app/src/main/cpp/Utils.h deleted file mode 100644 index 74fa82a6423dd600f3aa711da4ca244d0d4c34eb..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/cpp/Utils.h +++ /dev/null @@ -1,92 +0,0 @@ -// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -#pragma once - -#include "paddle_api.h" -#include -#include -#include -#include - -#define TAG "JNI" -#define LOGD(...) __android_log_print(ANDROID_LOG_DEBUG, TAG, __VA_ARGS__) -#define LOGI(...) __android_log_print(ANDROID_LOG_INFO, TAG, __VA_ARGS__) -#define LOGW(...) __android_log_print(ANDROID_LOG_WARN, TAG, __VA_ARGS__) -#define LOGE(...) __android_log_print(ANDROID_LOG_ERROR, TAG, __VA_ARGS__) -#define LOGF(...) __android_log_print(ANDROID_LOG_FATAL, TAG, __VA_ARGS__) - -int64_t ShapeProduction(const std::vector &shape); - -template -bool ReadFile(const std::string &path, std::vector *data) { - std::ifstream file(path, std::ifstream::binary); - if (file) { - file.seekg(0, file.end); - int size = file.tellg(); - LOGD("file size=%lld\n", size); - data->resize(size / sizeof(T)); - file.seekg(0, file.beg); - file.read(reinterpret_cast(data->data()), size); - file.close(); - return true; - } else { - LOGE("Can't read file from %s\n", path.c_str()); - } - return false; -} - -template -bool WriteFile(const std::string &path, const std::vector &data) { - std::ofstream file{path, std::ios::binary}; - if (!file.is_open()) { - LOGE("Can't write file to %s\n", path.c_str()); - return false; - } - file.write(reinterpret_cast(data.data()), - data.size() * sizeof(T)); - file.close(); - return true; -} - -inline int64_t GetCurrentTime() { - struct timeval time; - gettimeofday(&time, NULL); - return 1000000LL * (int64_t)time.tv_sec + (int64_t)time.tv_usec; -} - -inline double GetElapsedTime(int64_t time) { - return (GetCurrentTime() - time) / 1000.0f; -} - -inline paddle::lite_api::PowerMode ParsePowerMode(std::string mode) { - if (mode == "LITE_POWER_HIGH") { - return paddle::lite_api::LITE_POWER_HIGH; - } else if (mode == "LITE_POWER_LOW") { - return paddle::lite_api::LITE_POWER_LOW; - } else if (mode == "LITE_POWER_FULL") { - return paddle::lite_api::LITE_POWER_FULL; - } else if (mode == "LITE_POWER_RAND_HIGH") { - return paddle::lite_api::LITE_POWER_RAND_HIGH; - } else if (mode == "LITE_POWER_RAND_LOW") { - return paddle::lite_api::LITE_POWER_RAND_LOW; - } - return paddle::lite_api::LITE_POWER_NO_BIND; -} - -void NHWC3ToNC3HW(const float *src, float *dst, const float *mean, - const float *std, int width, int height); - -void NHWC1ToNC1HW(const float *src, float *dst, const float *mean, - const float *std, int width, int height); diff --git a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/common/AppCompatPreferenceActivity.java b/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/common/AppCompatPreferenceActivity.java deleted file mode 100644 index 6ebdb43334fd412cf3f9a3e9de2ffedd9ba574cf..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/common/AppCompatPreferenceActivity.java +++ /dev/null @@ -1,127 +0,0 @@ -/* - * Copyright (C) 2014 The Android Open Source Project - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.baidu.paddledetection.common; - -import android.content.res.Configuration; -import android.os.Bundle; -import android.preference.PreferenceActivity; -import androidx.annotation.LayoutRes; -import androidx.annotation.Nullable; -import androidx.appcompat.app.ActionBar; -import androidx.appcompat.app.AppCompatDelegate; -import androidx.appcompat.widget.Toolbar; -import android.view.MenuInflater; -import android.view.View; -import android.view.ViewGroup; - -/** - * A {@link PreferenceActivity} which implements and proxies the necessary calls - * to be used with AppCompat. - *

    - * This technique can be used with an {@link android.app.Activity} class, not just - * {@link PreferenceActivity}. - */ -public abstract class AppCompatPreferenceActivity extends PreferenceActivity { - private AppCompatDelegate mDelegate; - - @Override - protected void onCreate(Bundle savedInstanceState) { - getDelegate().installViewFactory(); - getDelegate().onCreate(savedInstanceState); - super.onCreate(savedInstanceState); - } - - @Override - protected void onPostCreate(Bundle savedInstanceState) { - super.onPostCreate(savedInstanceState); - getDelegate().onPostCreate(savedInstanceState); - } - - public ActionBar getSupportActionBar() { - return getDelegate().getSupportActionBar(); - } - - public void setSupportActionBar(@Nullable Toolbar toolbar) { - getDelegate().setSupportActionBar(toolbar); - } - - @Override - public MenuInflater getMenuInflater() { - return getDelegate().getMenuInflater(); - } - - @Override - public void setContentView(@LayoutRes int layoutResID) { - getDelegate().setContentView(layoutResID); - } - - @Override - public void setContentView(View view) { - getDelegate().setContentView(view); - } - - @Override - public void setContentView(View view, ViewGroup.LayoutParams params) { - getDelegate().setContentView(view, params); - } - - @Override - public void addContentView(View view, ViewGroup.LayoutParams params) { - getDelegate().addContentView(view, params); - } - - @Override - protected void onPostResume() { - super.onPostResume(); - getDelegate().onPostResume(); - } - - @Override - protected void onTitleChanged(CharSequence title, int color) { - super.onTitleChanged(title, color); - getDelegate().setTitle(title); - } - - @Override - public void onConfigurationChanged(Configuration newConfig) { - super.onConfigurationChanged(newConfig); - getDelegate().onConfigurationChanged(newConfig); - } - - @Override - protected void onStop() { - super.onStop(); - getDelegate().onStop(); - } - - @Override - protected void onDestroy() { - super.onDestroy(); - getDelegate().onDestroy(); - } - - public void invalidateOptionsMenu() { - getDelegate().invalidateOptionsMenu(); - } - - private AppCompatDelegate getDelegate() { - if (mDelegate == null) { - mDelegate = AppCompatDelegate.create(this, null); - } - return mDelegate; - } -} diff --git a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/common/CameraSurfaceView.java b/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/common/CameraSurfaceView.java deleted file mode 100644 index dc074841c63e1796398cce26be980444faafdd1f..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/common/CameraSurfaceView.java +++ /dev/null @@ -1,317 +0,0 @@ -package com.baidu.paddledetection.common; - -import android.content.Context; -import android.graphics.SurfaceTexture; -import android.hardware.Camera; -import android.hardware.Camera.CameraInfo; -import android.hardware.Camera.Size; -import android.opengl.GLES11Ext; -import android.opengl.GLES20; -import android.opengl.GLSurfaceView; -import android.opengl.GLSurfaceView.Renderer; -import android.opengl.Matrix; -import android.util.AttributeSet; -import android.util.Log; - -import javax.microedition.khronos.egl.EGLConfig; -import javax.microedition.khronos.opengles.GL10; - -import java.io.IOException; -import java.nio.ByteBuffer; -import java.nio.ByteOrder; -import java.nio.FloatBuffer; -import java.util.List; - -public class CameraSurfaceView extends GLSurfaceView implements Renderer, - SurfaceTexture.OnFrameAvailableListener { - private static final String TAG = CameraSurfaceView.class.getSimpleName(); - - public static final int EXPECTED_PREVIEW_WIDTH = 1280; - public static final int EXPECTED_PREVIEW_HEIGHT = 720; - - - protected int numberOfCameras; - protected int selectedCameraId; - protected boolean disableCamera = false; - protected Camera camera; - - protected Context context; - protected SurfaceTexture surfaceTexture; - protected int surfaceWidth = 0; - protected int surfaceHeight = 0; - protected int textureWidth = 0; - protected int textureHeight = 0; - - // In order to manipulate the camera preview data and render the modified one - // to the screen, three textures are created and the data flow is shown as following: - // previewdata->camTextureId->fboTexureId->drawTexureId->framebuffer - protected int[] fbo = {0}; - protected int[] camTextureId = {0}; - protected int[] fboTexureId = {0}; - protected int[] drawTexureId = {0}; - - private final String vss = "" - + "attribute vec2 vPosition;\n" - + "attribute vec2 vTexCoord;\n" + "varying vec2 texCoord;\n" - + "void main() {\n" + " texCoord = vTexCoord;\n" - + " gl_Position = vec4 (vPosition.x, vPosition.y, 0.0, 1.0);\n" - + "}"; - - private final String fssCam2FBO = "" - + "#extension GL_OES_EGL_image_external : require\n" - + "precision mediump float;\n" - + "uniform samplerExternalOES sTexture;\n" - + "varying vec2 texCoord;\n" - + "void main() {\n" - + " gl_FragColor = texture2D(sTexture,texCoord);\n" + "}"; - - private final String fssTex2Screen = "" - + "precision mediump float;\n" - + "uniform sampler2D sTexture;\n" - + "varying vec2 texCoord;\n" - + "void main() {\n" - + " gl_FragColor = texture2D(sTexture,texCoord);\n" + "}"; - - private final float vertexCoords[] = { - -1, -1, - -1, 1, - 1, -1, - 1, 1}; - private float textureCoords[] = { - 0, 1, - 0, 0, - 1, 1, - 1, 0}; - - private FloatBuffer vertexCoordsBuffer; - private FloatBuffer textureCoordsBuffer; - - private int progCam2FBO = -1; - private int progTex2Screen = -1; - private int vcCam2FBO; - private int tcCam2FBO; - private int vcTex2Screen; - private int tcTex2Screen; - - public interface OnTextureChangedListener { - public boolean onTextureChanged(int inTextureId, int outTextureId, int textureWidth, int textureHeight); - } - - private OnTextureChangedListener onTextureChangedListener = null; - - public void setOnTextureChangedListener(OnTextureChangedListener listener) { - onTextureChangedListener = listener; - } - - public CameraSurfaceView(Context ctx, AttributeSet attrs) { - super(ctx, attrs); - context = ctx; - setEGLContextClientVersion(2); - setRenderer(this); - setRenderMode(RENDERMODE_WHEN_DIRTY); - - // Find the total number of available cameras and the ID of the default camera - numberOfCameras = Camera.getNumberOfCameras(); - CameraInfo cameraInfo = new CameraInfo(); - for (int i = 0; i < numberOfCameras; i++) { - Camera.getCameraInfo(i, cameraInfo); - if (cameraInfo.facing == CameraInfo.CAMERA_FACING_BACK) { - selectedCameraId = i; - } - } - } - - @Override - public void onSurfaceCreated(GL10 gl, EGLConfig config) { - // Create OES texture for storing camera preview data(YUV format) - GLES20.glGenTextures(1, camTextureId, 0); - GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, camTextureId[0]); - GLES20.glTexParameteri(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE); - GLES20.glTexParameteri(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE); - GLES20.glTexParameteri(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_NEAREST); - GLES20.glTexParameteri(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_NEAREST); - surfaceTexture = new SurfaceTexture(camTextureId[0]); - surfaceTexture.setOnFrameAvailableListener(this); - - // Prepare vertex and texture coordinates - int bytes = vertexCoords.length * Float.SIZE / Byte.SIZE; - vertexCoordsBuffer = ByteBuffer.allocateDirect(bytes).order(ByteOrder.nativeOrder()).asFloatBuffer(); - textureCoordsBuffer = ByteBuffer.allocateDirect(bytes).order(ByteOrder.nativeOrder()).asFloatBuffer(); - vertexCoordsBuffer.put(vertexCoords).position(0); - textureCoordsBuffer.put(textureCoords).position(0); - - // Create vertex and fragment shaders - // camTextureId->fboTexureId - progCam2FBO = Utils.createShaderProgram(vss, fssCam2FBO); - vcCam2FBO = GLES20.glGetAttribLocation(progCam2FBO, "vPosition"); - tcCam2FBO = GLES20.glGetAttribLocation(progCam2FBO, "vTexCoord"); - GLES20.glEnableVertexAttribArray(vcCam2FBO); - GLES20.glEnableVertexAttribArray(tcCam2FBO); - // fboTexureId/drawTexureId -> screen - progTex2Screen = Utils.createShaderProgram(vss, fssTex2Screen); - vcTex2Screen = GLES20.glGetAttribLocation(progTex2Screen, "vPosition"); - tcTex2Screen = GLES20.glGetAttribLocation(progTex2Screen, "vTexCoord"); - GLES20.glEnableVertexAttribArray(vcTex2Screen); - GLES20.glEnableVertexAttribArray(tcTex2Screen); - } - - @Override - public void onSurfaceChanged(GL10 gl, int width, int height) { - surfaceWidth = width; - surfaceHeight = height; - openCamera(); - } - - @Override - public void onDrawFrame(GL10 gl) { - if (surfaceTexture == null) return; - - GLES20.glClearColor(0.0f, 0.0f, 0.0f, 1.0f); - GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT | GLES20.GL_DEPTH_BUFFER_BIT); - surfaceTexture.updateTexImage(); - float matrix[] = new float[16]; - surfaceTexture.getTransformMatrix(matrix); - - // camTextureId->fboTexureId - GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, fbo[0]); - GLES20.glViewport(0, 0, textureWidth, textureHeight); - GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT); - GLES20.glUseProgram(progCam2FBO); - GLES20.glVertexAttribPointer(vcCam2FBO, 2, GLES20.GL_FLOAT, false, 4 * 2, vertexCoordsBuffer); - textureCoordsBuffer.clear(); - textureCoordsBuffer.put(transformTextureCoordinates(textureCoords, matrix)); - textureCoordsBuffer.position(0); - GLES20.glVertexAttribPointer(tcCam2FBO, 2, GLES20.GL_FLOAT, false, 4 * 2, textureCoordsBuffer); - GLES20.glActiveTexture(GLES20.GL_TEXTURE0); - GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, camTextureId[0]); - GLES20.glUniform1i(GLES20.glGetUniformLocation(progCam2FBO, "sTexture"), 0); - GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4); - GLES20.glFlush(); - - // Check if the draw texture is set - int targetTexureId = fboTexureId[0]; - if (onTextureChangedListener != null) { - boolean modified = onTextureChangedListener.onTextureChanged(fboTexureId[0], drawTexureId[0], - textureWidth, textureHeight); - if (modified) { - targetTexureId = drawTexureId[0]; - } - } - - // fboTexureId/drawTexureId->Screen - GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, 0); - GLES20.glViewport(0, 0, surfaceWidth, surfaceHeight); - GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT); - GLES20.glUseProgram(progTex2Screen); - GLES20.glVertexAttribPointer(vcTex2Screen, 2, GLES20.GL_FLOAT, false, 4 * 2, vertexCoordsBuffer); - textureCoordsBuffer.clear(); - textureCoordsBuffer.put(textureCoords); - textureCoordsBuffer.position(0); - GLES20.glVertexAttribPointer(tcTex2Screen, 2, GLES20.GL_FLOAT, false, 4 * 2, textureCoordsBuffer); - GLES20.glActiveTexture(GLES20.GL_TEXTURE0); - GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, targetTexureId); - GLES20.glUniform1i(GLES20.glGetUniformLocation(progTex2Screen, "sTexture"), 0); - GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4); - GLES20.glFlush(); - } - - private float[] transformTextureCoordinates(float[] coords, float[] matrix) { - float[] result = new float[coords.length]; - float[] vt = new float[4]; - for (int i = 0; i < coords.length; i += 2) { - float[] v = {coords[i], coords[i + 1], 0, 1}; - Matrix.multiplyMV(vt, 0, matrix, 0, v, 0); - result[i] = vt[0]; - result[i + 1] = vt[1]; - } - return result; - } - - @Override - public void onResume() { - super.onResume(); - } - - @Override - public void onPause() { - super.onPause(); - releaseCamera(); - } - - @Override - public void onFrameAvailable(SurfaceTexture surfaceTexture) { - requestRender(); - } - - public void disableCamera() { - disableCamera = true; - } - - public void switchCamera() { - releaseCamera(); - selectedCameraId = (selectedCameraId + 1) % numberOfCameras; - openCamera(); - } - - public void openCamera() { - if (disableCamera) return; - camera = Camera.open(selectedCameraId); - List supportedPreviewSizes = camera.getParameters().getSupportedPreviewSizes(); - Size previewSize = Utils.getOptimalPreviewSize(supportedPreviewSizes, EXPECTED_PREVIEW_WIDTH, - EXPECTED_PREVIEW_HEIGHT); - Camera.Parameters parameters = camera.getParameters(); - parameters.setPreviewSize(previewSize.width, previewSize.height); - if (parameters.getSupportedFocusModes().contains(Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO)) { - parameters.setFocusMode(Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO); - } - camera.setParameters(parameters); - int degree = Utils.getCameraDisplayOrientation(context, selectedCameraId); - camera.setDisplayOrientation(degree); - boolean rotate = degree == 90 || degree == 270; - textureWidth = rotate ? previewSize.height : previewSize.width; - textureHeight = rotate ? previewSize.width : previewSize.height; - // Destroy FBO and draw textures - GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, 0); - GLES20.glDeleteFramebuffers(1, fbo, 0); - GLES20.glDeleteTextures(1, drawTexureId, 0); - GLES20.glDeleteTextures(1, fboTexureId, 0); - // Normal texture for storing modified camera preview data(RGBA format) - GLES20.glGenTextures(1, drawTexureId, 0); - GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, drawTexureId[0]); - GLES20.glTexImage2D(GLES20.GL_TEXTURE_2D, 0, GLES20.GL_RGBA, textureWidth, textureHeight, 0, - GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE, null); - GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE); - GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE); - GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_NEAREST); - GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_NEAREST); - // FBO texture for storing camera preview data(RGBA format) - GLES20.glGenTextures(1, fboTexureId, 0); - GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, fboTexureId[0]); - GLES20.glTexImage2D(GLES20.GL_TEXTURE_2D, 0, GLES20.GL_RGBA, textureWidth, textureHeight, 0, - GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE, null); - GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE); - GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE); - GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_NEAREST); - GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_NEAREST); - // Generate FBO and bind to FBO texture - GLES20.glGenFramebuffers(1, fbo, 0); - GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, fbo[0]); - GLES20.glFramebufferTexture2D(GLES20.GL_FRAMEBUFFER, GLES20.GL_COLOR_ATTACHMENT0, GLES20.GL_TEXTURE_2D, - fboTexureId[0], 0); - try { - camera.setPreviewTexture(surfaceTexture); - } catch (IOException exception) { - Log.e(TAG, "IOException caused by setPreviewDisplay()", exception); - } - camera.startPreview(); - } - - public void releaseCamera() { - if (camera != null) { - camera.setPreviewCallback(null); - camera.stopPreview(); - camera.release(); - camera = null; - } - } -} diff --git a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/common/Utils.java b/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/common/Utils.java deleted file mode 100644 index eb7bb743b5a1857d15d9db3f397932d381cc1909..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/common/Utils.java +++ /dev/null @@ -1,237 +0,0 @@ -package com.baidu.paddledetection.common; - -import android.content.Context; -import android.content.res.Resources; -import android.hardware.Camera; -import android.opengl.GLES20; -import android.os.Environment; -import android.util.Log; -import android.view.Surface; -import android.view.WindowManager; - -import java.io.*; -import java.util.List; - -public class Utils { - private static final String TAG = Utils.class.getSimpleName(); - - public static void RecursiveCreateDirectories(String fileDir) { - String[] fileDirs = fileDir.split("\\/"); - String topPath = ""; - for (int i = 0; i < fileDirs.length; i++) { - topPath += "/" + fileDirs[i]; - File file = new File(topPath); - if (file.exists()) { - continue; - } else { - file.mkdir(); - } - } - } - - public static void copyFileFromAssets(Context appCtx, String srcPath, String dstPath) { - if (srcPath.isEmpty() || dstPath.isEmpty()) { - return; - } - String dstDir = dstPath.substring(0, dstPath.lastIndexOf('/')); - if (dstDir.length() > 0) { - RecursiveCreateDirectories(dstDir); - } - InputStream is = null; - OutputStream os = null; - try { - is = new BufferedInputStream(appCtx.getAssets().open(srcPath)); - os = new BufferedOutputStream(new FileOutputStream(new File(dstPath))); - byte[] buffer = new byte[1024]; - int length = 0; - while ((length = is.read(buffer)) != -1) { - os.write(buffer, 0, length); - } - } catch (FileNotFoundException e) { - e.printStackTrace(); - } catch (IOException e) { - e.printStackTrace(); - } finally { - try { - os.close(); - is.close(); - } catch (IOException e) { - e.printStackTrace(); - } - } - } - - public static void copyDirectoryFromAssets(Context appCtx, String srcDir, String dstDir) { - if (srcDir.isEmpty() || dstDir.isEmpty()) { - return; - } - try { - if (!new File(dstDir).exists()) { - new File(dstDir).mkdirs(); - } - for (String fileName : appCtx.getAssets().list(srcDir)) { - String srcSubPath = srcDir + File.separator + fileName; - String dstSubPath = dstDir + File.separator + fileName; - if (new File(srcSubPath).isDirectory()) { - copyDirectoryFromAssets(appCtx, srcSubPath, dstSubPath); - } else { - copyFileFromAssets(appCtx, srcSubPath, dstSubPath); - } - } - } catch (Exception e) { - e.printStackTrace(); - } - } - - public static float[] parseFloatsFromString(String string, String delimiter) { - String[] pieces = string.trim().toLowerCase().split(delimiter); - float[] floats = new float[pieces.length]; - for (int i = 0; i < pieces.length; i++) { - floats[i] = Float.parseFloat(pieces[i].trim()); - } - return floats; - } - - public static long[] parseLongsFromString(String string, String delimiter) { - String[] pieces = string.trim().toLowerCase().split(delimiter); - long[] longs = new long[pieces.length]; - for (int i = 0; i < pieces.length; i++) { - longs[i] = Long.parseLong(pieces[i].trim()); - } - return longs; - } - - public static String getSDCardDirectory() { - return Environment.getExternalStorageDirectory().getAbsolutePath(); - } - - public static String getDCIMDirectory() { - return Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DCIM).getAbsolutePath(); - } - - public static Camera.Size getOptimalPreviewSize(List sizes, int w, int h) { - final double ASPECT_TOLERANCE = 0.1; - double targetRatio = (double) w / h; - if (sizes == null) return null; - - Camera.Size optimalSize = null; - double minDiff = Double.MAX_VALUE; - - int targetHeight = h; - - // Try to find an size match aspect ratio and size - for (Camera.Size size : sizes) { - double ratio = (double) size.width / size.height; - if (Math.abs(ratio - targetRatio) > ASPECT_TOLERANCE) continue; - if (Math.abs(size.height - targetHeight) < minDiff) { - optimalSize = size; - minDiff = Math.abs(size.height - targetHeight); - } - } - - // Cannot find the one match the aspect ratio, ignore the requirement - if (optimalSize == null) { - minDiff = Double.MAX_VALUE; - for (Camera.Size size : sizes) { - if (Math.abs(size.height - targetHeight) < minDiff) { - optimalSize = size; - minDiff = Math.abs(size.height - targetHeight); - } - } - } - return optimalSize; - } - - public static int getScreenWidth() { - return Resources.getSystem().getDisplayMetrics().widthPixels; - } - - public static int getScreenHeight() { - return Resources.getSystem().getDisplayMetrics().heightPixels; - } - - public static int getCameraDisplayOrientation(Context context, int cameraId) { - android.hardware.Camera.CameraInfo info = new android.hardware.Camera.CameraInfo(); - android.hardware.Camera.getCameraInfo(cameraId, info); - WindowManager wm = (WindowManager) context.getSystemService(Context.WINDOW_SERVICE); - int rotation = wm.getDefaultDisplay().getRotation(); - int degrees = 0; - switch (rotation) { - case Surface.ROTATION_0: - degrees = 0; - break; - case Surface.ROTATION_90: - degrees = 90; - break; - case Surface.ROTATION_180: - degrees = 180; - break; - case Surface.ROTATION_270: - degrees = 270; - break; - } - int result; - if (info.facing == Camera.CameraInfo.CAMERA_FACING_FRONT) { - result = (info.orientation + degrees) % 360; - result = (360 - result) % 360; // compensate the mirror - } else { - // back-facing - result = (info.orientation - degrees + 360) % 360; - } - return result; - } - - public static int createShaderProgram(String vss, String fss) { - int vshader = GLES20.glCreateShader(GLES20.GL_VERTEX_SHADER); - GLES20.glShaderSource(vshader, vss); - GLES20.glCompileShader(vshader); - int[] status = new int[1]; - GLES20.glGetShaderiv(vshader, GLES20.GL_COMPILE_STATUS, status, 0); - if (status[0] == 0) { - Log.e(TAG, GLES20.glGetShaderInfoLog(vshader)); - GLES20.glDeleteShader(vshader); - vshader = 0; - return 0; - } - - int fshader = GLES20.glCreateShader(GLES20.GL_FRAGMENT_SHADER); - GLES20.glShaderSource(fshader, fss); - GLES20.glCompileShader(fshader); - GLES20.glGetShaderiv(fshader, GLES20.GL_COMPILE_STATUS, status, 0); - if (status[0] == 0) { - Log.e(TAG, GLES20.glGetShaderInfoLog(fshader)); - GLES20.glDeleteShader(vshader); - GLES20.glDeleteShader(fshader); - fshader = 0; - return 0; - } - - int program = GLES20.glCreateProgram(); - GLES20.glAttachShader(program, vshader); - GLES20.glAttachShader(program, fshader); - GLES20.glLinkProgram(program); - GLES20.glDeleteShader(vshader); - GLES20.glDeleteShader(fshader); - GLES20.glGetProgramiv(program, GLES20.GL_LINK_STATUS, status, 0); - if (status[0] == 0) { - Log.e(TAG, GLES20.glGetProgramInfoLog(program)); - program = 0; - return 0; - } - GLES20.glValidateProgram(program); - GLES20.glGetProgramiv(program, GLES20.GL_VALIDATE_STATUS, status, 0); - if (status[0] == 0) { - Log.e(TAG, GLES20.glGetProgramInfoLog(program)); - GLES20.glDeleteProgram(program); - program = 0; - return 0; - } - - return program; - } - - public static boolean isSupportedNPU() { - String hardware = android.os.Build.HARDWARE; - return hardware.equalsIgnoreCase("kirin810") || hardware.equalsIgnoreCase("kirin990"); - } -} diff --git a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/CameraFragment.java b/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/CameraFragment.java deleted file mode 100644 index 238e5378c8216a581e136421820f56f41dc82a08..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/CameraFragment.java +++ /dev/null @@ -1,206 +0,0 @@ -package com.baidu.paddledetection.detection; - -import android.Manifest; -import android.app.Activity; -import android.app.AlertDialog; -import android.content.DialogInterface; -import android.content.Intent; -import android.content.SharedPreferences; -import android.content.pm.PackageManager; -import android.os.Bundle; -import android.preference.PreferenceManager; -import android.view.LayoutInflater; -import android.view.View; -import android.view.ViewGroup; -import android.view.Window; -import android.view.WindowManager; -import android.widget.ImageButton; -import android.widget.TextView; -import android.widget.Toast; - -import androidx.annotation.NonNull; -import androidx.core.app.ActivityCompat; -import androidx.core.content.ContextCompat; -import androidx.fragment.app.Fragment; -import androidx.navigation.fragment.NavHostFragment; - -import com.baidu.paddledetection.common.CameraSurfaceView; -import com.baidu.paddledetection.common.Utils; - -import java.io.File; -import java.text.SimpleDateFormat; -import java.util.Date; - -public class CameraFragment extends Fragment implements View.OnClickListener, CameraSurfaceView.OnTextureChangedListener { - private static final String TAG = CameraFragment.class.getSimpleName(); - - CameraSurfaceView svPreview; - TextView tvStatus; - ImageButton btnSwitch; - ImageButton btnShutter; - ImageButton btnSettings; - - String savedImagePath = ""; - int lastFrameIndex = 0; - long lastFrameTime; - - Native predictor = new Native(); - - @Override - public View onCreateView(LayoutInflater inflater, ViewGroup container, - Bundle savedInstanceState) { - // Inflate the layout for this fragment - return inflater.inflate(R.layout.fragment_camera, container, false); - } - - public void onViewCreated(@NonNull View view, Bundle savedInstanceState) { - super.onViewCreated(view, savedInstanceState); - // Clear all setting items to avoid app crashing due to the incorrect settings - initSettings(); - // Init the camera preview and UI components - initView(view); - // Check and request CAMERA and WRITE_EXTERNAL_STORAGE permissions - if (!checkAllPermissions()) { - requestAllPermissions(); - } - } - - @Override - public void onClick(View v) { - switch (v.getId()) { - case R.id.btn_switch: - svPreview.switchCamera(); - break; - case R.id.btn_shutter: - SimpleDateFormat date = new SimpleDateFormat("yyyy_MM_dd_HH_mm_ss"); - synchronized (this) { - savedImagePath = Utils.getDCIMDirectory() + File.separator + date.format(new Date()).toString() + ".png"; - } - Toast.makeText(getActivity(), "Save snapshot to " + savedImagePath, Toast.LENGTH_SHORT).show(); - break; - case R.id.btn_settings: - startActivity(new Intent(getActivity(), SettingsActivity.class)); - break; - } - } - - @Override - public boolean onTextureChanged(int inTextureId, int outTextureId, int textureWidth, int textureHeight) { - String savedImagePath = ""; - synchronized (this) { - savedImagePath = CameraFragment.this.savedImagePath; - } - boolean modified = predictor.process(inTextureId, outTextureId, textureWidth, textureHeight, savedImagePath); - if (!savedImagePath.isEmpty()) { - synchronized (this) { - CameraFragment.this.savedImagePath = ""; - } - } - lastFrameIndex++; - if (lastFrameIndex >= 30) { - final int fps = (int) (lastFrameIndex * 1e9 / (System.nanoTime() - lastFrameTime)); - getActivity().runOnUiThread(new Runnable() { - public void run() { - tvStatus.setText(Integer.toString(fps) + "fps"); - } - }); - lastFrameIndex = 0; - lastFrameTime = System.nanoTime(); - } - return modified; - } - - @Override - public void onResume() { - super.onResume(); - // Reload settings and re-initialize the predictor - checkAndUpdateSettings(); - // Open camera until the permissions have been granted - if (!checkAllPermissions()) { - svPreview.disableCamera(); - } - svPreview.onResume(); - } - - @Override - public void onPause() { - super.onPause(); - svPreview.onPause(); - } - - @Override - public void onDestroy() { - if (predictor != null) { - predictor.release(); - } - super.onDestroy(); - } - - public void initView(@NonNull View view) { - svPreview = (CameraSurfaceView) view.findViewById(R.id.sv_preview); - svPreview.setOnTextureChangedListener(this); - tvStatus = (TextView) view.findViewById(R.id.tv_status); - btnSwitch = (ImageButton) view.findViewById(R.id.btn_switch); - btnSwitch.setOnClickListener(this); - btnShutter = (ImageButton) view.findViewById(R.id.btn_shutter); - btnShutter.setOnClickListener(this); - btnSettings = (ImageButton) view.findViewById(R.id.btn_settings); - btnSettings.setOnClickListener(this); - } - - public void initSettings() { - SharedPreferences sharedPreferences = PreferenceManager.getDefaultSharedPreferences(getActivity()); - SharedPreferences.Editor editor = sharedPreferences.edit(); - editor.clear(); - editor.commit(); - SettingsActivity.resetSettings(); - } - - public void checkAndUpdateSettings() { - if (SettingsActivity.checkAndUpdateSettings(getActivity())) { - String realModelDir = getActivity().getCacheDir() + "/" + SettingsActivity.modelDir; - Utils.copyDirectoryFromAssets(getActivity(), SettingsActivity.modelDir, realModelDir); - String realLabelPath = getActivity().getCacheDir() + "/" + SettingsActivity.labelPath; - Utils.copyFileFromAssets(getActivity(), SettingsActivity.labelPath, realLabelPath); - predictor.init( - realModelDir, - realLabelPath, - SettingsActivity.cpuThreadNum, - SettingsActivity.cpuPowerMode, - SettingsActivity.inputWidth, - SettingsActivity.inputHeight, - SettingsActivity.inputMean, - SettingsActivity.inputStd, - SettingsActivity.scoreThreshold); - } - } - - @Override - public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions, - @NonNull int[] grantResults) { - super.onRequestPermissionsResult(requestCode, permissions, grantResults); - if (grantResults[0] != PackageManager.PERMISSION_GRANTED || grantResults[1] != PackageManager.PERMISSION_GRANTED) { - new AlertDialog.Builder(getActivity()) - .setTitle("Permission denied") - .setMessage("Click to force quit the app, then open Settings->Apps & notifications->Target " + - "App->Permissions to grant all of the permissions.") - .setCancelable(false) - .setPositiveButton("Exit", new DialogInterface.OnClickListener() { - @Override - public void onClick(DialogInterface dialog, int which) { - getActivity().finish(); - } - }).show(); - } - } - - private void requestAllPermissions() { - ActivityCompat.requestPermissions(getActivity(), new String[]{Manifest.permission.WRITE_EXTERNAL_STORAGE, - Manifest.permission.CAMERA}, 0); - } - - private boolean checkAllPermissions() { - return ContextCompat.checkSelfPermission(getActivity(), Manifest.permission.WRITE_EXTERNAL_STORAGE) == PackageManager.PERMISSION_GRANTED - && ContextCompat.checkSelfPermission(getActivity(), Manifest.permission.CAMERA) == PackageManager.PERMISSION_GRANTED; - } -} diff --git a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/ContentFragment.java b/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/ContentFragment.java deleted file mode 100644 index ed75d168685f47e5fba5ca2ca26d3c38e4d7af70..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/ContentFragment.java +++ /dev/null @@ -1,43 +0,0 @@ -package com.baidu.paddledetection.detection; - -import android.os.Bundle; -import android.view.LayoutInflater; -import android.view.View; -import android.view.ViewGroup; - -import androidx.annotation.NonNull; -import androidx.fragment.app.Fragment; -import androidx.navigation.fragment.NavHostFragment; - -public class ContentFragment extends Fragment { - - @Override - public View onCreateView( - LayoutInflater inflater, ViewGroup container, - Bundle savedInstanceState - ) { - // Inflate the layout for this fragment - return inflater.inflate(R.layout.fragment_content, container, false); - } - - public void onViewCreated(@NonNull View view, Bundle savedInstanceState) { - super.onViewCreated(view, savedInstanceState); - - view.findViewById(R.id.camera).setOnClickListener(new View.OnClickListener() { - @Override - public void onClick(View view) { - NavHostFragment.findNavController(ContentFragment.this) - .navigate(R.id.action_ContentFragment_to_CameraFragment); - - } - }); - - view.findViewById(R.id.photo).setOnClickListener(new View.OnClickListener() { - @Override - public void onClick(View view) { - NavHostFragment.findNavController(ContentFragment.this) - .navigate(R.id.action_ContentFragment_to_PhotoFragment); - } - }); - } -} \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/MainActivity.java b/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/MainActivity.java deleted file mode 100644 index 5100c87cb0a68ef0871af1970444c19cf6f1e0af..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/MainActivity.java +++ /dev/null @@ -1,50 +0,0 @@ -package com.baidu.paddledetection.detection; - -import android.os.Bundle; - -import com.google.android.material.floatingactionbutton.FloatingActionButton; -import com.google.android.material.navigation.NavigationView; -import com.google.android.material.snackbar.Snackbar; - -import androidx.appcompat.app.AppCompatActivity; -import androidx.appcompat.widget.Toolbar; - -import android.view.View; - -import android.view.Menu; -import android.view.MenuItem; - -public class MainActivity extends AppCompatActivity { - private NavigationView navigationView; - - @Override - protected void onCreate(Bundle savedInstanceState) { - super.onCreate(savedInstanceState); - setContentView(R.layout.activity_main); - Toolbar toolbar = findViewById(R.id.toolbar); - setSupportActionBar(toolbar); - - } - - @Override - public boolean onCreateOptionsMenu(Menu menu) { - // Inflate the menu; this adds items to the action bar if it is present. - getMenuInflater().inflate(R.menu.menu_main, menu); - return true; - } - - @Override - public boolean onOptionsItemSelected(MenuItem item) { - // Handle action bar item clicks here. The action bar will - // automatically handle clicks on the Home/Up button, so long - // as you specify a parent activity in AndroidManifest.xml. - int id = item.getItemId(); - - //noinspection SimplifiableIfStatement - if (id == R.id.action_settings) { - return true; - } - - return super.onOptionsItemSelected(item); - } -} \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/Native.java b/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/Native.java deleted file mode 100644 index d55be1034358ddb2357b7dfa8c38a285beb159af..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/Native.java +++ /dev/null @@ -1,59 +0,0 @@ -package com.baidu.paddledetection.detection; - -public class Native { - static { - System.loadLibrary("Native"); - } - - private long ctx = 0; - - public boolean init(String modelDir, - String labelPath, - int cpuThreadNum, - String cpuPowerMode, - int inputWidth, - int inputHeight, - float[] inputMean, - float[] inputStd, - float scoreThreshold) { - ctx = nativeInit( - modelDir, - labelPath, - cpuThreadNum, - cpuPowerMode, - inputWidth, - inputHeight, - inputMean, - inputStd, - scoreThreshold); - return ctx == 0; - } - - public boolean release() { - if (ctx == 0) { - return false; - } - return nativeRelease(ctx); - } - - public boolean process(int inTextureId, int outTextureId, int textureWidth, int textureHeight, String savedImagePath) { - if (ctx == 0) { - return false; - } - return nativeProcess(ctx, inTextureId, outTextureId, textureWidth, textureHeight, savedImagePath); - } - - public static native long nativeInit(String modelDir, - String labelPath, - int cpuThreadNum, - String cpuPowerMode, - int inputWidth, - int inputHeight, - float[] inputMean, - float[] inputStd, - float scoreThreshold); - - public static native boolean nativeRelease(long ctx); - - public static native boolean nativeProcess(long ctx, int inTextureId, int outTextureId, int textureWidth, int textureHeight, String savedImagePath); -} diff --git a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/PhotoFragment.java b/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/PhotoFragment.java deleted file mode 100644 index f37d13f27a778cfe984099a065ee0604b6483918..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/PhotoFragment.java +++ /dev/null @@ -1,374 +0,0 @@ -package com.baidu.paddledetection.detection; - -import android.app.ProgressDialog; -import android.content.ContentResolver; -import android.content.Intent; -import android.content.SharedPreferences; -import android.database.Cursor; -import android.graphics.Bitmap; -import android.graphics.BitmapFactory; -import android.net.Uri; -import android.os.Bundle; -import android.os.Handler; -import android.os.HandlerThread; -import android.os.Message; -import android.preference.PreferenceManager; -import android.provider.MediaStore; -import android.text.method.ScrollingMovementMethod; -import android.util.Log; -import android.view.LayoutInflater; -import android.view.View; -import android.view.ViewGroup; -import android.widget.ImageView; -import android.widget.TextView; -import android.widget.Toast; - -import androidx.annotation.NonNull; -import androidx.fragment.app.Fragment; - -import com.baidu.paddledetection.common.Utils; -import com.google.android.material.floatingactionbutton.FloatingActionButton; - -import java.io.File; -import java.io.IOException; -import java.io.InputStream; - -import static android.app.Activity.RESULT_OK; - -public class PhotoFragment extends Fragment implements View.OnClickListener{ - private static final String TAG = PhotoFragment.class.getSimpleName(); - public static final int OPEN_GALLERY_REQUEST_CODE = 0; - public static final int TAKE_PHOTO_REQUEST_CODE = 1; - - public static final int REQUEST_LOAD_MODEL = 0; - public static final int REQUEST_RUN_MODEL = 1; - public static final int RESPONSE_LOAD_MODEL_SUCCESSED = 0; - public static final int RESPONSE_LOAD_MODEL_FAILED = 1; - public static final int RESPONSE_RUN_MODEL_SUCCESSED = 2; - public static final int RESPONSE_RUN_MODEL_FAILED = 3; - - protected ProgressDialog pbLoadModel = null; - protected ProgressDialog pbRunModel = null; - - public Handler receiver = null; // Receive messages from worker thread - protected Handler sender = null; // Send command to worker thread - protected HandlerThread worker = null; // Worker thread to load&run model - - // UI components of object detection - protected TextView tvInputSetting; - protected ImageView ivInputImage; - protected TextView tvOutputResult; - protected TextView tvInferenceTime; - - // Model settings of object detection - protected String modelPath = ""; - protected String labelPath = ""; - protected String imagePath = ""; - protected int cpuThreadNum = 1; - protected String cpuPowerMode = ""; - protected String inputColorFormat = ""; - protected long[] inputShape = new long[]{}; - protected float[] inputMean = new float[]{}; - protected float[] inputStd = new float[]{}; - protected float scoreThreshold = 0.5f; - - protected Predictor predictor = new Predictor(); - - @Override - public View onCreateView( - LayoutInflater inflater, ViewGroup container, - Bundle savedInstanceState - ) { - // Inflate the layout for this fragment - return inflater.inflate(R.layout.fragment_photo, container, false); - } - - public void onViewCreated(@NonNull View view, Bundle savedInstanceState) { - super.onViewCreated(view, savedInstanceState); - // Prepare the worker thread for mode loading and inference - receiver = new Handler() { - @Override - public void handleMessage(Message msg) { - switch (msg.what) { - case RESPONSE_LOAD_MODEL_SUCCESSED: - pbLoadModel.dismiss(); - onLoadModelSuccessed(); - break; - case RESPONSE_LOAD_MODEL_FAILED: - pbLoadModel.dismiss(); - Toast.makeText(getActivity(), "Load model failed!", Toast.LENGTH_SHORT).show(); - onLoadModelFailed(); - break; - case RESPONSE_RUN_MODEL_SUCCESSED: - pbRunModel.dismiss(); - onRunModelSuccessed(); - break; - case RESPONSE_RUN_MODEL_FAILED: - pbRunModel.dismiss(); - Toast.makeText(getActivity(), "Run model failed!", Toast.LENGTH_SHORT).show(); - onRunModelFailed(); - break; - default: - break; - } - } - }; - - worker = new HandlerThread("Predictor Worker"); - worker.start(); - sender = new Handler(worker.getLooper()) { - public void handleMessage(Message msg) { - switch (msg.what) { - case REQUEST_LOAD_MODEL: - // Load model and reload test image - if (onLoadModel()) { - receiver.sendEmptyMessage(RESPONSE_LOAD_MODEL_SUCCESSED); - } else { - receiver.sendEmptyMessage(RESPONSE_LOAD_MODEL_FAILED); - } - break; - case REQUEST_RUN_MODEL: - // Run model if model is loaded - if (onRunModel()) { - receiver.sendEmptyMessage(RESPONSE_RUN_MODEL_SUCCESSED); - } else { - receiver.sendEmptyMessage(RESPONSE_RUN_MODEL_FAILED); - } - break; - default: - break; - } - } - }; - initView(view); - } - - - @Override - public void onClick(View v) { - switch (v.getId()) { - case R.id.iv_input_image: - - break; - } - } - - public void initView(@NonNull View view) { - // Setup the UI components - tvInputSetting = view.findViewById(R.id.tv_input_setting); - ivInputImage = view.findViewById(R.id.iv_input_image); - ivInputImage.setOnClickListener(this); - tvInferenceTime = view.findViewById(R.id.tv_inference_time); - tvOutputResult = view.findViewById(R.id.tv_output_result); - tvInputSetting.setMovementMethod(ScrollingMovementMethod.getInstance()); - tvOutputResult.setMovementMethod(ScrollingMovementMethod.getInstance()); - FloatingActionButton fab = view.findViewById(R.id.fab); - fab.setOnClickListener(new View.OnClickListener() { - @Override - public void onClick(View view) { - openGallery(); - // You can take photo - // takePhoto(); - } - }); - } - - - public void loadModel() { - pbLoadModel = ProgressDialog.show(getActivity(), "", "Loading model...", false, false); - sender.sendEmptyMessage(REQUEST_LOAD_MODEL); - } - - public void runModel() { - pbRunModel = ProgressDialog.show(getActivity(), "", "Running model...", false, false); - sender.sendEmptyMessage(REQUEST_RUN_MODEL); - Log.i(TAG, "开始运行模型5555555555555"); - } - - public boolean onLoadModel() { - return predictor.init(getActivity(), modelPath, labelPath, cpuThreadNum, - cpuPowerMode, - inputColorFormat, - inputShape, inputMean, - inputStd, scoreThreshold); - } - - public boolean onRunModel() { - return predictor.isLoaded() && predictor.runModel(); - } - - public void onLoadModelSuccessed() { - // Load test image from path and run model - try { - if (imagePath.isEmpty()) { - return; - } - Bitmap image = null; - // Read test image file from custom path if the first character of mode path is '/', otherwise read test - // image file from assets - if (!imagePath.substring(0, 1).equals("/")) { - InputStream imageStream = getActivity().getAssets().open(imagePath); - image = BitmapFactory.decodeStream(imageStream); - } else { - if (!new File(imagePath).exists()) { - return; - } - image = BitmapFactory.decodeFile(imagePath); - } - if (image != null && predictor.isLoaded()) { - predictor.setInputImage(image); - runModel(); - } - } catch (IOException e) { - Toast.makeText(getActivity(), "Load image failed!", Toast.LENGTH_SHORT).show(); - e.printStackTrace(); - } - } - - public void onLoadModelFailed() { - } - - public void onRunModelSuccessed() { - // Obtain results and update UI - tvInferenceTime.setText("Inference time: " + predictor.inferenceTime() + " ms"); - Bitmap outputImage = predictor.outputImage(); - if (outputImage != null) { - ivInputImage.setImageBitmap(outputImage); - } - tvOutputResult.setText(predictor.outputResult()); - tvOutputResult.scrollTo(0, 0); - } - - public void onRunModelFailed() {} - - public void onImageChanged(Bitmap image) { - // Rerun model if users pick test image from gallery or camera - if (image != null && predictor.isLoaded()) { - predictor.setInputImage(image); - runModel(); - } - } - - @Override - public void onActivityResult(int requestCode, int resultCode, Intent data) { - super.onActivityResult(requestCode, resultCode, data); - if (resultCode == RESULT_OK && data != null) { - switch (requestCode) { - case OPEN_GALLERY_REQUEST_CODE: - try { - ContentResolver resolver = getActivity().getContentResolver(); - Uri uri = data.getData(); - Bitmap image = MediaStore.Images.Media.getBitmap(resolver, uri); - String[] proj = {MediaStore.Images.Media.DATA}; - Cursor cursor = getActivity().managedQuery(uri, proj, null, null, null); - cursor.moveToFirst(); - onImageChanged(image); - } catch (IOException e) { - Log.e(TAG, e.toString()); - } - break; - case TAKE_PHOTO_REQUEST_CODE: - Bundle extras = data.getExtras(); - Bitmap image = (Bitmap) extras.get("data"); - onImageChanged(image); - break; - default: - break; - } - } - } - - private void openGallery() { - Intent intent = new Intent(Intent.ACTION_PICK, null); - intent.setDataAndType(MediaStore.Images.Media.EXTERNAL_CONTENT_URI, "image/*"); - startActivityForResult(intent, OPEN_GALLERY_REQUEST_CODE); - } - - private void takePhoto() { - Intent takePhotoIntent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE); - if (takePhotoIntent.resolveActivity(getActivity().getPackageManager()) != null) { - startActivityForResult(takePhotoIntent, TAKE_PHOTO_REQUEST_CODE); - } - } - - @Override - public void onResume() { - super.onResume(); - SharedPreferences sharedPreferences = PreferenceManager.getDefaultSharedPreferences(getActivity()); - boolean settingsChanged = false; - String model_path = sharedPreferences.getString("NULL", - getString(R.string.MODEL_PATH_DEFAULT)); - String label_path = sharedPreferences.getString(getString(R.string.LABEL_PATH_KEY), - getString(R.string.SSD_LABEL_PATH_DEFAULT)); - String image_path = sharedPreferences.getString("NULL", - getString(R.string.IMAGE_PATH_DEFAULT)); - settingsChanged |= !model_path.equalsIgnoreCase(modelPath); - settingsChanged |= !label_path.equalsIgnoreCase(labelPath); - settingsChanged |= !image_path.equalsIgnoreCase(imagePath); - int cpu_thread_num = Integer.parseInt(sharedPreferences.getString(getString(R.string.CPU_THREAD_NUM_KEY), - getString(R.string.CPU_THREAD_NUM_DEFAULT))); - settingsChanged |= cpu_thread_num != cpuThreadNum; - String cpu_power_mode = - sharedPreferences.getString(getString(R.string.CPU_POWER_MODE_KEY), - getString(R.string.CPU_POWER_MODE_DEFAULT)); - settingsChanged |= !cpu_power_mode.equalsIgnoreCase(cpuPowerMode); - String input_color_format = - sharedPreferences.getString("NULL", - getString(R.string.INPUT_COLOR_FORMAT_DEFAULT)); - settingsChanged |= !input_color_format.equalsIgnoreCase(inputColorFormat); - long[] input_shape = - Utils.parseLongsFromString(sharedPreferences.getString("NULL", - getString(R.string.INPUT_SHAPE_DEFAULT)), ","); - float[] input_mean = - Utils.parseFloatsFromString(sharedPreferences.getString(getString(R.string.INPUT_MEAN_KEY), - getString(R.string.INPUT_MEAN_DEFAULT)), ","); - float[] input_std = - Utils.parseFloatsFromString(sharedPreferences.getString(getString(R.string.INPUT_STD_KEY) - , getString(R.string.INPUT_STD_DEFAULT)), ","); - settingsChanged |= input_shape.length != inputShape.length; - settingsChanged |= input_mean.length != inputMean.length; - settingsChanged |= input_std.length != inputStd.length; - if (!settingsChanged) { - for (int i = 0; i < input_shape.length; i++) { - settingsChanged |= input_shape[i] != inputShape[i]; - } - for (int i = 0; i < input_mean.length; i++) { - settingsChanged |= input_mean[i] != inputMean[i]; - } - for (int i = 0; i < input_std.length; i++) { - settingsChanged |= input_std[i] != inputStd[i]; - } - } - float score_threshold = - Float.parseFloat(sharedPreferences.getString(getString(R.string.SCORE_THRESHOLD_KEY), - getString(R.string.SSD_SCORE_THRESHOLD_DEFAULT))); - settingsChanged |= scoreThreshold != score_threshold; - if (settingsChanged) { - modelPath = model_path; - labelPath = label_path; - imagePath = image_path; - cpuThreadNum = cpu_thread_num; - cpuPowerMode = cpu_power_mode; - inputColorFormat = input_color_format; - inputShape = input_shape; - inputMean = input_mean; - inputStd = input_std; - scoreThreshold = score_threshold; - // Update UI - tvInputSetting.setText("Model: " + modelPath.substring(modelPath.lastIndexOf("/") + 1) + "\n" + "CPU" + - " Thread Num: " + Integer.toString(cpuThreadNum) + "\n" + "CPU Power Mode: " + cpuPowerMode); - tvInputSetting.scrollTo(0, 0); - // Reload model if configure has been changed - loadModel(); - } - } - - @Override - public void onDestroy() { - if (predictor != null) { - predictor.releaseModel(); - } - worker.quit(); - super.onDestroy(); - } -} \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/Predictor.java b/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/Predictor.java deleted file mode 100644 index 07c1bf059fd1a4137c295205a4523ecbf8007d11..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/Predictor.java +++ /dev/null @@ -1,369 +0,0 @@ -package com.baidu.paddledetection.detection; - -import android.content.Context; -import android.graphics.Bitmap; -import android.graphics.Canvas; -import android.graphics.Paint; -import android.util.Log; -import com.baidu.paddle.lite.MobileConfig; -import com.baidu.paddle.lite.PaddlePredictor; -import com.baidu.paddle.lite.PowerMode; -import com.baidu.paddle.lite.Tensor; -import com.baidu.paddledetection.common.Utils; - -import java.io.File; -import java.io.InputStream; -import java.util.Date; -import java.util.Vector; - -import static android.graphics.Color.*; - -public class Predictor { - private static final String TAG = Predictor.class.getSimpleName(); - public boolean isLoaded = false; - public int warmupIterNum = 1; - public int inferIterNum = 1; - public int cpuThreadNum = 1; - public String cpuPowerMode = "LITE_POWER_HIGH"; - public String modelPath = ""; - public String modelName = ""; - protected PaddlePredictor paddlePredictor = null; - protected float inferenceTime = 0; - // Only for object detection - protected Vector wordLabels = new Vector(); - protected String inputColorFormat = "RGB"; - protected long[] inputShape = new long[]{1, 3, 320, 320}; - protected float[] inputMean = new float[]{0.485f, 0.456f, 0.406f}; - protected float[] inputStd = new float[]{0.229f, 0.224f, 0.225f}; - protected float scoreThreshold = 0.5f; - protected Bitmap inputImage = null; - protected Bitmap outputImage = null; - protected String outputResult = ""; - protected float preprocessTime = 0; - protected float postprocessTime = 0; - - public Predictor() { - } - - public boolean init(Context appCtx, String modelPath, String labelPath, int cpuThreadNum, String cpuPowerMode, - String inputColorFormat, - long[] inputShape, float[] inputMean, - float[] inputStd, float scoreThreshold) { - if (inputShape.length != 4) { - Log.i(TAG, "Size of input shape should be: 4"); - return false; - } - if (inputMean.length != inputShape[1]) { - Log.i(TAG, "Size of input mean should be: " + Long.toString(inputShape[1])); - return false; - } - if (inputStd.length != inputShape[1]) { - Log.i(TAG, "Size of input std should be: " + Long.toString(inputShape[1])); - return false; - } - if (inputShape[0] != 1) { - Log.i(TAG, "Only one batch is supported in the image classification demo, you can use any batch size in " + - "your Apps!"); - return false; - } - if (inputShape[1] != 1 && inputShape[1] != 3) { - Log.i(TAG, "Only one/three channels are supported in the image classification demo, you can use any " + - "channel size in your Apps!"); - return false; - } - if (!inputColorFormat.equalsIgnoreCase("RGB") && !inputColorFormat.equalsIgnoreCase("BGR")) { - Log.i(TAG, "Only RGB and BGR color format is supported."); - return false; - } - isLoaded = loadModel(appCtx, modelPath, cpuThreadNum, cpuPowerMode); - if (!isLoaded) { - return false; - } - isLoaded = loadLabel(appCtx, labelPath); - if (!isLoaded) { - return false; - } - this.inputColorFormat = inputColorFormat; - this.inputShape = inputShape; - this.inputMean = inputMean; - this.inputStd = inputStd; - this.scoreThreshold = scoreThreshold; - return true; - } - - protected boolean loadModel(Context appCtx, String modelPath, int cpuThreadNum, String cpuPowerMode) { - // Release model if exists - releaseModel(); - - // Load model - if (modelPath.isEmpty()) { - return false; - } - String realPath = modelPath; - if (!modelPath.substring(0, 1).equals("/")) { - // Read model files from custom path if the first character of mode path is '/' - // otherwise copy model to cache from assets - realPath = appCtx.getCacheDir() + "/" + modelPath; - Utils.copyDirectoryFromAssets(appCtx, modelPath, realPath); - } - if (realPath.isEmpty()) { - return false; - } - MobileConfig config = new MobileConfig(); - config.setModelFromFile(realPath + File.separator + "model.nb"); - config.setThreads(cpuThreadNum); - if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_HIGH")) { - config.setPowerMode(PowerMode.LITE_POWER_HIGH); - } else if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_LOW")) { - config.setPowerMode(PowerMode.LITE_POWER_LOW); - } else if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_FULL")) { - config.setPowerMode(PowerMode.LITE_POWER_FULL); - } else if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_NO_BIND")) { - config.setPowerMode(PowerMode.LITE_POWER_NO_BIND); - } else if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_RAND_HIGH")) { - config.setPowerMode(PowerMode.LITE_POWER_RAND_HIGH); - } else if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_RAND_LOW")) { - config.setPowerMode(PowerMode.LITE_POWER_RAND_LOW); - } else { - Log.e(TAG, "unknown cpu power mode!"); - return false; - } - paddlePredictor = PaddlePredictor.createPaddlePredictor(config); - - this.cpuThreadNum = cpuThreadNum; - this.cpuPowerMode = cpuPowerMode; - this.modelPath = realPath; - this.modelName = realPath.substring(realPath.lastIndexOf("/") + 1); - return true; - } - - public void releaseModel() { - paddlePredictor = null; - isLoaded = false; - cpuThreadNum = 1; - cpuPowerMode = "LITE_POWER_HIGH"; - modelPath = ""; - modelName = ""; - } - - protected boolean loadLabel(Context appCtx, String labelPath) { - wordLabels.clear(); - // Load word labels from file - try { - InputStream assetsInputStream = appCtx.getAssets().open(labelPath); - int available = assetsInputStream.available(); - byte[] lines = new byte[available]; - assetsInputStream.read(lines); - assetsInputStream.close(); - String words = new String(lines); - String[] contents = words.split("\n"); - for (String content : contents) { - wordLabels.add(content); - } - Log.i(TAG, "Word label size: " + wordLabels.size()); - } catch (Exception e) { - Log.e(TAG, e.getMessage()); - return false; - } - return true; - } - - public Tensor getInput(int idx) { - if (!isLoaded()) { - return null; - } - return paddlePredictor.getInput(idx); - } - - public Tensor getOutput(int idx) { - if (!isLoaded()) { - return null; - } - return paddlePredictor.getOutput(idx); - } - - public boolean runModel() { - if (inputImage == null || !isLoaded()) { - return false; - } - - // Set input shape - Tensor inputTensor = getInput(0); - inputTensor.resize(inputShape); - - // Pre-process image, and feed input tensor with pre-processed data - Date start = new Date(); - int channels = (int) inputShape[1]; - int width = (int) inputShape[3]; - int height = (int) inputShape[2]; - float[] inputData = new float[channels * width * height]; - if (channels == 3) { - int[] channelIdx = null; - if (inputColorFormat.equalsIgnoreCase("RGB")) { - channelIdx = new int[]{0, 1, 2}; - } else if (inputColorFormat.equalsIgnoreCase("BGR")) { - channelIdx = new int[]{2, 1, 0}; - } else { - Log.i(TAG, "Unknown color format " + inputColorFormat + ", only RGB and BGR color format is " + - "supported!"); - return false; - } - int[] channelStride = new int[]{width * height, width * height * 2}; - for (int y = 0; y < height; y++) { - for (int x = 0; x < width; x++) { - int color = inputImage.getPixel(x, y); - float[] rgb = new float[]{(float) red(color) / 255.0f, (float) green(color) / 255.0f, - (float) blue(color) / 255.0f}; - inputData[y * width + x] = (rgb[channelIdx[0]] - inputMean[0]) / inputStd[0]; - inputData[y * width + x + channelStride[0]] = (rgb[channelIdx[1]] - inputMean[1]) / inputStd[1]; - inputData[y * width + x + channelStride[1]] = (rgb[channelIdx[2]] - inputMean[2]) / inputStd[2]; - } - } - } else if (channels == 1) { - for (int y = 0; y < height; y++) { - for (int x = 0; x < width; x++) { - int color = inputImage.getPixel(x, y); - float gray = (float) (red(color) + green(color) + blue(color)) / 3.0f / 255.0f; - inputData[y * width + x] = (gray - inputMean[0]) / inputStd[0]; - } - } - } else { - Log.i(TAG, "Unsupported channel size " + Integer.toString(channels) + ", only channel 1 and 3 is " + - "supported!"); - return false; - } - inputTensor.setData(inputData); - Date end = new Date(); - preprocessTime = (float) (end.getTime() - start.getTime()); - - // Warm up - for (int i = 0; i < warmupIterNum; i++) { - paddlePredictor.run(); - } - // Run inference - start = new Date(); - for (int i = 0; i < inferIterNum; i++) { - paddlePredictor.run(); - } - end = new Date(); - inferenceTime = (end.getTime() - start.getTime()) / (float) inferIterNum; - - // Fetch output tensor - Tensor outputTensor = getOutput(0); - - // Post-process - start = new Date(); - long outputShape[] = outputTensor.shape(); - long outputSize = 1; - for (long s : outputShape) { - outputSize *= s; - } - outputImage = inputImage; - outputResult = new String(); - Canvas canvas = new Canvas(outputImage); - Paint rectPaint = new Paint(); - rectPaint.setStyle(Paint.Style.STROKE); - rectPaint.setStrokeWidth(1); - Paint txtPaint = new Paint(); - txtPaint.setTextSize(12); - txtPaint.setAntiAlias(true); - int txtXOffset = 4; - int txtYOffset = (int) (Math.ceil(-txtPaint.getFontMetrics().ascent)); - int imgWidth = outputImage.getWidth(); - int imgHeight = outputImage.getHeight(); - int objectIdx = 0; - final int[] objectColor = {0xFFFF00CC, 0xFFFF0000, 0xFFFFFF33, 0xFF0000FF, 0xFF00FF00, - 0xFF000000, 0xFF339933}; - for (int i = 0; i < outputSize; i += 6) { - float score = outputTensor.getFloatData()[i + 1]; - if (score < scoreThreshold) { - continue; - } - int categoryIdx = (int) outputTensor.getFloatData()[i]; - String categoryName = "Unknown"; - if (wordLabels.size() > 0 && categoryIdx >= 0 && categoryIdx < wordLabels.size()) { - categoryName = wordLabels.get(categoryIdx); - } - float rawLeft = outputTensor.getFloatData()[i + 2]; - float rawTop = outputTensor.getFloatData()[i + 3]; - float rawRight = outputTensor.getFloatData()[i + 4]; - float rawBottom = outputTensor.getFloatData()[i + 5]; - float clampedLeft = Math.max(Math.min(rawLeft, 1.f), 0.f); - float clampedTop = Math.max(Math.min(rawTop, 1.f), 0.f); - float clampedRight = Math.max(Math.min(rawRight, 1.f), 0.f); - float clampedBottom = Math.max(Math.min(rawBottom, 1.f), 0.f); - float imgLeft = clampedLeft * imgWidth; - float imgTop = clampedTop * imgWidth; - float imgRight = clampedRight * imgHeight; - float imgBottom = clampedBottom * imgHeight; - int color = objectColor[objectIdx % objectColor.length]; - rectPaint.setColor(color); - txtPaint.setColor(color); - canvas.drawRect(imgLeft, imgTop, imgRight, imgBottom, rectPaint); - canvas.drawText(objectIdx + "." + categoryName + ":" + String.format("%.3f", score), - imgLeft + txtXOffset, imgTop + txtYOffset, txtPaint); - outputResult += objectIdx + "." + categoryName + " - " + String.format("%.3f", score) + - " [" + String.format("%.3f", rawLeft) + "," + String.format("%.3f", rawTop) + "," + String.format("%.3f", rawRight) + "," + String.format("%.3f", rawBottom) + "]\n"; - objectIdx++; - } - end = new Date(); - postprocessTime = (float) (end.getTime() - start.getTime()); - return true; - } - - - public boolean isLoaded() { - return paddlePredictor != null && isLoaded; - } - - public String modelPath() { - return modelPath; - } - - public String modelName() { - return modelName; - } - - public int cpuThreadNum() { - return cpuThreadNum; - } - - public String cpuPowerMode() { - return cpuPowerMode; - } - - public float inferenceTime() { - return inferenceTime; - } - - public Bitmap inputImage() { - return inputImage; - } - - public Bitmap outputImage() { - return outputImage; - } - - public String outputResult() { - return outputResult; - } - - public float preprocessTime() { - return preprocessTime; - } - - public float postprocessTime() { - return postprocessTime; - } - - - public void setInputImage(Bitmap image) { - if (image == null) { - return; - } - // Scale image to the size of input tensor - Bitmap rgbaImage = image.copy(Bitmap.Config.ARGB_8888, true); - Bitmap scaleImage = Bitmap.createScaledBitmap(rgbaImage, (int) inputShape[3], (int) inputShape[2], true); - this.inputImage = scaleImage; - } -} diff --git a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/SettingsActivity.java b/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/SettingsActivity.java deleted file mode 100644 index f26cf05a05c5e34c2bb39d0e09164c07c862d720..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/java/com/baidu/paddledetection/detection/SettingsActivity.java +++ /dev/null @@ -1,272 +0,0 @@ -package com.baidu.paddledetection.detection; - -import android.content.Context; -import android.content.SharedPreferences; -import android.os.Bundle; -import android.preference.EditTextPreference; -import android.preference.ListPreference; -import android.preference.PreferenceManager; -import androidx.appcompat.app.ActionBar; - -import android.widget.Toast; - -import com.baidu.paddledetection.common.AppCompatPreferenceActivity; -import com.baidu.paddledetection.common.Utils; -import com.baidu.paddledetection.detection.R; - -import java.util.ArrayList; -import java.util.List; - -public class SettingsActivity extends AppCompatPreferenceActivity implements SharedPreferences.OnSharedPreferenceChangeListener { - private static final String TAG = SettingsActivity.class.getSimpleName(); - - static public int selectedModelIdx = -1; - static public String modelDir = ""; - static public String labelPath = ""; - static public int cpuThreadNum = 0; - static public String cpuPowerMode = ""; - static public int inputWidth = 0; - static public int inputHeight = 0; - static public float[] inputMean = new float[]{}; - static public float[] inputStd = new float[]{}; - static public float scoreThreshold = 0.0f; - - ListPreference lpChoosePreInstalledModel = null; - EditTextPreference etModelDir = null; - EditTextPreference etLabelPath = null; - ListPreference lpCPUThreadNum = null; - ListPreference lpCPUPowerMode = null; - EditTextPreference etInputWidth = null; - EditTextPreference etInputHeight = null; - EditTextPreference etInputMean = null; - EditTextPreference etInputStd = null; - EditTextPreference etScoreThreshold = null; - - List preInstalledModelDirs = null; - List preInstalledLabelPaths = null; - List preInstalledCPUThreadNums = null; - List preInstalledCPUPowerModes = null; - List preInstalledInputWidths = null; - List preInstalledInputHeights = null; - List preInstalledInputMeans = null; - List preInstalledInputStds = null; - List preInstalledScoreThresholds = null; - - @Override - public void onCreate(Bundle savedInstanceState) { - super.onCreate(savedInstanceState); - addPreferencesFromResource(R.xml.settings); - ActionBar supportActionBar = getSupportActionBar(); - if (supportActionBar != null) { - supportActionBar.setDisplayHomeAsUpEnabled(true); - } - - // Initialize pre-installed models - preInstalledModelDirs = new ArrayList(); - preInstalledLabelPaths = new ArrayList(); - preInstalledCPUThreadNums = new ArrayList(); - preInstalledCPUPowerModes = new ArrayList(); - preInstalledInputWidths = new ArrayList(); - preInstalledInputHeights = new ArrayList(); - preInstalledInputMeans = new ArrayList(); - preInstalledInputStds = new ArrayList(); - preInstalledScoreThresholds = new ArrayList(); - preInstalledModelDirs.add(getString(R.string.MODEL_DIR_DEFAULT)); - preInstalledLabelPaths.add(getString(R.string.LABEL_PATH_DEFAULT)); - preInstalledCPUThreadNums.add(getString(R.string.CPU_THREAD_NUM_DEFAULT)); - preInstalledCPUPowerModes.add(getString(R.string.CPU_POWER_MODE_DEFAULT)); - preInstalledInputWidths.add(getString(R.string.INPUT_WIDTH_DEFAULT)); - preInstalledInputHeights.add(getString(R.string.INPUT_HEIGHT_DEFAULT)); - preInstalledInputMeans.add(getString(R.string.INPUT_MEAN_DEFAULT)); - preInstalledInputStds.add(getString(R.string.INPUT_STD_DEFAULT)); - preInstalledScoreThresholds.add(getString(R.string.SCORE_THRESHOLD_DEFAULT)); - // Add yolov3_mobilenet_v3_for_hybrid_cpu_npu for CPU and huawei NPU - if (Utils.isSupportedNPU()) { - preInstalledModelDirs.add("models/yolov3_mobilenet_v3_for_hybrid_cpu_npu"); - preInstalledLabelPaths.add("labels/coco-labels-2014_2017.txt"); - preInstalledCPUThreadNums.add("1"); // Useless for NPU - preInstalledCPUPowerModes.add("LITE_POWER_HIGH"); // Useless for NPU - preInstalledInputWidths.add("320"); - preInstalledInputHeights.add("320"); - preInstalledInputMeans.add("0.485,0.456,0.406"); - preInstalledInputStds.add("0.229,0.224,0.225"); - preInstalledScoreThresholds.add("0.2"); - } else { - Toast.makeText(this, "NPU model is not supported by your device.", Toast.LENGTH_LONG).show(); - } - // Setup UI components - lpChoosePreInstalledModel = - (ListPreference) findPreference(getString(R.string.CHOOSE_PRE_INSTALLED_MODEL_KEY)); - String[] preInstalledModelNames = new String[preInstalledModelDirs.size()]; - for (int i = 0; i < preInstalledModelDirs.size(); i++) { - preInstalledModelNames[i] = preInstalledModelDirs.get(i).substring(preInstalledModelDirs.get(i).lastIndexOf("/") + 1); - } - lpChoosePreInstalledModel.setEntries(preInstalledModelNames); - lpChoosePreInstalledModel.setEntryValues(preInstalledModelDirs.toArray(new String[preInstalledModelDirs.size()])); - lpCPUThreadNum = (ListPreference) findPreference(getString(R.string.CPU_THREAD_NUM_KEY)); - lpCPUPowerMode = (ListPreference) findPreference(getString(R.string.CPU_POWER_MODE_KEY)); - etModelDir = (EditTextPreference) findPreference(getString(R.string.MODEL_DIR_KEY)); - etModelDir.setTitle("Model dir (SDCard: " + Utils.getSDCardDirectory() + ")"); - etLabelPath = (EditTextPreference) findPreference(getString(R.string.LABEL_PATH_KEY)); - etLabelPath.setTitle("Label path (SDCard: " + Utils.getSDCardDirectory() + ")"); - etInputWidth = (EditTextPreference) findPreference(getString(R.string.INPUT_WIDTH_KEY)); - etInputHeight = (EditTextPreference) findPreference(getString(R.string.INPUT_HEIGHT_KEY)); - etInputMean = (EditTextPreference) findPreference(getString(R.string.INPUT_MEAN_KEY)); - etInputStd = (EditTextPreference) findPreference(getString(R.string.INPUT_STD_KEY)); - etScoreThreshold = (EditTextPreference) findPreference(getString(R.string.SCORE_THRESHOLD_KEY)); - } - - private void reloadSettingsAndUpdateUI() { - SharedPreferences sharedPreferences = getPreferenceScreen().getSharedPreferences(); - - String selected_model_dir = sharedPreferences.getString(getString(R.string.CHOOSE_PRE_INSTALLED_MODEL_KEY), - getString(R.string.MODEL_DIR_DEFAULT)); - int selected_model_idx = lpChoosePreInstalledModel.findIndexOfValue(selected_model_dir); - if (selected_model_idx >= 0 && selected_model_idx < preInstalledModelDirs.size() && selected_model_idx != selectedModelIdx) { - SharedPreferences.Editor editor = sharedPreferences.edit(); - editor.putString(getString(R.string.MODEL_DIR_KEY), preInstalledModelDirs.get(selected_model_idx)); - editor.putString(getString(R.string.LABEL_PATH_KEY), preInstalledLabelPaths.get(selected_model_idx)); - editor.putString(getString(R.string.CPU_THREAD_NUM_KEY), preInstalledCPUThreadNums.get(selected_model_idx)); - editor.putString(getString(R.string.CPU_POWER_MODE_KEY), preInstalledCPUPowerModes.get(selected_model_idx)); - editor.putString(getString(R.string.INPUT_WIDTH_KEY), preInstalledInputWidths.get(selected_model_idx)); - editor.putString(getString(R.string.INPUT_HEIGHT_KEY), preInstalledInputHeights.get(selected_model_idx)); - editor.putString(getString(R.string.INPUT_MEAN_KEY), preInstalledInputMeans.get(selected_model_idx)); - editor.putString(getString(R.string.INPUT_STD_KEY), preInstalledInputStds.get(selected_model_idx)); - editor.putString(getString(R.string.SCORE_THRESHOLD_KEY), preInstalledScoreThresholds.get(selected_model_idx)); - editor.commit(); - lpChoosePreInstalledModel.setSummary(selected_model_dir); - selectedModelIdx = selected_model_idx; - } - - String model_dir = sharedPreferences.getString(getString(R.string.MODEL_DIR_KEY), - getString(R.string.MODEL_DIR_DEFAULT)); - String label_path = sharedPreferences.getString(getString(R.string.LABEL_PATH_KEY), - getString(R.string.LABEL_PATH_DEFAULT)); - String cpu_thread_num = sharedPreferences.getString(getString(R.string.CPU_THREAD_NUM_KEY), - getString(R.string.CPU_THREAD_NUM_DEFAULT)); - String cpu_power_mode = sharedPreferences.getString(getString(R.string.CPU_POWER_MODE_KEY), - getString(R.string.CPU_POWER_MODE_DEFAULT)); - String input_width = sharedPreferences.getString(getString(R.string.INPUT_WIDTH_KEY), - getString(R.string.INPUT_WIDTH_DEFAULT)); - String input_height = sharedPreferences.getString(getString(R.string.INPUT_HEIGHT_KEY), - getString(R.string.INPUT_HEIGHT_DEFAULT)); - String input_mean = sharedPreferences.getString(getString(R.string.INPUT_MEAN_KEY), - getString(R.string.INPUT_MEAN_DEFAULT)); - String input_std = sharedPreferences.getString(getString(R.string.INPUT_STD_KEY), - getString(R.string.INPUT_STD_DEFAULT)); - String score_threshold = sharedPreferences.getString(getString(R.string.SCORE_THRESHOLD_KEY), - getString(R.string.SCORE_THRESHOLD_DEFAULT)); - - etModelDir.setSummary(model_dir); - etLabelPath.setSummary(label_path); - lpCPUThreadNum.setValue(cpu_thread_num); - lpCPUThreadNum.setSummary(cpu_thread_num); - lpCPUPowerMode.setValue(cpu_power_mode); - lpCPUPowerMode.setSummary(cpu_power_mode); - etInputWidth.setSummary(input_width); - etInputWidth.setText(input_width); - etInputHeight.setSummary(input_height); - etInputHeight.setText(input_height); - etInputMean.setSummary(input_mean); - etInputMean.setText(input_mean); - etInputStd.setSummary(input_std); - etInputStd.setText(input_std); - etScoreThreshold.setSummary(score_threshold); - etScoreThreshold.setText(score_threshold); - } - - static boolean checkAndUpdateSettings(Context ctx) { - boolean settingsChanged = false; - SharedPreferences sharedPreferences = PreferenceManager.getDefaultSharedPreferences(ctx); - - String model_dir = sharedPreferences.getString(ctx.getString(R.string.MODEL_DIR_KEY), - ctx.getString(R.string.MODEL_DIR_DEFAULT)); - settingsChanged |= !modelDir.equalsIgnoreCase(model_dir); - modelDir = model_dir; - - String label_path = sharedPreferences.getString(ctx.getString(R.string.LABEL_PATH_KEY), - ctx.getString(R.string.LABEL_PATH_DEFAULT)); - settingsChanged |= !labelPath.equalsIgnoreCase(label_path); - labelPath = label_path; - - String cpu_thread_num = sharedPreferences.getString(ctx.getString(R.string.CPU_THREAD_NUM_KEY), - ctx.getString(R.string.CPU_THREAD_NUM_DEFAULT)); - settingsChanged |= cpuThreadNum != Integer.parseInt(cpu_thread_num); - cpuThreadNum = Integer.parseInt(cpu_thread_num); - - String cpu_power_mode = sharedPreferences.getString(ctx.getString(R.string.CPU_POWER_MODE_KEY), - ctx.getString(R.string.CPU_POWER_MODE_DEFAULT)); - settingsChanged |= !cpuPowerMode.equalsIgnoreCase(cpu_power_mode); - cpuPowerMode = cpu_power_mode; - - String input_width = sharedPreferences.getString(ctx.getString(R.string.INPUT_WIDTH_KEY), - ctx.getString(R.string.INPUT_WIDTH_DEFAULT)); - settingsChanged |= inputWidth != Integer.parseInt(input_width); - inputWidth = Integer.parseInt(input_width); - - String input_height = sharedPreferences.getString(ctx.getString(R.string.INPUT_HEIGHT_KEY), - ctx.getString(R.string.INPUT_HEIGHT_DEFAULT)); - settingsChanged |= inputHeight != Integer.parseInt(input_height); - inputHeight = Integer.parseInt(input_height); - - String input_mean = sharedPreferences.getString(ctx.getString(R.string.INPUT_MEAN_KEY), - ctx.getString(R.string.INPUT_MEAN_DEFAULT)); - float[] array_data = Utils.parseFloatsFromString(input_mean, ","); - settingsChanged |= array_data.length != inputMean.length; - if (!settingsChanged) { - for (int i = 0; i < array_data.length; i++) { - settingsChanged |= array_data[i] != inputMean[i]; - } - } - inputMean = array_data; - - String input_std = sharedPreferences.getString(ctx.getString(R.string.INPUT_STD_KEY), - ctx.getString(R.string.INPUT_STD_DEFAULT)); - array_data = Utils.parseFloatsFromString(input_std, ","); - settingsChanged |= array_data.length != inputStd.length; - if (!settingsChanged) { - for (int i = 0; i < array_data.length; i++) { - settingsChanged |= array_data[i] != inputStd[i]; - } - } - inputStd = array_data; - - String score_threshold = sharedPreferences.getString(ctx.getString(R.string.SCORE_THRESHOLD_KEY), - ctx.getString(R.string.SCORE_THRESHOLD_DEFAULT)); - settingsChanged |= scoreThreshold != Float.parseFloat(score_threshold); - scoreThreshold = Float.parseFloat(score_threshold); - - return settingsChanged; - } - - static void resetSettings() { - selectedModelIdx = -1; - modelDir = ""; - labelPath = ""; - cpuThreadNum = 0; - cpuPowerMode = ""; - inputWidth = 0; - inputHeight = 0; - inputMean = new float[]{}; - inputStd = new float[]{}; - scoreThreshold = 0; - } - - @Override - protected void onResume() { - super.onResume(); - getPreferenceScreen().getSharedPreferences().registerOnSharedPreferenceChangeListener(this); - reloadSettingsAndUpdateUI(); - } - - @Override - protected void onPause() { - super.onPause(); - getPreferenceScreen().getSharedPreferences().unregisterOnSharedPreferenceChangeListener(this); - } - - @Override - public void onSharedPreferenceChanged(SharedPreferences sharedPreferences, String key) { - reloadSettingsAndUpdateUI(); - } -} diff --git a/static/deploy/android_demo/app/src/main/res/drawable-v24/camera.png b/static/deploy/android_demo/app/src/main/res/drawable-v24/camera.png deleted file mode 100644 index 6bbeb0a7514f2b0d3ed93b10d5e807416c691833..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/drawable-v24/camera.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/drawable-v24/ic_launcher_foreground.xml b/static/deploy/android_demo/app/src/main/res/drawable-v24/ic_launcher_foreground.xml deleted file mode 100644 index 1f6bb290603d7caa16c5fb6f61bbfdc750622f5c..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/drawable-v24/ic_launcher_foreground.xml +++ /dev/null @@ -1,34 +0,0 @@ - - - - - - - - - - - diff --git a/static/deploy/android_demo/app/src/main/res/drawable-v24/photo.png b/static/deploy/android_demo/app/src/main/res/drawable-v24/photo.png deleted file mode 100644 index 7a534189a9fb3ebd2e89e02dbd31144b391483af..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/drawable-v24/photo.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/btn_switch_default.png b/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/btn_switch_default.png deleted file mode 100644 index b9e66c7f605dd5a02d13f04284a046810b292add..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/btn_switch_default.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/btn_switch_pressed.png b/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/btn_switch_pressed.png deleted file mode 100644 index 9544133bdade8f57552f9ab22976be3172c95b86..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/btn_switch_pressed.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/camera.png b/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/camera.png deleted file mode 100644 index 6bbeb0a7514f2b0d3ed93b10d5e807416c691833..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/camera.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/photo.png b/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/photo.png deleted file mode 100644 index 7a534189a9fb3ebd2e89e02dbd31144b391483af..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/drawable-xxhdpi-v4/photo.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/drawable/btn_settings.xml b/static/deploy/android_demo/app/src/main/res/drawable/btn_settings.xml deleted file mode 100644 index 917897b99981d18082d18a87a4ad5176ad8e8f8d..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/drawable/btn_settings.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - - - diff --git a/static/deploy/android_demo/app/src/main/res/drawable/btn_settings_default.xml b/static/deploy/android_demo/app/src/main/res/drawable/btn_settings_default.xml deleted file mode 100644 index e19589a97e419249eaacd05f3d75deeeada3e128..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/drawable/btn_settings_default.xml +++ /dev/null @@ -1,13 +0,0 @@ - - - - diff --git a/static/deploy/android_demo/app/src/main/res/drawable/btn_settings_pressed.xml b/static/deploy/android_demo/app/src/main/res/drawable/btn_settings_pressed.xml deleted file mode 100644 index c4af2a042de3a8ae00ab253f889a20dedffa4874..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/drawable/btn_settings_pressed.xml +++ /dev/null @@ -1,13 +0,0 @@ - - - - diff --git a/static/deploy/android_demo/app/src/main/res/drawable/btn_shutter.xml b/static/deploy/android_demo/app/src/main/res/drawable/btn_shutter.xml deleted file mode 100644 index 4f9826d3ae340b54046a48e4250a9d7e0b9d9139..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/drawable/btn_shutter.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - diff --git a/static/deploy/android_demo/app/src/main/res/drawable/btn_shutter_default.xml b/static/deploy/android_demo/app/src/main/res/drawable/btn_shutter_default.xml deleted file mode 100644 index 234ca014a76b9647959814fa28e0c02324a8d814..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/drawable/btn_shutter_default.xml +++ /dev/null @@ -1,17 +0,0 @@ - - - - - diff --git a/static/deploy/android_demo/app/src/main/res/drawable/btn_shutter_pressed.xml b/static/deploy/android_demo/app/src/main/res/drawable/btn_shutter_pressed.xml deleted file mode 100644 index accc7acedb91cc4fb8171d78eeba24eaa6b0c2db..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/drawable/btn_shutter_pressed.xml +++ /dev/null @@ -1,17 +0,0 @@ - - - - - diff --git a/static/deploy/android_demo/app/src/main/res/drawable/btn_switch.xml b/static/deploy/android_demo/app/src/main/res/drawable/btn_switch.xml deleted file mode 100644 index 691e8c2e97d7a65d580e4d12d6b77608083b5617..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/drawable/btn_switch.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - diff --git a/static/deploy/android_demo/app/src/main/res/drawable/camera.png b/static/deploy/android_demo/app/src/main/res/drawable/camera.png deleted file mode 100644 index 6bbeb0a7514f2b0d3ed93b10d5e807416c691833..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/drawable/camera.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/drawable/ic_launcher_background.xml b/static/deploy/android_demo/app/src/main/res/drawable/ic_launcher_background.xml deleted file mode 100644 index 0d025f9bf6b67c63044a36a9ff44fbc69e5c5822..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/drawable/ic_launcher_background.xml +++ /dev/null @@ -1,170 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/static/deploy/android_demo/app/src/main/res/drawable/photo.png b/static/deploy/android_demo/app/src/main/res/drawable/photo.png deleted file mode 100644 index 7a534189a9fb3ebd2e89e02dbd31144b391483af..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/drawable/photo.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/drawable/photo1.png b/static/deploy/android_demo/app/src/main/res/drawable/photo1.png deleted file mode 100644 index 41ebaaab61702b751f0243455ca5cc1b6d6e8700..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/drawable/photo1.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/layout-land/fragment_camera.xml b/static/deploy/android_demo/app/src/main/res/layout-land/fragment_camera.xml deleted file mode 100644 index ef3da245f2e3169441fc3980de25e2889b558f00..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/layout-land/fragment_camera.xml +++ /dev/null @@ -1,99 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/layout/activity_main.xml b/static/deploy/android_demo/app/src/main/res/layout/activity_main.xml deleted file mode 100644 index 9c96440bc2b1bc566214b74b6fac1246bdec8655..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/layout/activity_main.xml +++ /dev/null @@ -1,25 +0,0 @@ - - - - - - - - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/layout/content_main.xml b/static/deploy/android_demo/app/src/main/res/layout/content_main.xml deleted file mode 100644 index c285383882907af11efef688be1a7cf96541d2c3..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/layout/content_main.xml +++ /dev/null @@ -1,20 +0,0 @@ - - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/layout/fragment_camera.xml b/static/deploy/android_demo/app/src/main/res/layout/fragment_camera.xml deleted file mode 100644 index e1e7f41a94ffdbfa2bcb72b603da9eb13a6a852f..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/layout/fragment_camera.xml +++ /dev/null @@ -1,98 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/layout/fragment_content.xml b/static/deploy/android_demo/app/src/main/res/layout/fragment_content.xml deleted file mode 100644 index 3534e92acb774f3c128e8c38d9e4c929498f5450..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/layout/fragment_content.xml +++ /dev/null @@ -1,37 +0,0 @@ - - - - - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/layout/fragment_photo.xml b/static/deploy/android_demo/app/src/main/res/layout/fragment_photo.xml deleted file mode 100644 index 04871b37d59625762ee84b5befdd9cdab84de96a..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/layout/fragment_photo.xml +++ /dev/null @@ -1,121 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/menu/menu_main.xml b/static/deploy/android_demo/app/src/main/res/menu/menu_main.xml deleted file mode 100644 index 3a711c72f24fe60a993a45538df653a050d749b1..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/menu/menu_main.xml +++ /dev/null @@ -1,10 +0,0 @@ -

    - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-anydpi-v26/ic_launcher.xml b/static/deploy/android_demo/app/src/main/res/mipmap-anydpi-v26/ic_launcher.xml deleted file mode 100644 index eca70cfe52eac1ba66ba280a68ca7be8fcf88a16..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/mipmap-anydpi-v26/ic_launcher.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-anydpi-v26/ic_launcher_round.xml b/static/deploy/android_demo/app/src/main/res/mipmap-anydpi-v26/ic_launcher_round.xml deleted file mode 100644 index eca70cfe52eac1ba66ba280a68ca7be8fcf88a16..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/mipmap-anydpi-v26/ic_launcher_round.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-hdpi/ic_launcher.png b/static/deploy/android_demo/app/src/main/res/mipmap-hdpi/ic_launcher.png deleted file mode 100644 index 898f3ed59ac9f3248734a00e5902736c9367d455..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/mipmap-hdpi/ic_launcher.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-hdpi/ic_launcher_round.png b/static/deploy/android_demo/app/src/main/res/mipmap-hdpi/ic_launcher_round.png deleted file mode 100644 index dffca3601eba7bf5f409bdd520820e2eb5122c75..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/mipmap-hdpi/ic_launcher_round.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-mdpi/ic_launcher.png b/static/deploy/android_demo/app/src/main/res/mipmap-mdpi/ic_launcher.png deleted file mode 100644 index 64ba76f75e9ce021aa3d95c213491f73bcacb597..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/mipmap-mdpi/ic_launcher.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-mdpi/ic_launcher_round.png b/static/deploy/android_demo/app/src/main/res/mipmap-mdpi/ic_launcher_round.png deleted file mode 100644 index dae5e082342fcdeee5db8a6e0b27028e2d2808f5..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/mipmap-mdpi/ic_launcher_round.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-xhdpi/ic_launcher.png b/static/deploy/android_demo/app/src/main/res/mipmap-xhdpi/ic_launcher.png deleted file mode 100644 index e5ed46597ea8447d91ab1786a34e30f1c26b18bd..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/mipmap-xhdpi/ic_launcher.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-xhdpi/ic_launcher_round.png b/static/deploy/android_demo/app/src/main/res/mipmap-xhdpi/ic_launcher_round.png deleted file mode 100644 index 14ed0af35023e4f1901cf03487b6c524257b8483..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/mipmap-xhdpi/ic_launcher_round.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-xxhdpi/ic_launcher.png b/static/deploy/android_demo/app/src/main/res/mipmap-xxhdpi/ic_launcher.png deleted file mode 100644 index b0907cac3bfd8fbfdc46e1108247f0a1055387ec..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/mipmap-xxhdpi/ic_launcher.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-xxhdpi/ic_launcher_round.png b/static/deploy/android_demo/app/src/main/res/mipmap-xxhdpi/ic_launcher_round.png deleted file mode 100644 index d8ae03154975f397f8ed1b84f2d4bf9783ecfa26..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/mipmap-xxhdpi/ic_launcher_round.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-xxxhdpi/ic_launcher.png b/static/deploy/android_demo/app/src/main/res/mipmap-xxxhdpi/ic_launcher.png deleted file mode 100644 index 2c18de9e66108411737e910f5c1972476f03ddbf..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/mipmap-xxxhdpi/ic_launcher.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/mipmap-xxxhdpi/ic_launcher_round.png b/static/deploy/android_demo/app/src/main/res/mipmap-xxxhdpi/ic_launcher_round.png deleted file mode 100644 index beed3cdd2c32af5114a7dc70b9ef5b698eb8797e..0000000000000000000000000000000000000000 Binary files a/static/deploy/android_demo/app/src/main/res/mipmap-xxxhdpi/ic_launcher_round.png and /dev/null differ diff --git a/static/deploy/android_demo/app/src/main/res/navigation/nav_graph.xml b/static/deploy/android_demo/app/src/main/res/navigation/nav_graph.xml deleted file mode 100644 index ab0abc2506c8f6a80ed19f63b0cac580acb9e8fa..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/navigation/nav_graph.xml +++ /dev/null @@ -1,34 +0,0 @@ - - - - - - - - - - - - - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/values/arrays.xml b/static/deploy/android_demo/app/src/main/res/values/arrays.xml deleted file mode 100644 index 8c99734d1485615198f4ce3dd4e905305f7f0ce4..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/values/arrays.xml +++ /dev/null @@ -1,31 +0,0 @@ - - - - 1 threads - 2 threads - 4 threads - 8 threads - - - 1 - 2 - 4 - 8 - - - HIGH(only big cores) - LOW(only LITTLE cores) - FULL(all cores) - NO_BIND(depends on system) - RAND_HIGH - RAND_LOW - - - LITE_POWER_HIGH - LITE_POWER_LOW - LITE_POWER_FULL - LITE_POWER_NO_BIND - LITE_POWER_RAND_HIGH - LITE_POWER_RAND_LOW - - \ No newline at end of file diff --git a/static/deploy/android_demo/app/src/main/res/values/colors.xml b/static/deploy/android_demo/app/src/main/res/values/colors.xml deleted file mode 100644 index 1fdccc1e88c3cb44fbd1526cdea829ebe413802b..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/values/colors.xml +++ /dev/null @@ -1,10 +0,0 @@ - - - #6200EE - #3700B3 - #1E90FF - #FF000000 - #00000000 - #00000000 - #FFFFFFFF - diff --git a/static/deploy/android_demo/app/src/main/res/values/dimens.xml b/static/deploy/android_demo/app/src/main/res/values/dimens.xml deleted file mode 100644 index 377274d30426f4d5e64e7d16383d581084b65a74..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/values/dimens.xml +++ /dev/null @@ -1,18 +0,0 @@ - - - 26dp - 36dp - 34dp - 60dp - 16dp - 67dp - 67dp - 56dp - 56dp - 46dp - 46dp - 32dp - 24dp - 16dp - 16dp - diff --git a/static/deploy/android_demo/app/src/main/res/values/strings.xml b/static/deploy/android_demo/app/src/main/res/values/strings.xml deleted file mode 100644 index 1e2ee16e8bb482e9dfe79599315c30560043f69c..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/values/strings.xml +++ /dev/null @@ -1,37 +0,0 @@ - - PaddleDetection - - models/yolov3_mobilenet_v3_for_cpu - - labels/coco-labels-2014_2017.txt - 1 - LITE_POWER_HIGH - 320 - 320 - 0.485,0.456,0.406 - 0.229,0.224,0.225 - 0.2 - First Fragment - Second Fragment - 敬请期待新功能 - - - models/ssdlite_mobilenet_v3_large_for_cpu_nb - labels/coco-labels-background.txt - images/home.jpg - 1,3,320,320 - RGB - 0.5 - - CHOOSE_INSTALLED_MODEL_KEY - MODEL_DIR_KEY - LABEL_PATH_KEY - CPU_THREAD_NUM_KEY - CPU_POWER_MODE_KEY - INPUT_WIDTH_KEY - INPUT_HEIGHT_KEY - INPUT_MEAN_KEY - INPUT_STD_KEY - SCORE_THRESHOLD_KEY - - diff --git a/static/deploy/android_demo/app/src/main/res/values/styles.xml b/static/deploy/android_demo/app/src/main/res/values/styles.xml deleted file mode 100644 index 853262016a2ffab26185bf9f3dcd59e10605630a..0000000000000000000000000000000000000000 --- a/static/deploy/android_demo/app/src/main/res/values/styles.xml +++ /dev/null @@ -1,25 +0,0 @@ - - - - - - - - - -