1 Star 0 Fork 0

MythWU/dlib-models

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
Loading...
README
CC0-1.0

dlib-models

This repository contains trained models created by me (Davis King). They are provided as part of the dlib example programs, which are intended to be educational documents that explain how to use various parts of the dlib library. As far as I am concerned, anyone can do whatever they want with these model files as I've released them into the public domain. Details describing how each model was created are summarized below.

dlib_face_recognition_resnet_model_v1.dat.bz2

This model is a ResNet network with 29 conv layers. It's essentially a version of the ResNet-34 network from the paper Deep Residual Learning for Image Recognition by He, Zhang, Ren, and Sun with a few layers removed and the number of filters per layer reduced by half.

The network was trained from scratch on a dataset of about 3 million faces. This dataset is derived from a number of datasets. The face scrub dataset (http://vintage.winklerbros.net/facescrub.html), the VGG dataset (http://www.robots.ox.ac.uk/~vgg/data/vgg_face/), and then a large number of images I scraped from the internet. I tried as best I could to clean up the dataset by removing labeling errors, which meant filtering out a lot of stuff from VGG. I did this by repeatedly training a face recognition CNN and then using graph clustering methods and a lot of manual review to clean up the dataset. In the end about half the images are from VGG and face scrub. Also, the total number of individual identities in the dataset is 7485. I made sure to avoid overlap with identities in LFW.

The network training started with randomly initialized weights and used a structured metric loss that tries to project all the identities into non-overlapping balls of radius 0.6. The loss is basically a type of pair-wise hinge loss that runs over all pairs in a mini-batch and includes hard-negative mining at the mini-batch level.

The resulting model obtains a mean error of 0.993833 with a standard deviation of 0.00272732 on the LFW benchmark.

mmod_dog_hipsterizer.dat.bz2

This dataset is trained on the data from the Columbia Dogs dataset, which was introduced in the paper:

Dog Breed Classification Using Part Localization
Jiongxin Liu, Angjoo Kanazawa, Peter Belhumeur, David W. Jacobs 
European Conference on Computer Vision (ECCV), Oct. 2012. 

The original dataset is not fully annotated. So I created a new fully annotated version which is available here: http://dlib.net/files/data/CU_dogs_fully_labeled.tar.gz

mmod_human_face_detector.dat.bz2

This is trained on this dataset: http://dlib.net/files/data/dlib_face_detection_dataset-2016-09-30.tar.gz.
I created the dataset by finding face images in many publicly available image datasets (excluding the FDDB dataset). In particular, there are images from ImageNet, AFLW, Pascal VOC, the VGG dataset, WIDER, and face scrub.

All the annotations in the dataset were created by me using dlib's imglab tool.

resnet34_1000_imagenet_classifier.dnn.bz2

This is trained on the venerable ImageNet dataset.

shape_predictor_5_face_landmarks.dat.bz2

This is a 5 point landmarking model which identifies the corners of the eyes and bottom of the nose. It is trained on the dlib 5-point face landmark dataset, which consists of 7198 faces. I created this dataset by downloading images from the internet and annotating them with dlib's imglab tool.

The exact program that produced the model file can be found here.

This model is designed to work well with dlib's HOG face detector and the CNN face detector (the one in mmod_human_face_detector.dat).

shape_predictor_68_face_landmarks.dat.bz2

This is trained on the ibug 300-W dataset (https://ibug.doc.ic.ac.uk/resources/facial-point-annotations/)

C. Sagonas, E. Antonakos, G, Tzimiropoulos, S. Zafeiriou, M. Pantic. 
300 faces In-the-wild challenge: Database and results. 
Image and Vision Computing (IMAVIS), Special Issue on Facial Landmark Localisation "In-The-Wild". 2016.

The license for this dataset excludes commercial use and Stefanos Zafeiriou, one of the creators of the dataset, asked me to include a note here saying that the trained model therefore can't be used in a commerical product. So you should contact a lawyer or talk to Imperial College London to find out if it's OK for you to use this model in a commercial product.

Also note that this model file is designed for use with dlib's HOG face detector. That is, it expects the bounding boxes from the face detector to be aligned a certain way, the way dlib's HOG face detector does it. It won't work as well when used with a face detector that produces differently aligned boxes, such as the CNN based mmod_human_face_detector.dat face detector.

shape_predictor_68_face_landmarks_GTX.dat.bz2

The GTX model is the result of applying a set of training strategies and implementation optimization described in:

Alvarez Casado, C., Bordallo Lopez, M. 
Real-time face alignment: evaluation methods, training strategies and implementation optimization. 
Springer Journal of Real-time image processing, 2021

The resulted model is smaller, faster, smoother and more accurate. You can find all the details related to the training and testing in the next Gitlab repository: https://gitlab.com/visualhealth/vhpapers/real-time-facealignment

This is trained on the ibug 300-W dataset (https://ibug.doc.ic.ac.uk/resources/facial-point-annotations/)

C. Sagonas, E. Antonakos, G, Tzimiropoulos, S. Zafeiriou, M. Pantic. 
300 faces In-the-wild challenge: Database and results. 
Image and Vision Computing (IMAVIS), Special Issue on Facial Landmark Localisation "In-The-Wild". 2016.

The license for this dataset excludes commercial use and Stefanos Zafeiriou, one of the creators of the dataset, asked me to include a note here saying that the trained model therefore can't be used in a commerical product. So you should contact a lawyer or talk to Imperial College London to find out if it's OK for you to use this model in a commercial product.

Also note that this model file with increased robustness to face detectors. However, it works best when the bounding boxes are squared, as it is que case with both dlib's HOG face detector or the CNN based mmod_human_face_detector.dat face detector. It won't work as well when used with other face detectors that produce rectangular boxes.

mmod_rear_end_vehicle_detector.dat.bz2

This model is trained on the dlib rear end vehicles dataset. The dataset contains images from vehicle dashcams which I manually annotated using dlib's imglab tool.

mmod_front_and_rear_end_vehicle_detector.dat.bz2

This model is trained on the dlib front and rear end vehicles dataset. The dataset contains images from vehicle dashcams which I manually annotated using dlib's imglab tool.

dnn_gender_classifier_v1.dat.bz2

This model is a gender classifier trained using a private dataset of about 200k different face images and was generated according to the network definition and settings given in Minimalistic CNN-based ensemble model for gender prediction from face images. Even if the dataset used for the training is different from that used by G. Antipov et al, the classification results on the LFW evaluation are similar overall (± 97.3%). To take up the authors' proposal to join the results of three networks, a simplification was made by finally presenting RGB images, thus simulating three "grayscale" networks via the three image planes. Better results could be probably obtained with a more complex and deeper network, but the performance of the classification is nevertheless surprising compared to the simplicity of the network used and thus its very small size.

This gender model is provided for free by Cydral Technology and is licensed under the Creative Commons Zero v1.0 Universal.

dnn_age_predictor_v1.dat.bz2

The initial source for the model's creation came from the document of Z. Qawaqneh et al.: "Deep Convolutional Neural Network for Age Estimation based on VGG-Face Model". However, our research has led us to significant improvements in the CNN model, allowing us to estimate the age of a person outperforming the state-of-the-art results in terms of the exact accuracy and for 1-off accuracy.

This model is thus an age predictor leveraging a ResNet-10 architecture and trained using a private dataset of about 110k different labelled images. During the training, we used an optimization and data augmentation pipeline and considered several sizes for the entry image.

This age predictor model is provided for free by Cydral Technology and is licensed under the Creative Commons Zero v1.0 Universal.

resnet50_1000_imagenet_classifier.dnn.bz2

This is trained on the venerable ImageNet dataset. The model was trained using dlib's example but with the ResNet50 model defined in resnet.h and a crop size of 224.

The performance of this model is summarized in the following table:

# crops top-1 acc top-5 acc
1 0.77308 0.93352
10 0.77426 0.93310
CC0 1.0 Universal Statement of Purpose The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work"). Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others. For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights. 1. Copyright and Related Rights. A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following: i. the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work; ii. moral rights retained by the original author(s) and/or performer(s); iii. publicity and privacy rights pertaining to a person's image or likeness depicted in a Work; iv. rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below; v. rights protecting the extraction, dissemination, use and reuse of data in a Work; vi. database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and vii. other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof. 2. Waiver. To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose. 3. Public License Fallback. Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose. 4. Limitations and Disclaimers. a. No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document. b. Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law. c. Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work. d. Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work. For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/>

简介

暂无描述 展开 收起
README
CC0-1.0
取消

发行版

暂无发行版

贡献者

全部

语言

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/mythwu/dlib-models.git
git@gitee.com:mythwu/dlib-models.git
mythwu
dlib-models
dlib-models
master

搜索帮助