应用领域(Application Domain):Face Verification
版本(Version):1.1
修改时间(Modified) :2021.09.25
大小(Size):4M
框架(Framework):TensorFlow 1.15.0
模型格式(Model Format):ckpt,pb,om
精度(Precision):Mixed
应用级别(Categories):Demo
描述(Description):基于TensorFlow框架的MobileFaceNet人脸识别网络训练,推理代码
MobileFaceNet 是一个非常高效的CNN模型,参数量仅有4M,但却在移动设备和嵌入设备上取得了颇具竞争力的精度。MobileFaceNet在最后一个卷积层之后使用了一个全局深度可分离卷积替代了原来的全局平均池化,从而使得性能有了进一步提升。
训练数据集预处理 (MS1M-refined数据集)
测试数据集预处理 (LFW数据集)
训练超参
脚本已默认开启混合精度,设置precision_mode参数的代码参考如下。
custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("allow_mix_precision")
硬件环境准备请参见各硬件产品文档"驱动和固件安装升级指南"。需要在硬件设备上安装与CANN版本配套的固件与驱动。
宿主机上需要安装Docker并登录Ascend Hub中心获取镜像。
当前模型支持的镜像列表如表1所示。
表 1 镜像列表
|
模型训练使用ms1m-refined数据集,从此处获取预处理好的Tfrecord数据。
数据集获取后,放入MobileFaceNet/datasets/tfrecords/目录下,在训练脚本中指定数据集路径,可正常使用。
验证数据集使用LFW数据集,从该链接下载。
单击“立即下载”,并选择合适的下载方式下载源码包。
启动训练之前,首先要配置程序运行相关环境变量。
环境变量配置信息参见:
单卡训练
sh train_1p_full.sh > ../logs/loss+perf.txt
对于只需要执行一些step的训练过程的用户,可执行只训练少量steps的脚本
sh train_1p_less.sh
验证
sh test.sh > ../logs/test.txt
sh om_eval.sh
python3 calculate_om_acc.py
Parameters | NPU | GPU |
---|---|---|
Resource | Ascend 910 | GPU |
Tensorflow Version | 1.15.0 | 1.15.0 |
Dataset | LFW | LFW |
Training Parameters | epoch=10, batch_size=64 | epoch=10, batch_size=128 |
Optimizer | ADAM | ADAM |
Loss | 1.93 | 1.82 |
Validation Accuracy | 98.7% | 98.5% (99.2% in paper) |
Speed | 600 samples/s | 200 samples/s |
Total time | 23 hours | 60 hours |
Parameters | |
---|---|
Resource | Ascend 910; CPU 2.60GHz, 24cores; Memory, 72G |
Tensorflow Version | 1.15.0 |
Dataset | LFW |
batch_size | 100 |
Evaluation Accuracy | 98.7% |
Total time | 102s |
python3 freeze_grpah.py
atc --model=/root/zjx/MFN/pb_model/MobileFaceNet.pb --framework=3 --output=/root/zjx/MFN/tf_MobileFaceNet --soc_version=Ascend310 --input_shape='Placeholder:1,112,112,3' --precision_mode=allow_fp32_to_fp16 --op_select_implmode=high_precision
python3 image2bin.py
# Ascend310 om model evaluation script
# model: om model path
# input dir: bin file path
# output dir: om_output path
msame --model './tf_MobileFaceNet.om' --input './bin_input' --output './om_output'
python3 calculate_om_acc.py
Parameters | |
---|---|
Resource | Ascend 310; CPU 2.60GHz, 24cores; Memory, 72G |
Tensorflow Version | 1.15.0 |
Dataset | LFW |
batch_size | 1 |
Evaluation Accuracy | 98.7% |
Total time | 0.305 s |
├── README.md
├── requirements.txt
├── LICENSE
├── author.txt
├── modelzoo_level.txt
├── Dockerfile
├── imgs
├── scripts
│ ├── test.sh #测试脚本
│ ├── om_eval.sh #om推理脚本
│ ├── train_1p_less.sh #少量step训练脚本
│ └── train_1p_full.sh #完整训练脚本
├── MobileFaceNet_Tensorflow
│ ├── inference.py #测试代码
│ ├── train_nets.py #训练代码
│ ├── image2bin.py #将测试数据转换为bin格式
│ ├── fusion_switch.cfg # 关闭异常融合的规则
│ ├── arch
│ │ ├── img
│ │ ├── txt
│ ├── datasets
│ │ ├── faces_ms1m_112x112
│ │ ├── tfrecords
│ ├── losses
│ ├── nets
│ ├── output
│ ├── utils
├── log
| ├── loss+perf.txt
| └── test.txt
├── freeze_graph.py #固化参数生成pb模型
├── calculate_om_acc.py #计算om模型推理结果的精度
-max_epoch 训练过程的轮数,默认为10
-train_batch_size 训练批大小,默认为64
说明:当前代码仅支持单卡训练与验证。
通过“模型训练”中的训练指令启动单卡训练
完整训练过程的输出日志存储在../logs文件夹下
训练过程中会间隔执行推理过程,同时输出validation accuracy日志
下面是训练过程中的部分日志输出
WARNING:tensorflow:From /usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages/npu_bridge/estimator/npu/npu_optimizer.py:284: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:From train_nets.py:97: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_initializable_iterator(dataset)`.
begin db lfw convert.
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
loading bin 12000
(12000, 112, 112, 3)
2021-09-25 16:02:22.272842: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-09-25 16:02:22.306371: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600000000 Hz
2021-09-25 16:02:22.312458: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55df78ed3bf0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-09-25 16:02:22.312507: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-09-25 16:02:22.744753: W tf_adapter/util/ge_plugin.cc:130] [GePlugin] can not find Environment variable : JOB_ID
2021-09-25 16:02:26.027238: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SOURCE is null.
2021-09-25 16:02:26.027304: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SINK is null.
2021-09-25 16:02:26.032384: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node init is null.
... ...
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:`tf.train.start_queue_runners()` was called when no queue runners were defined. You can safely remove the call to this deprecated function.
============== start training ===============
2021-09-25 16:03:11.789699: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SOURCE is null.
2021-09-25 16:03:11.789765: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SINK is null.
2021-09-25 16:03:11.790043: W tf_adapter/util/infershape_util.cc:337] The shape of node MobileFaceNet/MobileFaceNet/Conv2d_0/BatchNorm/cond/FusedBatchNormV3 output 5 is ?, unknown shape.
2021-09-25 16:03:11.790135: W tf_adapter/util/infershape_util.cc:337] The shape of node MobileFaceNet/MobileFaceNet/Conv2d_0/BatchNorm/cond/FusedBatchNormV3_1 output 5 is ?, unknown shape.
......
epoch 0, step 100, total loss is: 38.39, inference loss is: 38.25, reg_loss is: 0.14
epoch 0, step 200, total loss is: 29.42, inference loss is: 29.28, reg_loss is: 0.14
epoch 0, step 300, total loss is: 28.49, inference loss is: 28.34, reg_loss is: 0.14
epoch 0, step 400, total loss is: 25.56, inference loss is: 25.41, reg_loss is: 0.15
epoch 0, step 500, total loss is: 27.22, inference loss is: 27.07, reg_loss is: 0.15
epoch 0, step 600, total loss is: 24.44, inference loss is: 24.29, reg_loss is: 0.15
epoch 0, step 700, total loss is: 25.18, inference loss is: 25.03, reg_loss is: 0.15
epoch 0, step 800, total loss is: 20.31, inference loss is: 20.16, reg_loss is: 0.15
epoch 0, step 900, total loss is: 21.15, inference loss is: 21.00, reg_loss is: 0.15
epoch 0, step 1000, total loss is: 23.45, inference loss is: 23.29, reg_loss is: 0.15
epoch 0, step 1100, total loss is: 21.60, inference loss is: 21.45, reg_loss is: 0.16
epoch 0, step 1200, total loss is: 20.70, inference loss is: 20.54, reg_loss is: 0.16
epoch 0, step 1300, total loss is: 18.63, inference loss is: 18.47, reg_loss is: 0.16
epoch 0, step 1400, total loss is: 19.59, inference loss is: 19.43, reg_loss is: 0.16
epoch 0, step 1500, total loss is: 19.24, inference loss is: 19.08, reg_loss is: 0.16
epoch 0, step 1600, total loss is: 16.85, inference loss is: 16.69, reg_loss is: 0.16
epoch 0, step 1700, total loss is: 17.70, inference loss is: 17.53, reg_loss is: 0.16
epoch 0, step 1800, total loss is: 18.93, inference loss is: 18.76, reg_loss is: 0.17
epoch 0, step 1900, total loss is: 19.98, inference loss is: 19.81, reg_loss is: 0.17
epoch 0, step 2000, total loss is: 15.79, inference loss is: 15.62, reg_loss is: 0.17
============== Validation: accuracy on 12000 LFW images is: 0.83717 ===============
epoch 0, step 2100, total loss is: 15.67, inference loss is: 15.50, reg_loss is: 0.17
epoch 0, step 2200, total loss is: 15.03, inference loss is: 14.85, reg_loss is: 0.17
epoch 0, step 2300, total loss is: 18.76, inference loss is: 18.59, reg_loss is: 0.17
epoch 0, step 2400, total loss is: 20.88, inference loss is: 20.70, reg_loss is: 0.18
epoch 0, step 2500, total loss is: 18.35, inference loss is: 18.17, reg_loss is: 0.18
epoch 0, step 2600, total loss is: 19.18, inference loss is: 19.01, reg_loss is: 0.18
epoch 0, step 2700, total loss is: 21.52, inference loss is: 21.34, reg_loss is: 0.18
epoch 0, step 2800, total loss is: 16.99, inference loss is: 16.81, reg_loss is: 0.18
epoch 0, step 2900, total loss is: 19.77, inference loss is: 19.59, reg_loss is: 0.18
epoch 0, step 3000, total loss is: 18.93, inference loss is: 18.75, reg_loss is: 0.19
epoch 0, step 3100, total loss is: 19.50, inference loss is: 19.31, reg_loss is: 0.19
epoch 0, step 3200, total loss is: 19.26, inference loss is: 19.07, reg_loss is: 0.19
epoch 0, step 3300, total loss is: 18.88, inference loss is: 18.69, reg_loss is: 0.19
epoch 0, step 3400, total loss is: 14.71, inference loss is: 14.52, reg_loss is: 0.19
epoch 0, step 3500, total loss is: 17.95, inference loss is: 17.76, reg_loss is: 0.19
epoch 0, step 3600, total loss is: 18.98, inference loss is: 18.78, reg_loss is: 0.19
epoch 0, step 3700, total loss is: 17.28, inference loss is: 17.08, reg_loss is: 0.20
epoch 0, step 3800, total loss is: 16.27, inference loss is: 16.08, reg_loss is: 0.20
epoch 0, step 3900, total loss is: 15.79, inference loss is: 15.59, reg_loss is: 0.20
epoch 0, step 4000, total loss is: 16.52, inference loss is: 16.32, reg_loss is: 0.20
============== Validation: accuracy on 12000 LFW images is: 0.88000 ===============
epoch 0, step 4100, total loss is: 17.89, inference loss is: 17.69, reg_loss is: 0.20
epoch 0, step 4200, total loss is: 17.22, inference loss is: 17.02, reg_loss is: 0.20
... ...
通过“模型训练”中的测试指令启动测试。
完整验证过程的输出日志存储在../log文件夹下
下面是验证过程中的部分日志输出
WARNING:tensorflow:From /usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages/npu_bridge/estimator/npu/npu_optimizer.py:284: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:From inference.py:58: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.
begin db lfw convert.
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
loading bin 12000
(12000, 112, 112, 3)
... ...
2021-09-26 10:51:17.759372: W tf_adapter/util/ge_plugin.cc:130] [GePlugin] can not find Environment variable : JOB_ID
2021-09-26 10:51:20.801079: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SOURCE is null.
2021-09-26 10:51:20.801131: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SINK is null.
2021-09-26 10:51:20.805008: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node init is null.
Restoring pretrained model: ../MobileFaceNet_Tensorflow/output/ckpt_best
model_checkpoint_path: "../MobileFaceNet_Tensorflow/output/ckpt_best/MobileFaceNet_best.ckpt"
all_model_checkpoint_paths: "../MobileFaceNet_Tensorflow/output/ckpt_best/MobileFaceNet_best.ckpt"
2021-09-26 10:51:32.358933: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SOURCE is null.
2021-09-26 10:51:32.358985: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SINK is null.
2021-09-26 10:51:32.362829: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node save/restore_all is null.
testing...
best_threshold_index 123 0.9872222222222222
thresholds max: 1.23 <=> min: 1.23
best_threshold_index 123 0.9866666666666667
thresholds max: 1.23 <=> min: 1.23
best_threshold_index 123 0.9879629629629629
thresholds max: 1.23 <=> min: 1.23
best_threshold_index 123 0.9875925925925926
thresholds max: 1.23 <=> min: 1.23
best_threshold_index 123 0.9874074074074074
thresholds max: 1.23 <=> min: 1.23
best_threshold_index 123 0.9862962962962963
thresholds max: 1.23 <=> min: 1.23
best_threshold_index 123 0.9875925925925926
thresholds max: 1.23 <=> min: 1.23
best_threshold_index 123 0.9877777777777778
thresholds max: 1.23 <=> min: 1.23
best_threshold_index 123 0.9866666666666667
thresholds max: 1.23 <=> min: 1.23
best_threshold_index 123 0.9864814814814815
thresholds max: 1.23 <=> min: 1.23
total time 102.303 to evaluate 12000 images of lfw
Accuracy: 0.987
Testing Done
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。