# sunOCR **Repository Path**: sun-yulin/sun-ocr ## Basic Information - **Project Name**: sunOCR - **Description**: Qt+PaddleOCR实现的一款离线OCR软件 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 6 - **Forks**: 0 - **Created**: 2021-08-17 - **Last Updated**: 2024-01-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # sunOCR 本项目是基于开源项目PaddleOCR实现的一款离线OCR软件,GUI界面使用Qt实现。 * https://gitee.com/paddlepaddle/PaddleOCR ## 效果图 最终实现效果如下图所示: ![效果图](效果图.jpg) ## PaddleOCR的编译 PaddleOCR使用CMake进行编译,主要编译过程参考下列文章: * https://blog.csdn.net/stq054188/article/details/118683950 * https://blog.csdn.net/weixin_38562726/article/details/115319859 编译完成后需要将项目的源码进行封装,封装自己需要的函数为动态链接库,封装时参考PaddleOCR项目提供的main函数,本项目主要封装了三个函数: * 读取PaddleOCR的config文件,进行模型的加载 ```c++ DLLEXPORT bool readOCRConfig(char* filepath) { OCRConfig config = PaddleOCR::OCRConfig(filepath); std::string tmpstr; det = new DBDetector(config.det_model_dir, config.use_gpu, config.gpu_id, config.gpu_mem, config.cpu_math_library_num_threads, config.use_mkldnn, config.max_side_len, config.det_db_thresh, config.det_db_box_thresh, config.det_db_unclip_ratio, config.use_polygon_score, config.visualize, config.use_tensorrt, config.use_fp16); if (config.use_angle_cls == true) { cls = new Classifier(config.cls_model_dir, config.use_gpu, config.gpu_id, config.gpu_mem, config.cpu_math_library_num_threads, config.use_mkldnn, config.cls_thresh, config.use_tensorrt, config.use_fp16); } rec = new CRNNRecognizer(config.rec_model_dir, config.use_gpu, config.gpu_id, config.gpu_mem, config.cpu_math_library_num_threads, config.use_mkldnn, config.char_list_file, config.use_tensorrt, config.use_fp16); std::cout << "config over"; } ``` * 释放加载的模型 ```c++ DLLEXPORT bool releaseOCRConfig() { if (det != nullptr) { delete det; det = nullptr; } if (cls != nullptr) { delete cls; cls = nullptr; } if (rec != nullptr) { delete rec; rec = nullptr; } } ``` * 图像文字识别 ```c++ DLLEXPORT char** PaddleOCRRec(cv::Mat& mat, bool& rec_flag, int& rec_length) { if (!mat.data) { rec_flag = false; return NULL; } std::vector str_res; vector>> boxes; det->Run(mat, boxes); if(!rec->RunPaddleOCR(boxes,mat,cls,str_res)) { rec_flag = false; return NULL; } rec_length = str_res.size(); char** rec_char = new char*[rec_length]; for (int i = 0; i < rec_length; ++i) { int length = str_res[i].length(); rec_char[i] = new char[length + 1]; strcpy_s(rec_char[i], length + 1, str_res[i].c_str()); } rec_flag = true; return rec_char; } ``` 其中的RunPaddleOCR是在CRNNRecognizer::RunPaddleOCR函数的基础上封装的返回识别字符串的函数,原函数字符串直接输出,并没有返回。 ```c++ /* * 封装OCR识别函数,返回vector */ bool CRNNRecognizer::RunPaddleOCR(std::vector>> boxes, cv::Mat& img, Classifier* cls, std::vector& recString) { int box_size = boxes.size(); if (box_size <= 0) {//无识别出的文本 return false; } cv::Mat srcimg; img.copyTo(srcimg); cv::Mat crop_img; cv::Mat resize_img; int index = 0; for (int i = 0; i < box_size; i++) { crop_img = GetRotateCropImage(srcimg, boxes[box_size - i - 1]); //顺序反向 if (cls != nullptr) { crop_img = cls->Run(crop_img); } float wh_ratio = float(crop_img.cols) / float(crop_img.rows); this->resize_op_.Run(crop_img, resize_img, wh_ratio, this->use_tensorrt_); this->normalize_op_.Run(&resize_img, this->mean_, this->scale_, this->is_scale_); std::vector input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f); this->permute_op_.Run(&resize_img, input.data()); // Inference. auto input_names = this->predictor_->GetInputNames(); auto input_t = this->predictor_->GetInputHandle(input_names[0]); input_t->Reshape({ 1, 3, resize_img.rows, resize_img.cols }); input_t->CopyFromCpu(input.data()); this->predictor_->Run(); std::vector predict_batch; auto output_names = this->predictor_->GetOutputNames(); auto output_t = this->predictor_->GetOutputHandle(output_names[0]); auto predict_shape = output_t->shape(); int out_num = std::accumulate(predict_shape.begin(), predict_shape.end(), 1, std::multiplies()); predict_batch.resize(out_num); output_t->CopyToCpu(predict_batch.data()); // ctc decode std::vector str_res; std::string rec_string; int argmax_idx; int last_index = 0; float score = 0.f; int count = 0; float max_value = 0.0f; for (int n = 0; n < predict_shape[1]; n++) { argmax_idx = int(Utility::argmax(&predict_batch[n * predict_shape[2]], &predict_batch[(n + 1) * predict_shape[2]])); max_value = float(*std::max_element(&predict_batch[n * predict_shape[2]], &predict_batch[(n + 1) * predict_shape[2]])); if (argmax_idx > 0 && (!(n > 0 && argmax_idx == last_index))) { score += max_value; count += 1; str_res.push_back(label_list_[argmax_idx]); } last_index = argmax_idx; } score /= count; for (int i = 0; i < str_res.size(); i++) { //std::cout << str_res[i]; rec_string += str_res[i]; } recString.push_back(rec_string); //std::cout << "\tscore: " << score << std::endl; } if (recString.size() == 0) return false; return true; } ``` ## GUI界面 GUI界面使用Qt5.14.0,主要功能是截屏,截屏后将图像转为cv::mat传入封装好的PaddleOCR的DLL中,再将识别出的结果显示到屏幕上。 截屏功能的实现主要参考: * https://zhuanlan.zhihu.com/p/212230990