ONNX+Triton实现paddlepaddle模型的服务化部署——以paddleOCR为例

使用ONNX和Triton-server快速部署paddleOCR模型，提供数据预处理、调用模型、后处理等一系列完整的操作流程和代码实现

David_Uv

2049人浏览 · 2024-04-10 14:09:25

David_Uv · 2024-04-10 14:09:25 发布

文章目录

前言

PaddleOCR是百度飞桨开源的超轻量级文字识别模型套件，包含数十种文本检测、识别模型。
在这之前我们已经使用torchserver完成了对文本检测模型DBNet的服务化部署，对于文本识别，选择使用PaddleOCR中的SVTR模型：场景文本识别算法SVTR，最后串联这两个模型获取输入文件完整的文本内容。

PaddleOCR的服务化部署

关于paddle模型的推理部署方式，官网给出了各种解决方案：，https://github.com/PaddlePaddle/PaddleOCR/blob/main/deploy/README_ch.md
这里我们以Paddle2ONNX转换模型文件，然后用triton-inference-server进行部署，并重点介绍如何使用部署之后生成的对外接口，包括预处理、调用模型接口、后处理

模型文件

SVTR识别模型可以直接使用，也可以根据官方提供的方案自己训练，将训练好的权重转换为inference_model，转换之后的目录包含三个文件：

/inference/rec_svtr_tiny_stn_ch/
    ├── inference.pdiparams         # 识别inference模型的参数文件
    ├── inference.pdiparams.info    # 识别inference模型的参数信息，可忽略
    └── inference.pdmodel           # 识别inference模型的program文件

rec_svtr
可直接用于推理

python3 tools/infer/predict_rec.py --image_dir='./example/images/test.png' --rec_model_dir='./inference/rec_svtr_tiny_stn_ch/' --rec_algorithm='SVTR' --rec_image_shape='3,64,256' --rec_char_dict_path='./ppocr/utils/ppocr_keys_v1.txt'

模型转换

Open Neural Network Exchange (ONNX) 是机器学习模型机和深度学习模型的开放标准化格式。可以从 TensorFlow、PyTorch、Scikit-learn、Keras 和 SparkML 等框架转换模型。

使用 ONNX 的主要优势在于不同的平台和框架都会接受该标识，以便更轻松地优化模型的性能。

Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。

安装

pip install paddle2onnx -i https://pypi.tuna.tsinghua.edu.cn/simple

转换

paddle2onnx --model_dir rec_svtr_tiny_stn_ch \
            --model_filename model.pdmodel \
            --params_filename model.pdiparams \
            --save_file model.onnx \
            --enable_dev_version True

执行完毕后，即得到ONNX模型model.onnx。目录结构如下：

/inference/rec_svtr_tiny_stn_ch/
    ├── inference.pdiparams
    ├── inference.pdiparams.info
    ├── inference.pdmodel
    └── paddleOCR
    	└── 1
    		└── model.onnx

部署

因为我们的团队有许多类似的模型，为方便统一部署和管理，选择了Triton Inference Server。
NVIDIA Triton 推理服务器是 NVIDIA AI 平台的一部分，它是一款开源推理服务软件，可助力标准化模型的部署和执行，并在生产环境中提供快速且可扩展的 AI。支持 TensorFlow、NVIDIA® TensorRT™、PyTorch、ONNX Runtime 等所有主流框架，同时支持定制后端框架。可并行运行多个模型以提高吞吐量和利用率。

获取
通过源码中的脚本获取，或者直接通过docker拉取镜像，详细教程参考triton-quickstart，这里不做过多赘述。
运行

tritonserver --model-repository=/paddleOCR/inference/rec_svtr_tiny_stn_ch/

接下来我们就可以通过对外接口调用模型进行推理了。

预处理

经triton部署的模型必须指定名称、数据类型和输入维度。
inputs and outputs
我们输入的图片维度是动态的，因此需要根据不同场景指定识别图片的大小，得到统一的维度，本章模型指定的维度为（1，3，64，256），然后将处理后的图片转成二进制格式。

def preprocess(image_file):
    img = cv2.imread(image_file)
    rec_image_shape = 3, 64, 256    
    
    imgC, imgH, imgW = rec_image_shape
    norm_img = cv2.resize(
        img, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
    norm_img = norm_img.astype('float32')
    norm_img = norm_img.transpose((2, 0, 1)) / 255
    norm_img -= 0.5
    norm_img /= 0.5
    
    norm_img = norm_img[np.newaxis, :]
    norm_img = norm_img.copy()
    # 调用模型
    image_buffer = norm_img.tobytes()
    return image_buffer

调用模型

指定输入数据的结构和参数，调用模型。代码实现：

def paddleOCR_rec(image_buffer):
    raw_data = {
        "inputs": [{
            "name": "x",
            "datatype": "FP32",
            "shape": [1, 3, 64, 256],
            "parameters": {
                "binary_data_size": len(image_buffer)
            }
        }],
        "outputs": [{
            "name": "softmax_12.tmp_0",
            "datatype": "FP32",
            "shape": [1, 40, 6625],
            "parameters": {
                "binary_data": False
            }
        }]
    }
    request_buffer = json.dumps(raw_data).encode()
    
    data = request_buffer + image_buffer
    
    header = {
        "Content-Type": "application/octet-stream",
        "Accept": "*/*",
        "Inference-Header-Content-Length": str(len(request_buffer)),
        "Content-Length": str(len(image_buffer) + len(request_buffer))
    }
    
    url = "https://[接口地址]/models/paddleOCR/versions/1/infer/"
    response = requests.post(url=url, data=data, headers=header)
    return response

参数详解

raw_data定义了模型的输入和输出结构，其中输出的shape：[1, 40, 6625]代表文本内容的映射。6625指的是模型指定字典文件的大小，
字典文件：ppocr_keys_v1.txt，文件来源：https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppocr/utils

文本映射
使用argmax函数，在40个映射向量中分别找到最大元素的索引，索引区间为[0, 6624]，得到维度为（1, 40）的向量，其中的非零数字在字典文件中的映射，一一对应识别结果，0对应字典文件中的“blank”，手动跳过。例如，4245、4547、3332分别对应字典文件中的字母“o”、“n”、“e”，于是得到结果为“one”。
置信度映射
同时，使用max函数，在40个映射向量中分别找到最大元素的值，对应的是每个输出结果的置信度，同步进行查询得到最后的文本，取平均数作为最终文本内容的置信度。
以上操作在后处理部分实现。

后处理

def decode(text_index, text_prob=None, is_remove_duplicate=True):
    """ convert text-index into text-label. """
    result_list = []
    ignored_tokens = [0]
    batch_size = len(text_index)

    character_str = []
    with open("./ppocr_keys_v1.txt", "rb") as fin:
        lines = fin.readlines()
        for line in lines:
            line = line.decode('utf-8').strip("\n").strip("\r\n")
            character_str.append(line)
    
    character_str.append(" ")
    dict_character = list(character_str)
    character = ['blank'] + dict_character
    
    for batch_idx in range(batch_size):
        selection = np.ones(len(text_index[batch_idx]), dtype=bool)
        if is_remove_duplicate:
            selection[1:] = text_index[batch_idx][1:] != text_index[
                                                             batch_idx][:-1]
        for ignored_token in ignored_tokens:
            selection &= text_index[batch_idx] != ignored_token
        
        char_list = [character[text_id] for text_id in text_index[batch_idx][selection]]
        if text_prob is not None:
            conf_list = text_prob[batch_idx][selection]
        else:
            conf_list = [1] * len(selection)
        if len(conf_list) == 0:
            conf_list = [0]
        text = ''.join(char_list)
        result_list.append({
            "result": text,
            "confidence": np.mean(conf_list).tolist()
        })
    return result_list


def postprocess(response):
    pred = np.array(response.json()["outputs"][0]['data'])
    preds = pred.reshape(1, 40, 6625)
    preds_idx = preds.argmax(axis=2)
    preds_prob = preds.max(axis=2)
    
    rec_result = decode(preds_idx, preds_prob, is_remove_duplicate=True)
    
    return json.dumps(rec_result)

示例

example

识别结果

[{"result": "onecard", "confidence": 0.9846975037029811}]

完整代码

import json
import cv2
import requests
import numpy as np

def preprocess(image_file):
    img = cv2.imread(image_file)
    rec_image_shape = 3, 64, 256    
    imgC, imgH, imgW = rec_image_shape
    norm_img = cv2.resize(
        img, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
    norm_img = norm_img.astype('float32')
    norm_img = norm_img.transpose((2, 0, 1)) / 255
    norm_img -= 0.5
    norm_img /= 0.5
    
    norm_img = norm_img[np.newaxis, :]
    norm_img = norm_img.copy()
    # 调用模型
    image_buffer = norm_img.tobytes()
    return image_buffer

def paddleOCR_rec(image_buffer):
    raw_data = {
        "inputs": [{
            "name": "x",
            "datatype": "FP32",
            "shape": [1, 3, 64, 256],
            "parameters": {
                "binary_data_size": len(image_buffer)
            }
        }],
        "outputs": [{
            "name": "softmax_12.tmp_0",
            "datatype": "FP32",
            "shape": [1, 40, 6625],
            "parameters": {
                "binary_data": False
            }
        }]
    }
    request_buffer = json.dumps(raw_data).encode()
    
    data = request_buffer + image_buffer
    
    header = {
        "Content-Type": "application/octet-stream",
        "Accept": "*/*",
        "Inference-Header-Content-Length": str(len(request_buffer)),
        "Content-Length": str(len(image_buffer) + len(request_buffer))
    }
    
    url = "https://[接口地址]/models/paddleOCR/versions/1/infer/"
    response = requests.post(url=url, data=data, headers=header)
    return response
# 后处理
def decode(text_index, text_prob=None, is_remove_duplicate=True):
    """ convert text-index into text-label. """
    result_list = []
    ignored_tokens = [0]
    batch_size = len(text_index)
    
    #  "--rec_char_dict_path",
    #         type=str,
    #         default="./ppocr/utils/ppocr_keys_v1.txt"
    # --use_space_char", type=str2bool, default=True
    
    character_str = []
    with open("./ppocr_keys_v1.txt", "rb") as fin:
        lines = fin.readlines()
        for line in lines:
            line = line.decode('utf-8').strip("\n").strip("\r\n")
            character_str.append(line)
    
    character_str.append(" ")
    dict_character = list(character_str)
    character = ['blank'] + dict_character
    
    for batch_idx in range(batch_size):
        selection = np.ones(len(text_index[batch_idx]), dtype=bool)
        if is_remove_duplicate:
            selection[1:] = text_index[batch_idx][1:] != text_index[
                                                             batch_idx][:-1]
        for ignored_token in ignored_tokens:
            selection &= text_index[batch_idx] != ignored_token
        
        char_list = [character[text_id] for text_id in text_index[batch_idx][selection]]
        if text_prob is not None:
            conf_list = text_prob[batch_idx][selection]
        else:
            conf_list = [1] * len(selection)
        if len(conf_list) == 0:
            conf_list = [0]
        text = ''.join(char_list)
        result_list.append({
            "result": text,
            "confidence": np.mean(conf_list).tolist()
        })
    return result_list


def postprocess(response):
    pred = np.array(response.json()["outputs"][0]['data'])
    preds = pred.reshape(1, 40, 6625)
    preds_idx = preds.argmax(axis=2)
    preds_prob = preds.max(axis=2)
    
    rec_result = decode(preds_idx, preds_prob, is_remove_duplicate=True)
    
    return json.dumps(rec_result)


if __name__ == '__main__':
    import argparse
    
    parser = argparse.ArgumentParser(description='Script to paddleOCR.')
    parser.add_argument('--image_file', type=str, default="../example/images/en.png", help='')
    params = parser.parse_args()
    
    image_buffer = preprocess(params.image_file)
    response = paddleOCR_rec(image_buffer)
    rec_result = postprocess(response)
    
    print("rec_result:", rec_result)

欢迎指正补充！

Triton中文社区

欢迎来到由智源人工智能研究院发起的Triton中文社区，这里是一个汇聚了AI开发者、数据科学家、机器学习爱好者以及业界专家的活力平台。我们致力于成为业内领先的Triton技术交流与应用分享的殿堂，为推动人工智能技术的普及与深化应用贡献力量。

更多推荐

VS2022+CUDA v12.0配置踩坑报错MSB372+C1083

Triton中文社区

ubuntu16.04 cuda9.x cuda10.x deb(network) install

cuda 9.2 ubuntu16.04cuda download pageplease select deb(network)sudo dpkg -i cuda-repo-ubuntu1604_9.2.148-1_amd64.debsudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compu...