【CUDA-BEVFusion】tool/build_trt_engine.sh 文件解读

🕗 发布于 2025-01-23 15:15 计算机视觉 深度学习

`build_trt_engine.sh`

# configure the environment
. tool/environment.sh

if [ "$ConfigurationStatus" != "Success" ]; then
    echo "Exit due to configure failure."
    exit
fi

# tensorrt version
# version=`trtexec | grep -m 1 TensorRT | sed -n "s/.*\[TensorRT v\([0-9]*\)\].*/\1/p"`

# resnet50/resnet50-int8/swint-tiny
base=model/$DEBUG_MODEL

# fp16/int8
precision=$DEBUG_PRECISION

# precision flags
trtexec_fp16_flags="--fp16"
trtexec_dynamic_flags="--fp16"
if [ "$precision" == "int8" ]; then
    trtexec_dynamic_flags="--fp16 --int8"
fi

function get_onnx_number_io(){

    # $1=model
    model=$1

    if [ ! -f "$model" ]; then
        echo The model [$model] not exists.
        return
    fi

    number_of_input=`python3 -c "import onnx;m=onnx.load('$model');print(len(m.graph.input), end='')"`
    number_of_output=`python3 -c "import onnx;m=onnx.load('$model');print(len(m.graph.output), end='')"`
    # echo The model [$model] has $number_of_input inputs and $number_of_output outputs.
}

function compile_trt_model(){

    # $1: name
    # $2: precision_flags
    # $3: number_of_input
    # $4: number_of_output
    # $5: extra_flags
    name=$1
    precision_flags=$2
    number_of_input=$3
    number_of_output=$4
    extra_flags=$5
    result_save_directory=$base/build
    onnx=$base/$name.onnx

    if [ -f "${result_save_directory}/$name.plan" ]; then
        echo Model ${result_save_directory}/$name.plan already build 🙋🙋🙋.
        return
    fi
    
    # Remove the onnx dependency
    # get_onnx_number_io $onnx
    # echo $number_of_input  $number_of_output

    input_flags="--inputIOFormats="
    output_flags="--outputIOFormats="
    for i in $(seq 1 $number_of_input); do
        input_flags+=fp16:chw,
    done

    for i in $(seq 1 $number_of_output); do
        output_flags+=fp16:chw,
    done

    input_flags=${input_flags%?}
    output_flags=${output_flags%?}

    cmd="--onnx=$base/$name.onnx ${precision_flags} ${input_flags} ${output_flags} ${extra_flags} \
        --saveEngine=${result_save_directory}/$name.plan \
        --memPoolSize=workspace:2048 --verbose --dumpLayerInfo \
        --dumpProfile --separateProfileRun \
        --profilingVerbosity=detailed --exportLayerInfo=${result_save_directory}/$name.json"

    mkdir -p $result_save_directory
    echo Building the model: ${result_save_directory}/$name.plan, this will take several minutes. Wait a moment 🤗🤗🤗~.
    trtexec $cmd > ${result_save_directory}/$name.log 2>&1
    if [ $? != 0 ]; then
        echo 😥 Failed to build model ${result_save_directory}/$name.plan.
        echo You can check the error message by ${result_save_directory}/$name.log 
        exit 1
    fi
}

# maybe int8 / fp16
compile_trt_model "camera.backbone" "$trtexec_dynamic_flags" 2 2
compile_trt_model "fuser" "$trtexec_dynamic_flags" 2 1

# fp16 only
compile_trt_model "camera.vtransform" "$trtexec_fp16_flags" 1 1

# for myelin layernorm head.bbox, may occur a tensorrt bug at layernorm fusion but faster
compile_trt_model "head.bbox" "$trtexec_fp16_flags" 1 6

# for layernorm version head.bbox.onnx, accurate but slower
# compile_trt_model "head.bbox.layernormplugin" "$trtexec_fp16_flags" 1 6 "--plugins=libcustom_layernorm.so"

这段代码是一个用于编译 TensorRT 模型的 Bash 脚本。TensorRT 是 NVIDIA 提供的一个高性能深度学习推理库，能够优化和加速深度学习模型的推理过程。

1. 环境配置

tool/environment.sh 脚本被调用来配置环境。这个脚本可能设置了必要的环境变量、路径等。
如果环境配置失败（ConfigurationStatus 不等于 “Success”），脚本会输出错误信息并退出。

2. TensorRT 版本检查

注释中提到可以通过 trtexec 命令获取 TensorRT 版本，但这段代码被注释掉了，可能是为了减少依赖或简化流程。

3. 模型和精度设置

base=model/$DEBUG_MODEL：设置模型的基础路径，DEBUG_MODEL 可能是一个环境变量，指定了模型的类型（如 resnet50、resnet50-int8、swint-tiny 等）。
precision=$DEBUG_PRECISION：设置模型的精度（如 fp16 或 int8），DEBUG_PRECISION 也是一个环境变量。

4. 精度标志

trtexec_fp16_flags 和 trtexec_dynamic_flags 是根据精度设置的 TensorRT 编译标志。
如果精度是 int8，则 trtexec_dynamic_flags 会包含 --int8 标志。

5. 函数 `get_onnx_number_io`

这个函数用于获取 ONNX 模型的输入和输出数量。
它使用 Python 脚本加载 ONNX 模型并解析其输入输出数量。
如果模型文件不存在，函数会输出错误信息并返回。

6. 函数 `compile_trt_model`

这个函数用于编译 TensorRT 模型。
参数：
- name：模型名称。
- precision_flags：精度标志（如 --fp16 或 --int8）。
- number_of_input 和 number_of_output：模型的输入和输出数量。
- extra_flags：额外的编译标志。
函数首先检查是否已经存在编译好的模型文件（.plan 文件），如果存在则跳过编译。
然后根据输入输出数量生成 input_flags 和 output_flags，这些标志用于指定输入输出的格式。
使用 trtexec 命令编译模型，生成 .plan 文件，并将日志输出到 .log 文件中。
如果编译失败，脚本会输出错误信息并退出。

7. 模型编译

脚本最后调用 compile_trt_model 函数编译多个模型：
- camera.backbone 和 fuser 模型使用动态精度标志（可能是 fp16 或 int8）。
- camera.vtransform 模型只使用 fp16 精度。
- head.bbox 模型也使用 fp16 精度，但注释中提到可能存在 TensorRT 的 bug，因此提供了两种编译方式：
  - 一种是不使用 layernorm 插件，可能会更快但不够准确。
  - 另一种是使用 layernorm 插件，可能会更准确但速度较慢。

8. 总结

这个脚本的主要功能是根据指定的模型和精度，使用 TensorRT 编译 ONNX 模型为 .plan 文件，以便在 NVIDIA 硬件上进行高效的推理。
脚本通过检查环境配置、模型输入输出数量、精度标志等，确保编译过程的正确性和高效性。
如果编译失败，脚本会输出详细的错误信息，帮助用户排查问题。

原文地址：https://blog.csdn.net/old_power/article/details/145305917

免责声明：本站文章内容转载自网络资源，如侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：【leetcode100】二叉搜索树中第k小的元素
下一篇：QILSTE H13-320B2W高亮白光LED灯珠发光二极管LED

[java] java基础-字符串篇
public StringJoiner(间隔符号,开始符号,结束符号)：创建一个StringJoiner对象，指定拼接时的间隔符号，开始符号，结束符号。指JDK中提供的各种功能的Java类，这些类将底
阅读更多2025-01-24
【C++笔记】哈希表底层实现的深度剖析
哈喽，各位小伙伴大家好!上期我们讲了使用红黑树封装map和set。今天我们来讲一下哈希表底层实现的深度剖析。话不多说，我们进入正题！向大厂冲锋unordered_set的声明如下，Key就是unord
阅读更多2025-01-24
板球背后的数据魔法：如何用数据分析提升印度板球比赛策略
随着板球赛事的数据日益增多，分析技术和方法不断进步，数据已经成为理解和预测比赛结果的核心工具。无论是通过分析球员的个人表现、球队的整体策略，还是通过实时的比赛数据预测，板球比赛的未来将更加依赖数据驱动
阅读更多2025-01-24
数据分析 six库
six库是Python的一个兼容性库，旨在帮助开发者更轻松地编写同时兼容Python 2和Python 3的代码。它是由Ben Hoyt开发的，最初发布于2010年，并在Python社区中被广泛使用。
阅读更多2025-01-24
系统相关类——java.lang.Runtime 类（二）
小编打算近期更俩三期类的专栏，一些常用的专集类，给大家分好类别总结和详细的代码举例解释。今天是第二个java.lang.Runtime 类我们一直都是以这样的形式，让新手小白轻松理解复杂晦涩的概念，把
阅读更多2025-01-24
pandas基础：基本数据结构
类型，而选择多列时返回的是 DataFrame 类型。这种行为是设计上的选择，目的是为了提供更灵活的数据操作方式。中，当你从DataFrame中选择列时，选择的方式会影响返回的数据类型。具体来说，选择
阅读更多2025-01-24
快慢指针及原理证明(swift实现)
快慢指针是一种双指针技巧，常用于遍历链表或是数组。优势如下：1.线性时间复杂度：快慢指针能够在O(n)时间内完成遍历，比暴力方法更高效。2.实时处理：无需额外存储大规模数据，可以在流式日志处理中使
阅读更多2025-01-24
【数据库】详解MySQL数据库中索引的本质与底层原理
这个过程叫寻道，所消耗的时间叫做寻道时间。答：局部性原理：当一个数据被用到时，其附近的数据被用到的概率会增大，所以操作系统为了提高效率，读取数据时往往不是按需读取，而是每次都会预读，即使只需要一个字节
阅读更多2025-01-24
如何处理langcleanupsysprepaction.dll文件的丢失与损坏问题
在使用Windows操作系统时，有时可能会遇到一些DLL文件（动态链接库）丢失或损坏的问题，文件也不例外。这个文件虽然不像一些常见的系统DLL文件那样广为人知，但它对于某些特定的系统操作或应用程序来说
阅读更多2025-01-24
Couchbase UI: Indexes
在Couchbase中，索引的这些指标可以帮助你评估索引的性能和状态。
阅读更多2025-01-24