編譯模型

Qualcomm® AI Hub 支援使用以下技術所訓練之模型的編譯：

PyTorch
ONNX
AI Model Efficiency Toolkit (AIMET) 量化模型.
TensorFlow (通過 ONNX)

上述任何模型都可以編譯為以下目標Runtime：

TensorFlow Lite (最近更名為 LiteRT；推薦給 Android 開發者)
ONNX (建議 Windows 開發者使用)
Qualcomm® AI Engine Direct (QNN) 上下文二進位檔 (SOC 特定)
Qualcomm® AI Engine Direct (QNN) 模型庫 (作業系統特定)
Qualcomm® AI Engine Direct (QNN) DLC（硬件無關）

要指定 Qualcomm® AI Engine Direct 的版本，請包含 --qairt_version。請參閱 Common Options。

編譯 PyTorch 到 TensorFlow Lite

要編譯 PyTorch 模型，我們必須首先使用 PyTorch 中的 jit.trace 方法在記憶體中生成 TorchScript 模型。一旦追蹤完成，您可以使用 submit_compile_job() API 來編譯該模型。

TensorFlow Lite 模型可以在 CPU、GPU (使用 GPU 委派) 或 NPU (使用 QNN 委派) 上運行.

import torch
import torchvision

import qai_hub as hub

# Using pre-trained MobileNet
torch_model = torchvision.models.mobilenet_v2(pretrained=True)
torch_model.eval()

# Trace model
input_shape: tuple[int, ...] = (1, 3, 224, 224)
example_input = torch.rand(input_shape)
pt_model = torch.jit.trace(torch_model, example_input)

# Compile model on a specific device
compile_job = hub.submit_compile_job(
    pt_model,
    name="MobileNet_V2",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    input_specs=dict(image=input_shape),
)

# Download the optimized compiled model
compile_job.download_target_model("MobileNet_V2.tflite")

如果您已經有保存的追蹤或腳本化的 torch 模型 (使用 torch.jit.save 保存)，您可以直接提交.我們將使用 mobilenet_v2.pt 作為範例.在此範例中，我們還會分析編譯的模型

import qai_hub as hub

# Compile a model
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    input_specs=dict(image=(1, 3, 224, 224)),
)

# Profile the compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S24 (Family)"),
)

# Download the optimized compiled model
compile_job.download_target_model("MobileNet_V2.tflite")

編譯 PyTorch 模型到 QNN 模型庫

Qualcomm® AI Hub 支援將 PyTorch 模型編譯和分析為 QNN 模型庫.在此範例中，我們將使用 mobilenet_v2.pt 並將其編譯為 ARM64 Android 平台 (aarch64_android) 的 QNN 模型庫 (.so 檔案).

模型庫是一種與作業系統相關的部署機制，與 SoC 無關。請注意，Qualcomm® AI Engine Direct SDK 不保證模型庫與所有 SDK 版本的 ABI 相容性。這表示使用某一版本 SDK 編譯的模型不一定能在其他版本的 SDK 上執行。詳情請參閱 Qualcomm® AI Engine Direct Options。

import qai_hub as hub

# Compile a model to a QNN Model Library
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_lib_aarch64_android",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

返回值是一個 CompileJob 的實例.請參閱此範例了解如何為 Snapdragon® 神經處理單元 (NPU) 分析此模型.

將 PyTorch 模型編譯為 QNN DLC

Qualcomm® AI Hub 支持將 PyTorch 模型編譯和分析為 QNN DLC。在此範例中，我們將使用 mobilenet_v2.pt 並將其編譯為 QNN DCL（.bin 文件）。

DLC 與硬體無關。 Qualcomm® AI Engine Direct SDK 保證 DLC 可與更新版本的 SDK 相容。這表示使用某一版本 SDK 編譯的 DLC 可在更新版本的 SDK 上執行。詳情請參閱 Qualcomm® AI Engine Direct Options。

import qai_hub as hub

# Compile a model to QNN DLC
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    options="--target_runtime qnn_dlc",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

返回值是一個 CompileJob 的實例.請參閱此範例了解如何為 Snapdragon® 神經處理單元 (NPU) 分析此模型.

編譯 PyTorch 模型到 QNN 上下文二進位檔

Qualcomm® AI Hub 支援將 PyTorch 模型編譯和分析為 QNN 上下文二進位檔.在此範例中，我們將使用 mobilenet_v2.pt 並將其編譯為針對特定設備優化的 QNN 上下文二進位檔.由於它們是針對特定硬體優化的，因此只能為單一設備編譯.

上下文二進位檔是一種 SOC 特定的部署機制.當為設備編譯時，預期模型將部署到相同的設備.該格式與操作系統無關，因此相同的模型可以部署在 Android、Linux 或 Windows .上下文二進位檔僅設計用於 NPU.

import qai_hub as hub

# Compile a model to QNN context binary
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    options="--target_runtime qnn_context_binary",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

返回值是一個 CompileJob 的實例.請參閱此範例了解如何為 Snapdragon® 神經處理單元 (NPU) 分析此模型.

QNN 上下文二進位檔也可以嵌入到 ONNX 模型中.

編譯為預編譯的 QNN ONNX

Qualcomm® AI Hub 支援編譯和分析預編譯的 ONNX Runtime 模型。這是一個與 ONNX Runtime 相容的模型，包含可在 Snapdragon 設備上使用 ONNX Runtime 運行的預編譯 QNN 二進位檔案。更多詳細資訊請參閱此處文件。

使用預編譯 QNN ONNX 的優點:

部署方便:適用於 Android、Linux 或 Windows.
性能提升:相當於 QNN 上下文二進位檔.
簡單的推理代碼: ONNX Runtime 使用 QNN Execution Provider 在編譯的模型上運行推理.
大型模型:適用於大型模型 (>1GB) 如 LLMs、Stable Diffusion 等.

請注意，QNN 上下文二進位檔與作業系統無關，但與裝置相關.此外，上下文二進位檔僅適用於 NPU.在此範例中，假設我們想要針對 Snapdragon® 8 Elite:

import qai_hub as hub

# Compile a model to QNN context binary
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Snapdragon 8 Elite QRD"),
    options="--target_runtime precompiled_qnn_onnx",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

編譯的模型是一個可以打包的目錄（副檔名為 .onnx），其中包含一個 ONNX 檔案和一個 QNN 上下文二進位檔案。如果您上傳自己預先編譯的 ONNX Runtime 模型，它應該符合以下文件夾結構：

<modeldir>.onnx
   ├── <model>.onnx
   └── <model>.bin

請注意，從 ONNX 模型到 QNN 上下文二進位檔有相對路徑引用，因此如果您重新命名或移動 .bin 檔案，請注意該引用.

編譯 PyTorch 模型以適用於 ONNX Runtime

Qualcomm® AI Hub 支援為 ONNX Runtime 編譯 PyTorch 模型。在此範例中，我們將使用 mobilenet_v2.pt 並將其編譯為 ONNX 模型。此模型可以使用 ONNX Runtime 進行分析。

ONNX Runtime 支援在 CPU、GPU（使用 DML Execution Provider）或 NPU（使用 QNN Execution Provider）上執行：

import qai_hub as hub

# Compile a model to an ONNX model
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime onnx",
    input_specs=dict(image=(1, 3, 224, 224)),
)
# Download the optimized compiled model
compile_job.download_target_model("MobileNet_V2.onnx")

編譯 ONNX 模型為 TensorFlow Lite 或 QNN

Qualcomm® AI Hub 也支援將 ONNX 模型編譯為 TensorFlow Lite 或 QNN 模型庫。我們將使用 mobilenet_v2.onnx 作為範例。

import qai_hub as hub

# Compile a model to TensorFlow Lite
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
compile_job.download_target_model("MobileNet_V2.tflite")

# Compile a model to a QNN Model Library
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_lib_aarch64_android",
)
compile_job.download_target_model("MobileNet_V2.so")

# Compile a model to a QNN DLC
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_dlc",
)
compile_job.download_target_model("MobileNet_V2.dlc")

請注意，ONNX 模型可能是未量化的（如上例所示），也可能是量化的（如我們在量化中所見）。如果來源模型是量化的，則會遵循量化參數以生成量化的可部署資產。ONNX 模型的目錄也可以支持 ONNX 模型的外部權重。這個目錄（附檔名為 .onnx）可以選擇壓縮，必須包含一個 .onnx 文件和一個附檔名為 .data 的權重文件。它應符合以下文件夾結構：

<modeldir>.onnx
   ├── <model>.onnx
   └── <model>.data

其中 <modeldir> 和 <model> 可以是任何名稱。如果您的 ONNX 模型不符合該結構，請使用以下代碼使其符合：

# if you have an ONNX model "file.onnx" which uses external weights,
# but does not adhere to Qualcomm AI Hub's required format, use this
# code to make it adhere

import onnx

model = onnx.load("file.onnx")
onnx.save(model, "new_file.onnx", save_as_external_data=True, location="new_file.data")

# place both "new_file.onnx" and "new_file.data" in a new directory with
# a .onnx extension, without any other files and upload that directory
# to Qualcomm AI Hub, either as is or as a .zip file

請注意，從 ONNX 模型到權重文件有相對路徑引用，因此如果您重新命名或移動權重文件，請注意該引用。

將使用 AIMET 量化的模型編譯為 TensorFlow Lite 或 QNN

AI Model Efficiency Toolkit (AIMET) 是一個開源庫，提供用於訓練神經網絡模型的先進模型量化和壓縮技術。AIMET 的 QuantizationSimModel 可以導出為 ONNX 模型（.onnx）和具有量化參數的編碼文件（.encodings）。

要使用此模型，請建立一個名稱中包含 .aimet 的目錄。它應包含一個 .onnx 模型和相應的編碼文件，

<modeldir>.aimet
   ├── <model>.onnx
   ├── <model>.data (optional)
   └── <encodings>.encodings

其中 <modeldir>, <model>, 和 <encodings> 可以是任何名稱。只有當 ONNX 模型具有外部權重時，才需要 <model.data>。

讓我們以 mobilenet_v2_onnx.aimet.zip 為例。解壓到 mobilenet_v2_onnx.aimet 目錄後，我們可以通過以下方式提交編譯作業：

import qai_hub as hub

# Compile to TensorFlow Lite
compile_job = hub.submit_compile_job(
    model="mobilenet_v2_onnx.aimet",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
)
compile_job.download_target_model("MobileNet_V2.tflite")

# Compile to a QNN DLC
compile_job = hub.submit_compile_job(
    model="mobilenet_v2_onnx.aimet",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    options="--target_runtime qnn_dlc --quantize_full_type int8",
)
compile_job.download_target_model("MobileNet_V2.dlc")