모델 컴파일

Qualcomm® AI Hub 는 다음을 사용하여 훈련된 모델의 컴파일을 지원합니다.

PyTorch
ONNX
AI Model Efficiency Toolkit (AIMET) 양자화된 모델들
TensorFlow (ONNX를 통해)

위 모델은 다음 대상 런타임에 대해 컴파일될 수 있습니다.

TensorFlow Lite (최근 LiteRT 로 이름이 변경됨. Android 개발자에게 권장됨)
ONNX (Windows 개발자에게 권장됨)
Qualcomm® AI Engine Direct (QNN) 컨텍스트 바이너리(SOC별)
Qualcomm® AI Engine Direct (QNN) 모델 라이브러리(운영 체제별)
Qualcomm® AI Engine Direct (QNN) DLC (하드웨어 독립적)

Qualcomm® AI Engine Direct 의 버전을 지정하려면 --qairt_version 을 포함하세요. Common Options 를 참조하세요.

PyTorch 를 TensorFlow Lite 로 컴파일하기

PyTorch 모델을 컴파일하려면 먼저 jit.trace 메서드를 사용하여 메모리에서 TorchScript 모델을 생성해야 합니다. 추적이 완료되면 submit_compile_job() API를 사용하여 모델을 컴파일할 수 있습니다.

TensorFlow Lite 모델은 CPU, GPU(GPU delegation 사용) 또는 NPU(QNN delegation 사용)에서 실행할 수 있습니다.

import torch
import torchvision

import qai_hub as hub

# Using pre-trained MobileNet
torch_model = torchvision.models.mobilenet_v2(pretrained=True)
torch_model.eval()

# Trace model
input_shape: tuple[int, ...] = (1, 3, 224, 224)
example_input = torch.rand(input_shape)
pt_model = torch.jit.trace(torch_model, example_input)

# Compile model on a specific device
compile_job = hub.submit_compile_job(
    pt_model,
    name="MyMobileNet",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    input_specs=dict(image=input_shape),
)

assert isinstance(compile_job, hub.CompileJob)

이미 저장된 추적 또는 스크립트된 토치 모델이 있는 경우 (torch.jit.save 로 저장됨) 직접 제출할 수 있습니다. mobilenet_v2.pt 를 예시로 사용합니다. 예를 들어. 이 예에서 우리는 컴파일된 모델을 프로파일링합니다.

import qai_hub as hub

# Compile a model
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

# Profile the compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S24 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

PyTorch 모델을 QNN 모델 라이브러리로 컴파일

Qualcomm® AI Hub 는 PyTorch 모델을 QNN 모델 라이브러리로 컴파일하고 프로파일링하는 것을 지원합니다. 이 예제에서는 mobilenet_v2.pt 를 사용하여 ARM64 Android 플랫폼(aarch64_android)용 QNN 모델 라이브러리(.so 파일)로 컴파일할 것입니다.

모델 라이브러리는 운영체제에 따라 배포되는 방식이며, SoC에 독립적입니다. 단, Qualcomm® AI Engine Direct SDK는 모델 라이브러리가 모든 SDK 버전과 ABI 호환성을 가질 것을 보장하지 않습니다. 즉, 특정 SDK 버전으로 컴파일된 모델이 다른 버전에서 실행된다는 보장은 없습니다. 자세한 내용은 Qualcomm® AI Engine Direct Options 를 참조하세요.

import qai_hub as hub

# Compile a model to a QNN Model Library
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_lib_aarch64_android",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

반환 값은 CompileJob 의 인스턴스입니다. 이 모델을 Snapdragon® 신경 처리 장치(NPU)에 프로파일링하는 방법을 배우려면 this example 를 참조하세요

PyTorch 모델을 QNN DLC로 컴파일

Qualcomm® AI Hub 은 PyTorch 모델을 QNN DLC로 컴파일하고 프로파일링하는 것을 지원합니다. 이 예제에서는 mobilenet_v2.pt 을 사용하여 QNN DCL (.bin 파일)로 컴파일합니다.

DLC는 하드웨어에 독립적입니다. Qualcomm® AI Engine Direct SDK는 DLC가 이후 버전의 SDK와 호환될 것을 보장합니다. 즉, 특정 SDK 버전으로 컴파일된 DLC는 이후 SDK 버전에서도 실행 가능합니다. 자세한 내용은 Qualcomm® AI Engine Direct Options 를 참조하세요.

import qai_hub as hub

# Compile a model to QNN DLC
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    options="--target_runtime qnn_dlc",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

반환 값은 CompileJob 의 인스턴스입니다. 이 모델을 Snapdragon® 신경 처리 장치(NPU)에 프로파일링하는 방법을 배우려면 this example 를 참조하세요

PyTorch 모델을 QNN 컨텍스트 바이너리로 컴파일하기

Qualcomm® AI Hub 는 PyTorch 모델을 QNN 컨텍스트 바이너리로 컴파일하고 프로파일링하는 것을 지원합니다. 이 예제에서는 mobilenet_v2.pt 를 사용하여 특정 장치에서 실행되도록 최적화된 QNN 컨텍스트 바이너리로 컴파일할 것입니다. 이들은 특정 하드웨어에 맞게 최적화되어 있기 때문에 단일 장치에 대해서만 컴파일할 수 있습니다.

컨텍스트 바이너리는 SOC 전용 배포 메커니즘입니다. 디바이스에 대해 컴파일할 때 모델이 동일한 디바이스에 배포될 것으로 예상됩니다. 포맷은 운영 체제에 독립적이므로 동일한 모델을 Android, Linux 또는 Windows에 배포할 수 있습니다. 컨텍스트 바이너리는 NPU에만 설계되었습니다.

import qai_hub as hub

# Compile a model to QNN context binary
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    options="--target_runtime qnn_context_binary",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

반환 값은 CompileJob 의 인스턴스입니다. 이 모델을 Snapdragon® 신경 처리 장치(NPU)에 프로파일링하는 방법을 배우려면 this example 를 참조하세요

QNN 컨텍스트 바이너리는 ONNX 모델에도 포함될 수 있습니다.

Compiling to a Precompiled QNN ONNX

Qualcomm® AI Hub 은 사전 컴파일된 ONNX Runtime 모델을 컴파일하고 프로파일링하는 것을 지원합니다. 이는 Snapdragon 디바이스에서 ONNX Runtime 를 사용하여 실행 가능한 사전 컴파일된 QNN 바이너리를 포함하는 ONNX Runtime 호환 모델입니다. 자세한 내용은 여기 에 문서화되어 있습니다.

사전 컴파일된 QNN ONNX 를 사용하는 이점:

배포 용이성: Android, Linux, Windows에서 작동합니다.
성능 향상: QNN 컨텍스트 바이너리와 동일함.
간단한 추론 코드: ONNX Runtime 는 QNN Execution Provider 를 사용하여 컴파일된 모델에 대한 추론을 실행합니다.
대형 모델: LLM, 스테이블 디퓨전 등 큰 모델(>1GB)에 적합합니다.

QNN 컨텍스트 바이너리는 운영 체제에 독립적이지만 장치에 따라 다릅니다. 또한 컨텍스트 바이너리는 NPU에만 맞게 설계되었습니다. 이 예에서 Snapdragon® 8 Elite를 타겟으로 한다고 가정해 보겠습니다.

import qai_hub as hub

# Compile a model to QNN context binary
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Snapdragon 8 Elite QRD"),
    options="--target_runtime precompiled_qnn_onnx",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

컴파일된 모델은 선택적으로 압축된 디렉토리(확장자 .onnx)로, ONNX 파일과 QNN 컨텍스트 바이너리 파일을 포함합니다. 직접 컴파일한 사전 컴파일된 ONNX Runtime 모델을 업로드하는 경우, 다음 폴더 구조를 준수해야 합니다:

<modeldir>.onnx
   ├── <model>.onnx
   └── <model>.bin

ONNX 모델에서 QNN 컨텍스트 바이너리로의 상대 경로 참조가 있다는 점에 유의하세요. 따라서 .bin 파일의 이름을 바꾸거나 이동하는 경우 해당 참조에 주의하세요.

ONNX Runtime 를 위한 PyTorch 모델 컴파일

Qualcomm® AI Hub 는 ONNX Runtime 에 대한 PyTorch 모델 컴파일을 지원합니다. 이 예에서 우리는 mobilenet_v2.pt 그리고 ONNX 모델로 컴파일합니다. 이 모델은 ONNX Runtime 를 사용하여 프로파일링할 수 있습니다.

ONNX Runtime 는 CPU, GPU (DML 실행 공급자 사용) 또는 NPU (QNN 실행 공급자 사용)에서 실행을 지원합니다.

import qai_hub as hub

# Compile a model to an ONNX model
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime onnx",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

ONNX 모델을 TensorFlow Lite 또는 QNN으로 컴파일

Qualcomm® AI Hub 는 ONNX 모델을 TensorFlow Lite 또는 QNN 모델 라이브러리로 컴파일하는 것도 지원합니다. 우리는 mobilenet_v2.onnx 를 예로 들어보겠습니다.

import qai_hub as hub

# Compile a model to TensorFlow Lite
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(compile_job, hub.CompileJob)

# Compile a model to a QNN Model Library
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_lib_aarch64_android",
)
assert isinstance(compile_job, hub.CompileJob)

# Compile a model to a QNN DLC
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_dlc",
)
assert isinstance(compile_job, hub.CompileJob)

ONNX 모델은 양자화되지 않았을 수도 있고(위 예제와 같이) 양자화되었을 수도 있습니다(양자화 (Quantization) 에서 볼 수 있듯이). 소스 모델이 양자화된 경우, 양자화 매개변수를 준수하여 양자화된 배포 가능한 자산을 생성합니다. ONNX 모델은 외부 가중치를 지원하기 위해 디렉토리일 수도 있습니다. 선택적으로 압축된 디렉토리(확장자 .onnx)는 정확히 하나의 .onnx 파일과 정확히 하나의 .data 확장자를 가진 가중치 파일을 포함해야 합니다. 다음 폴더 구조를 준수해야 합니다:

<modeldir>.onnx
   ├── <model>.onnx
   └── <model>.data

<modeldir> 와 <model> 는 어떤 이름이든 될 수 있습니다. ONNX 모델이 해당 구조를 따르지 않는 경우, 다음 코드를 사용하여 구조를 따르도록 하세요:

# if you have an ONNX model "file.onnx" which uses external weights,
# but does not adhere to Qualcomm AI Hub's required format, use this
# code to make it adhere

import onnx

model = onnx.load("file.onnx")
onnx.save(model, "new_file.onnx", save_as_external_data=True, location="new_file.data")

# place both "new_file.onnx" and "new_file.data" in a new directory with
# a .onnx extension, without any other files and upload that directory
# to Qualcomm AI Hub, either as is or as a .zip file

ONNX 모델에서 가중치 파일로의 상대 경로 참조가 있으므로, 가중치 파일의 이름을 변경하거나 이동할 때 이 참조를 유의하세요.

AIMET 으로 양자화된 모델을 TensorFlow Lite 또는 QNN으로 컴파일

AI Model Efficiency Toolkit (AIMET)은 신경망 모델을 훈련하기 위한 고급 모델 양자화 및 압축 기술을 제공하는 오픈 소스 라이브러리입니다. AIMET 의 QuantizationSimModel 은 양자화 매개변수가 포함된 ONNX 모델(.onnx)과 인코딩 파일(.encodings)로 내보낼 수 있습니다.

이 모델을 사용하려면 이름에 .aimet 이 포함된 디렉토리를 생성하세요. 하나의 .onnx 모델과 해당 인코딩 파일을 포함해야 합니다.

<modeldir>.aimet
   ├── <model>.onnx
   ├── <model>.data (optional)
   └── <encodings>.encodings

<modeldir>, <model> 와 <encodings> 는 어떤 이름이든 될 수 있습니다. ONNX 모델에 외부 가중치가 있는 경우에만 <model.data> 가 필요합니다.

mobilenet_v2_onnx.aimet.zip 을 예로 들어보겠습니다. mobilenet_v2_onnx.aimet 디렉토리로 압축을 푼 후, 컴파일 작업을 제출할 수 있습니다

import qai_hub as hub

# Compile to TensorFlow Lite
compile_job = hub.submit_compile_job(
    model="mobilenet_v2_onnx.aimet",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
)
assert isinstance(compile_job, hub.CompileJob)

# Compile to a QNN Model Library
compile_job = hub.submit_compile_job(
    model="mobilenet_v2_onnx.aimet",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    options="--target_runtime qnn_lib_aarch64_android --quantize_full_type int8",
)
assert isinstance(compile_job, hub.CompileJob)