Profiling Models

When deploying a neural network model on device, many important questions arise:

  • What is the inference latency across the target hardware?

  • Does the model fit within a certain memory budget?

  • Is my model able to leverage neural processing units?

Profile jobs give you answers to these questions by running your model on physical devices in the cloud and analyzing the performance.

Profiling a previously compiled model

Qualcomm® AI Hub supports profiling a previously compiled model. In this example, we optimize and profile a model that is previously compiled using a submit_compile_job(). Note how we were able to use the compiled model from compile_job using get_target_model().

import qai_hub as hub

# Profile the previously compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

The return value is an instance of ProfileJob. To view a list of all your jobs, go to /jobs/.

Profiling a PyTorch model

This example requires PyTorch, which can be installed as follows.

pip3 install "qai-hub[torch]"

In this example, we optimize and profile a PyTorch model using Qualcomm® AI Hub.

from typing import List, Tuple

import torch

import qai_hub as hub


class SimpleNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(5, 2)

    def forward(self, x):
        return self.linear(x)


input_shapes: List[Tuple[int, ...]] = [(3, 5)]
torch_model = SimpleNet()

# Trace the model using random inputs
torch_inputs = tuple(torch.randn(shape) for shape in input_shapes)
pt_model = torch.jit.trace(torch_model, torch_inputs)

# Submit compile job
compile_job = hub.submit_compile_job(
    model=pt_model,
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    input_specs=dict(x=input_shapes[0]),
)
assert isinstance(compile_job, hub.CompileJob)

# Submit profile job using results form compile job
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

For more information on options when uploading, compiling, and submitting a job, see upload_model(), submit_compile_job(), and submit_profile_job().

Profiling a TorchScript model

If you already have a saved traced or scripted torch model (saved with torch.jit.save), you can submit it directly. We will use mobilenet_v2.pt as an example. Similar to the previous example, you can only profile a TorchScript model after it is compiled to a suitable target.

import qai_hub as hub

# Compile previously saved torchscript model
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

Profiling an ONNX model

Qualcomm® AI Hub also supports ONNX models. ONNX models can be profiled by either compiling them to a target such as TensorFlow Lite, or profiled directly using the ONNX Runtime. We will use mobilenet_v2.onnx as an example of both methods. This example compiles to a TensorFlow Lite target model.

import qai_hub as hub

compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(compile_job, hub.CompileJob)

profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

This example profiles the ONNX model directly using the ONNX Runtime.

import qai_hub as hub

profile_job = hub.submit_profile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

The precompiled QNN ONNX models with a QNN context binary can also be profiled directly. In this example we continue the compilation example in Compiling Precompiled QNN ONNX and profile the model:

import qai_hub as hub

# Profile the previously compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Snapdragon 8 Elite QRD"),
)
assert isinstance(profile_job, hub.ProfileJob)

Profiling a QNN Context Binary

Qualcomm® AI Hub supports QNN context binary format for profiling. In this example, we continue the example from Compiling PyTorch model to a QNN Context Binary and profile the model:

import qai_hub as hub

# Profile the previously compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S24 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

Profiling a TensorFlow Lite model

Qualcomm® AI Hub supports profiling a model in the .tflite format as well. We will use the SqueezeNet10 model.

import qai_hub as hub

# Profile TensorFlow Lite model (from file)
profile_job = hub.submit_profile_job(
    model="SqueezeNet10.tflite",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)

Profiling a model on multiple devices

Often it is important to model performance on multiple devices. In this example, we profile on a recent Snapdragon® 8 Gen 1 and Snapdragon® 8 Gen 2 devices for good test coverage. We reuse the SqueezeNet model from the TensorFlow Lite example, but this time we profile it on two devices.

import qai_hub as hub

devices = [
    hub.Device("Samsung Galaxy S23 (Family)"),  # Snapdragon 8 Gen 2
    hub.Device("Samsung Galaxy S24 (Family)"),  # Snapdragon 8 Gen 3
]

jobs = hub.submit_profile_job(model="SqueezeNet10.tflite", device=devices)

A separate profile job is created for each device.

Uploading a model for profiling

It is possible to upload a model (e.g. SqueezeNet10.tflite) without submitting a profile job.

import qai_hub as hub

hub_model = hub.upload_model("SqueezeNet10.tflite")
print(hub_model)

You can now run a profiling job using the uploaded model’s model_id.

import qai_hub as hub

# Retrieve model using ID
hub_model = hub.get_model("mabc123")

# Submit job
profile_job = hub.submit_profile_job(
            model=hub_model,
            device=hub.Device("Samsung Galaxy S23 (Family)"),
)

Profiling a previously uploaded model

We can reuse a model from a previous job to launch a new profile job (e.g., on a different device). This avoids uploading the same model multiple times.

import qai_hub as hub

# Get the model from the profile job
profile_job = hub.get_job("jabc123")
hub_model = profile_job.model

# Run the model from the job
new_profile_job = hub.submit_profile_job(
    model=hub_model,
    device=hub.Device("Samsung Galaxy S22 Ultra 5G"),
)