Profiling Models

When deploying a neural network model on device, many important questions arise:

What is the inference latency across the target hardware?
Does the model fit within a certain memory budget?
Is my model able to leverage neural processing units?

Profile jobs give you answers to these questions by running your model on physical devices in the cloud and analyzing the performance.

Profile jobs support the --qairt_version to select a specific Qualcomm® AI Runtime version. If not specified, a version is selected according to Version Selection.

Profiling a previously compiled model

Qualcomm® AI Hub supports profiling a previously compiled model. In this example, we optimize and profile a model that is previously compiled using a submit_compile_job(). Note how we were able to use the compiled model from compile_job using get_target_model().

import qai_hub as hub

# Profile the previously compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

The return value is an instance of ProfileJob. To view a list of all your jobs, go to /jobs/.

Profiling a PyTorch model

This example requires PyTorch, which can be installed as follows.

pip3 install "qai-hub[torch]"

In this example, we optimize and profile a PyTorch model using Qualcomm® AI Hub.

import torch

import qai_hub as hub


class SimpleNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(5, 2)

    def forward(self, x):
        return self.linear(x)


input_shapes: list[tuple[int, ...]] = [(3, 5)]
torch_model = SimpleNet()

# Trace the model using random inputs
torch_inputs = tuple(torch.randn(shape) for shape in input_shapes)
pt_model = torch.jit.trace(torch_model, torch_inputs)

# Submit compile job
compile_job = hub.submit_compile_job(
    model=pt_model,
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    input_specs=dict(x=input_shapes[0]),
)
assert isinstance(compile_job, hub.CompileJob)

# Submit profile job using results form compile job
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

For more information on options when uploading, compiling, and submitting a job, see upload_model(), submit_compile_job(), and submit_profile_job().

Profiling a TorchScript model

If you already have a saved traced or scripted torch model (saved with torch.jit.save), you can submit it directly. We will use mobilenet_v2.pt as an example. Similar to the previous example, you can only profile a TorchScript model after it is compiled to a suitable target.

import qai_hub as hub

# Compile previously saved torchscript model
compile_job = hub.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

Profiling an ONNX model

Qualcomm® AI Hub also supports ONNX models. ONNX models can be profiled by either compiling them to a target such as TensorFlow Lite, or profiled directly using the ONNX Runtime. We will use mobilenet_v2.onnx as an example of both methods. This example compiles to a TensorFlow Lite target model.

import qai_hub as hub

compile_job = hub.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(compile_job, hub.CompileJob)

profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

This example profiles the ONNX model directly using the ONNX Runtime.

import qai_hub as hub

profile_job = hub.submit_profile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

The precompiled QNN ONNX models with a QNN context binary can also be profiled directly. In this example we continue the compilation example in Compiling to a Precompiled QNN ONNX and profile the model:

import qai_hub as hub

# Profile the previously compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Snapdragon 8 Elite QRD"),
)
assert isinstance(profile_job, hub.ProfileJob)

Profiling a QNN DLC

Qualcomm® AI Hub supports QNN DLC format for profiling. In this example, we continue the example from Compiling PyTorch model to a QNN DLC and profile the model:

import qai_hub as hub

# Profile the previously compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S24 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

Profiling a QNN Context Binary

Qualcomm® AI Hub supports QNN context binary format for profiling. In this example, we continue the example from Compiling PyTorch model to a QNN Context Binary and profile the model:

import qai_hub as hub

# Profile the previously compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S24 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

Profiling a TensorFlow Lite model

Qualcomm® AI Hub supports profiling a model in the .tflite format as well. We will use the SqueezeNet10 model.

import qai_hub as hub

# Profile TensorFlow Lite model (from file)
profile_job = hub.submit_profile_job(
    model="SqueezeNet10.tflite",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)

Profiling a model on multiple devices

Often it is important to model performance on multiple devices. In this example, we profile on a recent Snapdragon® 8 Gen 1 and Snapdragon® 8 Gen 2 devices for good test coverage. We reuse the SqueezeNet model from the TensorFlow Lite example, but this time we profile it on two devices.

import qai_hub as hub

devices = [
    hub.Device("Samsung Galaxy S23 (Family)"),  # Snapdragon 8 Gen 2
    hub.Device("Samsung Galaxy S24 (Family)"),  # Snapdragon 8 Gen 3
]

jobs = hub.submit_profile_job(model="SqueezeNet10.tflite", device=devices)

A separate profile job is created for each device.

Uploading a model for profiling

It is possible to upload a model (e.g. SqueezeNet10.tflite) without submitting a profile job.

import qai_hub as hub

hub_model = hub.upload_model("SqueezeNet10.tflite")
print(hub_model)

You can now run a profiling job using the uploaded model’s model_id.

import qai_hub as hub

# Retrieve model using ID
hub_model = hub.get_model("mabc123")

# Submit job
profile_job = hub.submit_profile_job(
            model=hub_model,
            device=hub.Device("Samsung Galaxy S23 (Family)"),
)

Profiling a previously uploaded model

We can reuse a model from a previous job to launch a new profile job (e.g., on a different device). This avoids uploading the same model multiple times.

import qai_hub as hub

# Get the model from the profile job
profile_job = hub.get_job("jabc123")
hub_model = profile_job.model

# Run the model from the job
new_profile_job = hub.submit_profile_job(
    model=hub_model,
    device=hub.Device("Samsung Galaxy S22 Ultra 5G"),
)