Profiling Models
When deploying a neural network model on device, many important questions arise:
What is the inference latency across the target hardware?
Does the model fit within a certain memory budget?
Is my model able to leverage neural processing units?
Profile jobs give you answers to these questions by running your model on physical devices in the cloud and analyzing the performance.
Profiling a previously compiled model
Qualcomm® AI Hub supports profiling a previously compiled model.
In this example, we optimize and profile a model that is previously compiled
using a submit_compile_job()
. Note how we were able to use the
compiled model from compile_job
using
get_target_model()
.
import qai_hub as hub
# Profile the previously compiled model
profile_job = hub.submit_profile_job(
model=compile_job.get_target_model(),
device=hub.Device("Samsung Galaxy S23"),
)
assert isinstance(profile_job, hub.ProfileJob)
The return value is an instance of ProfileJob
. To view a
list of all your jobs, go to /jobs/.
Profiling a PyTorch model
This example requires PyTorch, which can be installed as follows.
pip3 install "qai-hub[torch]"
In this example, we optimize and profile a PyTorch model using Qualcomm® AI Hub.
from typing import List, Tuple
import torch
import qai_hub as hub
class SimpleNet(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(5, 2)
def forward(self, x):
return self.linear(x)
input_shapes: List[Tuple[int, ...]] = [(3, 5)]
torch_model = SimpleNet()
# Trace the model using random inputs
torch_inputs = tuple(torch.randn(shape) for shape in input_shapes)
pt_model = torch.jit.trace(torch_model, torch_inputs)
# Submit compile job
compile_job = hub.submit_compile_job(
model=pt_model,
device=hub.Device("Samsung Galaxy S23 (Family)"),
input_specs=dict(x=input_shapes[0]),
)
assert isinstance(compile_job, hub.CompileJob)
# Submit profile job using results form compile job
profile_job = hub.submit_profile_job(
model=compile_job.get_target_model(),
device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)
For more information on options when uploading, compiling, and submitting a job, see
upload_model()
, submit_compile_job()
, and
submit_profile_job()
.
Profiling a TorchScript model
If you already have a saved traced or scripted torch model (saved with
torch.jit.save
), you can submit it directly. We will use
mobilenet_v2.pt
as an example. Similar to the previous example, you can only profile a
TorchScript model after it is compiled to a suitable target.
import qai_hub as hub
# Compile previously saved torchscript model
compile_job = hub.submit_compile_job(
model="mobilenet_v2.pt",
device=hub.Device("Samsung Galaxy S23 (Family)"),
input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)
profile_job = hub.submit_profile_job(
model=compile_job.get_target_model(),
device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)
Profiling an ONNX model
Qualcomm® AI Hub also supports ONNX models. ONNX models can be profiled by either compiling them to a target such as TensorFlow Lite, or profiled directly using the ONNX Runtime. We will use mobilenet_v2.onnx as an example of both methods. This example compiles to a TensorFlow Lite target model.
import qai_hub as hub
compile_job = hub.submit_compile_job(
model="mobilenet_v2.onnx",
device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(compile_job, hub.CompileJob)
profile_job = hub.submit_profile_job(
model=compile_job.get_target_model(),
device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)
This example profiles the ONNX model directly using the ONNX Runtime.
import qai_hub as hub
profile_job = hub.submit_profile_job(
model="mobilenet_v2.onnx",
device=hub.Device("Samsung Galaxy S23"),
)
assert isinstance(profile_job, hub.ProfileJob)
The precompiled QNN ONNX models with a QNN context binary can also be profiled directly. We will use mobilenet_v2_qnn.onnx as an example for this.
import qai_hub as hub
profile_job = hub.submit_profile_job(
model="mobilenet_v2_qnn.onnx",
device=hub.Device("Samsung Galaxy S23"),
)
assert isinstance(profile_job, hub.ProfileJob)
Profiling a QNN Context Binary
Qualcomm® AI Hub supports QNN context binary format for profiling. We will use mobilenet_v2.bin compiled to target Samsung Galaxy S23.
import qai_hub as hub
profile_job = hub.submit_profile_job(
model="mobilenetv2.bin",
device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)
Profiling a TensorFlow Lite model
Qualcomm® AI Hub supports profiling a model in the .tflite
format as well.
We will use the SqueezeNet10 model.
import qai_hub as hub
# Profile TensorFlow Lite model (from file)
profile_job = hub.submit_profile_job(
model="SqueezeNet10.tflite",
device=hub.Device("Samsung Galaxy S23 (Family)"),
)
Profiling a model on multiple devices
Often it is important to model performance on multiple devices. In this example, we profile on a recent Snapdragon® 8 Gen 1 and Snapdragon® 8 Gen 2 devices for good test coverage. We reuse the SqueezeNet model from the TensorFlow Lite example, but this time we profile it on two devices.
import qai_hub as hub
devices = [
hub.Device("Samsung Galaxy S23 (Family)"), # Snapdragon 8 Gen 2
hub.Device("Samsung Galaxy S24 (Family)"), # Snapdragon 8 Gen 3
]
jobs = hub.submit_profile_job(model="SqueezeNet10.tflite", device=devices)
A separate profile job is created for each device.
Uploading a model for profiling
It is possible to upload a model (e.g. SqueezeNet10.tflite) without submitting a profile job.
import qai_hub as hub
hub_model = hub.upload_model("SqueezeNet10.tflite")
print(hub_model)
You can now run a profiling job using the uploaded model’s model_id
.
import qai_hub as hub
# Retrieve model using ID
hub_model = hub.get_model("mabc123")
# Submit job
profile_job = hub.submit_profile_job(
model=hub_model,
device=hub.Device("Samsung Galaxy S23 (Family)"),
)
Profiling a previously uploaded model
We can reuse a model from a previous job to launch a new profile job (e.g., on a different device). This avoids uploading the same model multiple times.
import qai_hub as hub
# Get the model from the profile job
profile_job = hub.get_job("jabc123")
hub_model = profile_job.model
# Run the model from the job
new_profile_job = hub.submit_profile_job(
model=hub_model,
device=hub.Device("Samsung Galaxy S22 Ultra 5G"),
)