Release Notes

Released Oct 27, 2025

Optrace data is now available for profile jobs with QNN as the specified runtime (context binaries or DLC). Expand the Runtime Layer Analaysis section and click view optrace to review detailed HTP profiling data: QNN Op and HTP Op details, Optrace timing and bottlenecks.

Released Oct 22, 2025

A recent update in our device farm caused intermittent failures in profile and inference jobs. The error would be Failed to fully run the model, failed after compiling or Failed to profile the model: unexpected device error. We have pushed a work-around in our deployment, please try your job again.
Added QAIRT 2.39 (latest) support.
Upgraded to PyTorch 2.8.0.
Updated PSNR function within Quantize Job when using LiteMP (mixed precision) feature to align with AIMET PSNR calculation (if used offline).

Released Oct 6, 2025

Minor updates to simplify compile models examples in our documentation.
Updated to ONNX Runtime 1.23.
ONNX models generated by AI Hub now have ir_version=11. This requires ONNX Runtime 1.23 to run inference.
Updated to AIMET-ONNX 2.15.
AI Hub has removed the ability to download models directly to memory. Models must be downloaded to disk.

Released Sept 29, 2025

Additional profile job options have been added for ONNX Runtime (via –onnx_options). Take a look at our API documentation on how to set vtcm_mb, context_priority and offload_graph_io_quantization.
New Devices: We have 2 new mobile devices, Snapdragon® 8 Elite Gen 5 (SM8850) and Snapdragon® 7 Gen 4 (SM7750) that are available now to start testing your model on them through Qualcomm AI Hub! To start testing your model on one of these devices, simply specify the device name i.e. device=hub.Device("Snapdragon 8 Elite Gen 5 QRD") when submitting your job.
A fix was implemented for precompiled ONNX models.

Released Sept 22, 2025

Upgraded to AIMET-ONNX 2.14.
QAIRT 2.38.0 is the latest version, now available. QAIRT 2.37.0 has also been upgraded to 2.37.1.
Added additional logging for job creation and retry operations.

Released Sept 8, 2025

Upgraded to ONNX Runtime 1.22.2
Reduced job failures with reason ‘unknown’.

Released Sept 2, 2025

LiteMP (which stands for light mixed precision) is now available in beta, when submitting a quantize job on AI Hub. You can enable this option when submitting a quantize job, to allow the percentage of layers specified to be modified to the specified precision (–lite_mp percentage=10;override_qtype=int16).
Upgraded to aimet-onnx 2.13.0.

Released Aug 22, 2025

We’ve migrated to a new single sign on (SSO) provider. There should be no disruption to users. If you hit any issues, please try re-logging in and let us know if there is any unexpected behaviour.
Upgraded to QAIRT 2.37.0, aimet_onnx 2.12.0.
Downloading models directly to memory has been deprecated. A warning has been added to the client and documentation accordingly. Please download source model to file via model = compile_job.download_target_model("model_filename"). The filename is required.
Client API get_jobs which has been deprecated for some time, has officially been removed. Please use get_job_summaries instead.

Released Aug 11, 2025

DLC support is now fully functional in AI Hub! As a result, we will be moving away from generating .so model library files. DLC files have proven to be suitable in the same situations (i.e. targeting AArch64 Android) and are more reliable. As a result, .so generation has been deprecated via AI Hub (effective immediately - it is no longer supported or maintained) and subsequently removed in ~6 weeks. To generate a .bin file, it will now go through .dlc. The runtime options to generate .bin (–qnn_context_binary) and .dlc (–qnn_dlc) as well as other runtime options remain unchanged! The deprecated option –qnn_bin_conversion_via_model_library can be used to generate .bin files through .so models. Linking jobs now take in .dlc files (1 or more) as well as one or more devices. The old style of linking .bin models is being deprecated. Additionally, –target_runtime qnn_lib_aarch64_android is being deprecated.
Upgraded to QAIRT 2.36 (2.36.4), which is now the default version. Currently Hub supports 2.33.2, 2.34.2 and 2.36.4.
Improved error messaging for validation errors in compile jobs due to QNN Context Binary Generation failures. Message includes <failed op_name>: <op_type>: <root-cause of failure> when applicable.
Added common ONNX runtime errors to our FAQ in our documentation.
Error messages on specific failed jobs now includes a pointer to check the runtime log for more information when applicable.
Please note we no longer quantize weights in the pipeline prior to profile/inference jobs that use ONNX Runtime. This change was made to support future enhancements, and for our performance measurements to be easily reproduced.

Released July 28, 2025

Upgraded to AIMET-ONNX 2.10.0 in our Quantize Job.
Localization of our documentation (https://app.aihub.qualcomm.com/docs/) is now available in 3 additional languages - Korean, Japanese and Traditional Chinese. Check it out by toggling the language selector in the bottom left corner.
Added a weight sharing attribute to devices (>=v73 and SA8295P). When you submit hub.get_devices() , then the attribute htp-supports-weight-sharing:true indicates which device HTPs allow weights to be shared.

Released July 14, 2025

AI Hub now always produces ONNX models with external weights (.zip) regardless of the size of the model to address downstream issues affecting model upload and visualization. To turn this into a single model with embedded weights, users can do
```
import onnx
model = onnx.load("your_model.onnx")
onnx.save(model, "your_new_model.onnx")
```
Fixed a compilation bug when ONNX models use a Reshape layer from OpSet 5.
Resolved a bug with PSNR computation in Quantize jobs for ONNX models with external weights.
Addressed some AIMET+ONNX to QNN compilation issues.
Relaxed the required protobuf version for the AI Hub client. Users can now install protobuf>=3.20,<=6.31.1.
An issue affecting the tensor output order for Yolov8 QNN context binary was resolved with latest QAIRT versions.
The AI Hub version being shown under versions tab was temporarily displaying local-hub-1999 this has since been resolved.
Quantized model performance for w8a8 and w8a16 models targeting .dlc reported a significant regression due to a performance graph pass that unintentionally affected Q/DQ nodes.

Released June 30, 2025

Model visualization has been expanded to show int4 layers.
Upgraded various dependent packages: QAIRT 2.35 as our latest version. QAIRT 2.34.2 remains the default version. AIMET ONNX 2.8.0 for our Quantize Job. ONNX Runtime 1.22.

Released June 16, 2025

Upgraded to QAIRT SDK 2.34.2. This is now the default QAIRT SDK version used by all jobs.
Added support for visualizing .dlc models (please note this requires using QAIRT SDK >= 2.34.2).
Quantize jobs now use aimet-onnx 2.7.0.
You can now pass --verbose to the Qualcomm AI Hub CLI client to enable verbose logging. This is helpful when debugging a failed job and may be requested by our team to gain more details.
Bug fix to resolve sporadic failures seen when profiling one of multiple graphs, specifically when using our LLM export tutorial. This was due to passing –qnn_options as a flag and unintended behavior occurring.

Released June 2, 2025

Resolved a bug related to multi-graph profiling.
Added limited 6D slice support for TFLite path only, to support the RF-DETR model.
Large model support (models greater than 2GB) has been extended to include quantization jobs, and profile/inference jobs on Compute platforms. Android platforms do not currently support the profile or inference of large models.
Upgraded to aimet-onnx 2.6.0.

Released May 14, 2025

When uploading a model via CLI, a model ID + clickable model link is now printed.
Upgraded to aimet-onnx 2.5.0.

Released May 5, 2025

Users can bring large PyTorch and ONNX models to be compiled via AI Hub (that would previously hit the error message indicating models larger than 2GB weren’t supported), please note that generating large TF-Lite models is not yet supported.
Upgraded various packages: QAIRT 2.33.2, ONNX Runtime 1.21.1, aimet-onnx 2.4.0.
Additional error messaging when a job with incorrect input shapes is submitted to AI Hub.
Compile jobs now respect output shape for ONNX models when targeting precompiled_qnn_onnx path.

Released April 22, 2025

Deep Learning Container .dlc support is now available when submitting compile, profile and inference jobs on AI Hub. Take a look at our examples. Please note, this is currently for compiling torch/onnx to DLC and submitting profile/inference jobs of DLC models. Additional feature support, including DLC visualization is coming soon.
Added support for QAIRT SDK 2.33.0 as well as upgraded to 2.32.6. Default will continue to be 2.32, latest is now 2.33.
Upgraded to aimet-onnx 2.3.0.
Added Samsung Galaxy S25 devices to our available devices.

Released April 8, 2025

AI Hub now generates ONNX models with opset 20, which requires ONNX Runtime 1.17 (used to be 1.12).
Added support for AffineGrid op in ONNX2TF.
When compiling an AIMET model, AI Hub now supports AIMET encodings version 1.0.0.

Released March 25, 2025

We were seeing an increase in “Job timed out after 8h” as a failure reason. This has been investigated and mitigated. Please re-try your jobs if you hit this issue.
The option ort which was previously used to specify ONNX Runtime has been removed. Please continue to use onnx instead: --target_runtime onnx.
Added the ability to specify multiple HTP optimization options.

Released March 10, 2025

Upgraded to QAIRT 2.32. The Qualcomm AI Engine Direct SDK (aka QNN) is now known as the Qualcomm AI Runtime SDK (aka QAIRT).
Upgraded AIMET-ONNX to version 2.0.1. This is the underlying engine for our Quantize Job. As a result of this upgrade, several quantize job bugs were resolved, including jobs failing with internal quantizer error. Please re-submit your jobs and let us know if there are any issues.
Implemented static shape ROI Align TensorFlow Lite support.
Version information of ONNX Runtime under the version subsection on compile jobs was added.
Please upgrade to the newest client version, 0.25.0.

Released February 24, 2025

Upgraded to QNN version 2.31.
Check out our new devices page which provides key runtime support information.

Released Febuary 10, 2025

ONNX models now support bringing external weights when uploading the model. The directory name with .onnx extension or file name with .onnx.zip extension and must have exactly one weight file, which must be a .data file. Please note: support for LLMs using this feature is still under development.
Median and sparkline graphic added when viewing model inference time on a profile job, providing more detailed timing information.
Multi-model job visualization for all model assets: now when you click Visualize on the top right corner of a job, there is the ability to navigate and visualize all applicable models associated with the job (source, intermediate and target)
Added new IoT proxy devices for Qualcomm QCS8275 and QCS9075. These devices are now available in AI Hub to target when submitting a job.
If you observe the following error message: Tensors {'...'} occur in value_info but also in model IO. See https://github.com/onnx/onnx/blob/main/docs/IR.md#graphs, please update to the latest AI Hub Models (pip install qai-hub-models). We are aware of this issue directly affecting Llama model compilation.
Jobs referencing expired datasets are now surfaced with a precise error message.

Released January 22, 2025

Improved upload speeds for large models to prevent timeouts and SSLEOF errors. Please continue to report any errors that occur!
Various changes to improve error messaging of failed jobs. If you experience a failed job and are looking for more granularity as to what happened, please feel free to share your link on Slack.

Released January 6, 2025

We have removed the AIMET PyTorch model (.pt) upload path. We recommend using ONNX models (.onnx) and an encodings file (.encodings) with the quantization parameters instead.
Our FAQ has migrated to its new home, in our documentation! Check it out!
Inference jobs using quantized QNN models with fp16 I/O will now accept fp32 data.

Released December 13, 2024

Added Translation for ONNX NonMaxSupression op to TensorFlow Lite equivalent.
Warning: we will be deprecating the AIMET PyTorch model (.pt) upload path as part of our deployment on January 6th. We recommend using ONNX models (.onnx) and an encodings file (.encodings) with the quantization parameters instead.

Released November 25, 2024

Upgraded to QNN 2.28.2 and 2.28.0 for auto devices.
Various improvements related to upload issues for Llama family of models to AI Hub. We’ve addressed feedback from users that were experiencing timeouts during upload to AI Hub with these LLMs. Let us know if you experience continued issues.
You can now compress FP32 weights to FP16 by adding –quantize_weight_type float16 to compile options.
New Auto devices available in AI Hub! We now provide SA8775P and SA7255P ADP devices.

Released November 11, 2024

Announcing: link jobs! This combines multiple models into a single context binary so weights can be shared between graphs, saving disk space. Link jobs are exclusive to QNN context binaries for the Hexagon Tensor Processor (HTP).
Improvements to qai-hub client addressing common issues: the latest client version 0.19.0 has more fixes for errors in uploading that have been encountered. Additionally, the upload size limit has been raised from 5GB to 10GB (compressed), and large files are uploaded in multiple parts.
ONNX version has been updated to 1.17.0.
Updated examples in our documentation, specifically for quantization benchmarking and compile jobs.

Released October 28, 2024

New device: Snapdragon 8 Elite was announced at Snapdragon Summit and is available for all users by specifying device = hub.Device("Snapdragon 8 Elite QRD").
New device: The automotive device, Snapdragon Cockpit Gen 4 (SA8295P) is now ready to use in AI Hub. Select it with --device "SA8295P ADP" --device-os 14.
Once you’ve signed in to AI Hub with SSO, you will automatically be re-directed to the page of interest.

Released October 14, 2024

(Beta) Qualcomm AI Hub now enables converting float32 models to use integer numerics (e.g. int8, int16). This beta feature can be used via the submit_quantize_job API to quantize a PyTorch model. Check out more details and examples in our documentation.
Int64 is now supported (both inference and profiling jobs)!
Upgraded to QNN 2.27.

Released October 7, 2024

Improved support for rank-0 (scalar) tensors in inference jobs.
Update job states when a job is submitted (and fixed a pesky UI bug) to give more clarity around the stage your job is in.
Improved error messages in many cases, including: use of data types not supported by the profiler, invalid TFLite model files and out of memory errors on many devices.
Client version 0.17.0 pip install qai-hub==0.17.0 was released, this includes fixes for HTTP retries that should make uploading and downloading data much more reliable!
New device support! You can now launch jobs and target the Snapdragon X Plus on AI Hub by specifying device = hub.Device("Snapdragon X Plus 8-Core CRD")

Released September 23, 2024

The chipset attribute for all proxy devices has been renamed to include the suffix -proxy. For example, chipset:qualcomm-qcs6490 is now chipset:qualcomm-qcs6490-proxy. Device names remain unchanged.
Upgraded to ONNX Runtime 1.19.2, TFLite to 2.17.

Released September 11, 2024

Upgraded to QNN 2.26.
Models page now has a dropdown to filter by creator, making it easier to search for models owned by others in your organization.
Various bug fixes across UI, included updated visualization for QNN models. Check it out and let us know if you hit any issues!

Released August 26, 2024

Since Aug 13th, Hub no longer throws an exception upon job creation if the user already has the maximum allowed number of jobs running. Instead, new jobs are put into a pending state and automatically scheduled for execution as existing jobs finish. In the python client version 0.14.1, we have added a new property named pending to job objects. Jobs that are in pending state waiting for available backend capacity will now return True if `pending is called and False if running is called.
Upgraded to QNN 2.25.
get_job_summaries is available in the client from this version (0.15.0) and forward. The get_jobs API is deprecated and get_job_summaries should be used in its place.
We recommend updating to client version 0.15.0 pip install qai-hub==0.15.0, as well as updating your client each release to ensure you’re using all the latest features of Qualcomm AI Hub!

Released August 12, 2024

New client version, 0.14.0 is available!
Intermediate Assets: When you submit a compile job, you will now see an “intermediate assets” tab on the compile job page. This new feature allows AI Hub to save intermediate states of the compilation as first-class models on AI Hub. For instance, if you submit a TorchScript model for TFLite compilation, an intermediate ONNX model will be saved and will be accessible.
Job concurrency limits: Instead of returning an error, Hub will now automatically queue jobs past per-user max limits. If you’ve previously handled the error that was thrown with error handling, this is no longer needed to submit jobs.

Released July 29, 2024

Updated ONNX Runtime to 1.18.
Qualcomm AI Hub has extended support to include Snapdragon Ride platforms. Check out our pre-optimized AI Hub models available for Automotive devices, test out these models on real automotive devices via AI Hub and let us know if you hit any issues!

Released July 15, 2024

Improvements to memory estimates on Android devices have allowed for much more precise ranges. The profiler’s ability to avoid exogenous heap usage was improved, leading to smaller memory ranges. Try submitting a new job and check out the memory ranges!
Updated QNN to 2.24.0, ONNX to 1.16.0.
Added int16 support for ONNX Runtime.

Released July 1, 2024

AI Hub jobs can be shared with your organization automatically. To add users to your organization, email ai-hub-support@qti.qualcomm.com with the email addresses of your team.
AI Hub jobs can also be shared outside of your organization and with Qualcomm to obtain support. Click the “Share” button, in the top right of any job and specify an email of an AI Hub User and the job (and its associated model assets) will be shared. Access can also be revoked by removing an email address from the job.
Improved error messaging for AIMET models that fail to compile.
Documentation updated for precompiled_qnn_onnx.
Added detailed titles for AI Hub webpages. Now, when you have a page open it will specify the page you are on, as well as job name where applicable.
Release notes from AI Hub’s previous releases can now be found in our documentation for reference.

Released June 17, 2024

Windows devices are now widely available on AI Hub, including the brand new Snapdragon X Elite and the previous generation Snapdragon 8cx Gen 3 reference designs. When you run qai-hub list-devices, you will see it listed. Target the X Elite by specifying device = hub.Device("Snapdragon X Elite CRD").
Support for a Compiling Precompiled QNN ONNX models! Use options="--target_runtime precompiled_qnn_onnx" to specify that you’re using a pre-compiled ONNX Runtime model. (NOTE: there is a typo in the docs that will be fixed next release, please use the option as specified above).
Added documentation around supported ONNX Runtime Options.
Expanded steps in the Getting Started Quick Example to include submitting an inference job, downloading the model and more.
Additional error details highlighted on profile and inference jobs: if your job fails, check out the new section titled Additional Information from the Runtime Log. This provides key details to help you debug without having to expand and scroll through the runtime log.
Updated to QNN version 2.23.

Released June 4, 2024

Added list of device families: you’ll see these listed now when you use qai-hub list-devices (Google Pixel 3a Family, Samsung Galaxy S21 Family etc). This should help with device provisioning times, please use this option when applicable!
Updated to QNN version 2.22.6.
Support for 64bit input types as a compile option --truncate_64bit_io.

Released May 17, 2024

Added support for Snapdragon X Elite NPU on Windows via the ONNX QNN Execution Provider and Snapdragon X Elite GPU on Windows via the ONNX DirectML Execution Provider – for early access sign up here!
QNN version 2.22 support (compiled assets now target QNN 2.22, instead of QNN version 2.20).
Windows support in AI Hub!
w4a8 support for QNN (--quantize_full_type w4a8).
Additional context for when each runtime should be used in our documentation.
Deprecation of target runtime qnn_bin. Please use --target_runtime qnn_context_binary now. Context Binaries are compiled specific to device’s hardware architecture. More information can be found in our documentation, here.

Released May 6, 2024

Documentation now includes an example for compiling ONNX models to TFLite or QNN as well as profiling directly using the ONNX Runtime.
The default configuration for the ONNX Runtime now takes advantage of options for the highest speed in profiling/inferencing. It is now set to 3 which provides the most optimized model by default.
Upgraded TensorFlowLite to 2.16.1 (for profile jobs).
Additional performance fixes for compilation jobs.

Released April 22, 2024

Various performance improvements, improved error reporting and additional layer support has been added!
Added QCS8450 Proxy devices (see note in thread).
Upgraded to latest ONNX runtime version (1.17.3).
Updated documentation for ONNX runtime models.
Introduced IO options for ONNX Runtime.
Added support for w4a16 quantization for QNN path.

Released April 8, 2024

Introduced ONNX runtime(.onnx) and NPU support. Try it out by specifying options=--target_runtime onnx when submitting compile jobs.
Improvements to ONNX runtime including many speedups.
Added model visualization for ONNX runtime models.
Increased logging for compile jobs.
More proxy devices for IOT: checkout the QCS8250, QCS8550 proxy devices.
Upgraded to Tensorflow 2.15.0.
Added support for int16, w8a16 quantization via Hub.

Released March 25, 2024

Added more Galaxy S24 devices for running jobs.
Upgraded to the latest QNN version 2.20.
Increased model upload limit to 10 GB.
Added support to convert AIMET (.onnx + encodings) quantized models to ONNX and run on-device via ONNX Runtime.
Added optimization: constant folding reshape for depthwise convolutions for TFLite models.
Additional checks to prevent incorrect input names being passed via compile options.

Released March 11, 2024

Introduced devices with Snapdragon® 8 Gen 3 chipset to AI Hub. Target the Snapdragon® 8 Gen 3 by specifying device = hub.Device("Samsung Galaxy S24").

Released February 28, 2024

Qualcomm AI Hub Launched at MWC, 2024.
Support for ~75 QAI Hub Models to provide performance and accuracy numbers on various mobile devices via TFLite and QNN runtimes.