Release Notes
Released December 13, 2024
Added Translation for ONNX NonMaxSupression op to TFLite equivalent.
Warning: we will be deprecating the AIMET PyTorch model (.pt) upload path as part of our deployment on January 6th. We recommend using ONNX models (.onnx) and an encodings file (.encodings) with the quantization parameters instead.
Released November 25, 2024
Upgraded to QNN 2.28.2 and 2.28.0 for auto devices.
Various improvements related to upload issues for Llama family of models to AI Hub. We’ve addressed feedback from users that were experiencing timeouts during upload to AI Hub with these LLMs. Let us know if you experience continued issues.
You can now compress FP32 weights to FP16 by adding –quantize_weight_type float16 to compile options.
New Auto devices available in AI Hub! We now provide SA8775P and SA7255P ADP devices.
Released November 11, 2024
Announcing: link jobs! This combines multiple models into a single context binary so weights can be shared between graphs, saving disk space. Link jobs are exclusive to QNN context binaries for the Hexagon Tensor Processor (HTP).
Improvements to qai-hub client addressing common issues: the latest client version 0.19.0 has more fixes for errors in uploading that have been encountered. Additionally, the upload size limit has been raised from 5GB to 10GB (compressed), and large files are uploaded in multiple parts.
ONNX version has been updated to 1.17.0.
Updated examples in our documentation, specifically for quantization benchmarking and compile jobs.
Released October 28, 2024
New device: Snapdragon 8 Elite was announced at Snapdragon Summit and is available for all users by specifying
device = hub.Device("Snapdragon 8 Elite QRD")
.New device: The automotive device, Snapdragon Cockpit Gen 4 (SA8295P) is now ready to use in AI Hub. Select it with
--device "SA8295P ADP" --device-os 14
.Once you’ve signed in to AI Hub with SSO, you will automatically be re-directed to the page of interest.
Released October 14, 2024
(Beta) Qualcomm AI Hub now enables converting float32 models to use integer numerics (e.g. int8, int16). This beta feature can be used via the submit_quantize_job API to quantize a PyTorch model. Check out more details and examples in our documentation.
Int64 is now supported (both inference and profiling jobs)!
Upgraded to QNN 2.27.
Released October 7, 2024
Improved support for rank-0 (scalar) tensors in inference jobs.
Update job states when a job is submitted (and fixed a pesky UI bug) to give more clarity around the stage your job is in.
Improved error messages in many cases, including: use of data types not supported by the profiler, invalid TFLite model files and out of memory errors on many devices.
Client version 0.17.0
pip install qai-hub==0.17.0
was released, this includes fixes for HTTP retries that should make uploading and downloading data much more reliable!New device support! You can now launch jobs and target the Snapdragon X Plus on AI Hub by specifying
device = hub.Device("Snapdragon X Plus 8-Core CRD")
Released September 23, 2024
The chipset attribute for all proxy devices has been renamed to include the suffix -proxy. For example, chipset:qualcomm-qcs6490 is now chipset:qualcomm-qcs6490-proxy. Device names remain unchanged.
Upgraded to ONNX Runtime 1.19.2, TFLite to 2.17.
Released September 11, 2024
Upgraded to QNN 2.26.
Models page now has a dropdown to filter by creator, making it easier to search for models owned by others in your organization.
Various bug fixes across UI, included updated visualization for QNN models. Check it out and let us know if you hit any issues!
Released August 26, 2024
Since Aug 13th, Hub no longer throws an exception upon job creation if the user already has the maximum allowed number of jobs running. Instead, new jobs are put into a
pending
state and automatically scheduled for execution as existing jobs finish. In the python client version0.14.1
, we have added a new property namedpending
to job objects. Jobs that are inpending
state waiting for available backend capacity will now returnTrue
if`pending
is called andFalse
ifrunning
is called.Upgraded to QNN 2.25.
get_job_summaries
is available in the client from this version (0.15.0
) and forward. Theget_jobs
API is deprecated andget_job_summaries
should be used in its place.We recommend updating to client version 0.15.0
pip install qai-hub==0.15.0
, as well as updating your client each release to ensure you’re using all the latest features of Qualcomm AI Hub!
Released August 12, 2024
New client version, 0.14.0 is available!
Intermediate Assets: When you submit a compile job, you will now see an “intermediate assets” tab on the compile job page. This new feature allows AI Hub to save intermediate states of the compilation as first-class models on AI Hub. For instance, if you submit a TorchScript model for TFLite compilation, an intermediate ONNX model will be saved and will be accessible.
Job concurrency limits: Instead of returning an error, Hub will now automatically queue jobs past per-user max limits. If you’ve previously handled the error that was thrown with error handling, this is no longer needed to submit jobs.
Released July 29, 2024
Updated ONNX Runtime to 1.18.
Qualcomm AI Hub has extended support to include Snapdragon Ride platforms. Check out our pre-optimized AI Hub models available for Automotive devices, test out these models on real automotive devices via AI Hub and let us know if you hit any issues!
Released July 15, 2024
Improvements to memory estimates on Android devices have allowed for much more precise ranges. The profiler’s ability to avoid exogenous heap usage was improved, leading to smaller memory ranges. Try submitting a new job and check out the memory ranges!
Updated QNN to 2.24.0, ONNX to 1.16.0.
Added int16 support for ONNX Runtime.
Released July 1, 2024
AI Hub jobs can be shared with your organization automatically. To add users to your organization, email ai-hub-support@qti.qualcomm.com with the email addresses of your team.
AI Hub jobs can also be shared outside of your organization and with Qualcomm to obtain support. Click the “Share” button, in the top right of any job and specify an email of an AI Hub User and the job (and its associated model assets) will be shared. Access can also be revoked by removing an email address from the job.
Improved error messaging for AIMET models that fail to compile.
Documentation updated for
precompiled_qnn_onnx
.Added detailed titles for AI Hub webpages. Now, when you have a page open it will specify the page you are on, as well as job name where applicable.
Release notes from AI Hub’s previous releases can now be found in our documentation for reference.
Released June 17, 2024
Windows devices are now widely available on AI Hub, including the brand new Snapdragon X Elite and the previous generation Snapdragon 8cx Gen 3 reference designs. When you run qai-hub list-devices, you will see it listed. Target the X Elite by specifying
device = hub.Device("Snapdragon X Elite CRD")
.Support for a Compiling Precompiled QNN ONNX models! Use
options="--target_runtime precompiled_qnn_onnx"
to specify that you’re using a pre-compiled ONNX Runtime model. (NOTE: there is a typo in the docs that will be fixed next release, please use the option as specified above).Added documentation around supported ONNX Runtime Options.
Expanded steps in the Getting Started Quick Example to include submitting an inference job, downloading the model and more.
Additional error details highlighted on profile and inference jobs: if your job fails, check out the new section titled Additional Information from the Runtime Log. This provides key details to help you debug without having to expand and scroll through the runtime log.
Updated to QNN version 2.23.
Released June 4, 2024
Added list of device families: you’ll see these listed now when you use qai-hub list-devices (Google Pixel 3a Family, Samsung Galaxy S21 Family etc). This should help with device provisioning times, please use this option when applicable!
Updated to QNN version 2.22.6.
Support for 64bit input types as a compile option
--truncate_64bit_io
.
Released May 17, 2024
Added support for Snapdragon X Elite NPU on Windows via the ONNX QNN Execution Provider and Snapdragon X Elite GPU on Windows via the ONNX DirectML Execution Provider – for early access sign up here!
QNN version 2.22 support (compiled assets now target QNN 2.22, instead of QNN version 2.20).
Windows support in AI Hub!
w4a8 support for QNN (
--quantize_full_type w4a8
).Additional context for when each runtime should be used in our documentation.
Deprecation of target runtime
qnn_bin
. Please use--target_runtime qnn_context_binary
now. Context Binaries are compiled specific to device’s hardware architecture. More information can be found in our documentation, here.
Released May 6, 2024
Documentation now includes an example for compiling ONNX models to TFLite or QNN as well as profiling directly using the ONNX Runtime.
The default configuration for the ONNX Runtime now takes advantage of options for the highest speed in profiling/inferencing. It is now set to 3 which provides the most optimized model by default.
Upgraded TensorFlowLite to 2.16.1 (for profile jobs).
Additional performance fixes for compilation jobs.
Released April 22, 2024
Various performance improvements, improved error reporting and additional layer support has been added!
Added QCS8450 Proxy devices (see note in thread).
Upgraded to latest ONNX runtime version (1.17.3).
Updated documentation for ONNX runtime models.
Introduced IO options for ONNX Runtime.
Added support for w4a16 quantization for QNN path.
Released April 8, 2024
Introduced ONNX runtime(.onnx) and NPU support. Try it out by specifying
options=--target_runtime onnx
when submitting compile jobs.Improvements to ONNX runtime including many speedups.
Added model visualization for ONNX runtime models.
Increased logging for compile jobs.
More proxy devices for IOT: checkout the QCS8250, QCS8550 proxy devices.
Upgraded to Tensorflow 2.15.0.
Added support for int16, w8a16 quantization via Hub.
Released March 25, 2024
Added more Galaxy S24 devices for running jobs.
Upgraded to the latest QNN version 2.20.
Increased model upload limit to 10 GB.
Added support to convert AIMET (.onnx + encodings) quantized models to ONNX and run on-device via ONNX Runtime.
Added optimization: constant folding reshape for depthwise convolutions for TFLite models.
Additional checks to prevent incorrect input names being passed via compile options.
Released March 11, 2024
Introduced devices with Snapdragon® 8 Gen 3 chipset to AI Hub. Target the Snapdragon® 8 Gen 3 by specifying
device = hub.Device("Samsung Galaxy S24")
.
Released February 28, 2024
Qualcomm AI Hub Launched at MWC, 2024.
Support for ~75 QAI Hub Models to provide performance and accuracy numbers on various mobile devices via TFLite and QNN runtimes.