Release Notes

Released October 28

  • New device: Snapdragon 8 Elite was announced at Snapdragon Summit and is available for all users by specifying device = hub.Device("Snapdragon 8 Elite QRD")

  • New device: The automotive device, Snapdragon Cockpit Gen 4 (SA8295P) is now ready to use in AI Hub. Select it with --device "SA8295P ADP" --device-os 14.

  • Once you’ve signed in to AI Hub with SSO, you will automatically be re-directed to the page of interest

Released October 14

  • (Beta) Qualcomm AI Hub now enables converting float32 models to use integer numerics (e.g. int8, int16). This beta feature can be used via the submit_quantize_job API to quantize a PyTorch model. Check out more details and examples in our documentation.

  • Int64 is now supported (both inference and profiling jobs)!

  • Upgraded to QNN 2.27

Released October 7

  • Improved support for rank-0 (scalar) tensors in inference jobs.

  • Update job states when a job is submitted (and fixed a pesky UI bug) to give more clarity around the stage your job is in.

  • Improved error messages in many cases, including: use of data types not supported by the profiler, invalid TFLite model files and out of memory errors on many devices.

  • Client version 0.17.0 pip install qai-hub==0.17.0 was released, this includes fixes for HTTP retries that should make uploading and downloading data much more reliable!

  • New device support! You can now launch jobs and target the Snapdragon X Plus on AI Hub by specifying device = hub.Device("Snapdragon X Plus 8-Core CRD")

Released September 23

  • The chipset attribute for all proxy devices has been renamed to include the suffix -proxy. For example, chipset:qualcomm-qcs6490 is now chipset:qualcomm-qcs6490-proxy. Device names remain unchanged.

  • Upgraded to ONNX Runtime 1.19.2, TFLite to 2.17

Released September 11

  • Upgraded to QNN 2.26

  • Models page now has a dropdown to filter by creator, making it easier to search for models owned by others in your organization.

  • Various bug fixes across UI, included updated visualization for QNN models. Check it out and let us know if you hit any issues!

Released August 26

  • Since Aug 13th, Hub no longer throws an exception upon job creation if the user already has the maximum allowed number of jobs running. Instead, new jobs are put into a pending state and automatically scheduled for execution as existing jobs finish. In the python client version 0.14.1, we have added a new property named pending to job objects. Jobs that are in pending state waiting for available backend capacity will now return True if `pending is called and False if running is called.

  • Upgraded to QNN 2.25.

  • get_job_summaries is available in the client from this version (0.15.0) and forward. The get_jobs API is deprecated and get_job_summaries should be used in its place.

  • We recommend updating to client version 0.15.0 pip install qai-hub==0.15.0, as well as updating your client each release to ensure you’re using all the latest features of Qualcomm AI Hub!

Released August 12

  • New client version, 0.14.0 is available!

  • Intermediate Assets: When you submit a compile job, you will now see an “intermediate assets” tab on the compile job page. This new feature allows AI Hub to save intermediate states of the compilation as first-class models on AI Hub. For instance, if you submit a TorchScript model for TFLite compilation, an intermediate ONNX model will be saved and will be accessible.

  • Job concurrency limits: Instead of returning an error, Hub will now automatically queue jobs past per-user max limits. If you’ve previously handled the error that was thrown with error handling, this is no longer needed to submit jobs.

Released July 29

  • Updated ONNX Runtime to 1.18

  • Qualcomm AI Hub has extended support to include Snapdragon Ride platforms. Check out our pre-optimized AI Hub models available for Automotive devices, test out these models on real automotive devices via AI Hub and let us know if you hit any issues!

Released July 15

  • Improvements to memory estimates on Android devices have allowed for much more precise ranges. The profiler’s ability to avoid exogenous heap usage was improved, leading to smaller memory ranges. Try submitting a new job and check out the memory ranges!

  • Updated QNN to 2.24.0, ONNX to 1.16.0

  • Added int16 support for ONNX Runtime

Released July 1

  • AI Hub jobs can be shared with your organization automatically. To add users to your organization, email ai-hub-support@qti.qualcomm.com with the email addresses of your team.

  • AI Hub jobs can also be shared outside of your organization and with Qualcomm to obtain support. Click the “Share” button, in the top right of any job and specify an email of an AI Hub User and the job (and its associated model assets) will be shared. Access can also be revoked by removing an email address from the job.

  • Improved error messaging for AIMET models that fail to compile

  • Documentation updated for precompiled_qnn_onnx.

  • Added detailed titles for AI Hub webpages. Now, when you have a page open it will specify the page you are on, as well as job name where applicable.

  • Release notes from AI Hub’s previous releases can now be found in our documentation for reference.

Released Jun 17

  • Windows devices are now widely available on AI Hub, including the brand new Snapdragon X Elite and the previous generation Snapdragon 8cx Gen 3 reference designs. When you run qai-hub list-devices, you will see it listed. Target the X Elite by specifying device = hub.Device("Snapdragon X Elite CRD")

  • Support for a Compiling Precompiled QNN ONNX models! Use options="--target_runtime precompiled_qnn_onnx" to specify that you’re using a pre-compiled ONNX Runtime model. (NOTE: there is a typo in the docs that will be fixed next release, please use the option as specified above)

  • Added documentation around supported ONNX Runtime Options

  • Expanded steps in the Getting Started Quick Example to include submitting an inference job, downloading the model and more

  • Additional error details highlighted on profile and inference jobs: if your job fails, check out the new section titled Additional Information from the Runtime Log. This provides key details to help you debug without having to expand and scroll through the runtime log.

  • Updated to QNN version 2.23

Released Jun 4

  • Added list of device families: you’ll see these listed now when you use qai-hub list-devices (Google Pixel 3a Family, Samsung Galaxy S21 Family etc). This should help with device provisioning times, please use this option when applicable!

  • Updated to QNN version 2.22.6

  • Support for 64bit input types as a compile option --truncate_64bit_io.

Released May 17

  • Added support for Snapdragon X Elite NPU on Windows via the ONNX QNN Execution Provider and Snapdragon X Elite GPU on Windows via the ONNX DirectML Execution Provider – for early access sign up here!

  • QNN version 2.22 support (compiled assets now target QNN 2.22, instead of QNN version 2.20)

  • Windows support in AI Hub!

  • w4a8 support for QNN (--quantize_full_type w4a8)

  • Additional context for when each runtime should be used in our documentation.

  • Deprecation of target runtime qnn_bin. Please use --target_runtime qnn_context_binary now. Context Binaries are compiled specific to device’s hardware architecture. More information can be found in our documentation, here.

Released May 6

  • Documentation now includes an example for compiling ONNX models to TFLite or QNN as well as profiling directly using the ONNX Runtime.

  • The default configuration for the ONNX Runtime now takes advantage of options for the highest speed in profiling/inferencing. It is now set to 3 which provides the most optimized model by default.

  • Upgraded TensorFlowLite to 2.16.1 (for profile jobs)

  • Additional performance fixes for compilation jobs.

Released Apr 22

  • Various performance improvements, improved error reporting and additional layer support has been added!

  • Added QCS8450 Proxy devices (see note in thread)

  • Upgraded to latest ONNX runtime version (1.17.3)

  • Updated documentation for ONNX runtime models

  • Introduced IO options for ONNX Runtime

  • Added support for w4a16 quantization for QNN path

Released Apr 8

  • Introduced ONNX runtime(.onnx) and NPU support. Try it out by specifying options=--target_runtime onnx when submitting compile jobs.

  • Improvements to ONNX runtime including many speedups

  • Added model visualization for ONNX runtime models

  • Increased logging for compile jobs.

  • More proxy devices for IOT: checkout the QCS8250, QCS8550 proxy devices.

  • Upgraded to Tensorflow 2.15.0

  • Added support for int16, w8a16 quantization via Hub

Released Mar 25

  • Added more Galaxy S24 devices for running jobs.

  • Upgraded to the latest QNN version 2.20.

  • Increased model upload limit to 10 GB.

  • Added support to convert AIMET (.onnx + encodings) quantized models to ONNX and run on-device via ONNX Runtime.

  • Added optimization: constant folding reshape for depthwise convolutions for TFLite models.

  • Additional checks to prevent incorrect input names being passed via compile options.

Released Mar 11

  • Introduced devices with Snapdragon® 8 Gen 3 chipset to AI Hub. Target the Snapdragon® 8 Gen 3 by specifying device = hub.Device("Samsung Galaxy S24").

Released Feb 28

  • Qualcomm AI Hub Launched at MWC, 2024.

  • Support for QAI Hub models to provide performance and acuuracy numbers on various mobile devices via TFLite and QNN runtimes.