qai_hub.submit_inference_job

submit_inference_job(model, device, inputs, name=None, options='', retry=True)

Submits an inference job.

Parameters:

model (Union[Model, MLModel, bytes, str, Path]) – Model to run inference with. Must be one of the following: (1) Model object from a compile job via qai_hub.CompileJob.get_target_model() (2) Any TargetModel (3) Path to Any TargetModel
device (Union[Device, List[Device]]) – Devices on which to run the job.
inputs (Union[Dataset, Mapping[str, List[ndarray]], str]) –
If Dataset, it must have matching schema to model. For example, if model is a target model from a compile job, and the compile job was submitted with input_shapes=dict(a=(1, 2), b=(1, 3)). The dataset must also be created with dict(a=<list_of_np_array>, b=<list_of_np_array>). See qai_hub.submit_compile_job() for details.

If Dict, it’s uploaded as a new Dataset, equivalent to calling qai_hub.upload_dataset() with arbitrary name. Note that Dict is ordered in Python 3.7+ and we rely on the order to match the schema. See the paragraph above for an example.

If str, it’s a h5 path to Dataset.
name (Optional[str]) – Optional name for the job. Job names need not be unique.
options (str) – Cli-like flag options. See Profile & Inference Options.
retry (bool) – If job creation fails due to rate-limiting, keep retrying periodically until creation succeeds.

Returns:

job – Returns the inference jobs.

Return type:

InferenceJob | List[InferenceJob]

Examples

Submit a TFLite model for inference on a Samsung Galaxy S23:

import qai_hub as hub
import numpy as np

# TFlite model path
tflite_model = "squeeze_net.tflite"

# Setup input data
input_tensor = np.random.random((1, 3, 227, 227)).astype(np.float32)

# Submit inference job
job = hub.submit_inference_job(
    tflite_model,
    device=hub.Device("Samsung Galaxy S23"),
    name="squeeze_net (1, 3, 227, 227)",
    inputs=dict(image=[input_tensor]),
)

# Load the output data into a dictionary of numpy arrays
output_tensors = job.download_output_data()

For more examples, see Running Inference.