qai_hub.submit_compile_and_quantize_jobs
- submit_compile_and_quantize_jobs(model, device, calibration_data, name=None, input_specs=None, compile_options='', quantize_options='', weights_dtype=QuantizeDtype.INT8, activations_dtype=QuantizeDtype.INT8, retry=True)
Compiles a model to onnx and runs a quantize job on the produced onnx model.
The input model can be PyTorch or ONNX.
- Parameters:
model (
Union
[Model
,TopLevelTracedModule
,MLModel
,ModelProto
,bytes
,str
,Path
]) – Model to compile and quantize.device (
Device
) – Device for which to compile the onnx model.name (
Optional
[str
]) – Optional name for both the jobs. Job names need not be unique.input_specs (
Optional
[Mapping
[str
,Union
[Tuple
[int
,...
],Tuple
[Tuple
[int
,...
],str
]]]]) –Required if model is a PyTorch model. Keys in Dict (which is ordered in Python 3.7+) define the input names for the target model (e.g., TFLite model) created from this profile job, and may be different from the names in PyTorch model.
An input shape can either be a Tuple[int, …], ie (1, 2, 3), or it can be a Tuple[Tuple[int, …], str], ie ((1, 2, 3), “int32”)). The latter form can be used to specify the type of the input. If a type is not specified, it defaults to “float32”. Currently, only “float32”, “int8”, “int16”, “int32”, “uint8”, and “uint16” are accepted types.
For example, a PyTorch module with forward(self, x, y) may have input_specs=dict(a=(1,2), b=(1, 3)). When using the resulting target model (e.g. a TFLite model) from this profile job, the inputs must have keys a and b, not x and y. Similarly, if this target model is used in an inference job (see
qai_hub.submit_inference_job()
), the dataset must have entries a, b in this order, not x, yIf model is an ONNX model, input_specs are optional. input_specs can be used to overwrite the model’s input names and the dynamic extents for the input shapes. If input_specs is not None, it must be compatible with the model, or the server will return an error.
compile_options (
str
) – Cli-like flag options for the compile job. See Compile Options.quantize_options (
str
) – Cli-like flag options for the quantize job. See Quantize Options.calibration_data (
Union
[Dataset
,Mapping
[str
,List
[ndarray
]],str
]) – Data, Dataset, or Dataset ID to use during calibration in the quantize job.weights_dtype (
QuantizeDtype
) – The data type to which weights will be quantized.activations_dtype (
QuantizeDtype
) – The data type to which activations will be quantized.retry (
bool
) – If compile job creation fails due to rate-limiting, keep retrying periodically until creation succeeds.
- Returns:
jobs – Returns a tuple of CompileJob and ProfileJob.
- Return type:
Tuple[CompileJob, QuantizeJob]
Examples
Submit an torch model for compile and quantize:
import torch import numpy as np import qai_hub as hub pt_model = torch.jit.load("mobilenet_v2.pt") input_shapes = (1, 3, 224, 224) calibration_data = {"image": [np.random.randn(*input_shapes).astype(np.float32)]} job = hub.submit_compile_and_quantize_jobs( pt_model, hub.Device("Samsung Galaxy S23"), calibration_data, input_specs={"image": (input_shapes, "float32")}, weights_dtype=hub.QuantizeDtype.INT8, activations_dtype=hub.QuantizeDtype.INT8, name="mobilenet", )