| # Use Models |
|
|
| ## Build Models from Yacs Config |
| From a yacs config object, |
| models (and their sub-models) can be built by |
| functions such as `build_model`, `build_backbone`, `build_roi_heads`: |
| ```python |
| from detectron2.modeling import build_model |
| model = build_model(cfg) # returns a torch.nn.Module |
| ``` |
|
|
| `build_model` only builds the model structure and fills it with random parameters. |
| See below for how to load an existing checkpoint to the model and how to use the `model` object. |
|
|
| ### Load/Save a Checkpoint |
| ```python |
| from detectron2.checkpoint import DetectionCheckpointer |
| DetectionCheckpointer(model).load(file_path_or_url) # load a file, usually from cfg.MODEL.WEIGHTS |
| |
| checkpointer = DetectionCheckpointer(model, save_dir="output") |
| checkpointer.save("model_999") # save to output/model_999.pth |
| ``` |
|
|
| Detectron2's checkpointer recognizes models in pytorch's `.pth` format, as well as the `.pkl` files |
| in our model zoo. |
| See [API doc](../modules/checkpoint.html#detectron2.checkpoint.DetectionCheckpointer) |
| for more details about its usage. |
|
|
| The model files can be arbitrarily manipulated using `torch.{load,save}` for `.pth` files or |
| `pickle.{dump,load}` for `.pkl` files. |
|
|
| ### Use a Model |
|
|
| A model can be called by `outputs = model(inputs)`, where `inputs` is a `list[dict]`. |
| Each dict corresponds to one image and the required keys |
| depend on the type of model, and whether the model is in training or evaluation mode. |
| For example, in order to do inference, |
| all existing models expect the "image" key, and optionally "height" and "width". |
| The detailed format of inputs and outputs of existing models are explained below. |
|
|
| __Training__: When in training mode, all models are required to be used under an `EventStorage`. |
| The training statistics will be put into the storage: |
| ```python |
| from detectron2.utils.events import EventStorage |
| with EventStorage() as storage: |
| losses = model(inputs) |
| ``` |
|
|
| __Inference__: If you only want to do simple inference using an existing model, |
| [DefaultPredictor](../modules/engine.html#detectron2.engine.defaults.DefaultPredictor) |
| is a wrapper around model that provides such basic functionality. |
| It includes default behavior including model loading, preprocessing, |
| and operates on single image rather than batches. See its documentation for usage. |
|
|
| You can also run inference directly like this: |
| ``` |
| model.eval() |
| with torch.no_grad(): |
| outputs = model(inputs) |
| ``` |
|
|
| ### Model Input Format |
|
|
| Users can implement custom models that support any arbitrary input format. |
| Here we describe the standard input format that all builtin models support in detectron2. |
| They all take a `list[dict]` as the inputs. Each dict |
| corresponds to information about one image. |
|
|
| The dict may contain the following keys: |
|
|
| * "image": `Tensor` in (C, H, W) format. The meaning of channels are defined by `cfg.INPUT.FORMAT`. |
| Image normalization, if any, will be performed inside the model using |
| `cfg.MODEL.PIXEL_{MEAN,STD}`. |
| * "height", "width": the **desired** output height and width **in inference**, which is not necessarily the same |
| as the height or width of the `image` field. |
| For example, the `image` field contains the resized image, if resize is used as a preprocessing step. |
| But you may want the outputs to be in **original** resolution. |
| If provided, the model will produce output in this resolution, |
| rather than in the resolution of the `image` as input into the model. This is more efficient and accurate. |
| * "instances": an [Instances](../modules/structures.html#detectron2.structures.Instances) |
| object for training, with the following fields: |
| + "gt_boxes": a [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each instance. |
| + "gt_classes": `Tensor` of long type, a vector of N labels, in range [0, num_categories). |
| + "gt_masks": a [PolygonMasks](../modules/structures.html#detectron2.structures.PolygonMasks) |
| or [BitMasks](../modules/structures.html#detectron2.structures.BitMasks) object storing N masks, one for each instance. |
| + "gt_keypoints": a [Keypoints](../modules/structures.html#detectron2.structures.Keypoints) |
| object storing N keypoint sets, one for each instance. |
| * "sem_seg": `Tensor[int]` in (H, W) format. The semantic segmentation ground truth for training. |
| Values represent category labels starting from 0. |
| * "proposals": an [Instances](../modules/structures.html#detectron2.structures.Instances) |
| object used only in Fast R-CNN style models, with the following fields: |
| + "proposal_boxes": a [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing P proposal boxes. |
| + "objectness_logits": `Tensor`, a vector of P scores, one for each proposal. |
| |
| For inference of builtin models, only "image" key is required, and "width/height" are optional. |
|
|
| We currently don't define standard input format for panoptic segmentation training, |
| because models now use custom formats produced by custom data loaders. |
|
|
| #### How it connects to data loader: |
|
|
| The output of the default [DatasetMapper]( ../modules/data.html#detectron2.data.DatasetMapper) is a dict |
| that follows the above format. |
| After the data loader performs batching, it becomes `list[dict]` which the builtin models support. |
|
|
|
|
| ### Model Output Format |
|
|
| When in training mode, the builtin models output a `dict[str->ScalarTensor]` with all the losses. |
|
|
| When in inference mode, the builtin models output a `list[dict]`, one dict for each image. |
| Based on the tasks the model is doing, each dict may contain the following fields: |
|
|
| * "instances": [Instances](../modules/structures.html#detectron2.structures.Instances) |
| object with the following fields: |
| * "pred_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each detected instance. |
| * "scores": `Tensor`, a vector of N confidence scores. |
| * "pred_classes": `Tensor`, a vector of N labels in range [0, num_categories). |
| + "pred_masks": a `Tensor` of shape (N, H, W), masks for each detected instance. |
| + "pred_keypoints": a `Tensor` of shape (N, num_keypoint, 3). |
| Each row in the last dimension is (x, y, score). Confidence scores are larger than 0. |
| * "sem_seg": `Tensor` of (num_categories, H, W), the semantic segmentation prediction. |
| * "proposals": [Instances](../modules/structures.html#detectron2.structures.Instances) |
| object with the following fields: |
| * "proposal_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes) |
| object storing N boxes. |
| * "objectness_logits": a torch vector of N confidence scores. |
| * "panoptic_seg": A tuple of `(pred: Tensor, segments_info: Optional[list[dict]])`. |
| The `pred` tensor has shape (H, W), containing the segment id of each pixel. |
| |
| * If `segments_info` exists, each dict describes one segment id in `pred` and has the following fields: |
|
|
| * "id": the segment id |
| * "isthing": whether the segment is a thing or stuff |
| * "category_id": the category id of this segment. |
| |
| If a pixel's id does not exist in `segments_info`, it is considered to be void label |
| defined in [Panoptic Segmentation](https://arxiv.org/abs/1801.00868). |
| |
| * If `segments_info` is None, all pixel values in `pred` must be ≥ -1. |
| Pixels with value -1 are assigned void labels. |
| Otherwise, the category id of each pixel is obtained by |
| `category_id = pixel // metadata.label_divisor`. |
| |
|
|
| ### Partially execute a model: |
|
|
| Sometimes you may want to obtain an intermediate tensor inside a model, |
| such as the input of certain layer, the output before post-processing. |
| Since there are typically hundreds of intermediate tensors, there isn't an API that provides you |
| the intermediate result you need. |
| You have the following options: |
|
|
| 1. Write a (sub)model. Following the [tutorial](./write-models.md), you can |
| rewrite a model component (e.g. a head of a model), such that it |
| does the same thing as the existing component, but returns the output |
| you need. |
| 2. Partially execute a model. You can create the model as usual, |
| but use custom code to execute it instead of its `forward()`. For example, |
| the following code obtains mask features before mask head. |
|
|
| ```python |
| images = ImageList.from_tensors(...) # preprocessed input tensor |
| model = build_model(cfg) |
| model.eval() |
| features = model.backbone(images.tensor) |
| proposals, _ = model.proposal_generator(images, features) |
| instances, _ = model.roi_heads(images, features, proposals) |
| mask_features = [features[f] for f in model.roi_heads.in_features] |
| mask_features = model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances]) |
| ``` |
|
|
| 3. Use [forward hooks](https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html#forward-and-backward-function-hooks). |
| Forward hooks can help you obtain inputs or outputs of a certain module. |
| If they are not exactly what you want, they can at least be used together with partial execution |
| to obtain other tensors. |
|
|
| All options require you to read documentation and sometimes code |
| of the existing models to understand the internal logic, |
| in order to write code to obtain the internal tensors. |
|
|