| # Training |
|
|
| From the previous tutorials, you may now have a custom model and a data loader. |
| To run training, users typically have a preference in one of the following two styles: |
|
|
| ### Custom Training Loop |
|
|
| With a model and a data loader ready, everything else needed to write a training loop can |
| be found in PyTorch, and you are free to write the training loop yourself. |
| This style allows researchers to manage the entire training logic more clearly and have full control. |
| One such example is provided in [tools/plain_train_net.py](../../tools/plain_train_net.py). |
|
|
| Any customization on the training logic is then easily controlled by the user. |
|
|
| ### Trainer Abstraction |
|
|
| We also provide a standardized "trainer" abstraction with a |
| hook system that helps simplify the standard training behavior. |
| It includes the following two instantiations: |
|
|
| * [SimpleTrainer](../modules/engine.html#detectron2.engine.SimpleTrainer) |
| provides a minimal training loop for single-cost single-optimizer single-data-source training, with nothing else. |
| Other tasks (checkpointing, logging, etc) can be implemented using |
| [the hook system](../modules/engine.html#detectron2.engine.HookBase). |
| * [DefaultTrainer](../modules/engine.html#detectron2.engine.defaults.DefaultTrainer) is a `SimpleTrainer` initialized from a |
| yacs config, used by |
| [tools/train_net.py](../../tools/train_net.py) and many scripts. |
| It includes more standard default behaviors that one might want to opt in, |
| including default configurations for optimizer, learning rate schedule, |
| logging, evaluation, checkpointing etc. |
|
|
| To customize a `DefaultTrainer`: |
|
|
| 1. For simple customizations (e.g. change optimizer, evaluator, LR scheduler, data loader, etc.), overwrite [its methods](../modules/engine.html#detectron2.engine.defaults.DefaultTrainer) in a subclass, just like [tools/train_net.py](../../tools/train_net.py). |
| 2. For extra tasks during training, check the |
| [hook system](../modules/engine.html#detectron2.engine.HookBase) to see if it's supported. |
|
|
| As an example, to print hello during training: |
| ```python |
| class HelloHook(HookBase): |
| def after_step(self): |
| if self.trainer.iter % 100 == 0: |
| print(f"Hello at iteration {self.trainer.iter}!") |
| ``` |
| 3. Using a trainer+hook system means there will always be some non-standard behaviors that cannot be supported, especially in research. |
| For this reason, we intentionally keep the trainer & hook system minimal, rather than powerful. |
| If anything cannot be achieved by such a system, it's easier to start from [tools/plain_train_net.py](../../tools/plain_train_net.py) to implement custom training logic manually. |
|
|
| ### Logging of Metrics |
|
|
| During training, detectron2 models and trainer put metrics to a centralized [EventStorage](../modules/utils.html#detectron2.utils.events.EventStorage). |
| You can use the following code to access it and log metrics to it: |
| ``` |
| from detectron2.utils.events import get_event_storage |
| |
| # inside the model: |
| if self.training: |
| value = # compute the value from inputs |
| storage = get_event_storage() |
| storage.put_scalar("some_accuracy", value) |
| ``` |
|
|
| Refer to its documentation for more details. |
|
|
| Metrics are then written to various destinations with [EventWriter](../modules/utils.html#module-detectron2.utils.events). |
| DefaultTrainer enables a few `EventWriter` with default configurations. |
| See above for how to customize them. |
|
|