训练 API

InternEvo 的训练 API 由 internlm.core.trainer.Trainer 管理。在定义了训练引擎和调度器之后,我们可以调用 Trainer API 来执行模型训练、评估、梯度清零和参数更新等。

有关详细用法,请参阅 Trainer API 文档和示例。

class internlm.core.trainer.Trainer(engine: Engine, schedule: BaseScheduler | None = None)[源代码]

This is a class tending for easy deployments of users’ training and evaluation instead of writing their own scripts.

参数:
  • engine (Engine) – Engine responsible for the process function.

  • schedule (BaseScheduler, optional) – Runtime schedule. Defaults to None.

property engine

Returns the engine that responsible for managing the training and evaluation process.

property schedule

Returns the runtime scheduler.

property uses_pipeline

Returns whether the pipeline parallel is used or not.

train()[源代码]

Sets the model to training mode.

eval()[源代码]

Sets the model to evaluation mode.

zero_grad()[源代码]

Sets the gradient of all parameters in the model to zero.

step()[源代码]

Executes the parameter update step.

execute_schedule(data_iter: Iterable, **kwargs)[源代码]

Runs the forward, loss computation, and backward for the model. Returns a tuple of (output, label, loss).

参数:
  • data_iter (Iterable) – The data iterator.

  • **kwargs – Additional keyword arguments.

返回:

A tuple of (output, label, loss, moe_loss).

返回类型:

Tuple[torch.Tensor]