Skip to content

edsnlp.training.optimizer

LinearSchedule

Bases: Schedule

Linear schedule for a parameter group. The schedule will linearly increase the value from start_value to max_value in the first warmup_rate of the total_steps and then linearly decrease it to end_value.

Parameters

PARAMETER DESCRIPTION
total_steps

The total number of steps, usually used to calculate ratios.

TYPE: Optional[int] DEFAULT: None

max_value

The maximum value to reach.

TYPE: Optional[Any] DEFAULT: None

start_value

The initial value.

TYPE: float DEFAULT: 0.0

path

The path to the attribute to set.

TYPE: Optional[Union[str, int, List[Union[str, int]]]] DEFAULT: None

warmup_rate

The rate of the warmup.

TYPE: float DEFAULT: 0.0

end_value

The final value to reach after the decay phase. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

ScheduledOptimizer

Bases: Optimizer

Wrapper optimizer that supports schedules for the parameters and easy parameter selection using the key of the groups dictionary as regex patterns to match the parameter names.

Schedules are defined directly in the groups, in place of the scheduled value.

Examples

optim = ScheduledOptimizer(
    cls="adamw",
    module=model,
    groups=[
        # Exclude all parameters matching 'bias' from optimization.
        {
            "selector": "bias",
            "exclude": True,
        },
        # Parameters of the NER module's embedding receive this learning rate
        # schedule. If a parameter matches both 'transformer' and 'ner',
        # the first group settings take precedence due to the order.
        {
            "selector": "^ner[.]embedding"
            "lr": {
                "@schedules": "linear",
                "start_value": 0.0,
                "max_value": 5e-4,
                "warmup_rate": 0.2,
            },
        },
        # Parameters starting with 'ner' receive this learning rate schedule,
        # unless a 'lr' value has already been set by an earlier selector.
        {
            "selector": "^ner"
            "lr": {
                "@schedules": "linear",
                "start_value": 0.0,
                "max_value": 1e-4,
                "warmup_rate": 0.2,
            },
        },
        # Apply a weight_decay of 0.01 to all parameters not excluded.
        # This setting doesn't conflict with others and applies to all.
        {
            "selector": "",
            "weight_decay": 0.01,
        },
    ],
    total_steps=1000,
)

Parameters

PARAMETER DESCRIPTION
optim

The optimizer to use. If a string (like "adamw") or a type to instantiate, the module and groups must be provided.

TYPE: Union[str, Type[Optimizer], Optimizer]

module

The module to optimize. Usually the nlp pipeline object.

TYPE: Optional[Union[PipelineProtocol, Module]] DEFAULT: None

total_steps

The total number of steps, used for schedules.

TYPE: Optional[int] DEFAULT: None

groups

The groups to optimize. Each group is a dictionary containing:

  • a regex selector key to match the parameter of that group by their names (as listed by nlp.named_parameters())
  • and several other keys that define the optimizer parameters for that group, such as lr, weight_decay etc. The value for these keys can be a Schedule instance or a simple value
  • an exclude key that can be set to True to exclude parameters

The matching is performed by running regex.search(selector, name) so you do not have to match the full name. Note that the order of the groups matters. If a parameter name matches multiple selectors, the configurations of these selectors are combined in reverse order (from the last matched selector to the first), allowing later selectors to complete options from earlier ones. If a selector contains exclude=True, any parameter matching it is excluded from optimization.

TYPE: Optional[List[Group]] DEFAULT: None