#### 背景

https://huggingface.co/spaces/basicv8vc/learning-rate-scheduler-online

https://share.streamlit.io/basicv8vc/scheduler-online

## 学习率调整策略

OneFlow v0.7.0

### 基类LRScheduler

LRScheduler(optimizer: Optimizer, last_step: int = -1, verbose: bool = False)

### ConstantLR

````oneflow.optim.lr_scheduler.ConstantLR(`
`    optimizer: Optimizer,`
`    factor: float = 1.0 / 3,`
`    total_iters: int = 5,`
`    last_step: int = -1,`
`    verbose: bool = False,`
`)````

ConstantLR和固定学习率差不多，唯一的区别是在前total_iters，学习率为初始学习率 * factor。

### LinearLR

````oneflow.optim.lr_scheduler.LinearLR(`
`    optimizer: Optimizer,`
`    start_factor: float = 1.0 / 3,`
`    end_factor: float = 1.0,`
`    total_iters: int = 5,`
`    last_step: int = -1,`
`    verbose: bool = False,`
`)````

LinearLR和固定学习率也差不多，唯一的区别是在前total_iters，学习率先线性增加或递减，然后再固定为初始学习率 * end_factor。

LinearLR

### ExponentialLR

````oneflow.optim.lr_scheduler.ExponentialLR(`
`    optimizer: Optimizer,`
`    gamma: float,`
`    last_step: int = -1,`
`    verbose: bool = False,`
`)````

ExponentialLR

### StepLR

````oneflow.optim.lr_scheduler.StepLR(`
`    optimizer: Optimizer,`
`    step_size: int,`
`    gamma: float = 0.1,`
`    last_step: int = -1,`
`    verbose: bool = False,`
`)````

StepLR和ExponentialLR差不多，区别是不是每一次调用step()都进行学习率调整，而是每隔step_size才调整一次。

StepLR

### MultiStepLR

````oneflow.optim.lr_scheduler.MultiStepLR(`
`    optimizer: Optimizer,`
`    milestones: list,`
`    gamma: float = 0.1,`
`    last_step: int = -1,`
`    verbose: bool = False,`
`)````

StepLR每隔step_size就调整一次学习率，而MultiStepLR则根据用户指定的milestones进行调整，假设milestones是[2, 5, 9]，在[0, 2)是lr，在[2, 5)是lr * gamma，在[5, 9)是lr * (gamma **2)，在[9, )是lr * (gamma **3)。

MultiStepLR

### PolynomialLR

````oneflow.optim.lr_scheduler.PolynomialLR(`
`    optimizer,`
`    steps: int,`
`    end_learning_rate: float = 0.0001,`
`    power: float = 1.0,`
`    cycle: bool = False,`
`    last_step: int = -1,`
`    verbose: bool = False,`
`)````

PolynomialLR

### CosineDecayLR

````oneflow.optim.lr_scheduler.CosineDecayLR(`
`    optimizer: Optimizer,`
`    decay_steps: int,`
`    alpha: float = 0.0,`
`    last_step: int = -1,`
`    verbose: bool = False,`
`)````

### CosineAnnealingLR

````oneflow.optim.lr_scheduler.CosineAnnealingLR(`
`    optimizer: Optimizer,`
`    T_max: int,`
`    eta_min: float = 0.0,`
`    last_step: int = -1,`
`    verbose: bool = False,`
`)````

CosineAnnealingLR和CosineDecayLR很像，区别在于前者不仅包含余弦衰减的过程，也可以包含余弦增加，在前T_max步，学习率由lr余弦衰减到eta_min, 如果cur_step > T_max，然后再余弦增加到lr，不断重复这个过程。

CosineAnnealingLR

### CosineAnnealingWarmRestarts

````oneflow.optim.lr_scheduler.CosineAnnealingWarmRestarts(`
`    optimizer: Optimizer,`
`    T_0: int,`
`    T_mult: int = 1,`
`    eta_min: float = 0.0,`
`    decay_rate: float = 1.0,`
`    restart_limit: int = 0,`
`    last_step: int = -1,`
`    verbose: bool = False,`
`)````

T_mult=1, decay_rate=1

T_mult=1, decay_rate=0.5

### LambdaLR

`oneflow.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_step=-1, verbose=False)`

LambdaLR可以说是最灵活的策略了，因为具体的方法是根据函数lr_lambda来指定的。比如实现Transformer中的Noam Scheduler：

````def rate(step, model_size, factor, warmup):`
`    """`
`    we have to default the step to 1 for LambdaLR function`
`    to avoid zero raising to negative power.`
`    """`
`    if step == 0:`
`        step = 1`
`    return factor * (`
`        model_size ** (-0.5) * min(step ** (-0.5), step * warmup ** (-1.5))`
`    )`
```
```
```
```
`model = CustomTransformer(...)`
`optimizer = flow.optim.Adam(`
`    model.parameters(), lr=1.0, betas=(0.9, 0.98), eps=1e-9`
`)`
`lr_scheduler = LambdaLR(`
`    optimizer=optimizer,`
`    lr_lambda=lambda step: rate(step, d_model, factor=1, warmup=3000),`
`)````

### SequentialLR

````oneflow.optim.lr_scheduler.SequentialLR(`
`    optimizer: Optimizer,`
`    schedulers: Sequence[LRScheduler],`
`    milestones: Sequence[int],`
`    interval_rescaling: Union[Sequence[bool], bool] = False,`
`    last_step: int = -1,`
`    verbose: bool = False,`
`)````

### WarmupLR

````oneflow.optim.lr_scheduler.WarmupLR(`
`    scheduler_or_optimizer: Union[LRScheduler, Optimizer],`
`    warmup_factor: float = 1.0 / 3,`
`    warmup_iters: int = 5,`
`    warmup_method: str = "linear",`
`    warmup_prefix: bool = False,`
`    last_step=-1,`
`    verbose=False,`
`)````

WarmupLR是SequentialLR的子类，包含两个LRScheduler，并且第一个要幺是ConstantLR，要幺是LinearLR。

### ChainedScheduler

`oneflow.optim.lr_scheduler.ChainedScheduler(schedulers)`

`lr ==> LRScheduler_1 ==> LRScheduler_2 ==> ... ==> LRScheduler_N`

### ReduceLROnPlateau

````oneflow.optim.lr_scheduler.ReduceLROnPlateau(`
`    optimizer,`
`    mode="min",`
`    factor=0.1,`
`    patience=10,`
`    threshold=1e-4,`
`    threshold_mode="rel",`
`    cooldown=0,`
`    min_lr=0,`
`    eps=1e-8,`
`    verbose=False,`
`)````

````optimizer = flow.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)`
`scheduler = flow.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min')`
`for epoch in range(10):`
`    train(...)`
`    val_loss = validate(...)`
`    # 注意，该步骤应在validate()之后调用。`
`    scheduler.step(val_loss)````

## 实践

https://github.com/basicv8vc/oneflow-cifar100-lr-scheduler

（本文经授权后发布，原文：

https://zhuanlan.zhihu.com/p/520719314 ）