In deep learning training, the adjustment of learning rate is very important. Exponential lr attenuation is the most commonly used and widely used, and its learning rate changes as shown in the following figure:
The red line shows the standard index lr attenuation. The blue line is a step-by-step lr attenuation, which can keep the learning rate unchanged for a period of time. The advantages of this attenuation method are fast convergence and simplicity.
Shilov proposed cosine annealing strategy. Its simplified version reduces the learning rate from the initial value to zero according to the cosine function. Assuming that the total number of batches is, the learning rate in batches can be calculated according to the following formula:
As shown in the figure, cosine attenuation slowly reduces the learning rate at the beginning, almost linearly in the middle, and slowly reduces the learning rate again at the end.