Cosine Annealing Learning Rate

2021. 3. 15. 10:43카테고리 없음

Cosine Annealing is a type of learning rate schedule that has the effect of starting with a large learning rate that is relatively rapidly decreased to a minimum value before being increased rapidly again. The resetting of the learning rate acts like a simulated restart of the learning process and the re-use of good weights as the starting point of the restart is referred to as a "warm restart" in contrast to a "cold restart" where a new set of small random numbers may be used as a starting point.

  • Cosine Annealing learning rate는 학습 시에 learning rate를 cosine 함수를 따라 최대값에서 최소값으로 변화 시킴으로써, local minimum 혹은 saddle point 에서 벗어날 수 있도록 한다.
  • 결과적으로 모델의 일반화 성능을 높인다.

Warm-up Learning Rate

cosine_schedule_with_warmup
cosine_with_hard_restarts_schedule_with_warmup method

  • Warm-up: 네트워크 파라미터의 정렬을 위해 사용
  • Transformer 구조에서 일반적으로 1회의 warm-up을 사용하게 됨.