Cosine Annealing Learning Rate
2021. 3. 15. 10:43ㆍ카테고리 없음
Cosine Annealing is a type of learning rate schedule that has the effect of starting with a large learning rate that is relatively rapidly decreased to a minimum value before being increased rapidly again. The resetting of the learning rate acts like a simulated restart of the learning process and the re-use of good weights as the starting point of the restart is referred to as a "warm restart" in contrast to a "cold restart" where a new set of small random numbers may be used as a starting point.
- Cosine Annealing learning rate는 학습 시에 learning rate를 cosine 함수를 따라 최대값에서 최소값으로 변화 시킴으로써, local minimum 혹은 saddle point 에서 벗어날 수 있도록 한다.
- 결과적으로 모델의 일반화 성능을 높인다.
Warm-up Learning Rate
- Warm-up: 네트워크 파라미터의 정렬을 위해 사용
- Transformer 구조에서 일반적으로 1회의 warm-up을 사용하게 됨.