Skip to content

Additional Readings

Deep Reinforcement Learning

Optimization

  • Online Learning Rate Adaptation with Hypergradient Descent
    • Reduces the need for learning rate scheduling for SGD, SGD and nesterov momentum, and Adam,
    • Uses the concept of hypergradients (gradients w.r.t. learning rate) obtained via reverse-mode automatic differentiation to dynamically update learning rates in real-time alongside weight updates
    • Little additional computation because just needs just one additional copy of original gradients store in memory
    • Severely under-appreciated paper

Comments