A momentum optimizer with a gradient check at the look-ahead point.
NAG is a back-seat driver, but useful for once. It looks down the road first. Then it yells “turn now!” before the car drifts.
It speeds up Gradient Descent in model training. It often acts as a smarter Momentum step for SGD.
Momentum
NAG is Momentum with a look-ahead check before the correction.
SGD
NAG often works as a faster update rule for SGD.
Gradient Descent
NAG improves Gradient Descent so it can reach low loss faster.
Adam
Adam also uses momentum, but it changes the learning rate as it learns.