From: Dopamine, uncertainty and TD learning

Dependence of the ramp on learning rate. The shape of the ramp, but not the height of its peak, is dependent on the learning rate. The graph shows simulated activity for the case of p r = 0.5 near the time of the expected reward, for different learning rates, averaged over both rewarded and unrewarded trials. According to TD learning with persistent asymmetrically coded prediction errors, averaging over activity in rewarded and unrewarded trials results in a ramp up to the time of reward. The height of the peak of the ramp is determined by the ratio of rewarded and unrewarded trials, however, the breadth of the ramp is determined by the rate of back-propagation of these error signals from the time of the (expected) reward to the time of the predictive stimulus. A higher learning rate results in a larger fraction of the error propagating back, and thus a higher ramp. With lower learning rates, the ramp becomes negligible, although the positive activity (on average) at the time of reward is still maintained. Note that although the learning rate used in the simulations depicted in Figure 1b,d was 0.8, this should not be taken as the literal synaptic learning rate of the neural substrate, given our schematic representation of the stimulus. In a more realistic representation in which a population of neurons is active at every timestep, a much lower learning rate would produce similar results.

