Behavioural procedure Maze habituation. One week after the beginning of food deprivation, animals received four days of habituation to the apparatus configured as a cross-maze. The mouse was placed in the central area of the maze. On day one, five pellets were placed in each of the four arms (three along their lengths and two in the food-wells located at the end of each arm). This was gradually decreased over the four days. On the last day of maze habituation, only one pellet was located in each of the four food-wells. Each mouse was placed in the maze for a maximum of 3×12 min/day. Once all pellets were consumed or after 12 min had passed, the mouse was removed from the maze, the maze was re-baited, and the next habituation trial began. This procedure served to habituate the animals to the maze and to repeated handling. Between habituation trials, the mouse was placed in a holding cage with a heavy-absorbent paper on the floor in order to avoid the potential transfer of olfactory cues to the test apparatus. On the last day of maze habituation, animals consumed all of the pellets in the three habituation trials in a mean of 5 min. Turn bias. The mouse turn-bias was determined after maze habituation and before discrimination learning. The maze was given a T- or Y-configuration with the start-arm being S (south), W (west) or E (east) across trials but never N (north). The maze configuration (Y-maze vs. T-maze) was counterbalanced across the different experimental groups. The mouse was placed in the start-arm and had the choice of turning left or right, with both arms baited in order to delay any association between response and reinforcement. The start-arm for each trial was predetermined in a pseudorandom order identical for each mouse. Each animal was given seven trials. A trial comprised one left and one right response. For example, if the mouse turned left, it was allowed to consume the pellet and thereafter returned to the start-arm to make a new choice. If choosing left once more, the mouse was immediately returned to the start-arm. The trial continued until the mouse had turned right. To calculate the mouse turn-bias, the first turn of each trial were summed, with the majority of responses determining the mouse turn-bias to left or right. Spatial discrimination (Fig. 1A). Again, the start-arm was S, W or E across trials but never N. The start-arm for each trial was predetermined in a pseudorandom order identical for each mouse. Over every nine trials, each arm figured as the start-arm an equal number of times but never as a start-arm for more than two consecutive trials. The mouse had the choice of turning 90° (T-maze) or 45° (Y-maze) to the left and right. In spatial discrimination learning, the mice were always trained to turn against their own turn-bias. After approximately every 7th trial, the maze was rotated 90° to minimise extra-maze cues. After making a response, the mouse was removed from the maze and returned to the holding cage while the maze was set up for the next trial. The inter-trial interval was approximately 40 s. If a mouse made nine consecutive correct responses it was given a probe-trial. In the probe-trial, the use of an egocentric response strategy was pitted against the use of exteroceptive cues by using N as the start-arm. If successful, egocentric spatial discrimination was deemed completed and the animal was returned to its home-cage. If unsuccessful, a further five correct responses led to a new probe-trial. Each animal was given 25 trials/day. However, if the animal had completed ≥6 consecutive correct responses by the end of the 25th trial, it was given the chance to reach criterion. Nine consecutive correct responses followed by a correct probe trial was used as criterion in egocentric spatial discrimination learning as well as in all subsequent tests involving contingency shifts. Full reversal test (Fig. 1B). Here the contingencies from the initial spatial discrimination were reversed. For example, an animal previously trained to turn right now had to turn left. Thus, the bait was moved to the opposite arm without any additional changes in the maze configuration. Perseverance test (Fig. 1C). Here the previously correct arm remained opened while a novel arm replaced the previously incorrect arm. For example, a previously incorrect arm 90° to the left was replaced by a novel arm 45° to the left. Only the novel arm was baited. Hence, altered performance in this test condition must be due to a change in the association of reward, as the previously incorrect response alternative is no longer present. Thus the only acquired association that could influence choice behaviour in this test condition was the previous CS+. Learned non-reward test (Fig. 1D). Here the previously incorrect arm remained opened while a novel arm replaced the previously correct arm. For example, a previously correct arm 90° to the right was replaced by a novel arm 45° to the right. Only the previously incorrect arm was baited. Hence, altered performance in this test condition must be due to a change in the association of learned non-reward, as the previously correct response alternative no longer is present. Thus the only acquired association that could influence choice behaviour in this test condition was the previous CS−.