Experiments 1 & 2– Egocentric Cognitive Flexibility

Animals
Experiment 1 used 72 C57BL/6J male mice (Charles River, UK) weighing a mean 24.9 g (SEM ±0.1) at the start of the experiment. Experiment 2 used 33 male mice bred at the University of Sussex (18 WTs; 15 KOs) weighing a mean 25.9 g (SEM ±0.4) at the start of the experiment. One week prior to food deprivation, animals were single housed in a controlled environment held at 21±2°C and 50±15% relative humidity with a 12∶12 h light-dark period (lights on at 07∶00 h). One week before commencing behavioural training, animals were food deprived to 85–90% of their ad libitum weight. During this week, animals were handled daily for 5–10 min after which 3–4 sucrose pellets were placed in each home-cage to reduce neophobia. On the last day of the week, animals in Experiment 1 received a sham saline injection (4 ml/kg) for habituation to the injection procedure. Animals were fed 2.5–3.0 g daily of standard laboratory chow (Special Diet Service Ltd, Witham, UK) 1 h after completion of behavioural training and testing. The experiments were licensed under the UK Animals (Scientific Procedures) Act 1986 (Project Licence 70/6654) following approval by the University of Sussex, Local Ethical Review Committee.

Apparatus
The experiments used an eight-arm radial maze made of clear Plexiglas elevated 55 cm above the floor. Each arm (33.5×5×8.3 cm) extended from a circular central platform (15.5 cm diameter). Access to the arms was controlled by inserting or removing clear Plexiglas inserts at the entrances to each arm. Black-painted vial bottle tops (80 mm diameter, 40 mm deep) figured as food-wells. The maze was enclosed by a featureless circular ‘tent’ of blackout material within which the maze could be rotated. A red light bulb and bullet-camera was located 63 cm above the central platform. The camera connected to a monitor and DVD recorder located in the corner of the room. Before a mouse was placed in the maze, this was always wiped with a sponge moistened with disinfectant to minimise intra-maze olfactory cues. The choice-behaviour of the animals was observed through the monitor, which was kept at minimal luminance to minimise visual cues.

Drug
SB242084 (6-chloro-5-methyl-1-[2(2methylpyridyl- 3-oxy)-pyrid-5-yl carbamoyl] indoline hydrochloride; Tocris, Bristol, UK) was initially dissolved in PEG400 (Sigma-Aldrich, Poole, UK) at 20% of the final required volume, which was then made up by 10% (w/v) hydroxypropyl-beta-cyclodextrin (Fluka, Poole, UK). The stock solution was aliquoted and frozen at −80°C in vials of quantities required for each test day. Each animal in Experiment 1 was dosed at 0.5 mg/kg subcutaneously (s.c.) in the nape of the neck at a volume of 4 ml/kg 30 min prior to behavioural testing.

Breeding and genotyping
The 5-HT2CR KO and WT animals used in Experiment 2 were of a C57BL/6J background generated as previously described [16]. The original progeny of 5-HT2CR KO mice used here were a gift from L. Tecott and produced as described by [29]. Wild-type male mice were crossed with females heterozygous for the X-linked 5-HT2CR mutation generating male WT and KO offspring. Genotyping was achieved using PCR on tissue samples from ear punches. The wild-type allele was detected using primers of the 5-HT2CR gene sequences flanking the Neo insertion: m5h2c (5′-AGTTGATGTTCATCTCAGGTGGC-3′) and 3N2 (5′-GGGTCCTATAGATCGAGGTACC-3′). The mutant allele was detected using primers complimentary to neomycin resistance gene (Neo) sequences: NeoD (5′-CACCTTGCTCCTGCCGAGAAA-3′) and NeoH (5′-AGAAGGCGATAGAAGGCGATG-3′). Breeding animals had been backcrossed for more than 20 generations and the individuals used here were 10–24 weeks old (age-matched for genotype) at the beginning of the experiment.

Behavioural procedure
Maze habituation. One week after the beginning of food deprivation, animals received four days of habituation to the apparatus configured as a cross-maze. The mouse was placed in the central area of the maze. On day one, five pellets were placed in each of the four arms (three along their lengths and two in the food-wells located at the end of each arm). This was gradually decreased over the four days. On the last day of maze habituation, only one pellet was located in each of the four food-wells. Each mouse was placed in the maze for a maximum of 3×12 min/day. Once all pellets were consumed or after 12 min had passed, the mouse was removed from the maze, the maze was re-baited, and the next habituation trial began. This procedure served to habituate the animals to the maze and to repeated handling. Between habituation trials, the mouse was placed in a holding cage with a heavy-absorbent paper on the floor in order to avoid the potential transfer of olfactory cues to the test apparatus. On the last day of maze habituation, animals consumed all of the pellets in the three habituation trials in a mean of 5 min.
Turn bias. The mouse turn-bias was determined after maze habituation and before discrimination learning. The maze was given a T- or Y-configuration with the start-arm being S (south), W (west) or E (east) across trials but never N (north). The maze configuration (Y-maze vs. T-maze) was counterbalanced across the different experimental groups. The mouse was placed in the start-arm and had the choice of turning left or right, with both arms baited in order to delay any association between response and reinforcement. The start-arm for each trial was predetermined in a pseudorandom order identical for each mouse. Each animal was given seven trials. A trial comprised one left and one right response. For example, if the mouse turned left, it was allowed to consume the pellet and thereafter returned to the start-arm to make a new choice. If choosing left once more, the mouse was immediately returned to the start-arm. The trial continued until the mouse had turned right. To calculate the mouse turn-bias, the first turn of each trial were summed, with the majority of responses determining the mouse turn-bias to left or right.
Spatial discrimination (Fig. 1A). Again, the start-arm was S, W or E across trials but never N. The start-arm for each trial was predetermined in a pseudorandom order identical for each mouse. Over every nine trials, each arm figured as the start-arm an equal number of times but never as a start-arm for more than two consecutive trials. The mouse had the choice of turning 90° (T-maze) or 45° (Y-maze) to the left and right. In spatial discrimination learning, the mice were always trained to turn against their own turn-bias. After approximately every 7th trial, the maze was rotated 90° to minimise extra-maze cues. After making a response, the mouse was removed from the maze and returned to the holding cage while the maze was set up for the next trial. The inter-trial interval was approximately 40 s. If a mouse made nine consecutive correct responses it was given a probe-trial. In the probe-trial, the use of an egocentric response strategy was pitted against the use of exteroceptive cues by using N as the start-arm. If successful, egocentric spatial discrimination was deemed completed and the animal was returned to its home-cage. If unsuccessful, a further five correct responses led to a new probe-trial. Each animal was given 25 trials/day. However, if the animal had completed ≥6 consecutive correct responses by the end of the 25th trial, it was given the chance to reach criterion. Nine consecutive correct responses followed by a correct probe trial was used as criterion in egocentric spatial discrimination learning as well as in all subsequent tests involving contingency shifts.
Full reversal test (Fig. 1B). Here the contingencies from the initial spatial discrimination were reversed. For example, an animal previously trained to turn right now had to turn left. Thus, the bait was moved to the opposite arm without any additional changes in the maze configuration.
Perseverance test (Fig. 1C). Here the previously correct arm remained opened while a novel arm replaced the previously incorrect arm. For example, a previously incorrect arm 90° to the left was replaced by a novel arm 45° to the left. Only the novel arm was baited. Hence, altered performance in this test condition must be due to a change in the association of reward, as the previously incorrect response alternative is no longer present. Thus the only acquired association that could influence choice behaviour in this test condition was the previous CS+.
Learned non-reward test (Fig. 1D). Here the previously incorrect arm remained opened while a novel arm replaced the previously correct arm. For example, a previously correct arm 90° to the right was replaced by a novel arm 45° to the right. Only the previously incorrect arm was baited. Hence, altered performance in this test condition must be due to a change in the association of learned non-reward, as the previously correct response alternative no longer is present. Thus the only acquired association that could influence choice behaviour in this test condition was the previous CS−.