For each sample, the predictions of the model were evaluated on the validation set, taking the 0/1 outcome of the same reference model as outcome.