Reference models
In the current study, we evaluated the following modelling techniques, using default settings as far as possible:Logistic regression (LR)
Classification and regression trees (CART)
Support vector machines (SVM)
Neural nets (NN)
Random forest (RF)
For a description of these modelling techniques, based on the work of various authors [12, 15, 20, 21], we refer to Additional file 1.
As reference points for this evaluation, we first applied each modelling technique to each entire artificial cohort in order to generate an LR model, a CART model, an SVM model, an NN model and an RF model. These models were fitted with optimization according to default settings. Next, we generated probabilities of the outcome for each of these reference models. With these probabilities, we generated a new 0/1 outcome by comparing the generated probabilities of each reference model with a random number from a uniform (0,1) distribution. Using this new 0/1 outcome, we evaluated the five modelling techniques. The R-code for the construction of the reference models is in Additional file 2.