The knowledge is transferred from the teacher networks to the student network using a loss function.