Fig. 7 Knowledge distillation consisting of multiple teacher networks and a target student network. The knowledge is transferred from the teacher networks to the student network using a loss function.