on consisting of multiple teacher networks and a target student network. The knowledge is transferred from the teacher networks to the student network using a loss function. Gradient-weighted class activation maps Grad-CAM59 i