obtain a small model with high performance. In the distillation process, knowledge was transferred from the teacher networks to the student network to