minimize knowledge loss. The target was the output of the teacher networks; these outputs were called soft labels. The studen