PMC:7782580 / 33213-34381 JSON TXT

Annnotations TAB JSON ListView MergeView

Id	Subject	Object	Predicate	Lexical cue
T255	0-233	Sentence	denotes	We adopted a knowledge distillation method in the training phrase; a small model (called a student network) was trained to mimic the ensemble of multiple models (called teacher networks) to obtain a small model with high performance.
T256	234-365	Sentence	denotes	In the distillation process, knowledge was transferred from the teacher networks to the student network to minimize knowledge loss.
T257	366-455	Sentence	denotes	The target was the output of the teacher networks; these outputs were called soft labels.
T258	456-645	Sentence	denotes	The student network also learned from the ground-truth labels (also called hard labels), thereby minimizing the knowledge loss from the student networks, whose targets were the hard labels.
T259	646-792	Sentence	denotes	Therefore, the overall loss function of the student network incorporated both knowledge distillation and knowledge loss from the student networks.
T260	793-1036	Sentence	denotes	After the student network had been well-trained, the task of the teacher networks was complete, and the student model could be used on a regular computer with a fast speed, which is suitable for hospitals without extensive computing resources.
T261	1037-1168	Sentence	denotes	As a result of the knowledge distillation method, the CNNCF achieved high performance with a few parameters in the teacher network.