Id |
Subject |
Object |
Predicate |
Lexical cue |
T255 |
0-233 |
Sentence |
denotes |
We adopted a knowledge distillation method in the training phrase; a small model (called a student network) was trained to mimic the ensemble of multiple models (called teacher networks) to obtain a small model with high performance. |
T256 |
234-365 |
Sentence |
denotes |
In the distillation process, knowledge was transferred from the teacher networks to the student network to minimize knowledge loss. |
T257 |
366-455 |
Sentence |
denotes |
The target was the output of the teacher networks; these outputs were called soft labels. |
T258 |
456-645 |
Sentence |
denotes |
The student network also learned from the ground-truth labels (also called hard labels), thereby minimizing the knowledge loss from the student networks, whose targets were the hard labels. |
T259 |
646-792 |
Sentence |
denotes |
Therefore, the overall loss function of the student network incorporated both knowledge distillation and knowledge loss from the student networks. |
T260 |
793-1036 |
Sentence |
denotes |
After the student network had been well-trained, the task of the teacher networks was complete, and the student model could be used on a regular computer with a fast speed, which is suitable for hospitals without extensive computing resources. |
T261 |
1037-1168 |
Sentence |
denotes |
As a result of the knowledge distillation method, the CNNCF achieved high performance with a few parameters in the teacher network. |