Each dataset has more than thousands (or hundreds) of labels. These labels are highly unbalanced and each protein is annotated with a very small number of labels (i.e., each protein in the Human dataset on average has 13.52 BP labels and there are a total of 3413 BP labels). Since MacroF1 is more driven by the labels associated to fewer proteins, and MicroF1 is more affected by the labels associated to a larger number of proteins, the algorithms have larger values of MicroF1 than MacroF1. The difference between MNet and the other algorithms (including SW, which also considers the problem of unbalanced labels) on MacroF1 is more obvious than that on MicroF1. This observation indicates that MNet can handle the unbalanced problem much better than the other methods.