The initial learning rate was 0.01, and the optimization function was the stochastic gradient descent (SGD) method44.