using the CAVS. The initial learning rate was 0.01, and the optimization function was the stochastic gradient descent