The loss function was the MSE loss function, which was minimized using the SGD optimizer with a learning rate of a2 (in this study, a2 = 0.01).