Using multiple stacks of small kernel filters increases the network’s depth, which results in improving complex feature learning while decreasing computation costs.