Voice patterns are detected in images using the VGG model.