Human behaviors and actions can be detected as objects from the video frames using a trained deep learning model.