In total, 603 images for coughing, 634 images for hugging, 608 images for handshaking, and 623 images for door touching were used from COCO dataset [62] for transfer learning for the pre-trained model (YOLOv3). These images were taken from free sources found through Google image searches. For labelling objects, a semi-automatic method was applied. Darknet library was also used for training. For individual behaviors, all of the people in images were detected and labelled in a text file whilst the algorithm aggregated intersected bounding boxes of people into a single bounding box. As wrong labels might be generated, the images should be manually checked to correct misclassified objects. For this step 80 percent of the images were selected for training and 20 percent for testing. To increase the accuracy of this model, the configuration in Table 3 was used.