Evaluating precision, recall, and F-Score of video-Based and audio-Based risky behavior detection are listed in in Table 5 and Table 7 accordingly. Table 8 includes time performance of different developed functionalities (e.g., video-based person density, video-based physical distancing, video-based risky behavior detection, and audio-based risky behavior detection) on various platforms such as Jetson NX, laptop, and android smartphone. The performance of using a deep learning engine is highly dependent on Graphics and Computing processors. Therefore, the performance of those functionalities is evaluated on a laptop with more robust processing units. The laptop has NVIDIA GeForce RTX 2070 with 7.5 computation capabilities and a Core i7. Therefore, the performance on Jetson NX is lower than on the laptop. The best performance values are video-based risky behavior detection because they only involve the object detection task. Audio-based risky behavior detection segments the voice in specific time frames and converts them into spectrogram images. Voice patterns are detected in images using the VGG model. Therefore, the time of processing for audio is higher than video object detection. Video-based people density and video-based physical distancing give worse performance values than simple object detection regarding complexities in tracking functions.