> top > docs > PMC:7796058 > spans > 56257-59410 > annotations

PMC:7796058 / 56257-59410 JSONTXT

Annnotations TAB JSON ListView MergeView

LitCovid-PubTator

Id Subject Object Predicate Lexical cue tao:has_database_id
408 626-631 Species denotes human Tax:9606
409 114-122 Disease denotes coughing MESH:D003371
411 2007-2015 Disease denotes coughing MESH:D003371
415 2857-2862 Species denotes human Tax:9606
416 2564-2569 Disease denotes cough MESH:D003371
417 3097-3105 Disease denotes coughing MESH:D003371

LitCovid-PD-HP

Id Subject Object Predicate Lexical cue hp_id
T11 114-122 Phenotype denotes coughing http://purl.obolibrary.org/obo/HP_0012735
T12 2007-2015 Phenotype denotes coughing http://purl.obolibrary.org/obo/HP_0012735
T13 2564-2569 Phenotype denotes cough http://purl.obolibrary.org/obo/HP_0012735
T14 3097-3105 Phenotype denotes coughing http://purl.obolibrary.org/obo/HP_0012735

LitCovid-sentences

Id Subject Object Predicate Lexical cue
T406 0-4 Sentence denotes 5.6.
T407 5-41 Sentence denotes Audio-Based Risky Behavior Detection
T408 42-185 Sentence denotes This section examines an audio classification algorithm that recognizes coughing and sneezing using an audio sensor with an embedded DL engine.
T409 186-244 Sentence denotes The methodology for audio detection is shown in Figure 13.
T410 245-410 Sentence denotes This figure shows the four main steps of the audio DL process.The recording needs to first be preprocessed for noise before being used for extracting sound features.
T411 411-551 Sentence denotes The most commonly known time-frequency feature is the short-time Fourier transform [67], Mel spectrogram [68], and wavelet spectrogram [69].
T412 552-744 Sentence denotes The Mel spectrogram was based on a nonlinear frequency scale motivated by human auditory perception and provides a more compact spectral representation of sounds when compared to the STFT [3].
T413 745-832 Sentence denotes To compute a Mel spectrogram, we first convert the sample audio files into time series.
T414 833-926 Sentence denotes Next, its magnitude spectrogram is computed, and then mapped onto the Mel scale with power 2.
T415 927-974 Sentence denotes The end result would be a Mel spectrogram [70].
T416 975-1069 Sentence denotes The last step in preprocessing would be to convert Mel spectrograms into log Mel spectrograms.
T417 1070-1164 Sentence denotes Then the image results would be introduced as an input to the deep learning modelling process.
T418 1165-1379 Sentence denotes Convolutional neural network (CNN) architectures use multiple blocks of successive convolution and pooling operations for feature learning and down sampling along the time and feature dimensions, respectively [71].
T419 1380-1474 Sentence denotes The VGG16 is a pre-trained CNN [72] used as a base model for transfer learning (Table 6) [73].
T420 1475-1659 Sentence denotes VGG16 is a famous CNN architecture that uses multiple stacks of small kernel filters (3 by 3) instead of the shallow architecture of two or three layers with large kernel filters [74].
T421 1660-1824 Sentence denotes Using multiple stacks of small kernel filters increases the network’s depth, which results in improving complex feature learning while decreasing computation costs.
T422 1825-1903 Sentence denotes VGG16 architecture includes 16 convolutional and three fully connected layers.
T423 1904-2158 Sentence denotes Audio-based risky behavior detection is based on complex features and distinguishable behaviors (e.g., coughing, sneezing, background noise), which requires a deeper CNN model than shallow architecture (i.e., two or three-layer architecture) offers [75].
T424 2159-2261 Sentence denotes VGG16 has been adopted for audio event detection and demonstrated significant literature results [71].
T425 2262-2365 Sentence denotes The feature maps were flattened to obtain the fully connected layer after the last convolutional layer.
T426 2366-2499 Sentence denotes For most CNN-based architectures, only the last convolutional layer activations are connected to the final classification layer [76].
T427 2500-2600 Sentence denotes The ESC-50 [77] and AudioSet [78] datasets were used to extract cough and sneezing training samples.
T428 2601-2756 Sentence denotes The ESC-50 dataset is a labelled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification.
T429 2757-2916 Sentence denotes AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labelled, 10 s sound clips taken from YouTube videos.
T430 2917-3036 Sentence denotes Over 5000 samples were extracted for the transfer learning CNN model which was then divided to train and test datasets.
T431 3037-3119 Sentence denotes We examined the performance of the trained CNN models using coughing and sneezing.
T432 3120-3153 Sentence denotes The results are shown in Table 7.