> top > projects > LitCovid-sentences > docs > PMC:7033698 > annotations

PMC:7033698 JSONTXT 21 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
T1 0-48 Sentence denotes HIV-1 did not contribute to the 2019-nCoV genome
T2 49-79 Sentence denotes Emerging Microbes & Infections
T3 80-82 Sentence denotes C.
T4 83-94 Sentence denotes Xiao et al.
T5 96-197 Sentence denotes When a new pathogen that causes a global epidemic in humans, one key question is where it comes from.
T6 198-295 Sentence denotes This is especially important for a zoonotic infectious disease that jumps from animals to humans.
T7 296-417 Sentence denotes Knowing the origin of such a pathogen is critical to develop means to block further transmission and to develop vaccines.
T8 418-642 Sentence denotes Discovery of the origin of a newly human pathogen is a sophisticated process that requires extensive and vigorous scientific validations and generally takes many years, such as the cases for HIV-1 [1], SARS [2] and MERS [3].
T9 643-807 Sentence denotes Unfortunately, before the natural sources of new pathogens are clearly defined, conspiracy theories that the new pathogens are man-made often surface as the source.
T10 808-875 Sentence denotes However, in all cases, such theories have been debunked in history.
T11 876-971 Sentence denotes Infection from an emerging pathogenic coronavirus was first reported in December 2019 in China.
T12 972-1097 Sentence denotes It has now affected over 42,000 people and caused over 1,000 deaths in 25 countries (https://2019ncov.Chinacdc.Cn/2019-Ncov).
T13 1098-1249 Sentence denotes The complete genome of this new virus was quickly sequenced and made public on January 12, only about 2 weeks after the disease was first observed [4].
T14 1250-1333 Sentence denotes It was named as 2019-nCoV the following day by the World Health Organization (WHO).
T15 1334-1429 Sentence denotes Phylogenetic analysis shows that 2019-nCoV is a new member of coronaviruses that infect humans.
T16 1430-1522 Sentence denotes It is genetically homogenous but distinct from coronaviruses that cause SARS and MERS [5,6].
T17 1523-1796 Sentence denotes However, it shares a high level of genetic similarity (96.3%) with a bat coronavirus RaTG13 which was obtained from bat in Yunnan in 2013, suggesting that RaTG13-like viruses are most likely the reservoir, but not the immediate sources of the current 2019-nCoV viruses [7].
T18 1797-1965 Sentence denotes Lack of the definite origin of 2019-nCoV has led to speculation that 2019-nCoV might be derived from genetic manipulation or even for the purpose of use as a bioweapon.
T19 1966-2015 Sentence denotes This notion has been fully debunked in the media.
T20 2016-2234 Sentence denotes A recent informally presented report, however, showed that 2019-nCoV had four insertions in the spike glycoprotein gene that is critical for the virus to enter the target cells when compared to other coronaviruses [8].
T21 2235-2506 Sentence denotes It was claimed that these inserts were either identical or similar to the motifs in the highly variable (V) regions (V1, V4 and V5) in the envelope glycoprotein or in the Gag protein of some unique HIV-1 strains from three different countries (Thailand, Kenya and India).
T22 2507-2758 Sentence denotes Together with the structure modelling analysis, the authors speculated that these motif insertions sharing similarity with HIV-1 proteins could provide an enhanced affinity towards host cell receptors and increase the range of host cells of 2019-nCoV.
T23 2759-2860 Sentence denotes This study implies that 2019-nCoV might be generated by gaining gene fragments from the HIV-1 genome.
T24 2861-2993 Sentence denotes Current report conducted careful examination of the sequences of 2019-nCoV, other CoV viruses and HIV-1 as well as GenBank database.
T25 2994-3151 Sentence denotes Our results demonstrated no evidence that the sequences of these four inserts are HIV-1 specific or the 2019-nCoV viruses obtain these insertions from HIV-1.
T26 3152-3346 Sentence denotes First, the results of blast search of these motifs against GenBank shows that the top 100 identical or highly homologous hits are all from host genes of mammalian, insects, bacterial and others.
T27 3347-3426 Sentence denotes There are only a few hits on coronaviruses, but none of them are HIV-1 related.
T28 3427-3609 Sentence denotes Blast against viral sequence database also showed these insertion sequences widely exist in all kinds of viruses from bacteriophage, influenza, to giant eukaryotic viruses (Table 1).
T29 3610-3741 Sentence denotes More hits were found for coronaviruses and a few also hit on HIV-1 sequences than the search against the entire database (Table 1).
T30 3742-3967 Sentence denotes However, while the 100% match between the insertion 1 and 2 sequences and the HIV sequences were found in 19 entries, the matches between the insertion 3 and 4 sequences and HIV-1 sequences were rather poor (from 42% to 88%).
T31 3968-4169 Sentence denotes Moreover, the insertion 4 sequence ambiguously hit multiple different genes (gag, pol and env) in the HIV-1 genome, suggesting that similarities (as low as 42%) between them are too low to be reliable.
T32 4170-4334 Sentence denotes Search these four insertion sequences against HIV-1 Sequence Database (https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html) yielded similar results.
T33 4335-4437 Sentence denotes Sequences that completely match the insertion 3 and 4 sequences were not found in any HIV-1 sequences.
T34 4438-4570 Sentence denotes This clearly shows that these insertioin sequences are widely present in living organisms including viruses, but not HIV-1 specific.
T35 4571-4777 Sentence denotes All these regions in HIV-1 envelope glycoprotein are highly variable with many large insertions and deletions, indicating that they are not essential for biological functions of HIV-1 envelope glycoprotein.
T36 4778-4988 Sentence denotes The detection of completely matched sequences of 1 and 2 insertions in only a few HIV-1 strains demonstrated that four insertions are very rare or not present among tens of thousands of natural HIV-1 sequences.
T37 4989-5110 Sentence denotes This also explains why four insertion homolog sequences could only be independently found in different HIV-1 genomes [8].
T38 5111-5268 Sentence denotes Because of their poor identities to and rareness in the HIV-1 sequences, HIV-1 could not be the source for those insertion sequences in the 2019-nCoV genome.
T39 5269-5277 Sentence denotes Table 1.
T40 5278-5354 Sentence denotes Blast search results of four insertion sequences against sequence databases.
T41 5355-5459 Sentence denotes Database Gene source Insertion 1 TNGTKR Insertion 2 HKNNKS Insertion 3 RSYLTPGDSSSG Insertion 4 QTNSPRRA
T42 5460-5498 Sentence denotes Whole database CoV 2 (2) 0 3 (3) 2 (2)
T43 5499-5512 Sentence denotes HIV-1 0 0 0 0
T44 5513-5552 Sentence denotes Prokaryotic 27 (27) 3 (3) 74 (0) 66 (1)
T45 5553-5593 Sentence denotes Eukaryotic 71 (71) 97 (97) 23 (0) 32 (1)
T46 5594-5641 Sentence denotes Only viral database CoV 3 (3) 3 (3) 5 (3) 3 (2)
T47 5642-5676 Sentence denotes HIV-1 18 (18) 1 (1) 4 (0)* 6 (0)**
T48 5677-5729 Sentence denotes Other Eukaryotic viruses 49 (2) 66 (8) 69 (0) 62 (0)
T49 5730-5778 Sentence denotes Prokaryotic viruses 29 (13) 30 (1) 21 (0) 28 (0)
T50 5779-5817 Sentence denotes Unclassified virus 1 (1) 0 1 (0) 1 (0)
T51 5818-6004 Sentence denotes Top 100 hits are analyzed and the numbers of 100% matches are shown in parentheses. * Similarity at 67%; ** Random hits in Gag, Pro and Env sequences with similarity between 42% and 88%.
T52 6005-6227 Sentence denotes Second, these insertions are present not only in 2019-nCoV viruses but also in three betaCoV sequences from bats: two (ZC45 and ZXC21) from Zhejiang deposited in GenBank in 2018 and RaTG13 from Yunnan obtained in 2013 [8].
T53 6228-6310 Sentence denotes The RaTG13 is much more similar to 2019-nCoV than both ZC45 and ZXC21 (Figure 1A).
T54 6311-6385 Sentence denotes The similarity of the spike protein between RaTG13 and 2019-nCoV is 97.7%.
T55 6386-6602 Sentence denotes In the RaTG13 genome, two inserts are identical (HKNNKS and RSYLTPGDSSSG) to those in 2019-nCoV, one has one T → I substitution (TNGIKR), and the fourth one misses the C-terminal 4 amino acids (QTNS----) (Figure 1B).
T56 6603-6763 Sentence denotes ZC45 and ZXC21 are more divergent from 2019-nCoV than RaTG13, but both also contain similar insertions at three insertion sites, except insertion 4 (Figure 1B).
T57 6764-6881 Sentence denotes Furthermore, many other CoV viruses have similar insertions but with different sequences at the insertion 1 position.
T58 6882-7022 Sentence denotes These results clearly show that three out of four of these inserts naturally exist in three bat CoV viruses before 2019-nCoV was identified.
T59 7023-7147 Sentence denotes This undoubtedly refutes the possibility that 2019-nCoV is generated through obtaining gene fragments from the HIV-1 genome.
T60 7148-7235 Sentence denotes Instead, it is much more likely that 2019-nCoV originated from RaTG13-like CoV viruses.
T61 7236-7245 Sentence denotes Figure 1.
T62 7246-7478 Sentence denotes Sequence and structure analysis of 2019-nCoV and bat coronaviruses. (A) Phylogenetic tree analysis of the spike gene sequences. (B) Sequence alignment of suspected insertion sites between the 2019-nCoV and bat coronavirus sequences.
T63 7479-7530 Sentence denotes The deletions in the alignment are shown as dashes.
T64 7531-7689 Sentence denotes The numbers of insertions are indicated at the top of the alignment. (C) Structure comparison of the four insertions in the CoV spike protein and HIV-1 gp120.
T65 7690-7769 Sentence denotes 2019-nCoV structure was modelled using I-TASSER server with default parameters.
T66 7770-7883 Sentence denotes Only relevant domains with residues 1 to 708 (exclude residues from 305 to 603) were presented as ribbon diagram.
T67 7884-7977 Sentence denotes The four insertions were labelled and coloured in red, blue, green and magenta, respectively.
T68 7978-8042 Sentence denotes HIV-1 gp120 structure (PDB 1GC1) is presented as ribbon diagram.
T69 8043-8142 Sentence denotes V4, V5, V1/V2 and LE loops were labelled and coloured in red, blue, green, and black, respectively.
T70 8143-8350 Sentence denotes Third, insertions 1 and 2 in 2019-nCoV have 6-AA motifs identical to those in V4 and V5 of certain HIV-1 gp120 isolates, which are structurally close to each other but separated by a LE loop (Figure 1C) [9].
T71 8351-8500 Sentence denotes However, insertion 3 located between insertions 1 and 2 in 2019-nCoV has sequences similar (with deletions) to those in the V1 region of HIV-1 gp120.
T72 8501-8717 Sentence denotes V1 is far away from V4 and V5 on the opposite side of gp120, which should not interact with V4/V5 in gp120 (Figure 1C) but is now inserted between V4 and V5 in the modelled the 2019-nCoV spike protein structure [10].
T73 8718-8804 Sentence denotes Insertion 4 was found in Gag protein of HIV-1 that is not associated with viral entry.
T74 8805-8964 Sentence denotes This insertion is located too far to be considered to form the same structural unit with the other three insertions in the 2019-nCoV spike protein (Figure 1C).
T75 8965-9181 Sentence denotes We do not see any selection benefit or rationale for 2019-nCoV to obtain and mix structurally unrelated parts of HIV-1 to generate a unique structure for its enhanced receptor binding as indicated by the authors [8].
T76 9182-9249 Sentence denotes How the three bat CoV viruses obtain those inserts remains unknown.
T77 9250-9458 Sentence denotes For any virus to obtain additional insert sequences from other organisms, it requires that it has direct interactions with other organisms, most likely through homologous or non-homologous recombination [11].
T78 9459-9575 Sentence denotes For bat CoV viruses to gain the gene fragments from HIV-1, it will require both viruses to co-infect the same cells.
T79 9576-9708 Sentence denotes Because the host cells for bat CoV viruses and HIV-1 are different, the chance for both to exchange genetic materials is negligible.
T80 9709-9928 Sentence denotes On the contrary, these motifs are widely present in various mammalian cells and so it will be more likely for bat CoV viruses to gain those motifs from the genomes of their infected cells if recombination indeed occurs.
T81 9929-10044 Sentence denotes However, extensive studies of more CoV viruses in wild and domestic animals are warranted to address this question.
T82 10045-10273 Sentence denotes Identification of the origins of these inserted sequences in three bat CoV viruses and the new epidemic 2019-nCoV strain will be important for us to understand how CoV viruses jump from animals to humans and adapt in the latter.
T83 10274-10347 Sentence denotes Current data showed that RaTG13 is most closely related to 2019-nCoV [7].
T84 10348-10464 Sentence denotes However, the genetic difference between them is too high for RaTG13 to serve as the immediate ancestor of 2019-nCoV.
T85 10465-10622 Sentence denotes Other viruses that are more closely related to 2019-nCoV in intermediate animals like civet for SARS and camel for MERS [3,12] are remained to be identified.
T86 10623-10691 Sentence denotes More studies are necessary to identify the real source of 2019-nCoV.
T87 10692-10811 Sentence denotes This may take a long time to identify the origin of 2019-nCoV by screening a large number of wild and domestic animals.
T88 10812-10961 Sentence denotes In any case, reducing or eliminating direct contacts with wild animals will be critical to control the new epidemic infection diseases in the future.
T89 10962-11079 Sentence denotes The advances in bioinformatics analysis tools are widely used to easily and rapidly analyse newly obtained sequences.
T90 11080-11236 Sentence denotes However, great care is required for comprehensive and thorough analysis to fully understand the real biological implications of the new genomic information.
T91 11237-11441 Sentence denotes Biased, partial and incorrect analysis can dangerously lead to conclusions that fuel conspiracies and harm the process of true scientific discoveries and the effort to control the damage to public health.
T92 11443-11458 Sentence denotes Acknowledgments
T93 11459-11621 Sentence denotes We greatly appreciate Youyu He (Shanghai Center for Bioinformation Technology) in helping us to blast the insertion sequences against the viral sequence database.
T94 11623-11643 Sentence denotes Disclosure statement
T95 11644-11708 Sentence denotes No potential conflict of interest was reported by the author(s).
T96 11710-11715 Sentence denotes ORCID
T97 11716-11763 Sentence denotes Xiaojun Li http://orcid.org/0000-0002-5780-0880
T98 11764-11809 Sentence denotes Feng Gao http://orcid.org/0000-0001-8903-0203