PMC:1852316 / 13810-27810
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1852316","sourcedb":"PMC","sourceid":"1852316","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1852316","text":"Phylogenic similarity analysis\nPhylogenic similarity analysis, also proposed by [20], is based on the hypothesis that a pair of genes with large phylogenic similarity score is likely in the same functional operon, regulon or pathway. Our implementation differs in that we suggest that if two genes have high phylogenic similarity score, then they would be regulated in the same manner by the same set of TFs. Based on this hypothesis we extend the preliminary TRN.\nOur approach is to calculate phylogenic similarity for gene-gene pairs follows the methodology proposed by [20] (referred to as 'likelihood of neighboring profiles' in their work). In this analysis all bacteria sequence information is downloaded from [24] and all preliminary gene/TF interactions are from [14]. Once we have phylogenic similarity scores for all gene pairs, we calculate the gene/TF scores based on the methodology described in the From Gene-Gene Scores to Gene/TF Scores Section.\n\nCalculation of the phylogenic similarity\nWe first construct a vector for each gene in E. coli, the dimension of the vector being the number of genomes used in the analysis (in this study 229). We applied BLASTP to identify probable orthologous genes of a target genome in 229 reference genomes. The most significant BLASTP hit from each reference species was considered the true ortholog of the target species if the expectation value was less than 1.0e-10 [25]. If there is an orthologous gene in the ith genome, then the ith entry in this vector is assigned the order of the orthologous gene in the ith genome. If an orthologous gene does not exist in the ith genome, then this entry is taken to be 0. Once such a vector for each E. coli gene is constructed, we compute a phylogenic similarity measure for each gene pair. Given two vectors Xi = [xi1, xi2,...,xi229] for gene i and similarly Xj for gene j, we use the following phylogenic similarity measure for a gene pair:\nS i j P H Y = − ∑ k = 1 229 log [ P ( x i k , x j k ) ] . ( 1 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWudaqhaaWcbaGaemyAaKMaemOAaOgabaGaemiuaaLaemisaGKaemywaKfaaOGaeyypa0JaeyOeI0YaaabCaeaacyGGSbaBcqGGVbWBcqGGNbWzcqGGBbWwcqWGqbaucqGGOaakcqWG4baEdaWgaaWcbaGaemyAaKMaem4AaSgabeaakiabcYcaSiabdIha4naaBaaaleaacqWGQbGAcqWGRbWAaeqaaOGaeiykaKIaeiyxa0faleaacqWGRbWAcqGH9aqpcqaIXaqmaeaacqaIYaGmcqaIYaGmcqaI5aqoa0GaeyyeIuoakiabc6caUiaaxMaacaWLjaWaaeWaaeaacqaIXaqmaiaawIcacaGLPaaaaaa@568D@\nHere P(xik, xjk), the likelihood of genes i and j, is calculated from\n= ( 1 − p i k ) ( 1 − p j k ) i f x i k = 0 a n d x j k = 0 P ( x i k , x j k ) = p i k ( 1 − p j k ) i f x i k ≠ 0 a n d x j k = 0 = ( 1 − p i k ) p j k i f x i k = 0 a n d x j k ≠ 0 = p i k p j k d ( x i k , x j k ) ( 2 N k − d ( x i k , x j k ) − 1 ) N k ( N k − 1 ) i f x i k ≠ 0 a n d x j k ≠ 0 ( 2 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqaaaabdaaaaeaaaeaacqGH9aqpcqGGOaakcqaIXaqmcqGHsislcqWGWbaCdaWgaaWcbaGaemyAaKMaem4AaSgabeaakiabcMcaPiabcIcaOiabigdaXiabgkHiTiabdchaWnaaBaaaleaacqWGQbGAcqWGRbWAaeqaaOGaeiykaKcabaacbaGae8xAaKMae8NzayMae8hiaaIaemiEaG3aaSbaaSqaaiabdMgaPjabdUgaRbqabaGccqGH9aqpcqaIWaamcqqGGaaicqWFHbqycqWFUbGBcqWFKbazcqWFGaaicqWG4baEdaWgaaWcbaGaemOAaOMaem4AaSgabeaakiabg2da9iabicdaWaqaaiabdcfaqjabcIcaOiabdIha4naaBaaaleaacqWGPbqAcqWGRbWAaeqaaOGaeiilaWIaemiEaG3aaSbaaSqaaiabdQgaQjabdUgaRbqabaGccqGGPaqkaeaacqGH9aqpcqWGWbaCdaWgaaWcbaGaemyAaKMaem4AaSgabeaakiabcIcaOiabigdaXiabgkHiTiabdchaWnaaBaaaleaacqWGQbGAcqWGRbWAaeqaaOGaeiykaKcabaGae8xAaKMae8NzayMae8hiaaIaemiEaG3aaSbaaSqaaiabdMgaPjabdUgaRbqabaGccqGHGjsUcqaIWaamcqqGGaaicqWFHbqycqWFUbGBcqWFKbazcqWFGaaicqWG4baEdaWgaaWcbaGaemOAaOMaem4AaSgabeaakiabg2da9iabicdaWaqaaaqaaiabg2da9iabcIcaOiabigdaXiabgkHiTiabdchaWnaaBaaaleaacqWGPbqAcqWGRbWAaeqaaOGaeiykaKIaemiCaa3aaSbaaSqaaiabdQgaQjabdUgaRbqabaaakeaacqWFPbqAcqWFMbGzcqWFGaaicqWG4baEdaWgaaWcbaGaemyAaKMaem4AaSgabeaakiabg2da9iabicdaWiabbccaGiab=fgaHjab=5gaUjab=rgaKjab=bcaGiabdIha4naaBaaaleaacqWGQbGAcqWGRbWAaeqaaOGaeyiyIKRaeGimaadabaaabaGaeyypa0JaemiCaa3aaSbaaSqaaiabdMgaPjabdUgaRbqabaGccqWGWbaCdaWgaaWcbaGaemOAaOMaem4AaSgabeaakmaalaaabaGaemizaqMaeiikaGIaemiEaG3aaSbaaSqaaiabdMgaPjabdUgaRbqabaGccqGGSaalcqWG4baEdaWgaaWcbaGaemOAaOMaem4AaSgabeaakiabcMcaPiabcIcaOiabikdaYiabd6eaonaaBaaaleaacqWGRbWAaeqaaOGaeyOeI0IaemizaqMaeiikaGIaemiEaG3aaSbaaSqaaiabdMgaPjabdUgaRbqabaGccqGGSaalcqWG4baEdaWgaaWcbaGaemOAaOMaem4AaSgabeaakiabcMcaPiabgkHiTiabigdaXiabcMcaPaqaaiabd6eaonaaBaaaleaacqWGRbWAaeqaaOGaeiikaGIaemOta40aaSbaaSqaaiabdUgaRbqabaGccqGHsislcqaIXaqmcqGGPaqkaaaabaGae8xAaKMae8NzayMae8hiaaIaemiEaG3aaSbaaSqaaiabdMgaPjabdUgaRbqabaGccqGHGjsUcqaIWaamcqqGGaaicqWFHbqycqWFUbGBcqWFKbazcqWFGaaicqWG4baEdaWgaaWcbaGaemOAaOMaem4AaSgabeaakiabgcMi5kabicdaWaaacaWLjaGaaCzcamaabmaabaGaeGOmaidacaGLOaGaayzkaaaaaa@F695@\nwhere\npik is the probability that gene i is present in genome k.\nNk is the total number of genes in reference genome k\nd(xik, xjk) = abs(xik - xjk).\nTo calculate pik, we grouped 229 reference genomes into subgroups based on information gathered from [26,27] (see Table 1). It is assumed that pik is identical within each subgroup for each gene. Then pik is taken to be the ratio of number of genomes that has an orthologous gene to the total number of genomes in the subgroup.\nTable 1 The list of bacteria used in the phylogenic similarity analysis.\nSubgroup Bacteria\nActinobacteria Bifidobacterium longum NCC2705, Corynebacterium diphtheriae NCTC 13129, Corynebacterium efficiens YS-314, Corynebacterium glutamicum ATCC13032, Corynebacterium glutamicum ATCC 13032, Leifsonia xyli subsp. xyli str. CTCB07, Mycobacterium avium subsp. paratuberculosis str. k10, Mycobacterium bovis AF2122/97, Mycobacterium leprae TN, Mycobacterium tuberculosis H37Rv, Mycobacterium tuberculosis CDC1551, Nocardia farcinica IFM 10152, Propionibacterium acnes KPA171202, Streptomyces avermitilis MA-4680, Streptomyces coelicolor A3(2), Symbiobacterium thermophilum IAM 14863, Tropheryma whipplei TW08/27, Tropheryma whipplei str. Twist\nAquificae Aquifex aeolicus VF5\nBacteroidetes Bacteroides fragilis YCH46, Bacteroides fragilis NCTC 9343, Bacteroides thetaiotaomicron VPI-5482, Porphyromonas gingivalis W83\nCyanobacteria Prochlorococcus marinus subsp. marinus str. CCMP1375, Prochlorococcus marinus str. MIT 9313\nChlamydiae Chlamydophila abortus S26/3, Chlamydia muridarum Nigg, Chlamydia trachomatis D/UW-3/CX, Chlamydophila caviae GPIC, Chlamydophila pneumoniae AR39, Chlamydophila pneumoniae CWL029, Chlamydophila pneumoniae J138, Chlamydophila pneumoniae TW-183, Parachlamydia sp. UWE25\nChlorobi Chlorobium tepidum TLS\nChloroflexi Dehalococcoides ethenogenes 195\nCrenarchaeota Aeropyrum pernix K1, Pyrobaculum aerophilum str. IM2, Sulfolobus solfataricus P2, Sulfolobus tokodaii str. 7\nCyanobacteria Gloeobacter violaceus PCC 7421, Nostoc sp. PCC 7120, Prochlorococcus marinus subsp. pastoris str. CCMP1986, Synechococcus elongatus PCC 6301, Synechococcus sp. WH 8102, Synechocystis sp. PCC 6803, Thermosynechococcus elongatus BP-1\nDeinococcus-Thermus Deinococcus radiodurans R1, Thermus thermophilus HB27, Thermus thermophilus HB8\nEuryarchaeota Archaeoglobus fulgidus DSM 4304, Haloarcula marismortui ATCC 43049, Halobacterium sp. NRC-1, Methanothermobacter thermautotrophicus str.Delta H, Methanocaldococcus jannaschii DSM 2661, Methanococcus maripaludis S2, Methanopyrus kandleri AV19, Methanosarcina acetivorans C2A, Methanosarcina mazei Go1, Picrophilus torridus DSM 9790, Pyrococcus abyssi GE5, Pyrococcus furiosus DSM 3638, Pyrococcus horikoshii OT3, Thermococcus kodakaraensis KOD1, Thermoplasma acidophilum DSM 1728, Thermoplasma volcanium GSS1\nFirmicutes Bacillus anthracis str. Ames, Bacillus anthracis str. 'Ames Ancestor', Bacillus anthracis str. Sterne, Bacillus cereus ATCC 14579, Bacillus cereus ATCC 10987, Bacillus cereus ZK, Bacillus clausii KSM-K16, Bacillus halodurans C-125, Bacillus licheniformis ATCC 14580, Bacillus subtilis subsp. subtilis str. 168, Bacillus thuringiensis serovar konkukian str. 97-27, Clostridium acetobutylicum ATCC 824, Clostridium perfringens str. 13, Clostridium tetani E88, Enterococcus faecalis V583, Geobacillus kaustophilus HTA426, Lactobacillus acidophilus NCFM, Lactobacillus johnsonii NCC 533, Lactobacillus plantarum WCFS1, Lactococcus lactis subsp. lactis Il1403, Listeria innocua Clip11262, Listeria monocytogenes EGD-e, Listeria monocytogenes str. 4b F2365, Mesoplasma florum L1, Mycoplasma gallisepticum R, Mycoplasma genitalium G-37, Mycoplasma hyopneumoniae 232, Mycoplasmamobile 163K, Mycoplasma mycoides subsp. mycoides SC str. PG1, Mycoplasma penetrans HF-2, Mycoplasma pneumoniae M129, Mycoplasma pulmonis UAB CTIP, Oceanobacillus iheyensis HTE831, Onion yellows phytoplasma OY-M, Staphylococcus aureus subsp. aureus COL, Staphylococcus aureus subsp. aureus MW2, Staphylococcus aureus subsp. aureus Mu50, Staphylococcus aureus subsp. aureus N315, Staphylococcus aureus subsp. aureus MRSA252, Staphylococcus aureus subsp. aureus MSSA476, Staphylococcus epidermidis ATCC 12228, Staphylococcus epidermidis RP62A, Streptococcus agalactiae 2603V/R, Streptococcus agalactiae NEM316, Streptococcus mutans UA159, Streptococcus pneumoniae R6, Streptococcus pneumoniaeTIGR4, Streptococcus pyogenes M1 GAS, Streptococcus pyogenes MGAS10394, Streptococcus pyogenes MGAS315, Streptococcus pyogenes MGAS8232, Streptococcus pyogenes SSI-1, Streptococcus thermophilus CNRZ1066, Streptococcus thermophilus LMG 18311, Thermoanaerobacter tengcongensis MB4, Ureaplasma parvum serovar 3 str. ATCC 700970\nFusobacteria Fusobacterium nucleatum subsp. nucleatum ATCC 25586\nNanoarchaeota Nanoarchaeum equitans Kin4-M\nPlanctomycetes Rhodopirellula baltica SH 1\nProteobacteria Acinetobacter sp. ADP1, Agrobacterium tumefaciens str. C58, Agrobacterium tumefaciens str. C58, Anaplasma marginale str. St. Maries, Azoarcus sp. EbN1, Bartonella henselae str. Houston-1, Bartonella quintana str. Toulouse, Bdellovibrio bacteriovorus HD100, Candidatus Blochmannia floridanus, Bordetella bronchiseptica RB50, Bordetella parapertussis 12822, Bordetella pertussis Tohama I, Bradyrhizobium japonicum USDA 110, Brucella abortus biovar 1 str. 9–941, Brucella melitensis 16M, Brucella suis 1330, Buchnera aphidicola str. Bp (Baizongia pistaciae), Buchnera aphidicola str. Sg (Schizaphis graminum), Buchnera aphidicola str. APS (Acyrthosiphon pisum), Burkholderia mallei ATCC 23344, Burkholderia pseudomallei K96243, Campylobacter jejuni subsp. jejuni NCTC 11168, Campylobacter jejuni RM1221, Caulobacter crescentus CB15, Chromobacterium violaceum ATCC 12472, Coxiella burnetii RSA 493, Desulfotalea psychrophila LSv54, Desulfovibrio vulgaris subsp. vulgaris str. Hildenborough, Ehrlichia ruminantium str. Gardel, Ehrlichia ruminantium str. Welgevonden, Ehrlichia ruminantium str. Welgevonden, Erwinia carotovora subsp. atroseptica SCRI1043, Escherichia coli CFT073, Escherichia coli K12, Escherichia coli O157:H7 EDL933, Escherichia coli O157:H7, Francisella tularensis subsp. tularensis Schu 4, Gluconobacter oxydans 621H, Geobacter sulfurreducens PCA, Haemophilus ducreyi 35000HP, Haemophilus influenzae Rd KW20, Helicobacter hepaticus ATCC 51449, Helicobacter pylori 26695, Helicobacter pylori J99, Idiomarina loihiensis L2TR, Legionella pneumophila str. Lens, Legionella pneumophila str. Paris, Legionella pneumophila subsp. pneumophila str. Philadelphia 1, Mannheimia succiniciproducens MBEL55E, Mesorhizobium loti MAFF303099, Methylococcus capsulatus str. Bath, Neisseria gonorrhoeae FA 1090, Neisseria meningitidis MC58, Neisseria meningitidis Z2491, Nitrosomonas europaea ATCC 19718, Pasteurella multocida subsp.multocida str. Pm70, Photobacterium profundum SS9, Photorhabdus luminescens subsp. laumondii TTO1, Pseudomonas aeruginosa PAO1, Pseudomonas putida KT2440, Pseudomonas syringae pv. syringae B728a, Pseudomonas syringae pv. tomato str. DC3000, Ralstonia solanacearum GMI1000, Rhodopseudomonas palustris CGA009, Rickettsia conorii str. Malish 7, Rickettsia prowazekii str. Madrid E, Rickettsia typhi str. Wilmington, Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67, Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150, Salmonella enterica subsp. enterica serovar Typhi str. CT18, Salmonella enterica subsp. enterica serovar Typhi Ty2, Salmonella typhimurium LT2, Shewanella oneidensis MR-1, Shigella flexneri 2a str. 301, Silicibacter pomeroyi DSS-3, Sinorhizobium meliloti 1021, Shigella flexneri 2a str. 2457T, Vibrio cholerae O1 biovar eltor str. N16961, Vibrio fischeri ES114, Vibrio parahaemolyticus RIMD 2210633, Vibriovulnificus CMCP6, Vibrio vulnificus YJ016, Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis, Wolbachia endosymbiont strain TRS of Brugia malayi, Wolbachia endosymbiont of Drosophila melanogaster, Wolinella succinogenes DSM 1740, Xanthomonas campestris pv. campestris str. ATCC 33913, Xylella fastidiosa 9a5c, Xanthomonas axonopodis pv. citri str. 306, Xanthomonas campestris pv. campestris str. 8004, Xanthomonas oryzae pv. oryzae KACC10331, Xylella fastidiosa Temecula1, Yersinia pestis biovar Medievalis str. 91001, Yersinia pestis CO92, Yersinia pestis KIM, Yersinia pseudotuberculosis IP 32953, Zymomonas mobilis subsp. mobilis ZM4\nSpirochaetes Borrelia burgdorferi B31, Borrelia garinii PBi chromosome linear, Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130, Leptospira interrogans serovar Lai str. 56601, Treponema denticola ATCC 35405, Treponema pallidum subsp. pallidum str. Nichols\nThermotogae Thermotoga maritima MSB8","divisions":[{"label":"title","span":{"begin":0,"end":30}},{"label":"p","span":{"begin":31,"end":464}},{"label":"p","span":{"begin":465,"end":961}},{"label":"title","span":{"begin":963,"end":1003}},{"label":"p","span":{"begin":1004,"end":1938}},{"label":"p","span":{"begin":1939,"end":2647}},{"label":"p","span":{"begin":2648,"end":2717}},{"label":"p","span":{"begin":2718,"end":5230}},{"label":"p","span":{"begin":5231,"end":5236}},{"label":"p","span":{"begin":5237,"end":5295}},{"label":"p","span":{"begin":5296,"end":5349}},{"label":"p","span":{"begin":5350,"end":5379}},{"label":"p","span":{"begin":5380,"end":5707}},{"label":"label","span":{"begin":5708,"end":5715}},{"label":"caption","span":{"begin":5717,"end":5781}},{"label":"p","span":{"begin":5717,"end":5781}},{"label":"tr","span":{"begin":5782,"end":5801}},{"label":"td","span":{"begin":5782,"end":5791}},{"label":"td","span":{"begin":5792,"end":5801}},{"label":"tr","span":{"begin":5802,"end":6450}},{"label":"td","span":{"begin":5802,"end":5816}},{"label":"td","span":{"begin":5818,"end":6450}},{"label":"tr","span":{"begin":6451,"end":6482}},{"label":"td","span":{"begin":6451,"end":6460}},{"label":"td","span":{"begin":6462,"end":6482}},{"label":"tr","span":{"begin":6483,"end":6625}},{"label":"td","span":{"begin":6483,"end":6496}},{"label":"td","span":{"begin":6498,"end":6625}},{"label":"tr","span":{"begin":6626,"end":6732}},{"label":"td","span":{"begin":6626,"end":6639}},{"label":"td","span":{"begin":6641,"end":6732}},{"label":"tr","span":{"begin":6733,"end":7011}},{"label":"td","span":{"begin":6733,"end":6743}},{"label":"td","span":{"begin":6745,"end":7011}},{"label":"tr","span":{"begin":7012,"end":7044}},{"label":"td","span":{"begin":7012,"end":7020}},{"label":"td","span":{"begin":7022,"end":7044}},{"label":"tr","span":{"begin":7045,"end":7089}},{"label":"td","span":{"begin":7045,"end":7056}},{"label":"td","span":{"begin":7058,"end":7089}},{"label":"tr","span":{"begin":7090,"end":7213}},{"label":"td","span":{"begin":7090,"end":7103}},{"label":"td","span":{"begin":7105,"end":7213}},{"label":"tr","span":{"begin":7214,"end":7460}},{"label":"td","span":{"begin":7214,"end":7227}},{"label":"td","span":{"begin":7229,"end":7460}},{"label":"tr","span":{"begin":7461,"end":7561}},{"label":"td","span":{"begin":7461,"end":7480}},{"label":"td","span":{"begin":7482,"end":7561}},{"label":"tr","span":{"begin":7562,"end":8084}},{"label":"td","span":{"begin":7562,"end":7575}},{"label":"td","span":{"begin":7577,"end":8084}},{"label":"tr","span":{"begin":8085,"end":9980}},{"label":"td","span":{"begin":8085,"end":8095}},{"label":"td","span":{"begin":8097,"end":9980}},{"label":"tr","span":{"begin":9981,"end":10046}},{"label":"td","span":{"begin":9981,"end":9993}},{"label":"td","span":{"begin":9995,"end":10046}},{"label":"tr","span":{"begin":10047,"end":10090}},{"label":"td","span":{"begin":10047,"end":10060}},{"label":"td","span":{"begin":10062,"end":10090}},{"label":"tr","span":{"begin":10091,"end":10134}},{"label":"td","span":{"begin":10091,"end":10105}},{"label":"td","span":{"begin":10107,"end":10134}},{"label":"tr","span":{"begin":10135,"end":13691}},{"label":"td","span":{"begin":10135,"end":10149}},{"label":"td","span":{"begin":10151,"end":13691}},{"label":"tr","span":{"begin":13692,"end":13962}},{"label":"td","span":{"begin":13692,"end":13704}},{"label":"td","span":{"begin":13706,"end":13962}},{"label":"td","span":{"begin":13963,"end":13974}}],"tracks":[{"project":"2_test","denotations":[{"id":"17397539-15901854-1689496","span":{"begin":81,"end":83},"obj":"15901854"},{"id":"17397539-15901854-1689497","span":{"begin":573,"end":575},"obj":"15901854"},{"id":"17397539-11160901-1689498","span":{"begin":1421,"end":1423},"obj":"11160901"}],"attributes":[{"subj":"17397539-15901854-1689496","pred":"source","obj":"2_test"},{"subj":"17397539-15901854-1689497","pred":"source","obj":"2_test"},{"subj":"17397539-11160901-1689498","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#e993ec","default":true}]}]}}