Posting of laboratory results to the NovelFam3000 system Selection of sample set of genes To demonstrate the capacity of the NovelFam3000 system to facilitate the inference of protein domain functions, we selected a set of 39 domain families for targeted experimental studies (Table 1). For 25 genes belonging to the target domain families, we confirmed expression in a panel of cell lines, cloned full-length cDNAs, and performed sub-cellular localization analysis [see Additional file 2]. Sub-cellular localization The function of proteins is, in part, defined by the cellular compartment within which they reside. Sub-cellular localization can be determined by visualization of recombinant proteins in amenable cell lines [24,25]. We initiated sub-cellular localization by verifying that a set of predicted human genes were endogenously expressed in human cells. For this purpose, we screened the expression of the 25 selected genes in three human cell lines by reverse transcription polymerase chain reaction (RT-PCR) analysis [see Additional file 3]. The human cell lines, chosen for their suitability for microscopy studies, included the hepatocarcinoma cell line PLC/PRF/5, the glia cell line U333CG/343 MG, and the fibroblast line HF-SV80. Of the 25 candidate genes, 20 were expressed in all three cell lines, three were found to be expressed in two of the three cell lines, and transcripts for two genes were only detected in a single cell line. These observations confirmed the physiological expression of predicted human genes. For sub-cellular screening, full-length human cDNAs were amplified from mRNA and cloned in-frame with an N-terminal FLAG tag. The 25 cloned, FLAG epitope-tagged recombinant proteins were analyzed by immunofluorescence microscopy. Individual transfection of each construct into mammalian cells followed by expression and immunolocalization with monoclonal FLAG-specific antibodies revealed sub-cellular localization of the fusion proteins. We performed an initial screen to distinguish between cytoplasmic and nuclear localization. This initial classification was followed by counterstaining experiments with multiple sub-cellular markers. Each marker was specific to a sub-cellular compartment, thus facilitating the refined interpretation of previously determined coarse staining patterns. During the primary analysis, we observed six fusion proteins localized to the nucleus, nine proteins in the cytoplasm and six proteins appeared diffusely distributed over the entire cell. Four of the recombinant proteins did not give rise to any detectable staining pattern. All constructs were expressed in the three cell lines to confirm that the observed localization pattern was identical between transfections with the same construct irrespective of the cell type. In the second round of screening, this time limited to PLC/PRF/5 cells, we re-transfected those constructs that had previously given rise to distinct cellular localization patterns, and stained using either antibody markers or specific dyes for cellular structures to confirm co-localization. All of the expression data and microscopy images from the sub-cellular localization profiling were posted through the laboratory results service of the NovelFam3000 system. Inference of potential domain properties Within the targeted domain families, we sought to identify intra-family consistencies. For protein domain family PF004427 (Brix domain), human proteins BRIX_HUMAN and IMP4 localized to the nucleoli (Figure 2). The localization was confirmed by complete co-localization with fibrillarin, a nucleolar-specific marker. To test if additional Brix domain family members localize to the nucleolus and to confirm consistency in sub-cellular localization across organisms, we isolated three Drosophila homologs of this domain family (CG32253, CG11920, and CG6712). The cDNAs of the fly genes were cloned into both fly and mammalian expression vectors. All three fly proteins localized to the nucleus in fly cells displaying a consistent nucleolar staining pattern (CG6712 not shown). The expression of the Drosophila proteins in human HEK 293 cells was monitored by a C-terminal in-frame GFP tag (CG11920 not shown). The Drosophila proteins were found to accumulate in the nucleoli of the human cells, suggesting that the evolutionary conserved protein domain might be implicated in the targeting of these proteins to nucleoli. These results complement published observations for family members in model organisms [26], and in total, suggest that the proteins with the domain perform specific functions in the nucleoli. Intra-family consistency was also observed for protein domain family PF03114 (BAR domain). Member proteins SH3BP1 and SH3GL1 both localized to cytoplasmic vesicles which appeared to merge with the cellular membrane forming protrusions (Figure 3). The familial consistency in the staining patterns observed suggests that the BAR domain is linked to vesicle transport and/or metabolism. The conserved domain might be part of a localization signal that directs the proteins to the observed locations. We observed an example in which members of the same domain family displayed different, distinct cellular localization patterns (Figure 4). Over-expression of NP_057480 (HSPC129) of domain family PF03031 (NLI interacting factor-like phosphatase domain) in PLC/PRF/5 cells gave a clear and strong staining of the nuclear envelope presenting budding structures. DULLARD of that same domain family displayed a cytoplasmic staining pattern localizing to the endoplasmatic reticulum (ER), as confirmed by calnexin counterstaining. Furthermore, family member CTDSPL co-localized with MitoTracker, a mitochondrion-specific cell-permeant fluorescent dye. These results indicate that the function of this domain is not linked to a specific sub-cellular location. Combining results from multiple sources via NovelFam3000 NP_055268 (CHMP2A, BC-2), a member of protein family PF03357 (SNF7 domain, previously DUF279), gave rise to a unique cytoplasmic staining pattern (Figure 5). We tested hypothetical co-localization with the golgi, the ER, and mitochondria by counterstaining using corresponding markers (data not shown), but could not attribute NP_055268 (CHMP2A, BC-2)'s pattern to any previously defined sub-cellular location. Linking from NovelFam3000 to the Ulysses system, conserved networks in the model organisms suggest that NP_055268 (CHMP2A, BC-2) is a protein involved in pre-vacuolar endosome protein sorting and transport, a hypothesis supported by a previous study [27]. CHMP2A has also been shown to be expressed in the nucleus, possibly having a role in gene silencing [28]. This dual expression pattern is reminiscent of the expression pattern of a related gene, CHMP1, that has been postulated to have a cytoplasmic role in vesicle trafficking, but also a role within the nuclear matrix [29,30]. In addition to the analysis of paralogous human genes (derived by duplication), similarities between family members can be considered across species (orthologs analysis). For those selected proteins present in yeast, we extracted and reviewed sub-cellular localization and interacting protein partners. We show in two examples how the integration of functional data from studies of homologous yeast proteins reveals the broad conservation of function. Yeast proteins containing the brix domain (PF04427) and their interacting partners have been localized to the nucleolus [31]. Imp4p is a specific component of the U3 snoRNP and is required for pre-18S rRNA processing. Brx1p is implicated in the biogenesis of the 60S ribosomal subunit. The functional differences of human homologs, BRIX_HUMAN and IMP4, are reflected in their observed nucleolar, yet distinct localization patterns (Figure 2). Protein localization and interaction data from yeast studies complement the observed localization of human NP_057480 (HSPC129) and DULLARD, both from protein family PF03031 (Figure 4). A yeast homolog containing the NIF domain, nem1, is described as a trans-membrane protein localizing to the membranes of the ER and the nucleus [32]. Nem1's specific molecular function is unknown. Protein interaction studies with nem1 have identified three interacting partners (nup84, nup85, nup120), all components of the yeast nuclear pore complex (NPC) [33]. Despite the strong links to the NPC and the localization to the nuclear membrane, we are not convinced that NP_057480 (HSPC129) is a direct component of the vertebrate NPC, since its nuclear rim staining does not show a punctuate pattern – a general feature of NPC elements [34]. Based on the consistency among yeast network members, we identified the human orthologs for the interacting partners. Human NUP107 (related to yeast nup84) supports the NPC link, as this protein is required for the assembly of a subset of "Nup" proteins into the NPC [35]. From the analysis of NP_057480 (HSPC129), its homologs and interacting partners, we hypothesize that this protein is an uncharacterized NPC-associated protein.