PMC:5197943 / 1499-6615
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"27669316-27282150-69476981","span":{"begin":177,"end":178},"obj":"27282150"},{"id":"27669316-22585170-69476982","span":{"begin":546,"end":547},"obj":"22585170"},{"id":"27669316-20805560-69476983","span":{"begin":817,"end":818},"obj":"20805560"},{"id":"27669316-25921876-69476984","span":{"begin":980,"end":981},"obj":"25921876"},{"id":"27669316-23035929-69476985","span":{"begin":984,"end":985},"obj":"23035929"},{"id":"27669316-15286789-69476986","span":{"begin":988,"end":989},"obj":"15286789"},{"id":"27669316-16118262-69476987","span":{"begin":1283,"end":1284},"obj":"16118262"},{"id":"27669316-19393097-69476988","span":{"begin":1285,"end":1287},"obj":"19393097"},{"id":"27669316-20537149-69476989","span":{"begin":1288,"end":1290},"obj":"20537149"},{"id":"27669316-12865907-69476990","span":{"begin":2729,"end":2731},"obj":"12865907"},{"id":"27669316-13655060-69476991","span":{"begin":3599,"end":3601},"obj":"13655060"}],"text":"1. Introduction\nThe possibility to integrate clinical data with high-throughput data at the single patient level has gained increasing interest in different fields of medicine [1]. Indeed, clinicians can use this new integrated data to evaluate, in a more comprehensive way, the efficacy of the therapy in a cohort of patients, to draw more detailed conclusion on the benefits of the treatment, or to modify the drug dosage to reduce the side effects by following the genomic features of each patient and improving the efficacy of the treatment [2]. This scenario, however, introduces new challenges from a computational point of view, since the high-throughput technologies such as Next Generation Sequencing (NGS) and Genome Wide-Association Studies (GWAS), are changing clinical medicine in a data-driven science [3]. This event requires the development of new, efficient algorithms and software platforms to analyze this enormous amount of data in the shortest time possible [4,5,6,7,8]. Consequently, researchers can obtain a broader and more detailed perspective of the study under investigation, allowing them to link together genomic variants with clinical outcomes, including the response to certain treatments and prognosis of the patients under specific clinical studies [9,10,11]. Researchers focus on the occurrence/observation of an event of interest, aiming to develop predictive models. The outcome for a patient is the occurrence/observation of an event of interest. For example, the event may be: (i) the metastatic spread of a particular type of cancer; (ii) death, from any cause, of patients in a trial study of different treatments for lung cancer; and (iii) death due to lung cancer during the trial study. The events in these examples present differences in terms of certainty that can be attributed to their observation. In (i) it could be very difficult to be certain; in (ii) it is unequivocal; whereas in (iii) it might be surprisingly uncertain. In each of these examples, the patients are usually observed over a specified period of time, focusing on the time at which the event of interest occurs. Time to event is a clinical course duration variable for each patient with a beginning and an end, known as the survival time, or even failure time. For example, it may begin when the subject is enrolled into a study or when the event of interest is reached, or the patient is dropped out of the study for some unknown reasons, known as censored. All incomplete survival information are considered censored, including patients that, at the moment of the analysis, have still not reached the event. Censoring is an important issue in survival analysis, representing a particular type of missing data [12], needing to be handled in a proper way. Survival analysis is composed of a set of statistical methods able to estimate lifetime between two clearly defined events known as time of response or time of failure. The more popular survival analysis that estimates the probability of survival is the Kaplan-Meier method (K and M) [13], also called the product limit estimator. The Kaplan-Meier method allows to easily handling censored data. The K and M estimator is used to obtain univariate descriptive statistics for survival data, including the median survival time, and to compare the survival experience for two or more groups. To compare the overall differences between estimated survival curves of two or more groups of subjects, such as males versus females, or treated versus untreated (control) groups, several tests are available, including the log-rank test [14]. The comparison of survival curves makes it possible to reveal if the treatment was effective in terms of increasing the overall survival of a group of subjects or if the differences observed were simply the result of chance, for example.\nIn particular, in this work we focus on the comparison of survival analysis of patients and on the correlation of these analysis to the genomic characteristics of patients analyzed by DMET microarray technology. We provide an automatic analysis methodology to compute the overall survival analysis (OS) and the progression-free survival (PFS) from a whole DMET dataset produced by using the Affymetrix DMET PLUS platform and successively extended by adding temporal data.\nThe main aim of OSAnalyzer is to automatize and simplify the work of biomedical researchers and clinical researchers when interpreting and analyzing the data of observational studies.\nThe rest of the paper is structured as follows: Section 2 introduces the data provided by the DMET platform and the OS-dataset obtained by merging DMET data with clinical data, Section 2.2 investigates the current state of the art in analyzing the correlation among survival data and genomic features, Section 3 presents the OSAnalyzer tool and describes its use through a case study by using the OSAnalyzer tool, Section 4 discusses the main results of the proposed methodology of automatic analysis of genomics data integrated with clinical data and, finally, Section 5 concludes the paper and outlines future works."}