The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Molecular biologists need robust computational tools to determine models that can learn to recognize DNA and amino acid sequences and assign protein structures to certain sequences. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this Chapter we describe the main clustering algorithms developed for analyzing gene expression data, comparing their results with the classification deriving by the application of unsupervised neural networks. In the analysis of gene expression data of particular interest is the search for correlated patterns, which is typically done by clustering analysis. DNA microarray technologies (Lockhart et al., 1996) allow the monitoring of thousand genes quickly and efficiently. These technologies have introduced new rules for the exploration of an organism with a genome wide-ranging vision. In particular, the study of gene expression of a complete genome (such as that of Saccharomyces cerevisiae) is now possible. Studies have also been developed (Perou et al., 1999) through the use of DNA microarrays until the complete mapping of the human genome. The production of targeted drugs and identification of drugs are other areas that can significantly benefit from these techniques. One problem inherent the use of DNA microarray technology is the huge amount of data available, the analysis of which is a significant problem per se. Several approaches are used in the analysis of gene expression data, grouped in two areas: clustering and classification. Clustering is a purely data-driven activity that uses only data from the study or experiment to group together measurements. Classification, in contrast, uses additional data, including heuristics, to assign measurements to groups. Among these, commonly statistical methods applied to microarray data are Hierarchical Clustering (Sneath & Sokal, 1973) and (Unsupervised) Neural Networks (Herrero et al., 2001): The identification of the optimal method for the analysis of these data is still a topic of discussion. In this Chapter we examine some methods for gene co-expression analysis, such as "correlation graphs" and supervised-unsupervised clustering methods. The next section is a brief exposition of the underlying background of clustering techniques. Then we detail the clustering algorithm based on correlation graphs. Next we examine the application of supervised and unsupervised techniques. The Chapter ends with some final considerations and further research directions.

Clustering Techniques for Revealing Gene Expression Patterns

GALLO, CRESCENZIO;CAPOZZI, VITO GIACOMO
2014-01-01

Abstract

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Molecular biologists need robust computational tools to determine models that can learn to recognize DNA and amino acid sequences and assign protein structures to certain sequences. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this Chapter we describe the main clustering algorithms developed for analyzing gene expression data, comparing their results with the classification deriving by the application of unsupervised neural networks. In the analysis of gene expression data of particular interest is the search for correlated patterns, which is typically done by clustering analysis. DNA microarray technologies (Lockhart et al., 1996) allow the monitoring of thousand genes quickly and efficiently. These technologies have introduced new rules for the exploration of an organism with a genome wide-ranging vision. In particular, the study of gene expression of a complete genome (such as that of Saccharomyces cerevisiae) is now possible. Studies have also been developed (Perou et al., 1999) through the use of DNA microarrays until the complete mapping of the human genome. The production of targeted drugs and identification of drugs are other areas that can significantly benefit from these techniques. One problem inherent the use of DNA microarray technology is the huge amount of data available, the analysis of which is a significant problem per se. Several approaches are used in the analysis of gene expression data, grouped in two areas: clustering and classification. Clustering is a purely data-driven activity that uses only data from the study or experiment to group together measurements. Classification, in contrast, uses additional data, including heuristics, to assign measurements to groups. Among these, commonly statistical methods applied to microarray data are Hierarchical Clustering (Sneath & Sokal, 1973) and (Unsupervised) Neural Networks (Herrero et al., 2001): The identification of the optimal method for the analysis of these data is still a topic of discussion. In this Chapter we examine some methods for gene co-expression analysis, such as "correlation graphs" and supervised-unsupervised clustering methods. The next section is a brief exposition of the underlying background of clustering techniques. Then we detail the clustering algorithm based on correlation graphs. Next we examine the application of supervised and unsupervised techniques. The Chapter ends with some final considerations and further research directions.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11369/238552
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact