Detection of Genes Patterns with an Enhanced Partitioning-Based DBSCAN Algorithm
Abstract: Microarray datasets are enriched with numerous unknown gene expression patterns that may have significant biological meaning. Detecting well-separated gene expression patterns is a critical task in microarray data analysis. The density-based spatial clustering DBSCAN algorithm has been used to detect patterns with different shapes and sizes in many applications. However, the DBSCAN algorithm is time-consuming when used on big datasets, and microarray datasets are considered as big and complex datasets. Therefore, in this study, we modified the DBSCAN algorithm by combining it with a partitioning around medoids algorithm based on normalized and weighted Mahalanobis distance (NWM). The developed algorithm (NWM_PDBSCAN) was tested on selected microarray expression datasets, which were pre-processed prior to analysis. The results revealed an optimal cluster solution with different shapes and sizes. We further reduced the dataset sizes using a random sampling technique to enhance the performance of the DBSCAN algorithm. The proposed NWM_PDBSCAN algorithm performed ideally, and was evaluated using Dunn’s validity index.
Keywords: Microarray data; Partitioning around medoids; DBSCAN; Normalized weighted Mahalanobis distance; Validity index; Pre-processing; Sampling; Number of clusters