Detection of Genes Patterns with an Enhanced Partitioning-Based DBSCAN Algorithm


Abstract: Microarray datasets are enriched with numerous unknown gene expression patterns  that may have significant biological meaning. Detecting well-separated gene expression patterns is a critical task in microarray data analysis. The density-based spatial clustering DBSCAN  algorithm has been used to detect patterns with different shapes and sizes in many applications. However, the DBSCAN algorithm is time-consuming when used on big datasets, and microarray datasets are considered as big and complex datasets. Therefore, in this study, we modified the DBSCAN algorithm by combining it with a partitioning around medoids algorithm based on normalized and weighted Mahalanobis distance (NWM). The developed algorithm (NWM_PDBSCAN) was tested on selected microarray expression datasets, which were pre-processed prior to analysis. The results revealed an optimal cluster solution with different shapes and sizes. We further reduced the dataset sizes using a random sampling technique to enhance the performance of the DBSCAN algorithm. The proposed NWM_PDBSCAN algorithm performed ideally, and was evaluated using Dunn’s validity index.

Keywords: Microarray data; Partitioning around medoids; DBSCAN; Normalized weighted Mahalanobis distance; Validity index; Pre-processing; Sampling; Number of clusters

Nwayyin Najat Mohammed,Micheal Cawthorne and Adnan Mohsin Abdulazeez

The Author field can not be Empty

Zakho University, Iraq

The Institution field can't be Empty

Vol. 4, Issue 1

Volume and Issue can't be empty


The Page Numbers field can't be Empty


Publication Date field can't be Empty