Comparing Two Feature Selection Methods for Influenza-A Antivral Resistance Determination

Open Access

Abstract: The paper thoroughly analyzes the use of Principal Component Analysis (PCA) in
comparison to Information Gain (IG) as a feature selection method for improving the classification of Influenza-A antiviral resistance. Neural networks were used as the classification method of choice with PCA, while decision trees were the classification of choice with IG. The experiment was conducted on cDNA viral segments of Influenza-A belonging to the H1N1 strain. The 7 Infleunza-A segments generating the best results were used for comparison. Sequences from each segment were further divided into Adamantane-resistant, & non-Adamantane-resistant. Accuracy, sensitivity, specificity precision & time were used as performance measures. Using PCA for feature selection increased preprocessing speeds from an average processing time of 1.5 hours to 5 minutes, as opposed to IG. IG had higher accuracy. The best accuracy generated by PCA & NNs on the M1/M2 was 96.5%, while that of IG & DTs was 98.2% Using PCA features & DTs also generated a comparable accuracy to that of IG features & DT at 97.6% on the M1/M2 segment. There was a 88% increase in feature selection processing speed when using PCA compared to IG on the M1/M2 segment alone.
Keywords: Influenza-A, Principle component analysis (PCA), Machine learning, Information gain,
DNA classification, decision trees, neural networks.

Nermin Shaltout, Ahmed Rafea, Mohamed Moustafa, Ahmed Moustafa, Mahmoud ElHefnawi and Mohamed ElHefnawi

The Author field can not be Empty

The Institution field can't be Empty

Volume 1 Issue 1

Volume and Issue can't be empty

48 - 54

The Page Numbers field can't be Empty

25-12-2015

Publication Date field can't be Empty