Comparing Two Feature Selection Methods for Influenza-A Antivral Resistance Determination
Abstract: The paper thoroughly analyzes the use of Principal Component Analysis (PCA) in
comparison to Information Gain (IG) as a feature selection method for improving the classification of Influenza-A antiviral resistance. Neural networks were used as the classification method of choice with PCA, while decision trees were the classification of choice with IG. The experiment was conducted on cDNA viral segments of Influenza-A belonging to the H1N1 strain. The 7 Infleunza-A segments generating the best results were used for comparison. Sequences from each segment were further divided into Adamantane-resistant, & non-Adamantane-resistant. Accuracy, sensitivity, specificity precision & time were used as performance measures. Using PCA for feature selection increased preprocessing speeds from an average processing time of 1.5 hours to 5 minutes, as opposed to IG. IG had higher accuracy. The best accuracy generated by PCA & NNs on the M1/M2 was 96.5%, while that of IG & DTs was 98.2% Using PCA features & DTs also generated a comparable accuracy to that of IG features & DT at 97.6% on the M1/M2 segment. There was a 88% increase in feature selection processing speed when using PCA compared to IG on the M1/M2 segment alone.
Keywords: Influenza-A, Principle component analysis (PCA), Machine learning, Information gain,
DNA classification, decision trees, neural networks.