Genomics-Based Artificial Neural Networks Ensemble for Medical Diagnosis: A Case Study of Non-Small Cell Lung Cancer
Abstract:
The adoption of information and communications technology to support cancer research has triggered the use of Deoxyribonucleic Acid (DNA) sequencing to generate complete genomes of different types of human cancer. This usually results in huge volume of genomics data which contain hidden mutations’ profiles that cannot be accurately mined with molecular approach. However, using digital techniques to unravel the inherent mutations holds a lot of promises for early diagnosis of cancerous patients accurately and efficiently. In this research work, pre-processing and features extraction algorithms were developed to implement a genomics and bagged Artificial Neural Networks (ANNs) ensemble-based diagnostic system for non-small cell lung cancer. The pre-processing algorithm encodes and vectorises the genomics nucleotides using their relative molecular masses and normalizes the resultant vectors for any given patient. For the features extraction algorithm, two distance metrics were hybrided to extract eight features from the normalized genomics vectors. This helped to eliminate the “curse of dimensionality” for the bagged ANNs ensemble platform that was adopted as the classifier in the system. The performance of the system was evaluated using Mean Square Error (MSE) and confusion matrix when a single ANN was utilized and when bagged ANNs ensemble was employed. The ensembled ANNs option did not only achieve perfect output stability but also achieved a good MSE of 0.0139 and an accuracy of 100% which is by far better than the single ANN with MSE of 0.0817, accuracy of 83.3% and serious output instability. The 100% accuracy of the bagged ANNs ensemble platform is not unexpected because of the genomics data on which it is built. The result of the bagged ANNs ensemble-based system in this work is better in comparison with other similar systems in the literature ORDER COMPLETE MATERIAL FROM CHAPTER 1-5