A Voting Ensemble Approach for Hepatitis Disease Detection

  • Samir Kumar Bandyopadhyay
  • Shawni Dutta
Keywords: Hepatitis Disease, Voting Classifier, Machine Learning, Predictive Model, Ensemble Approach


The inflammation of the liver is termed as Hepatitis. Several different types of hepatitis are from A to G. For example, Hepatitis A is caused by the hepatitis A virus. Similarly, other type of Hepatitis virus is formed by the name, say Hepatitis G. Some types of virus will not create any serious problems. Long-lasting (chronic) and cause scarring of the liver (cirrhosis), loss of liver function and in some cases, liver cancer are also caused by this disease. An automated tool is suggested in this paper that recognizes patients with hepatitis syndromes. This paper proceeds by implementing an automated tool by implementing multi-phase classification approach. During the first phase, numerous classifiers such as Support Vector Machine, Multi-layer Perceptron, naïve Bayes, k-Nearest neighbor, Decision tree are implemented. AdaBoost, Gradient Boost, and Random Forest are implemented as phase-2 classifiers. These implemented classifiers are evaluated as well as compared in terms of prediction performance. Voting ensemble based approach is proposed in this paper as final phase classification that accepts top two classifier models obtained from first and second phase classification respectively. The reason of using the proposed classifier is to enhance the prediction performance so that patients with hepatitis disease are identified correctly.


Download data is not yet available.


Boyd, A., Duchesne, L., & Lacombe, K. (2018). Research gaps in viral hepatitis. Journal of the International AIDS Society, 21, e25054. https://doi.org/10.1002/jia2.25054

Magoulas, G. D., & Prentza, A. (2001). Machine Learning in Medical Applications. In G. Paliouras, V. Karkaletsis, & C. D. Spyropoulos (Eds.), Machine Learning and Its Applications (Vol. 2049, pp. 300–307). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-44673-7_19

Evgeniou, T., & Pontil, M. (2001). Support Vector Machines: Theory and Applications. In G. Paliouras, V. Karkaletsis, & C. D. Spyropoulos (Eds.), Machine Learning and Its Applications (Vol. 2049, pp. 249–257). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-44673-7_12

Kaviani P, Dhotre S. (2017). Short Survey on Naive Bayes Algorithm. International Journal of Advance Research in Computer Science and Management, 4(11), 607–611. Avilable at: http://www.ijaerd.com/papers/finished_papers/Short%20Survey%20on%20Naive%20Bayes%20Algorithm-IJAERDV04I1140826.pdf

Cunningham P, Delany SJ. (2007). K-Nearest Neighbour Classifiers. Mult Classif Syst.,1–17.

Sharma H. and Kumar S. (2016). A Survey on Decision Tree Algorithms of Classification in Data Mining. Int J Sci Res., 5(4), 2094–2097. Avilable at: https://pdfs.semanticscholar.org/9307/1221663df46568d5e1edf3e0476d1d2422cc.pdf

Schölkopf, B., Luo, Z., & Vovk, V. (Eds.). (2013). Empirical Inference. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-41136-6

Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7. https://doi.org/10.3389/fnbot.2013.00021

Breiman L. (2001). Random Forests. Mach Learn., 45(1), 5–32.

Shah, U., & Habib, Z. (2000). Liver Failure Attributable to Hepatitis A Virus Infection in a Developing Country. PEDIATRICS, 105(2), 436–438. https://doi.org/10.1542/peds.105.2.436

Subhan Butt, A., & Sharif, F. (2016). Viral Hepatitis in Pakistan: Past, Present, and Future. Euroasian Journal of Hepato-Gastroenterology, 6(1), 70–81. https://doi.org/10.5005/jp-journals-10018-1172

Agboatwalla, M., Isomura, S., Miyake, K., Yamashita, T., Morishita, T., & Akram, D. S. (1994). Hepatitis A, B and C seroprevalence in Pakistan. The Indian Journal of Pediatrics, 61(5), 545–549. https://doi.org/10.1007/BF02751716

World Health Organization. Hepatitis E, WHO/CDS/CSR/EDC/2001.12.

Malik IA, Tariq WZ. (1995). The prevalence and pattern of viral hepatitis in Pakistan. J. Coll Physicians Surg Pak., 5, 2-3.

World Health Organization. Hepatitis B fact sheet. (Online) 2000. Available from URL: http://www.who.int/mediacentre/factsheets/fs204/en/.

A J Khan, S P Luby, F Fikree, A Karim, S Obaid, S Dellawala, S Mirza, T Malik, S Fisher-Hoch, J B McCormick. (2000). Unsafe injections and the transmission of hepatitis B and C in a periurban community in Pakistan. Bull World Health Organ, 78(8), 956-963.

A. Kane, J. Lloyd, M. Zaffran, L. Simonsen, and M. Kane. (1999). Transmission of hepatitis B, hepatitis C and human immunodeficiency viruses through unsafe injections in the developing world: model-based regional estimates. Bull World Health Organ, 77(10), 801–807.

H Qureshi, S Hafiz. (2000). Exposure rate of hepatitis A and E (IgG) in children. J Pak Med Assoc, 50(8), 284-285.

Nilashi, M., Ahmadi, H., Shahmoradi, L., Ibrahim, O., & Akbari, E. (2019). A predictive method for hepatitis disease diagnosis using ensembles of neuro-fuzzy technique. Journal of Infection and Public Health, 12(1), 13–20. https://doi.org/10.1016/j.jiph.2018.09.009

Yarasuri, V. K., Indukuri, G. K., & Nair, A. K. (2019). Prediction of Hepatitis Disease Using Machine Learning Technique. 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 265–269. https://doi.org/10.1109/I-SMAC47947.2019.9032585

Karthikeyan, T., & Thangaraju, P. (2013). Analysis of Classification Algorithms Applied to Hepatitis Patients. International Journal of Computer Applications, 62(15), 25–30. https://doi.org/10.5120/10157-5032

Sartakhti, J. S., Zangooei, M. H., & Mozafari, K. (2012). Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA). Computer Methods and Programs in Biomedicine, 108(2), 570–579. https://doi.org/10.1016/j.cmpb.2011.08.003

Harini R. (2018, June). Hepatitis, Version 1. Retrieved on June 18, 2020. Avilable at: https://www.kaggle.com/harinir/hepatitis

Costa, F. S., Pires, M. M. D. S. & Nassar, S. M. (2013). Analysis of bayesian classifier accuracy. Journal of Computer Science, 9(11), 1487-1495. https://doi.org/10.3844/jcssp.2013.1487.1495

David Opitz and Richard Maclin. (1999). Popular Ensemble Methods: An Empirical Study. Journal of Artificial Intelligence Research, 11, 169-198.

Leon, F., Floria, S.-A., & Badica, C. (2017). Evaluating the effect of voting methods on ensemble-based classification. 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA), 1–6. https://doi.org/10.1109/INISTA.2017.8001122

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. F., & Nielsen, H. (2000). Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics, 16(5), 412–424. https://doi.org/10.1093/bioinformatics/16.5.412

M, H., & M.N, S. (2015). A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 01–11. https://doi.org/10.5121/ijdkp.2015.5201

How to Cite
Samir Kumar Bandyopadhyay, & Shawni Dutta. (2020). A Voting Ensemble Approach for Hepatitis Disease Detection. International Journal for Research in Applied Sciences and Biotechnology, 7(5), 56-62. https://doi.org/10.31033/ijrasb.7.5.6