This is the final project for module 4 in Flatiron School. After I had learned about deep learning in module 4, this time I will apply this knowledge to classify Pneumonia patients from chest x-ray images.

If anyone is interested in my project, please feel free to have a look at this link.

Project Problem

Chest x-ray image analysis is the common and basis diagnosis method in the medical field. In order to assess different pathologies, the imaging exam has been used for a long time. An automation analysis can minimize the workloads, improve efficiency and reduce the potential of human errors. Thus, in this project, the CNN model will be built in order to classify chest x-ray images.


Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care. It is available on Kaggle.

Data Observation

In the data, there are three sets of data provided, train, test, and validate data set. Here is the number of normal and Pneumonia x-ray images.

  • Train dataset has 5,218 images: 1,342 normal and 3,876 infected
  • Test dataset has 624 images: 234 normal and 390 infected
  • Validate dataset has 18 images: 9 normal and 9 infected

As you can see, the number of Pneumonia x-ray images seems to be higher than the normal case. Here is a preview of each dataset.

Next, I observe the image information. The type of the images is only JPEG but the size varies as below figure.


In this project, I used the Convolutional Neural Network. By varying target size, the accuracy of each target size is below:

  • 64×64: the training accuracy is 94.32% and the testing accuracy is 88.31%.
  • 150×150: the training accuracy is 94.59% and the testing accuracy is 89.10%.
  • 224×224: the training accuracy is 94.65% and the testing accuracy is 88.78%.
  • 256×256 pixel: the training accuracy is 93.96% and the testing accuracy is 82.69%.

The size of 224×224 seems to be to best size because the training accuracy is highest and the testing accuracy does not drop that much. Unlike the 256×256 case, the accuracy of both training and testing is lower than others.

After that, I tried to use the VGG16 deep neural network. The result is below:


The current model can classify and distinguish Pneumonia patients and normal people very well. Both the accuracy of training and testing data is very high. When we deal with medical diagnosis, a false positive (i.e. predicting illness when the patient is healthy) is less critical than a false negative (predicting healthiness when the patient is sick). The number of false negatives obtained with the model presented here is extremely low. Therefore, the machine developed here as a reliable ancillary tool for Pneumonia detection.

Future Plan

  • Add more training data to ensure performance.
  • Use Grid Search for better optimization.
  • Be able to categorize Pneumonia types.
  • Be able to classify other diseases from X-ray images.