Classification of x-ray images for detection of childhood pneumonia using pre-trained neural networks

This paper describes a comparison between three pre-trained neural networks for the classification of chest X-ray images: Xception, Inception V3, and NasNetLarge. Networks were implemented using learning transfer; The database used was the chest x-ray data set, which contains a total of 5856 chest x-ray images of pediatric patients aged one to five years, with three classes: Normal Viral Pneumonia and Bacterial Pneumonia. Data were divided into three groups: validation, testing and training. A comparison was made with the work of Kermany et al. (2018) who implemented the Inception V3 network in two ways: (Pneumonia X Normal) and (Bacterial Pneumonia X Viral Pneumonia). The nets used had good accuracy, being the NasNetLarge network the best precision, which was 95.35 % (Pneumonia X Normal) and 91.79 % (Viral Pneumonia X Bacterial Pneumonia) against 92.80 % in (Pneumonia X Normal) and 90.70 % (Viral Pneumonia X Bacterial Pneumonia) from kermany’s work, the Xception network also achieved an improvement in accuracy compared to kermany’s work, with 93.59 % at (Normal X Pneumonia) and 91.03 % in (Viral Pneumonia X Bacterial Pneumonia).


Introduction
According to the World Health Organization, pneumonia is the largest infectious cause of death in children worldwide. Pneumonia killed 920136 children under 5 by 2015, accounting for 16% of all deaths of children under five years of age. Pneumonia affects children and families everywhere, but it is more prevalent in South Asia and sub-Saharan Africa (World, 2016).
Chest X-rays are often used to assess cases of pneumonia and are the most commonly used diagnostic tests for chest-related diseases. A very small dose of ionizing radiation is used to produce images of the chest .
Low precision in the diagnosis of pneumonia is harmful to the patient, and this leads to the excessive prescription of antibiotics and to the waste of stocks of the same. Antibiotics also kill beneficial bacteria, causing unintended health problems in patients (Kurt et al., 2018). It is noteworthy that overuse leads to the proliferation of drug resistant bacteria, therefore the importance of a rapid and precise diagnosis.
Computational systems able to aid in the diagnosis and to identify the potential of diseases in patients are increasingly common (Manogaran et al., 2018). Used as a specialist support tool they can minimize errors (Malmir et al., 2017), being able to carry out a screening of potential patients.
With the increase of computational power, it became possible to use techniques such as Artificial Neural Networks (RNA) (Esteva et al., 2017), a technique that requires a high processing power in the training phase. RNA has its origin in a biophysical analogy to biological neurons (Koçer and Tümer, 2017).
In this way, it inspired the development of image classifier childhood pneumonia images, in order to diagnose patients in an automated and fast way. The method chosen and implemented is in the classification of chest x-ray images of patients, to determine if it has pneumonia or not, and is also classified what type of pneumonia, which may be bacterin or viral, for the classification was used the Chest X-Ray Dataset in total 5856 chest X-ray images of pediatric patients aged one to five years, provided with (Kermany, Zhang and Goldbaum, 2018).
In the prediction stage the classification of the images through artificial intelligence, consisting of Convolutional Neural Networks (CNN), together with the technique known as learning transference (Douarre et al., 2018). For purposes of comparison, two different CNNs were used, these being pre-trained neural networks known as NasNetLarge  and Xception (Chollet, 2016), thus, an analysis is made of which network has a better performance for the proposed system.The proposed method guarantees robust coverage in image recognition, in certain aspects that will be clarified throughout the text.
The document is divided into 7 sections, in which Section 2 is characterized by the contextualization of the work. The methodology applied and the validation metrics in Section 3. The description of the database is in Section 4, the evaluation metrics in Section 5, the results after application of the proposal are presented in Section 6 and Section 7 consists of the conclusion.

Related work
In Rajpurkar et al. (2017), an algorithm is proposed to detect pneumonia of chest radiographs, where the authors affirm that the proposed algorithm exceeds the performance of radiologists. The algorithm is a convolutional neural network of 121 layers trained using the set of images of ChestX-ray14, containing more than 100,000 radiographic images with 14 diseases.
In Varoquaux et al. (2017), a review is made on crossvalidation procedures for neuroimaging decoding, also includes a didactic overview of the relevant theoretical considerations. Practical aspects of common decoders are highlighted in predictions across multiple data sets. The experiments describe the large margins of error of cross-validation in neuroimaging configurations.
A clinical diagnostic tool is developed in Kermany et al. (2018), which is based on a deep learning framework for the screening of patients with retinal dazzling diseases treatable and childhood pneumonia. Transfer learning is used, where a neural network is trained with a data set of optical coherence tomography images and chest X-ray images, the authors state that the performance of the proposed method is comparable to that of human specialists.
In Zech et al. (2018), it is proposed a classification system of images of pneumonia, based on deep neural networks, an analysis of trained models is made using the junction of three sets of data. The validation was performed in a set of data from different hospitals, evaluated how well the trained models manage to generalize the classification, for a pneumonia screening task.
Deep neural network architectures are used to identify glaucomatous optic neuropathy in Christopher et al. (2018) ,the performance of neural networks is analyzed and also the impact of learning by transference. The author states that, in all cases, transfer learning achieved better performance, and also a reduction in training time.
In Abiyev and Maaitah (2018), we demonstrated the feasibility of classifying chest pathologies in chest X-rays using convolutional neural networks. For comparison, back-propagation neural networks with supervised learning, neural networks with unsupervised learning, training and testing is performed on the same radiographic database.
We present the work of a model based on stochastic attention in Kermany et al. (2018), this model is able to learn which regions within a chest x-ray should be visually explored, to conclude whether the radiograph contains a specific radiological abnormality. The proposed model is a recurrent neural network that sequentially learns the entire radiograph and focuses only on information areas that are likely to contain the relevant information.

Methodology
According to Shahin et al. (2004), to train a machine learning model it is necessary to divide the data into three sets (training, test and validation). According to Krawczyk (2016) the training data set is the sample of data used to fit the model, where the model sees and learns from that data.
The test dataset, however, is the data sample used to provide an unbiased assessment of a model fit in the training data set while adjusting the model hyperparameters (Krawczyk, 2016).
The validation data set, is the data sample used to provide an unbiased assessment of a final model, the validation data set provides the gold standard used to evaluate the model. It is only used when a model is fully trained (Esteva et al., 2017). Ideally, the model should be evaluated on samples that were not used to construct or fit the model so as to provide an unbiased sense of model efficacy. In this case, the validation sets were used for this purpose where the implemented models were validated after being trained Fig. 1.    In the Fig. 2 the division used to carry out the tests is represented, the comparison proposed in this paper takes into account the structure described in Fig. 2. Thus, the pre-trained networks use the same structure.   Neural networks were implemented using the Keras deep learning API (Gulli and Pal, 2017), written in Python and capable of running on top of TensorFlow (Tang, 2016), CNTK (Seide and Agarwal, 2016), or Theano (Bergstra et al., 2010). The networks were processed using an NVIDIA Quadro P6000 24 GB video card, which has 3840 CUDA cores and an Intel Core i7 processor with 12 Gigabytes of RAM, though, most of the processing effort is done per video card, since CNN can run on GPUs, if available. The Table 1 shows the time for training the NasNetLarge x Xception networks, it is possible to notice that the training times are relatively low, however the Xception is up to 55% faster than the NasNetLarge, approximately.

Transfer Learning
Transfer learning is used to provide a pre-trained structure in a knowledge base that can be from the same or another domain, taking advantage of the knowledge acquired to solve new problems more quickly and effectively (Weiss et al., 2016, Lu et al., 2015. In the state of the art the transfer of learning to solve problems is much more present, some examples are: Abidin et al. (2018), Douarre et al. (2018), Khatami et al. (2018), Baltruschat et al. (2018), Chen et al. (2018).The technique consists in using a pre-trained model with distinct classes of the problem to be solved (Wu et al., 2018), this becomes an advantage in the use of small data sets (Shallu and Mehra, 2018), because there is a difficulty in getting large sets of data for specific problems (Ramalingam and Garzia, 2018).
In the transfer of learning the initial layers and intermediates remain, being that the final layer is subsisted and trained again (Ramalingam and Garzia, 2018).
In the Fig. 3 it is possible to visualize the flowchart in which it represents the transfer of learning.
For the training of neural networks, all weights are defined as non-trainable, since they were trained with the Imagenet data set. In this way, the last layer of the networks is removed and four dense layers are added, with the latter having the same number of neurons as the number of classes to be classified. The Sofitmax function is used to activate the last layer of the networks Fig. 3. The hyperparameters for each of the networks can be analyzed in Table 2, the epoch parameter used for both networks for both Pneumoia X Normal and Viral X Bacterial classification was 100, this means the number of times the dataset is analyzed in each layer of the network, with this parameter the network Xception had an advantage over the NasNetLarge with regard to training time taking half the time to be trained Table 1. The Batch Size parameter (hyperparameter that defines the number of samples to work before updating the internal parameters of the model) is the 300 used for NasNetLarge and 200 for Xcepion.
A CNN is described as a sequence of layers, an example is shown in the Fig. 3, which is composed of three main layers, the convolutional layer, the pooling layer and the fully connected layer (Saraiva. et al., 2020), in fully connected layers, the use of ReLU functions demonstrated in the Eqs. (1) and (2) are common. These layers, when placed in sequence, form an architecture of a CNN (Salamon and Bello, 2017). In the output layer, the softmax function Eq. (3) is used, which is a generalization of the logistic function for various dimensions (Zhao, 2017) f(x) = x + = max(0, x) (1) Choosing the optimization algorithm for a deep learning model can mean the difference between good results in minutes, hours, and days (Bottou et al., 2018). This is done using ADAM (Kingma and Ba, 2014) and SGD (Metz et al., 2018) optimizers, where they are quite present in the literature (Anil et al., 2018, Zoph and Le, 2016, Adam is used in NasNetLarge with a learning rate of 0.001, since for Xception the SGD is used with a learning rate of 0.002. Zoph et al. (2017) proposed a learning model for image recognition in the Cifrar-10 and imageNet data sets, he defends the contribution of his work as a design of a search space in which he calls the "nasnet search space", which according to the author improves transferability to the network contains about 88.9 million parameters (Arend et al., 2018, Bressan et al., 2018.

Xception
The Xception network, proposed by Chollet (2016), is a derivation of the Inception V3 network (Szegedy et al., 2015) of google, that in the classification of the dataset imagenet obtained an improvement in the precision. A imageNet is a data set with 15 million labeled images (He et al., 2018). Xception has 22855952 parameters, which represents a reduction in quantity compared to Inception V3 (Chollet, 2016).

Description of the dataset
The set of images contains 5856 X-ray images (JPEG) and 3 categories (Viral Pneumonia, Bacterial Pneumonia and Normal) provided by the Kermany, Zhang and Goldbaum (2018). You can see in the Fig. 5, where they are divided into Normal X pneumonia ( Fig. 5a and b) and Viral Pneumonia, Bacterial and Normal Pneumonia (Fig. 5c).
Chest x-ray images (anteroposterior) were selected from pediatric patients aged one to five years. The images come from the Guangzhou Women and Children Medical Center. All chest X-ray images were performed as part of the routine clinical care of the patients (Kermany, Zhang and Goldbaum, 2018).
The dataset still has quality control, where garbled, low-quality images have been removed. The diagnosis was classified by two specialist physicians and checked by a third expert in order to extinguish the errors (Kermany, Zhang and Goldbaum, 2018). The dataset consists of 5856 images, 1583 images of normal patients and 2780 of patients with bacterial pneumonia and 1493 of patients with viral pneumonia.
Pneumonia causes a pulmonary consolidation, this means that the pulmonary alveoli are full of inflammatory fluid, this liquid replaces the air in the alveoli, so that the affected part of the lung does not contain air (Iorio et al., 2018). In the radiographic, pulmonary consolidation corresponds to an opacity (whitish area).
The identification of the existence of pneumonia is based on the opacities of the radiography, ie, on the radiograph you can see the darker part near the spine that corresponds to the bronchi (Kunz et al., 2018). The air contained in the bronchi gives this color to the radiograph, while the outside of the lung is lighter (opaque) because the alveoli are filled with fluid Figs. 4 and 5.  (2018), Kermany et al. (2018)

Metrics of the evaluation
A statistical tool is the confusion matrix that provides the basis for describing classification accuracy and characterizing errors, helping to refine the ranking (Saraiva et al., 2018). The confusion matrix is formed by an array of squares of numbers arranged in rows and columns that express the number of sample units of a particular category, inferred by a rule of decision, compared to the category current verified in the field. The measures derived from the confusion matrix are: the total accuracy being that chosen by the present work, accuracy of individual class, producer precision, user precision and Kappa index, among others. The total accuracy is calculated by dividing the sum of the main diagonal of the error matrix x ii , by the total number of samples collected n. According to the Eq. (4).
Receiver Operating Characteristic Curve (ROC curve) is a measure of performance for classification problems in various boundary settings. The ROC is a probability curve and the ROC curve represents the degree or measure of separability. It informs how much model is able to distinguish between classes. the ROC curve is a curve that is drawn using the true positive rate and the false positive rate. The ROC curve is a complete sensitivity/specificity report.
In a ROC curve, the true positive rate (Sensitivity) is plotted against the false positive rate (specificity of 100) for different cut-off points of a parameter. Each point in the ROC curve represents a sensitivity / specificity pair corresponding to a given decision threshold.
To evaluate the performance of the classifiers of the present study, the confusion matrix, accuracy, along with the measures given by the ROC curve are used: sensitivity, specificity.
To exemplify a classification (Pneumonia X Normal), where true positive (TP) is the number of pneumonia samples that are correctly classified and true negative (TN) is the number of normal samples that are classified correctly. The false positive (FP) are Normal samples classified as wrong Pneumonia, and false negative (FN) are Pneumonia samples classified as Normal.

Results
In this section we will present the performance results of neural networks. Accuracy data, ROC curve and matrix of confusion will be presented. Satisfactory results were obtained, compared to the paper of . Two pre-trained neural networks (a) Source:  Figure 5: Example of radiographic images divided into normal and with pneumonia. Source: (Kermany, Zhang and Goldbaum, 2018) and   were implemented and the learning transfer was used, the networks used were: Xception and NasNetLarge. and showed an improvement in accuracy. Data were divided into three groups: Training, testing and validation. On the Table 3 it is possible to visualize the accuracy that each network obtained, compared to Inception V3 implemented by the Kermany (2018).
As described above, a portion of the dataset was separated for validation.(Bacterial pneumonia X viral pneumonia) were separated 50 images of each, already in (Normal X pneumonia) were separated 100 images of each. On the Table 4 it is possible to visualize the results obtained by the networks, and the best results were attributed to the comparison of (Pneumonia X Normal).
The networks performed less than the test in cases of (Pneumonia Bacteria X Viral Pneumonia). Thus, the network that had the best capacity to generalize was the NasNetLarge with 96.5% in (Pneumonia X normal) and 69.00% in (Pneumonia Bacteria X Viral Pneumonia). The two networks did well in generalization when the case was only detecting pneumonia. In the Figs. 6 to 9 it is possible to visualize some more results, represented by the matrices of confusion, curve ROC and Recall curve.
In Fig. 6 it is possible to see the confusion matrices and ROC curve for the classification of (Normal X Pneumonia) with the Xception network, with a good performance for the test and validation sets, with 99% acurracy in the validation, demonstrating that the model is able to generalize well, since in the test set it obtained a slightly lower percentage. In Fig. 7d it is possible to see that the validation performed considerably less, with 60% accuracy, this is due to the fact that the model has a greater difficulty in distinguishing differences between the classes of Viral and Bacterial Pneumonia . In the Figs. 8 and 9 the results of the NasNetLarge network are shown, which had the same difficulty in distinguishing between the classes of viral and bacterial pneumonia, but with a slightly better performance, 69% accuracy .
In the Table 5 it is possible to visualize measures of Sensitivity and Specificity, which are metrics chosen to evaluate the performance of the classifiers, with this one can observe the comparative of the same, related to the classification of (Pneumonia X Normal) and (Pneumonia Bacterial X Viral Pneumonia). It is possible to see how the networks performed well in the tests and validation, and also how the proposed method performed better than Kermany (2018).    (1) and confusion matrix of neural network validation NasNetLarge.

Conclusion
In this work, a comparison was made between the pre-trained neural networks Xception, Inception V3 and NasNetLarge with learning transfer for the classification of thoracic x-ray images in the detection of pneumonia. The use of pre-trained neural networks with learning transfer has been shown to be efficient in classification, as discussed in this work, the state of the art is optimistic about the use of this technique. In the literature, applications for classification of pneumonia are presented that use approaches such as network construction, as is the case of the works Saraiva et al. (2019), Andika et al. (2019) and transfer of learning as an extractor of features such as the work of Toğaçar et al. (2020) that uses a combination of three pre-trunked networks, alexnet, VGG16 and VGG19 to make the classification. The main contribution of this work is in the detailed analysis of performance of the Xception and NasNetLarge networks in the classification of three classes, separated in pairs, classification of viral x bacterial pneumonia and classification between pneumonia and normal. Fundamentals that strengthen the use of transfer of learning for the classification of chest x-ray images of pathologies such as pneumonia are presented during the work, but with the potential to extend other classes.
The neural network NasNetlarge was the one that obtained the best result, even in the validation of (Bacterial Pneumonia X Viral Pneumonia) it demonstrated better than the Xception. However, the networks NasNetLarge and Xception had a good result in the generalization question in the comparison of (Pneumonia X Normal), having a result in the validation similar to that of the test, thus demonstrating its capacity. The study of machine learning techniques in assisting medicine is promising, with the possibility of improving diagnoses in hospitals.