Swarm intelligence and evolutionary computation approaches for 2D face recognition: a systematic review

A wide range of approaches for 2D face recognition (FR) systems can be found in the literature due to its high applicability and issues that need more investigation yet which include occlusion, variations in scale, facial expression


Introduction
Face recognition (FR) systems are widely used in di erent parts of the society which includes, for example, residences, public places, industries, commercial shops and o ces.In addition to this, in recent decades, some internet applications have been developed using the 2D FR systems.Generally biometric-based human recognition systems are becoming popular making possible to breach traditional security systems (Islam et al.;2012).Hence, it is possible to nd security systems based on biometric identi cation such as gait, ngerprint, signature, voice, iris, face recognition (FR), and others (Nebti and Boukerram;2017).Among these, FR systems stands out due to the necessity of an identi cation system for real-world situations with a high ow of people.As an example, it is impractical to stop everyone at the entrance of an airport to record their ngerprint, but to place a camera recording the ow is enough for face capture.We have reached a point where FR systems perform well under controlled settings (Ochoa-Villegas et al.;2015), however real-time situations require systems that can deal with uncontrolled settings like possible occlusion, adornments changing, di erent scales, facial expressions and illumination variation.Although we have many techniques that perform well under some uncontrolled conditions, it is worth pointing out that, up to this moment, no generic technique have been proposed to guarantee total immunity to this problems (Ochoa-Villegas et al.;2015), leaving a gap for the research community.Among di erent kind of approaches proposed during the last decades, some studies can be found in literature which are focused on approaches that employ optimization techniques inspired by nature, i.e. bio-inspired optimization techniques (Bowyer et al.;2006;Abate et al.;2007;Islam et al.;2012;Scheenstra et al.;2005;Kong et al.;2005;Zhao et al.;2003;Alsalibi et al.;2015).
It is possible to classify the bio-inspired algorithms into di erent groups based on their source of inspiration (Fister et al.;2013).Algorithms inspired by physiochemical systems were designed to mimic the behavior and characteristics of certain laws of physics or chemistry, including gravity, electric charges, and pluvial systems.In this group we can cite Big-Bang Big-Crunch Optimization, Black Hole Optimization, Simulated Annealing, among others (Fister et al.;2013).Algorithms based on biological systems have its source of inspiration originated from Biology.In this category we have Arti cial Neural Networks (McCulloch and Pitts;1943), Arti cial Immune Systems (Dasgupta et al.;2011), Evolutionary Computation (De Jong;2006) and Swarm Intelligence (Parpinelli and Lopes;2011), being the last two the focus of this work due to its applicability in FR systems.From now on, the term bio-inspired algorithms is referred in this work as any Swarm Intelligence (SI) and Evolutionary Computation (EC) approaches.Through this, we may realize that an increased number of bio-inspired FR systems had emerged due their intelligent problem-solving ability, scalability, exibility, and adaptive nature.
SI and EC are the two major branches representing the biologically inspired algorithms for optimization.SI-based approaches are inspired by the social and collective behavior of insects, such as ants, termites, bees, ock of birds and sh school.In this branch, we can cite the Particle Swarm Optimization (PSO) algorithm (Eberhart and Shi;2011), one of the wellknown algorithms among researchers, and it is inspired by the coordinated movement of sh schools and bird ocks.Among many versions of PSO, its binary version has been widely used to nd the most discriminative set of features in facial images improving FR systems (Vora et al.;2014;Varun et al.;2015;Varadarajan et al.;2015).Another popular SIbased algorithm is the Ant Colony Optimization (ACO) (Dorigo and Stutzle;2003), which is inspired by the collective behavior of ants in nding the shortest path between the nest and the food source through a substance called pheromone.This algorithm has been used for feature selection and recognition purposes in 2D FR systems (Kanan et al.;2007;Venkatesan and Madane;2010;Kaur et al.;2013a).Other SI algorithms that have been used in FR systems are inspired by bacterial foraging (BFO) (Passino;2002), bees foraging (BA, ABC) (Pham et al.;2011;Karaboga;2005), brood parasitic behavior of some cuckoo species (CSA) (Yang and Deb;2009), and gravity law and mass interactions (GSA) (Rashedi et al.;2009).Many other SI algorithms can be found in (Parpinelli and Lopes;2011).EC-based algorithms are inspired by the evolutionary theory proposed by Darwin.In this branch we may cite the Genetic Algorithm (GA) (Holland;1973), in which natural selection and genetic operators play the main role, and has been used for feature selection and classi cation in FR systems (Fan and Verma;2004;Zheng et al.;2005;Liu and Wang;2006).Another algorithm based in Darwinian theory is the Di erential Evolution (DE) algorithm (Storn and Price;1997) which has been also used as an optimization technique in FR systems (Mallipeddi and Lee;2012;Yoo et al.;2013).
According to Detroz et al. (2015), the traditional bibliographies are oriented by the researchers experience, which might lead to biased results.Unlike, a systematic review aims to present a fair evaluation of a research topic, identifying and evaluating in a reliable and impartial manner all relevant researches using a trustworthy, rigorous, and auditable methodology (Keele;2007).Also, it allows to summarize the bene ts and limitations of a speci c method (Kitchenham;2004).Hence, this work presents a Systematic Literature Review (SLR) of 2D face recognition systems using biologically inspired approaches (considering both SI and EC algorithms), providing a well-de ned methodology to identify, analyze and interpret all available evidences related to the topic (Keele;2007).After ltering the results through objective, inclusion and exclusion criteria, seventy three relevant works are gathered for SLR purpose.Some of the bio-inspired algorithms used in 2D FR systems for di erent purposes are template matching (Chidambaram et al.;2012), classi cation (Nebti and Boukerram;2017), parameters optimization (Oh et al.;2016;Shen et al.;2015;Fernández-Martínez and Cernea;2015;Loderer and Pavlovičová;2014), and feature selection (Khadhraoui et al.;2016;Rao and Rao;2016;Farag et al.;2016).
It is important to highlight that in Alsalibi et al. ( 2015) a relevant review with the same scope of ours is presented.However, in the Alsalibi's work, the review period encompass papers published until 2013, and a SLR methodology is not employed.With the emergence of new bio-inspired algorithms and its approaches, along with the application of a rigorous review methodology, the present work brings the following contributions: i.An up-to-date review of bio-inspired approaches for 2D FR Systems using the SLR methodology.
ii.The description of how each algorithm is employed in FR systems.iii.A brief description of each work that is analyzed.iv.A summary of employed approaches and algorithms.
v. Exposure of how bio-inspired approaches represent a possible solution and tness function.vi.A rank of databases with its main features.This review is organized as follows: Section 2 describes the SLR methodology; a brief description of bio-inspired algorithms and their applications to FR systems are reported in Section 4; Section 5 presents a summary and the discussion on the review; and the conclusions and future trends are presented in Section 6. Petersen et al. (2008) presents the SLR methodology.The rst step is to de ne the research questions (RQ) in order to identify and evaluate all available relevant works, which are presented as follows:

Research Method
• RQ 1 : What EC and SI algorithms are being applied to 2D FR systems?• RQ 2 : How EC and SI algorithms are being applied in 2D FR systems?
With the scope de ned, the SLR is e ectively done through the following steps: planning and conducting.

Planning the Review
After de ning the RQs, the search for relevant works must be done.This work performs an automated search, in which it is de ned a boolean search string with keywords used as input in Academic Search Engines (ASE).The following words were employed: faces, face, facial, recognition, detection, bioinspired, bioinspiration, bio-inspired, bio-inspiration, bio inspired, bio inspiration, evolutionary, and swarm, as well its synonyms and variations.The ASEs were chosen according to Navau et al. (2013), which represent the most relevant for Computer Science.The ASEs selected were Web of Knowledge (ISI-WoS), SCOPUS and IEEEXplore.
The following boolean composition was used to perform the search: (FACES OR FACE OR FACIAL) AND (RECOGNITION OR DETECTION) AND (BIO-INSPIR* OR "BIO INSPIR*" OR BIOINSPIR* OR EVOLUTIONARY OR SWARM).However, as each ASE has its own search mechanism, this boolean query su ered some changes preserving its semantic meaning.

Conducting the Review
After de ning the search engines and the boolean query, closure requirements are necessary, such as objective, inclusion, and exclusion criteria, which are presented as follows: • Objective Criteria (OC) Based on the above rules, it was established that if a work ts into any EX or does not t into any OC,then it should be excluded.The evaluation of each paper returned by the boolean query was done in this order: OC, EX, and IC. Figure 1 shows the relevant works found for each ASE.After removing the duplicated works, about seventy three papers are kept for analysis.

Face Recognition
Traditional security systems based on encryption and passwords have proven vulnerable and easily breakable.Hence, the biometric technology became crucial for many domains (Nebti and Boukerram;2017).Among them, 2D FR systems do not require any user interaction which becomes an advantage compared with other biometric technologies (Kim et al.;2016).However, some issues arise on image acquisition such as occlusion, di erent pose and expressions and illumination variation.In this context, much interest and research have been focused on the eld of FR and consequently, an increased number of bio-inspired FR systems had been emerged for di erent purposes which include feature selection, parameters optimization, template matching and classi cation.
According to Rao and Rao (2016), the feature selection technique in a FR system consists in extracting the best features subset from the original images dataset, aiming to improve recognition rate.
Experimental results have proven to compensate illumination and expression variations.Many authors employed bio-inspired algorithms to optimize the intrinsic parameters in their proposed methodologies, such as selecting the parameters G and C of Support Vector Machine (SVM) classi cation (Valuvanathorn et al.;2012), searching the optimum Hidden Markov Model (HMM) states and parameters (Farag et al.;2016), or adjusting the parameters of a homomorphic lter (Plichoski et al.;2017).In addition to these, there are other bio-inspired approaches that are employed for template matching, which consists in nding areas of an image that better match to a template image (Chidambaram et al.;2014).As preprocessing step, template matching might deal with problems such as scale and rotation variations.Finally works were found for classi cation purposes, such as Nebti and Boukerram (2017) that employed a bio-inspired approach to classify a decision tree recursively until obtaining only one class representing the input face image, thus addressing illumination, pose and facial expression variations.Table 1 presents the acronyms used in this review to indicate in which speci c problem, bio-inspired algorithms are employed in FR systems.EC is based on the natural selection theory proposed by Darwin.Individuals of a population compete to survive in which the more adapted ones with higher reproductive chances will survive.Genetic Algorithms (GA) (Holland;1973), Di erential Evolution (DE) (Storn and Price;1997), Genetic Programming (GP) (Koza;1992), and Memetic Algorithm (MA) (Dawkins;2016) are some EC-based algorithms that can be found in the literature related to 2D FR systems.
SI algorithms are inspired by the social and collective behavior of insects, such as ants, termites, bees, ock of birds and sh school.The collective and self-organized behavior that appears from local interactions is the intelligence found in those systems, which is actually called emergent behavior.Ant Colony Optimization (ACO) (Dorigo and Stutzle;2003), Arti cial Bee Colony (ABC) (Karaboga;2005), Bacterial Foraging Optimization (BFO) (Passino;2002), Bees Algorithm (BA) (Pham et al.;2011), Cuckoo Search Algorithm (CSA) (Yang and Deb;2009), Gravitational Search Algorithm (GSA) (Rashedi et al.;2009), and Particle Swarm Optimization (PSO) (Eberhart and Shi;2011) are some SI-based algorithms found in the literature that are applied to 2D FR systems.
In the next section, each algorithm is brie y detailed followed by their respective related works in the eld of 2D FR.Displayed in alphabetical order, the rst two algorithms belong to EC and the subsequent seven algorithms belong to SI.Also, for each algorithm, the works reviewed are grouped according to their speci c application following the acronyms order presented in Table 1.

Di erential Evolution
Di erential Evolution (DE) was proposed by Storn and Price (1997).This algorithm initializes with randomly generated individuals in the search space.New individuals are generated adding the weighted di erence between individuals to a third one, namely target.This routine is called mutation.Then, the target individual is combined with a randomly preselected individual resulting the trial individual and represents crossover in DE.If the tness of the trial individual is worse than the target's tness value, the trial is discarded.Otherwise, the trial individual replaces the target individual in the next generation, and represents greedy selection.This process repeats until a stop criteria is reached.
In our research for DE applications, four works found in the literature using FS.But, among these, three works also employed for PO.Mallipeddi and Lee (2012) used the DE algorithm in their system to select the optimal Principal Component Analysis (PCA) features.In DE, each population member encodes the index of d features to be selected from the amount of n features, and the search is guided by maximizing the distance between classes.To validate their methodology, Yale, Yale B, ORL, and AR databases were used.Oh et al. (2013) used DE to optimize the parameters of a Radial Basis Function Neural Network (RBFNN) and also for feature selection of combined PCA and Linear Discriminant Analysis (LDA) features.The DE is represented as a vector containing the learning rate, momentum coe cient, and fuzzi cation coe cient parameters, as well as the selected feature subset.In a later experiment, they used only the 2D-LDA features on the Yale and ORL databases (Yoo et al.;2013).Recently, they performed a comparative study of feature extraction methods and their application to RBFNN using the same FR systems architecture but with 2D 2 LDA features (Oh et al.;2016).

Genetic Algorithm
The Genetic Algorithm was proposed by Holland (1973), and it is inspired by Darwin's evolution theory.The concepts of evolution and natural selection is used to guide the search for better solutions in a problem search space.The crossover routine simulates reproduction combining two individuals previously selected to generate a new one, then, the mutation operator is applied at each newly generated individual.It is usually represented as binary strings, so they can be decoded to almost any desired representation.
From this survey, we found three works employing the GA for feature selection (FS).Vignolo et al. (2012) proposed the use of GA to select the features extracted by means of Active Shape Model (ASM).The search is guided by a classi er so that the classi cation success rate is assigned as tness value for each evaluated individual.The experiments were performed using University of ESSEX face Recognition Data.In another work, Vignolo et al. (2013) also used ASM for extracting features, but with a modi ed version of GA called Multi-Objective Genetic Algorithm (MOGA), in which the rank of an individual is the number of chromosomes in the population by which it dominates.Also, the author proposed an aggregative tness function, which combines classi cation accuracy and the number of features in a single equation.Loderer and Pavlovičová (2014) proposed to optimize the parameters of Local Binary Patterns (LBP) such as type of pattern, size of blocks, distance measure, and the dimension of histograms using GA.The chromosome is represented as a sequence of values which will be optimized.To validate their methodology CMU-PIE, Yale B, and ORL databases were used.

Arti cial Bee Colony
The Arti cial Bee Colony (ABC) algorithm was proposed by Karaboga (2005).The inspiration comes from the natural foraging behavior of honey bees to nd the optimal food source.Bees estimate the location of the food source by measuring the amount of energy spent as they y, as well as the direction.The location and quality of the food source is shared with their nest-mates by performing a waggle dance and trophallaxis (direct contact).In the ABC, bees aim to discover places of food sources (solutions) with high amount of nectar, that represents a good solution.Many aspects of the bees and other insects have been explained by the principle of selforganization (Eberhart and Shi;2011).Foraging behavior for food sources depends on the three types of bees in the colony: the scout that randomly ies in the search space for new food sources, the employed bee that exploits the neighborhood of their locations selecting a random solution to perturb (new food source), and the onlooker bee (information obtained from waggle dance) that uses the population tness to probabilistically select a guiding solution to exploit its neighborhood.If a new source is better than the previous one a greedy selection strategy is employed to determine the new food source.
In the literature, according to our research scope, three works are found applying the ABC algorithm in 2D FR systems in which two for FS and one for TM.Chakrabarty et al. (2013) proposed the use of a Lévy-mutated ABC algorithm to derive optimal Volterra kernels simultaneously maximizing interclass distances and minimizing intra-class distances in the feature space.The performance of the proposed methodology was validated in Yale A and Yale B databases.Khan and Gupta (2016) applied the ABC to reduce the number of sub-windows in an extraction algorithm.In this work, the bees select a subwindow and the solution is evaluated by the subwindow average.Experiments were carried out on University of ESSEX face Recognition Data and VITM datasets.In addition the previous two works, Chidambaram et al. (2014) proposed a multiple face recognition approach using an improved ABC algorithm to search local features extracted by Speeded Up Robust Features (SURF).The individual is represented as a four-dimensional vector containing horizontal and vertical coordinates, scale factor and rotation angle, then an image patch is cut from the still image and its interest points are identi ed for matching with the target face image.The distance measures between the images were used to guide the search.Experiments were carried out on BIO-INFO database.

Ant Colony Optimization
Ant Colony Optimization (ACO) was proposed by Dorigo and Stutzle (2003).It's source of inspiration comes from the collective behavior of ants in nding the shortest path from the nest to the food source using the substance called pheromone.Ants drop pheromone on the ground as they travel and tends to follow the path with more pheromone, thus, after several trips to the food source the shortest path will retain more pheromone leading more ants to it.
Only one work using the ACO was found in this review and is an ensemble with ABC algorithm, thus, it is described latter on Subsection 4.10.

Bacterial Foraging Optimization
The Bacterial Foraging Optimization (BFO) was proposed by Passino (2002) and it is inspired by the bacteria foraging behavior.The E. Coli bacterium move itself rotating its agella around their body.If rotates counterclockwise, they propel the bacterium along a trajectory (run or swim), otherwise, they pull on the bacterium in di erent directions (tumble).Alternating properly between these two modes, it keeps bacteria in places with higher concentration of nutrients.In BFO algorithm, the bacteria represent the solutions, and its health, the quality of the solution.
At rst, bacteria tumble and swim (chemotaxis) in the search space, and then, the half of the population are killed based on the their health and the other half are duplicated (reproduction).After a certain number of reproduction steps, some bacteria are probabilistically chosen to be killed, and new ones are randomly generated (elimination-dispersal).
In this section, two relevant works can be mentioned that employed BFO algorithm focusing on FS and PO.The rst work was proposed by Panda and Naik (2015) which is a modi ed version of BFO called Adaptive Crossover Bacterial Foraging Optimization Algorithm (ACBFOA) to nd optimal subset of features reduced by LDA and PCA.In ACBFOA, the algorithm incorporates adaptive chemotaxis and also inherits the crossover mechanism of genetic algorithm.Experiments were performed on Color FERET, Yale A and UMIST data sets.Meanwhile the second work was proposed by Yadav et al. (2013) that mitigated the e ect of facial changes by combining the global features of LBP and local facial regions at match score level by means of the BFO algorithm.The objective function is de ned for learning the weights to be employed in a weighted sum rule fusion.Experimental results are presented using the FG-Net and IIITDelhi face aging databases.

Bees Algorithm
The Bees Algorithm (BA) was proposed by Pham et al. (2011), and it is inspired by the bees foraging behavior, as in the ABC but with di erent internal routines.In this algorithm, a bee represents a possible solution, and a solution represents a visited site.The scout bees are randomly placed in the search space from which the bees with highest tness will become selected bees and sites visited by them are chosen for local search.
Only one work using the BA was found in this review and it is developed with PSO algorithm.Hence it is described in Subsection 4.10 which speci cally discusses about ensembled approaches.

Cuckoo Search Algorithm
The Cuckoo Search Algorithm (CSA) was proposed by Yang and Deb (2009), and it is inspired by the brood parasitic behavior of some cuckoo species, in combination with the Lévy ight behavior.These cuckoos lay their eggs in communal nests for other bird species to hatch.The CSA is based in three main rules: 1) Each cuckoo lays one egg at a time, and dump its egg in a randomly chosen nest; 2) The best nests with high quality of eggs will be carried over to the next generations; 3) The number of available host nests is xed and the egg laid by a cuckoo is discovered by the host bird with a probability.In this case, the host bird can either throw the egg away or abandon the nest and build a completely new nest.In the proposed algorithm, each egg in a nest represents a solution and a cuckoo egg represent a new solution.The aim is to use the new and potentially better solutions (cuckoos) to replace a not-so-good solution in the nests.
Only one work was found in literature that employed CS algorithm for FS.Naik and Panda (2016) proposed an adaptive version of CSA (ACS) to nd the optimal feature vectors for classi cation on Intrinsic Discriminant Analysis (ICA) feature space.In ACS, the step size is made adaptive from the knowledge of its tness function value and its current position in the search space.For performance analysis, the YALE, ORL, and Color FERET databases were used.

Gravitational Search Algorithm
The Gravitational Search Algorithm (GSA) was proposed by Rashedi et al. (2009).This algorithm is inspired by the law of gravity and mass interactions in which particles are considered as objects and their performance is measured by their masses.All these objects attract each other causing their movement in the search space.When a solution is better with heavy mass, then it naturally moves more slowly than lighter ones, and thus, leading to more local search.The main di erence of GSA from PSO is the local communication between objects that uses a gravitational factor.
During the SLR process about GSA, only one work was found for FS.Chakraborti and Chatterjee (2014) proposed to use a binary variation of GSA with dynamic adaptive inertia weight (BAW-GSA) to select the relevant features extracted by LBP, Modi ed Census Transform (MCT) and Local Gradient Pattern (LGP).The tness function was implemented as the ratio of the within class distance (Intra-Class) to the between class distance (Inter-Class).The experiments were carried out on Yale A, Yale B, ORL, LFW and AR databases.

Particle Swarm Optimization
Particle Swarm Optimization (PSO) was proposed by Eberhart and Shi (2011) in 1995.The PSO algorithm is inspired by the coordinated behavior of a ock of birds or a sh school.The search mechanism is based on the acceleration of the particles, being attracted by the global best position (social component) and the personal best position (cognitive component) found so far.The algorithm initiates with a random population of particles with it's own velocities, which are responsible for moving the particles around the search space.At each iteration the velocities are updated until reaches a prede ned stop criteria.It's canonical version were used for continuous optimization problems, however nowadays we may nd in literature several other adaptations for di erent types of optimization problems e.g.discrete optimization problems (Kennedy and Eberhart;1997).Among a wide range of works in which the PSO algorithm is used, we attempt to describe in the present section the forty three works that are developed for FS followed by eleven works that used for PO.
In ThPSO, the recurrence of selected features is considered based on a threshold set by the user.Sattiraju et al. (2013) (Divya et al.;2012;Gagan et al.;2012), moreover an accelerated version of BPSO (ABPSO) was employed on DCT feature space (Aneesh et al.;2012).In ABPSO, the velocity is updated for each iteration by summing it with the previous positional values for each particle.Ensemble approaches were experimented with DCT, such as Deepa et al. (2012) that used BPSO to select features extracted by DFT and DCT, and Ajaya et al. (2014) proposed an approach to select extracted features from Contourlet Transform (CT) and DCT.On CT feature space, PCA was employed for dimensionality reduction and BPSO for FS.Last, (Darestani et al.;2013) extracted features using only CT and employed BPSO for feature selection.Then, in the classi cation stage, a classi er based on a arti cial neural network was used with PSOoptimized hidden layer size and learning rate.In the works mention here, the mostly used databases are presented in descending order: ORL, Color FERET, Yale B, UMIST, CMU-PIE, PHPD, JAFFE and CAS-PERL.
A considerable number of works using DWT and DCT along with PSO were also performed (D'Cunha et al.;2013;Nischal et al.;2013;Rao et al.;2014;Babu, Shreyas, Manikantan and Ramachandran;2014;Kodandaram et al.;2015).Also, Soumya et al. (2013) proposed a approach using BPSO to select features extracted from DWT and DCT.However, besides class separation, the authors used DCT trace in the tness relation.Similarly Rao and Rao (2016) used BPSO for FS extracted from DWT and Slope-form Triangular Discrete Cosine Transform (STDCT).In this work, tness function was based in preserving maximum precision in order to represent the original feature set.The works mentioned in this paragraph used the following databases which are in descending order: Color FERET, CMU-PIE, PHPD, Yale B, UMIST, ORL, FEI, GT, IFD and JAFFE.
Other approaches for FS using BPSO and distance between classes were also proposed, such as the work developed by Shanbhag et al. (2014).They developed the work based on BPSO to search the features extracted by Wavelet Transform Feature Extraction (WTFE) for the optimal subset.Shetty et al. (2013) proposed a similar work, however, to select features extracted by Stationary Wavelet Transform (SWT) based technique namely Shift Invariance based Feature Extraction (SIFE).The authors claimed that they have used a modi ed version of BPSO called Weighted Binary Particle Swarm Optimizer (WBPSO), but, actually, they just considered the number of times a particular feature is being selected in the tness computation.Also, Babu, Birajdhar and Tambad (2014) used the recurrence of a feature in tness evaluation and called Conservative BPSO to select features from SWT feature space.Abhishree et al. (2015) proposed to use BPSO to select features extracted by Gabor lter technique.The same way, Kishore et al. (2014) used Gabor lter, but combined with Fast Fourier Transform (FFT).Di erently, Cheng et al. (2014) searched for the optimal subset on the Self Quotient Image (SQI) features space meanwhile Nema and Thakur (2015) proposed to select features extracted by LDA.In their work the algorithm was adapted with a deterministic parameter control technique which decreases C1 and increases C2 exponentially with time.Like others, Vora et al. (2014) tried the FS on Gamma Ray Burst Rhombus Star (GRBRS) feature mask space and Varun et al. (2015) on Block-wise Hough Transform (HT) feature space.The nal work which used the distance between class as a tness evaluation was proposed by Varadarajan et al. (2015).In this work, the author tested a modi ed version of BPSO called Exponential Binary Particle Swarm Optimization (EBPSO) to select the features extracted using Block based Additive Fusion.The group of articles mentioned in the present paragraph used the databases: Color FERET, CMU-PIE, Yale B, PHPD, FEI, ORL, IFD, LFW, GT and CAS-PERL.
Di erent approaches for FS using PSO and its variants were employed, such as Lei et al. (2012) who proposed to use a variation of PSO denominated as Fast Static Particle Swarm Optimization (FSPSO).It treats the whole initial feature set as a static particle swarm in which no new particle would be generated in high dimensional space, and the proposed method takes lter and wrapper strategy to pick out the most discriminative feature particle subset.In the universe of FS works, Shieh et al. (2014) proposed to use PSO to select features extracted by PCA using SVM as tness function.Yin, Qiao, Fu and Xia (2014) proposed to use BPSO for feature selection on DCT coe cients feature space.In this approach, the tness function is based on classi cation rate and dimensionality.In another work, YIN, FU and SUN (2014) used the same approach, but, they reduced the search space for BPSO algorithm by preselecting DCT coe cients according to a separability criterion.In addition to the modi ed versions of PSO, it is becoming common the use of modi ed versions BPSO.
As an example, we can mention Sah et al. (2015) who proposed a modi ed version of BPSO called Logarithmic Binary Particle Swarm Optimization (LBPSO) to select features extracted by Entropic Gabor Wavelet Transform (GWT).In LBPSO, the global solution and particle best positions are weighted instead of their current positions.The feature vector is optimized by sampling the vector and choosing the one with the highest entropy.Among the works that we have mentioned, the work proposed by Mollaee and Moattar (2016) is somehow di erent since they proposed to use the PSO algorithm to aid discriminant ICA in nding multivariate data with lower dimension and independent features by maximizing Negentropy, as well as Fisher criterion.Zhang and Peng (2016) proposed to use PSO to nd out the optimum combination of the basis and variety from a basis plus variety model on a high-dimensional unit sphere in terms of the minimum L2 distance relative to the query image.In their model, the query image is approximately a linear combination of the basis image and variety image.The basis images are the neutral images of subjects and variety images are generalized from multi-sample subjects.The basis of the optimum point gives the identity information for classi cation.The databases used by the works mentioned in this paragraph were: Yale B, ORL, Color FERET, CMU-PIE, AR and MIT.
The bio-inspired approaches for PO were also employed the PSO algorithms.Here, we comment about the works related to SVM parameters optimization.Valuvanathorn et al. (2012) developed the work to aid SVM classi cation by selecting the parameters G and C automatically (PSO-SVM).In the next work, a modi ed version of PSO, called Opposition Particle Swarm Optimization (OPSO) was presented by Hasan et al. (2013) to optimize the SVM parameters in training and testing features extracted by PCA.In OPSO, two populations generated: the rst one is random and the second is opposition population which is based on the rst population values.In this work, similar to others, Xiao et al. (2014) used PSO combined with grid-search to optimize the parameters of a Radial Basis Function (RBF) kernel in SVM.Similarly, Zou and Zhang (2016) presented their PO work using the recognition rate to calculate the tness of each particle.The above mentioned works in the present paragraph used mostly the following databases: ORL, Color FERET, Yale A and BioID.
Besides optimizing SVM parameters, some works that optimize other parameters were also found, such as Pan et al. (2013) who proposed to replace exhaustive search used in Adaboost framework with PSO.The authors claimed that PSO is used as FS procedure, but as each particle is encoded with a feature parameter set (type, xs , ys , θ, width, height, sampling_points, radius).The tness function was de ned as the normalized classi cation error rate.Banerjee and Datta (2013) proposed to use PSO for parameters optimization for both constrained and unconstrained type in which particle vectors are considered as correlated parameters to be optimized.The false acceptance rate was used as objective function to be minimized.Trinh et al. (2014) proposed to use PSO algorithm to nd optimal weights to fuse global and local Fourier-Mellin Transform (FMT) features at score-level.The tness function is evaluated by calculating the recognition rate related to a speci c set of weights.Fernández-Martínez and Cernea (2015) proposed to use a modi ed version of BPSO called RR-PSO to optimize the parameters of SCAV1.SCAV1 is a supervised ensemble learning algorithm based on six nearestneighbor classi ers based on histogram, variogram, texture analysis, edges, DWT and Zernike moments.In RR-PSO, global and local search are balanced by adopting regressive discretization in acceleration and in velocity of the PSO continuous model.Farag et al. (2016) proposed to apply PSO to search the optimum HMM states and parameters.In their approach, maximum accuracy and minimum feature dimension was used to guide the search.Kim et al. (2016) proposed to use PSO algorithm to optimize the parameters of RBFNN such as the number of nodes and fuzzi cation coe cient.The classi cation rate is used as the tness value.Finally, Plichoski et al. (2017) proposed to use PSO to optimize Homomorphic lter (HF) parameters namely high and low frequency factors, cut-o frequency and lter's order.The recognition rate is used as tness function to guide the algorithms search.The most used database by the presented works were: ORL, CMU-PIE, Yale A, PHPD, MIT, AR, Korean Face Database, PUT, UCSD and IC&CI.

Ensemble-based approaches
This section refers to applications of ensemble bio-inspired approaches in which more than one optimization algorithm is used in the same FR system.In our review, some hybrid approaches were found in which the rst work embedded the algorithms in their system for FS and CL.In the other works reviewed, bio-inspired approaches were employed for FS, followed by PO and CL respectively.Kaur et al. (2013b) used ABC to select the features extracted by DWT, wherein optimization is driven by terms of correlation value for pattern recognition.For CL application task, ACO is used by measuring the distances between the selected features.The authors used their own image sets to test their methodology.Khan et al. (2015) proposed to use BPSO along with GA algorithm for feature selection.Global and local features were extracted from the image using DCT and LBP, respectively.The system was evaluated using ORL and LFW face databases.Kallianpur et al. (2016) proposed to use a modi ed discrete version of ABC called Hybrid-Discrete Arti cial Bee Colony algorithm (H-DABC) to select features extracted from DWT.This ABC version contains certain elements from PSO, then, its an hybrid algorithm.Their methodology was validated using LFW, Color FERET and ORL.Dora et al. (2017) proposed to use a hybrid algorithm of PSO and GSA (PSO-GSA) for optimizing the parameters of an evolutionary single Gabor kernel (ESGK) lter which is used for feature extraction.The tness function used for selecting optimal lter is the Gabor energy vector.The experiments were performed on Color FERET, ORL, UMIST, GT and LFW databases.The last work reviewed is proposed by Nebti and Boukerram (2017) using BA and a decision tree based PSO as classi ers.The tree is represented by the classes, then BPSO classi es them recursively until obtaining only one class representing the input face image.BA searches for the best training faces which are the most similar to the faces being recognized.The tness function for both algorithms is the sum of the Euclidean distances between the current particle and the testing sample features.The two are combined with a decision tree based fuzzy SVM using majority vote.The experiments were conducted on ORL, YALE, FERET and UMIST.

Summary and Discussions
As seen in Section 4, most of the bio-inspired algorithms in 2D FR systems have been applied to FS, PO, CL and TM.Seventy three scienti c articles were analyzed focusing on the problem approached, which bio-inspired algorithm are used, how candidate solutions are represented, how tness function is modeled, and what databases are employed.
Figure 2 presents the distribution of bio-inspired algorithms among the analyzed works.A huge gap between the use of the PSO algorithm (73.41%) in comparison to other algorithms (26.59%) can be noticed.
Some reasons for the popularity of PSO algorithm in 2D FR systems might be because of its good performance, its implementation simplicity, and the use of few parameters to be tuned.Also, as pointed out by Alsalibi et al. (2015), PSO algorithm requires less training time, has good scalability and high convergence rate.According to an extensive review done by Zhang et al. (2015), the number of publications related to PSO is the highest among others reaching around 1,000 per year.However, the few works developed using other bio-inspired algorithms provides room for exploiting their features.Among the four applications employed, the FS applications reaches the highest rate of 76.315%, followed by PO with 19.74%.In our review, we found only two works for classi cation and one work for template matching.Figure 3 shows the distribution of applications per algorithm in which the dominance of FS applications can be clearly observed.The high usage of FS can be justi ed since its use as a dimensional reduction approach becomes attractive and sometimes required.Additionally most of the works that deals with images and videos requires FS as an important step and it is an important component of many pattern recognition tasks with very high-dimensional data (Gui et al.;2017).
For every work applied for FS, the candidate solution was represented as a vector containing the selected features.Also, in most works the search was driven aiming to maximize the distance between classes.The most used feature extraction techniques are presented in Figure 4 For the CL task, distinct solution representations are employed.For example, one work used the image pixels as solution representation (Kaur et al.;2013b) and another used the database classes (Nebti and Boukerram;2017).For the TM task, the individual is represented as a four-dimensional vector containing horizontal and vertical coordinates, scale factor and rotation angle.All works used the Euclidean distance measure to drive the search and to calculate the distance between the images (Chidambaram et al.;2014).
Figure 5 shows the distribution of most used bioinspired algorithms per year.The number of works employing the PSO far exceeds all other algorithms.However, the ABC and the set DE and GA occupy the second and third rankings respectively.
When analyzing the works presented in this review, we realize that the experimental methodology

Conclusion and Future Work
Nowadays, in di erent parts of the society, we can nd a pool of 2D FR systems that perform well under controlled settings.However, real-time situations require systems that should deal with uncontrolled settings and thus creating an open research gap.The major challenges found in the 2D FR systems refers to the high-dimensionality of the dataset, a wide range of image variations and the useless information in the images leading to misclassi cation.Such complexity makes necessary the use of optimization methods as bio-inspired algorithms to construct robust FR systems.Hence, the researchers from the whole world have been proposed many 2D FR approaches using bio-inspired algorithms.Hence, in this context, to map these approaches we proposed the present Systematic Literature Mapping.
As discussed in this review, bio-inspired algorithms have being applied in many tasks such as template matching, classi cation, feature selection, and parameters optimization.The last two applications, FS and PO, are the most used to improve 2D FR systems with 76.135% and 19.74% of the works, respectively.Among di erent bio-inspired algorithms discussed in this work, the use of PSO far exceeds the other algorithms and it is present in 73.41% of the analyzed works.The reasons for this might be due to its good performance, simplicity to implement and few parameters to tune.Furthermore, it is possible to note that bio-inspired 2D FR systems are trending towards the PSO algorithm which suggests a reliable choice for future applications, as well as the usage of the DWT technique for feature extraction and SVM for classi cation.A complete exposure of the most used databases is also presented in this work.Among them the top three are the FERET, the ORL and the Yale B databases.The possible future works can be conducted in analyzing the performance of the bio-inspired 2D FR systems.

Figure 1 :
Figure 1: Relevant works found for each ASE.

PSOFigure 2 :
Figure 2: Bio-inspired algorithms distribution in relation to the analyzed works

Figure 5 :
Figure 4: Most used feature extraction techniques

Table 1 :
Acronyms for the applications employed

4 Bio-inspired Algorithms
EC and SI algorithms are the focus of this review due to the large amount of applications in 2D Face Recognition.Besides that, the application of Neural Networks in FR systems has brought attention of the research community by emerging a signi cant number of works.Therefore, we believe that they should have their own review on application of Neural Networks.
. The DWT technique is the most employed (41.82%) mainly because of its

Table 2 .
The complexity of the databases is de ned by its size and categories such as pose, illumination, expressions and occlusion.The most used database is the FERET (representing 20.21% of the works) which contains 14,126 images with 512x768 pixels and 1,199 individuals varying pose, illumination and expressions Alsalibi et al.(2015).The next position is occupied by the ORL database with 17.55% and Yale B with 13.83% of the works.Yale B database is a well known database because of its several degrees of illumination and its cropped version of face images which is important to investigate the illumination variation problem.The remaining databases information are shown in references described in Table2.