Hybrid deep learning approach for nancial time series classi cation

This paper proposes a combined approach of two machine learning techniques for nancial time series classi cation. Boltzmann Restricted Machines (RBM) were used as the latent features extractor and Support Vector Machines (SVM) as the classi er. Tests were performed with real data of ve assets from Brazilian Stock Market. The results of the combined RBM + SVM techniques showed better performance when compared to the isolated SVM, which suggests that the proposed approach can be suitable for the considered application.


Introduction
Predicting the future is certainly one of the greatest ambitions of human beings.There is no perfect system that manages to do that, but it is possible to nd in the literature many attempts of doing so in several di erent contexts (Barrymore, John ;2017).Considering that nancial markets are environments that provide great opportunities, many studies have been developed in order to carry out predictions for this kind of application (Tkac and Verner;2016;Cavalcante et al.;2016).Most of these studies focus on prediction error minimization, supporting the task of predicting the best moment for buying or selling an asset.For that, models trained with historical data are used aiming to predict future behavior.
Many researchers Meesad and Rasel (2013); Nametala et al. (2016); Persio and Honchar (2016); Patel et al. (2015); Pimenta et al. (2014) have worked towards the improvement of already existing predictors.Many of these studies are based on recent machine learning techniques, such as Support Vector Machines (SVM) (Vapnik;1995), Arti cial Neural Networks (ANNs) (McCulloch and Pitts;1943) and Genetic Programming (GP) (Koza;1992).However, it is well known by the machine learning community that the high dimensionality of inputs usually leads to the reduction of computational performance and precision of the classi cation and regression results (Sá and Albertini;2014).Theoretically, it is intuitive to think that the bigger the amount of attributes, the more information would supposedly be available for the algorithm.However, the increase in data attributes makes them tend to becoming more sparse, generating situations that considerably impair training (optimal locations of the error function, for example).This di culty resulting from high-dimensional spaces is usually called curse of dimensionality (Liu and Motoda;2007).
In order to improve the precision of classi cation algorithms, it is recommended that a prior selection of features be carried out.
Nevertheless, this step usually strongly depends on a specialist in the studied application.This study proposes the use of a recent arti cial neural network, called Restricted Boltzmann Machine (RBM) (Hinton and Salakhutdinov;2006), to carry out the selection of features for the later application of a machine learning technique in the prediction of nancial time series (SVM).This approach of deep neural networks is a powerful tool in the area of machine learning and has been used in order to extract features, thus supporting in the reduction of dimensions and enabling the improvement of data classi cation (Hrasko et al.;2015).This kind of ANN is characterized by its capacity of learning internal representations and solving complex combinatorial problems (Cai et al.;2012).
The main contribution of this work was presenting a combination of RBM and SVM machine learning techniques to predict stock market asset trends.Five real data sets of the BM&FBOVESPA were used to validate the study.Comparisons were also made between combined techniques and SVM by itself, with the proposed approach generally presenting higher accuracy.
The remaining of this text is organized as follows: Section 2 describes the problem and some correlated studies; Section 3 presents a description of the RBM technique used; Section 4 describes the methodology that will be applied in Section 5, in which the results of the prediction of ve real and di erent nancial time series are presented.Finally, Section 6 presents our conclusion.

Problem De nition & Related Work
The base problem of this study, in general terms, is the prediction of variations in stock market asset prices.More speci cally, historical data on price and volume were assessed using technical analysis (Kirkpatrick and Dahlquist;2006) with the purpose of predicting price changes with the highest precision possible.
This section presents a general view of the basic concepts of nancial market and some correlated studies.

Financial Market
The nancial market is the place where people can negotiate (buy or sell) assets such as real state, goods and exchange.The purpose of the market is to gather many sellers in one single place, making them accessible to interested buyers.Markets are considered a vital part of any economy, since, as their movement increases, there are more opportunities for buyers to apply their resources and contribute to heating the economy (Neto;2009).
The stock exchange is the negotiation environment in which investors may buy or sell titles through direct negotiation, with or without the support of negotiation correspondents.In the case of the Brazilian stock exchange, the negotiation is done through brokers (Bússola do Investidor; 2017)1 .In Brazil, the role of the stock exchange is represented by BM&FBOVESPA (2017), who is the owner of two stock exchanges: BM&F, which focuses on the negotiation of agriculture and livestock products, and nancial instruments; and BOVESPA, which focuses on the negotiation of stocks and stock options.
In the stock market, the investor gains pro t by buying undervalued stocks and selling them at moments of higher value.The pro t of the investment is determined by the di erence between the buying and the selling price, adding bene ts such as dividends and discounting transaction fees (Bússola do Investidor; 2017).
Usually, in order to predict if the value of a stock will increase or not, analysts use two diagnostic mechanisms, the fundamental analysis and the technical analysis (Anghel; 2013): • Fundamental Analysis: in this kind of analysis, the references for the investor are parameters that de ne the nancial situation of the company, such as net pro t, level of indebtedness and distribution of dividends, among others.In summary, the fundamental analysis assumes that stocks have an intrinsic value which would correspond to its fair price.This price, in turn, would be determined by the income stream measured for the stock and e ectively distributed throughout a given period of time, discounting the present value.• Technical Analysis: this analysis focuses on information regarding stock price and buying/selling movement in a given period of time.This fact enables the projection of a trajectory or probable future changes in stock prices.Since this approach is applied in this study, it will receive greater focus.
Investors who use the technical analysis seek to identify possible trends, since this analysis assumes that these trends follow a cyclical pattern (Noronha;2003).This identi cation is usually based on chart patterns.
After some time, these chart patterns were translated into numerical or logical indicators that facilitate the automatic processing of time series in order to identify possible opportunities.Considering the innate complexity and dynamism of nancial market predictions, there is a constant debate regarding the possibility of predicting stock price changes.According to Moura (2006), the traditional analysis methods (technical and fundamental) are not capable of identifying the non-linear relations between the many variables that comprise the price of a stock and its movement upwards or downwards, leading to the need of using more advanced techniques.

Related Studies
Considering the scenario of uncertainties of the stock market, many studies were or are being developed to support in the prediction of trends in this market.Nelson and Pereira (2016) applied neural networks of the type Long Short-Term Memory (LSTM) for the prediction of stock price trends.A prediction model was created and a series of experiments were carried out using assets of BM&FBOVESPA.The results found were considered satisfactory, with an average accuracy of up to 55.9% in predicting whether the price of a given stock would rise or not in the immediate future.The model was also assessed under nancial perspectives, showing promising results regarding its return.
Franco and Steiner (2014) compared neural networks of the types Multi Layer Perceptron (MLP), Radial Basis Function (RBF) and Layer Recurrent Network (LRN) as techniques for predicting the future value of certain stocks in BM&FBOVESPA.A total of 496 closing prices in reverse auction for four assets were used in the data set in the period ranging from February 27th, 2012, to February 25th, 2014.The accuracy measure used for validation was the mean squared error between the values predicted by the RNAs and the real values.The best prediction technique was that of LRN, with error values of the order of 10 -11 .Zhu et al. (2014) implemented a decision-making support system for the buying and selling of assets.This system used Deep Belief Networks (DBN) to predict stock prices.The experiment used a set of 400 stocks from the S&P 500.This data set comprised 12 nancial indicators.The authors stated that the proposed system was capable of predicting stock prices and attaining high nancial performance.However, they showed that DBNs require a lot of time to be trained with historical data.For that reason, speed was an obstacle to the system.
Takeuchi and Lee ( 2014) developed an algorithm based on Restricted Boltzmann Machines (RBM) to extract latent features of Nasdaq assets.The authors used a data set of the period from 1990 to 2009.The results showed that the use of RBMs enabled the reduction of input data dimensionality and the extraction of important features to support in the prediction of future prices.
Based on the survey by Li and Ma (2010), it is possible to observe that many studies in the literature address the theme of predicting trends in nancial series but few of them use RBM.No studies could be found that combine SVM, RBM and technical analysis.Additionally, none of the studies related to RBM were applied to BM&FBOVESPA.
The next section presents the theoretical basis proposed in this study.

Theoretical Basis
This section addresses the main concepts required for the understanding of the proposed tool.

Restricted Boltzmann Machines (RBM)
The Restricted Boltzmann Machines (Hinton and Salakhutdinov;2006) are unsupervised learning neural networks.They are mainly characterized by their ability to learn internal representations and to solve complex combinatorial problems.
In general, the RBM is a stochastic network comprising two layers: a visible layer and a hidden layer.The layer of visible units represents the observed data and is connected to the hidden layer, which, in turn, must learn to extract features from these data.Originally, the RBM was developed for binary data, both in the visible and the hidden layers.Considering that there are problems for which it is necessary to process other data types, Hinton and Salakhutdinov (2006) proposed a Gaussian-Bernoulli RBM (CRBM), which uses normal distribution to model neurons in the visible layer.This study describes the basic concepts of the CRBM approach, considering that the inputs in this study are data of continuous type.
In RBM, the connections between neurons are bidirectional and symmetrical, which means that there is information tra c in both directions of the network.Besides, in order to simplify inference procedures, neurons of the same layer are not connected between themselves.Therefore, only neurons of di erent layers are connected, which explains why it is called restricted.
The RBM is an energy-based probabilistic model.That means that the joint probability distribution of the con guration (v,h) is achieved using Equations 1 and 2: Figure 1: RBM with 4 visible units and 3 hidden units.
The probability that the network attributes to a visible vector v is given by the sum of all the probabilities of the hidden vectors h, calculated by Equation 3: v,h;θ)   (3) As the RBM is restricted, the probability distributions of h given v and of v given h are described by Equations 4 and 5: In the CRBM version (Hinton and Salakhutdinov;2006), in which the visible layer is continuous and the hidden layer is binary, the conditional distributions are described by Equations 6 and 7: in which φ(x) = 1 1+e -x (logistic function) and N is a normal distribution, with mean v and standard deviation σ 2 , usually 1.
The purpose of the RBM is to estimate the values of the components of vector θ that cause the energy level of the network to decrease.Since p(v; θ) is the input data distribution, θ can be estimated by the maximization of p(v, θ) or, in an equivalent manner, log p(v, θ).Therefore, the descending gradient of log p(v, θ) regarding θ is calculated by Equation 8.
in which the components v i h j d and v i h j m are used to represent the computed expectations about the data and the model, respectively.
The estimation of v i h j d is obtained in a simple way by means of the conditional probabilities p(h j = 1|v; θ) and p(v i = v|h; θ).However, obtaining an estimate of v i h j m is much harder.This can be done by means of Gibbs sampling (Geman and Geman; 1984) using random data feeding the visible layer.
Still, this procedure may take a lot of time to achieve an adequate result.Fortunately, a quicker procedure called contrastive divergence (CD) was proposed by Hinton (2006).The idea behind this method is to feed the visible layer with training data and execute Gibbs sampling only once, which has been called reconstruction.
For the application of the CD algorithm, the rst step is to match the visible layer v 0 to the input data and, soon after, estimate the hidden layer h 0 using the conditional probability p(h j = 1|v; θ).With that, Then, based on h 0 , v 1 should be estimated using the conditional probability p(v i = v|h; θ).Similarly, based on v 1 , h 1 is estimated, again by p(h j = 1|v; θ).With that, vh T m = v 1 h T 1 .Finally, the set of parameters θ are updated as follows: considering that (W, a, b) are randomly initialized.The pseudocode of the CD algorithm is presented as follows in Algorithm 1: The parameters η, ρ and α are known as learning rate, weight decay and momentum.Hinton (2010) suggests η = 0.01, ρ = [0.01,0.0001] and α = 0.5 for an iteration lower than 5 and α = 0.9, in the opposite case.
The RBM has four hyperparameters: the amount of neurons in the visible layer (v), the amount of neurons in the hidden layer (h), the learning rate (lr) and the amount of cycles (ep).If the learning rate is too low, network learning is too low; if it is too high, it generates oscillations in the training and prevents the convergence of the learning process.Usually, its value varies from 0.1 to 1.0.The number of cycles is the number of times in which the training set is presented to the network.An excessive number of cycles can cause the network to lose its generalization power (over tting).On the other hand, with a Estimate hidden layer h 0 using the condit.probability p(h|v)) Estimate, based on h 0 , the visible layer v 1 using the equation p(v|h) 10: Estimate, based on v 1 , the hidden layer h 1 using the equation p(h|v) 11: Update θ using the updating equations described above 12: Return: θ training small number of cycles, the network may not be able to model the general behavior of the system (under tting) (Haykin;1998).

Support Vector Machines (SVM)
Support Vector Machines (SVM) are based on the theory of statistical learning, developed by Vapnik (1995) based on studies initiated in Vapnik and Chervonenkis (1971).This study establishes a series of principles that should be followed in order to obtain classi ers with a good generalization, which is de ned as their capacity to correctly predict the class of new data of the same domain for which the learning took place.The SVM machine learning algorithms have the purpose of determining decision limits that produce an optimal separation between classes through the minimization of errors.The SVMs stand out due to at least two characteristics: solid theoretical foundation and high performance in practical applications (Santos;2002).
In its basic form, SVMs are linear classi ers that separate data into two classes by means of a separating hyperplane.An optimal hyperplane separates data with the maximum margin possible, which is de ned by the sum of the distances between the positive points and the negative points that are closer in the hyperplane.These points are called support vectors and are circled in Figure 2.
The hyperplane is constructed based on prior training using a nite data set.
Assuming the training set {x i , y i }, y i ∈ {-1, 1}, x i ∈ R n where x i is the ith input element and y i is its respective class value for x i , i = 1,. ..,l.The calculation of the hyperplane with optimal margin is given by the minimization of w 2 considering the restrictions below: where w is the normal to the hyperplane.This is a quadratic optimization problem and may be converted to a dual problem, which depends only on the Lagrange multipliers α i : according to the restrictions of the linear equation: and the restrictions of the inequality: with the solution given by: Where N is the number of training examples.The elements that are closest to the hyperplane are called support vectors and are located in planes H1 and H2, as seen in Figure 2.These are the most important points, since they are the ones that de ne the classi cation margin of the SVM (Burges;1998).
For most real problems, the data set is not separable through a linear hyperplane and the calculation of support vectors using the formulations described above would not be applicable (Platt;1999).This problem may be solved by the introduction of margin expansion variables ξ i , which relax the restrictions of the linear SVM, allowing for some margin failures but also penalizing failures through the control variable C. The transformation of this optimization problem into its dual form only changes the restriction to: The SVM has some hyperparameters to be chosen: kernel function, gamma and cost.The kernel

Methodology
The methodology adopted in this work comprises ve steps:

Data Extraction
A historical data set of all assets of BM&FBOVESPA was extracted for the period from August 2014 to August 2015.These data were composed using daily candles.A candle represents the variation in the prices of a given asset in a given time unit (e.g., daily, weekly, monthly) (GrafBolsa; 2017).Figure 4 shows the time series of the ve assets of the period.
MetaTrader (2018) toolbox was used the to extract the dataset.This toolbox is a multi-asset platform that allows trading Forex, stocks and futures.It o ers superior tools for comprehensive price analysis, use of algorithmic trading applications (trading robots) and copy trading.

Transformation
Based on the input of the candles it is possible to assess the technical indicators.These indicators aim to support in the prediction of future market movements (Kirkpatrick and Dahlquist;2006).The assessment was made using a Java code, built by the authors, that communicates with the API TA-Lib (Technical Analysis Library).This API is capable of generating more than 100 technical indicators based on the candle set presented.
Although the technical indicators are of utmost importance, it was possible to observe that the amount of indicators generated by the API TA-Lib expressively increases the dimensionality of the data, which brings more complexity to the learning problem.Therefore, it was essential to adopt an approach to reduce dimensionality, besides generating latent features.

Dimensionality Reduction
Two dimensionality reduction approaches for learning problems stand out in the literature: selection of features and extraction of features Campos (2001).Selection, as the name implies, selects, according to a given criterion, the best subset within the original set of features.Extraction, in general terms, creates new features through transformations or combinations within the original set of features (Campos;2001).
In this step, the RBM is used in order to reduce the dimensionality of data.The implementation used in this study was the Restricted Boltzmann Machine with continuous-valued inputs (CRBM), from the library Deep learning library for node.js2 .The tool was adapted in order to provide, in addition to the output of the reduced technical indicators, the label (class) that indicates at that moment in time if the asset's price increased or not.
The policy of class attribution is based on the closing price at the following moment: Fayyad et al. (1996), classi cation is the process of nding a model (or set of functions) that describes and di erentiates classes or data concepts.With this model, it is possible to identify objects whose class is still not known.The model derived from the data is based on the analysis of the training set, that is, the set of data whose classi cation is previously known.This model can be represented in many ways.Usually, the classi cation is used to infer which class an object belongs to.

Classi cation
In this study, the Support Vector Machines (SVM) were used in this step.The implementation used in this step was the LibSVM, from the library Support Vector Machine for nodejs 3 .
The case study that applies this methodology is presented in the next section.

Results
In order to validate our proposed approach, we apply the methodology to actual data from the Brazilian Stock Market.The experiments used ve real data sets on assets of BM&F BOVESPA4 .The assets used are described in the Table 1.The column As describes the asset, the column NI presents the distribution of the class "(did) not increase", the column I presents the distribution of the class "increased" and, nally, the column Dim presents the dimension of the input data set.It is possible to observe that there is not a great unbalance between the classes.Thus, the accuracy measure is adequate to estimate performance (Pereira;2012).Besides, it can be seen that the dimension of each set is high, showing that solving this classi cation problem can become computationally complex and expensive.
The ve tested assets were: • VALE3 is the ticker symbol of the common stocks of Vale S/A, world leader in the production of iron • ABEV3 is the ticker symbol at Bovespa of the common stocks of Ambev S/A, the world's greatest beer manufacturer.9

Con guration of Algorithm Parameters
The hyperparameters of the algorithms were de ned using empirical tests.For the training, tests were performed with the following combinations to RBM and SVM, respectively, Equations 9 and 10.
Since the data are temporal, the validation of results was made in a structure of training and test, applying the concept of sliding window, according to which the training and test sets move with time.Therefore, at each step, the tool was trained with

Results Analysis
The accuracy was adopted in this work for performance assessment.It is evaluated as the quantity of positive and negative samples correctly classi ed divided by the total quantity of samples, such as shown in Equation 11.
in which: TP is the proportion of positive cases that were correctly identi ed.FP is the proportion of negatives cases that were incorrectly classi ed as positive.TN is de ned as the proportion of negatives cases that were classi ed correctly.FN is the proportion of positives cases that were incorrectly classi ed as negative.
The results found for the combination of RBM + SVM were compared to those achieved with SVM only, such as presented in Table 4.The proposed method (RBM + SVM) led to results ranging from 0.54 (VALE3) to 0.66 (USIM5), which is higher than the results obtained with SVM isolated, (0.51 to 0.61).The RBM+SVM association was able to outperform SVM only in four of the ve assets, being equivalent in the other one (ABEV3).These results support the assumption that RBM could improve classi cation results through adequate selection of problem features.
No statistical tests were carried out to check whether there is a signi cant di erence between the approaches as there are no points of randomness in the methods.The results found for all of the executions performed in each of the assets are the same.

Conclusion
This study aimed to explore the capability of a deep neural network, speci cally a Restricted Boltzmann Machine (RBM), to support in the prediction of trends at the BM&FBOVESPA.The results showed that this machine learning approach has the potential to reduce the dimensionality of the input data and extract latent features to be considered by the main classi er.This enables the generation of additional information and, consequently, supports the process of data classi cation.Four of the ve data sets used in this study presented better results with the combination of RBM and SVM.It could be seen that, in general, the proposed approach (RBM + SVM) achieves better results than the isolated classi er (SVM).Therefore, it can be concluded that the proposed approach is promising and may contribute to future studies on this type of application.
Another important contribution was the development of the tests of this solution with real BM&FBOVESPA data.There were more than 100 articles studied, only three used data from the Brazilian market.Comparing the results found in this work with the three papers that also used real data from BM&FBOVESPA.The proposed approach presented better performance, presenting an average accuracy rate of 59.8% An immediate future study recommendation would be to incorporate the developed solution in a negotiation model to assess nancial strategies and, thus, verify if the gains in accuracy rate lead to monetary gains.

Figure 2 :
Figure 2: Classi cation of a data set using a linear SVM.

Figure 3 :
Figure 3: Proposed methodology [a] extraction of historical data on assets; [b] transformation; [c] reduction of dimensionality and feature extraction; [d] classi cation; and [e] analysis of results.Figure 3 shows each step of the proposed method.The ow of these steps is detailed in the following subsections.

Figure 4 :
Figure 4: Evolution of prices of daily assets from August 2014 to August 2015.

Table 1 :
Data sets of BM&FBovespa

Table 4 :
Results of the accuracy of the experiments