A Matlab code to fit periodic data

This paper presents a computer method to find the best sine-based function, in the sense of least squares, to fit periodic data. Even though the least squares method is not a novelty, there is a void in the literature about its use to find trigonometric functions, particularly when it gives rise to nonlinear systems, as it is done in this article. The respective code, implemented in the Matlab programming language, is detailed and analyzed exploring experimental data from the air and soil temperatures measured along the year in earth air heat exchangers (EAHE) built in the facilities of a case study house in the Brazilian state of Rio Grande do Sul. The fitting curves have been employed by the authors in different works to define boundary conditions to study new computer models of EAHE.


Introduction
To face the increasing energy consumption, it is necessary to consider not only alternative/renewable energy sources and devices, but also to review the building construction models. As a remarkable example, in the summer of 2014, there was a change in the electrical energy demand peak in Brazil. It occurred at noon, instead of late afternoon, due to a heavy use of air conditioners during a heat wave [1].
In this context, the Earth Air Heat Exchangers (EAHE) stand out as a natural solution to improve air conditioning at low energy cost. In general, EAHE are composed by a system of fans and buried ducts. The fans blow the air inside the ducts, where it exchanges heat with the soil and returns at a milder temperature [2,3,4].
All this happens because the soil acts as a seasonal thermal reservoir, storing heat during summer and releasing it to the atmosphere along the winter. Hence, in the winter, the soil is warmer than the external air, what allows heating the air inside buried ducts. On the contrary, in the summer, the soil is colder than the environment air, reducing the air temperature inside the ducts [2].
On the other hand, to develop new computer models for EAHE it is necessary to know experimentally the air and soil temperatures of a given place. In particular, such states as the Rio Grande do Sul (RS) in Brazil, where the four seasons are well defined, it is important to measure this data for at least one year, as it is done in [2].
After plotting the results in a graph, it is noticeable that the annual temperatures in RS are periodic following the pattern of a translated sine (or cosine) curve in the interval I = [0, 2π]. Thus, the year begins in summer when the environment air temperatures reach their peak. Between March and June, the temperatures drop slowly (which characterizes the fall) to reach a minimum between June and September (in the winter). From September to December the temperatures go rising again (in the spring), and the year ends as it started with high summer temperatures. The soil temperatures follow an analogous behavior to the air ones, although there is a lag between them which depends on the soil depth.
Given that, it is very natural to look for the "best" sine-based function which represents (models) the air or soil temperatures along the year. To find it, this work employed the experimental data from [2], where the temperatures were measured daily, hour by hour. However, choosing to adopt the term "best" in the sense of least squares, the authors found a gap in classical literature [5][6][7][8][9][10] concerning the expected trigonometric fitting.
To be more specific, there is a lack of texts exploiting the problem of finding a real continuous function f(x) = a sen(bx+c)+d (with a, b, c, and d being real constants) to fit, by the least squares method, a set of n ordered pairs U={(x i , y i ), i = 1, 2, …, n}. Despite using a trigonometric function, instead of a high order polynomial, brings the drawback of solving a non-linear system of equations to find the coefficients a, b, c, and d, this research found the Newton method very effective to circumvent this matter in all the simulated cases.
In general, this paper addresses two objectives: fill a void in the literature about trigonometric fitting; illustrate its application based on a computer code which can help the study and simulation of EAHE.

Least Squares Method
Considering a set of ordered pairs {(x i , y i ), i = 1, 2, …, n}, relative to the temperatures values y i measured at n regular time intervals x i , one can find a representative continuous function f to fit this data, as in Figure 1, by minimizing the sum of the squared differences between y i and f(x i ). From the discussions presented in [2,3], to solve the differential equations that model the temperature inside EAHE, it is very convenient to use sine based functions in the form f(x) =asen(bx+c)+d, where a, b, c and d are real constants, to represent boundary conditions. It is natural to use these functions because they are periodic and their pattern is similar to the analytical solutions for the equations in simplified forms [3].
Therefore, by the least squares method, the constants are chosen minimizing the error function: ( As it is demonstrated in [11], the minimum value of this function occurs in the critical point (a, b, c, d) where all partial derivatives are zero. Computing these derivatives, one finds: Therefore, assuming a ≠ 0, to discover the minimum value of the error function, it is necessary to find a, b, c, and d satisfying the non-linear system of equations: ∑ [a sen(bx i +c)+d - cos(bx i +c) = 0, To solve these equations in this work, it was adopted the Newton method, which is reviewed next.

Newton Method
Given a differentiable vector field it is possible [12,13] to approximate F, on the neighborhood of a point ξ 0 = (a 0 , b 0 , c 0 , d 0 ), by the linear vector field: Therefore, one can approximate the solution of F(a, b, c, d) = 0 (which composes a non-linear system of equations) by solving alternatively L(a,b,c,d) = 0, which is equivalent to solve the linear system: In matrix form, the equations (12)- (15), can be written as: ( The matrix of partial derivatives, which is repeated in both sides of equation (16), is known as the Jacobian matrix, J, of the vector field F. Thus, the equation (16) can be written simply as: where ξ = (a, b, c, d), and, as before, ξ 0 = (a 0 , b 0 , c 0 , d 0 ).
Following these formulations, an approximation to a root, x r , of a vector field F can be found iteratively by choosing a suitable initial approximation ξ 0 and computing the successive approximations: for k = 0, 1, 2, …, etc. These resolutions should continue until reaching a maximum number of iterations or a convergence is achieved, that is, the difference in a chosen norm between successive solutions ξ k and ξ k+1 is under a specified tolerance. For this work, it was adopted the infinity norm [6].
This presented technique is called the Newton Method [5,6]. In fact, to reduce computational costs, the matrix-vector multiplications in equation (18) can be avoided by introducing the variable t = ξ k+1 -ξ k . In this way, the algorithm for the Newton method is divided in two parts: 1. Find t by solving equation J(ξ k ) t = -F(ξ k ), 2. Compute ξ k+1 = ξ k + t.
Under certain conditions the Newton method can converge "very fast" or not converge at all. Particularly, the convergence is very dependent on the initial approximation, which must be chosen carefully. Following the guidelines proposed in Section 3, for all the cases simulated in this research, the Newton method converged in less than five iterations.
Before proceeding, it is important to present the 4 by 4 Jacobian matrix, J, relative to the set of equations (6)-(9), which were obtained from the least squares methods. The elements, J ij , corresponding to the i-th row and j-th column of J, are given by: [2a sin(bx i +c)+d-y i ], J 22 = ∑ x i 2 {a cos 2 (bx i +c)-[a sen(bx i +c)+d-y i ] sen(bx i +c) J 23 = ∑ x i {a cos 2 (bx i +c)-[a sen(bx i +c)+d-y i ] sen(bx i +c) J 32 = J 23 , J 33 = ∑ {a cos 2 (bx i +c)-[a sen(bx i +c)+d-y i ] sen(bx i +c)

Commented Matlab Code
To show how this methodology was implemented computationally, this subsection presents a corresponding code written in the Matlab (R2008a) programming language [14]. For those not used to Matlab, it is worth to read the following points.
 Anything written to the right of the symbol % is taken as a commentary.
 A very long command statement can be continued to the next lines using ellipsis (…).  It is possible to make operations for all the elements of an array at once. For instance, if x and y are two n×1 arrays, and k is a constant, the following statements: abs(x), k*x, x.^2, x.*y, return n×1 arrays with, respectively, the absolute value of all elements of x, all the elements of x multiplied by k, all the squares of x, an element by element product of x and y.  A linear system Ax=b, where A and b are, respectively, n×n and n×1, arrays, can be solved simply by the command statement x=A\b;  As usual, Matlab counts with a conditional if-else construction and also a loop while. Even though the later has a syntax while(condition) do commands end, the loop terminates not only when the specified condition holds false, but also when a break statement is executed.
Besides, it is worth to say that there is a high correlation coefficient, of 0.8693, between the fitted curve and the experimental data. Figure 2: Air temperature at the inlet of an EAHE duct As it was mentioned in the methodology, the Newton method converged fast, that is, in less than five iterations for all cases tested. However, it should be pointed that this depends on the quality of the initial approximation (a 0 , b 0 , c 0 , d 0 ) which can be fairly estimated from a previous analysis of the experimental data.
In this regard, computing an average of the temperatures along the summer, A s , and along the winter, A w , it is possible to estimate the amplitude term by a 0 = (A s -A w )/2. Certainly, there are other valid ways to estimate the same term. For instance, taking a look at the Figure 2, one can note the peaks of temperature during summer, in average, are around 28 o C. In the same way, the average minimum temperatures during winter are close to 13 o C. Therefore, it is reasonable to assume an amplitude a 0 ≈ (28-13)/2 = 7.5.
As for the term d 0 , it represents the annual average temperatures. Thus, it can be calculated taking the mean value of the experimental data. A simple way to do it in Matlab is using the function mean(), whose usual syntax is mean(A), which computes the average or mean value of a given array A. It is worth to mention that one estimate can also be deduced from Figure 2. The value of d 0 is approximately an average between the maximum summer temperatures and the minimum winter ones, in other words, d 0 ≈ (28+13)/2 = 20.5.
Finally, to estimate the terms b 0 and c 0 , it was adopted the following procedure. For the external air, it is known empirically that the peak of summer temperatures occurs, usually, at the beginning of February (about forty days after the year start). On the other hand, the minimum winter temperatures occur in July (about two hundred and ten days after the year start). Since the sine function achieves its maximum value at the point π/2 and its minimum at 3π/2, it is sufficient to make a correspondence among, respectively, the days x = 40 and x = 210, and the points π/2 and 3π/2. Thus, b 0 and c 0 are found solving the linear system: whose solution is b 0 ≈ 0.0185 and c 0 ≈ 0.8316.
All these values are presented in the Matlab code above. In fact, running the program, it was obtained the following function to represent the temperature at the air inlet of the EAHE duct: T(x) = 20.4967 + 5.6634 sin( 0.0178x + 0.9787 ).
(37) Therefore, except for the amplitude term, all the initial approximations were fairly precise.
On the other hand, one should highlight that the least squares and Newton methods are robust enough to compute the coefficients even when the initial approximations are rather rough. For instance, using the same previous estimates, it was obtained the following function to represent the temperature at the air inlet of other EAHE duct: T(x) = 21.7895 -5.9614 sin( -0.0184x + 5.3991 ), which is very different than the one in (37).
The Figure 3 displays a graph comparing the fitting curve with the experimental data for the temperature at this second inlet. As before, it is possible to see that the sine curve also matches the behavior of the temperature along the year. Besides, for this case, the correlation between the fitted curve and the experimental data is 0.8788, which is higher than the previous result.
This methodology was also employed to obtain the temperatures in the soil for different depths. These results were obtained from numerical data available using the EAHE model validated in [15]. In the Figure 4, there is a graph comparing the temperature curves, at the depth of 4 m, for the numerical model and the fitted one. In this case, the correlation between them is 0.9976. In fact, the curves fitted from the least squares method were used by the authors to construct new EAHE models presented in the articles [3], [4] and [15].  Before concluding this section, it may be worth to note that all simulations were run in a computer with the following specifications: 64-bit processor, Intel Core i5, 2.53 GHz, and 3GB of RAM.

Conclusions
This paper aimed to present a computer methodology to find fitting curves which can help in the development of new EAHE models. For this purpose, it was developed a computer code following a methodology based on the least squares and Newton methods. Such code was written in the Matlab programming language.
For this research, it was used several empirical data relative to the temperature of air and soil measured in the facilities of an experimental house in the Brazilian state of Rio Grande do Sul. As it was verified, the method allowed finding a sine-based function which naturally modeled the phenomena.
Even though the proposed methodology is relatively more complex than using polynomial fitting, due to the need of solving non-linear systems of equations, it was observed a fast convergence in the solutions. This was the result of a previous analysis to find suitable initial approximations. In fact, even facing poor initial approximations, the method was robust enough to converge for highly correlated results.
Finally, this work also helps to fill a void in the literature about the use of the least squares methods with trigonometric functions.