Sieve methods for non-stationary time series
Xiucai Ding (xcading@ucdaivs.edu, xiucaiding89@gmail.com) and Chenyang Zhang
Department of Statistics, UC Davis
This package provides tools for analyzing non-stationary time series using sieve methods. We use three popular sieve basis in our methods: Fourier series, orthogonal polynomials and orthogonal wavelets. We consider both linear AR and nonlinear AR approximations or regressions. For AR approximation, we expand the AR coefficients using the sieve basis. We provide visualization of local partial auto-correlation function (LPACF) and cross-validation methods for parameter tuning and model selection. We also propose a bootstrap method to test stability of series, i.e., constancy of the coefficients. For the nonlinear AR regression, we expand both time and series itself as coefficients using the sieve basis. We provide visualization of cross-validation methods for parameter tuning and model selection. We also propose similar bootstrap methods to finish several inferences on the structure, containing the exact form test, simultaneous confidence region(SCR), time-homogeneity test and separability test. The last two tests are based on the SCR results.
For a times series $x_{1}, x_{2},\dots, x_{n} \in \mathbb{R}$, denote by $\Omega_n:=[\text{Cov}(x_{1},\dots, x_{n})]^{-1}$ the precision matirx, then the estimation amd inference of $\Omega_n$ are very important in time series analysis. While $\Omega_n$ is a $n \times n$ matrix, it is not convinient to estimate the covariance matirx and then invert it if $n$ is large. Therefore, we utilize the Cholesky decomposition technique to denote our series as a least squares linear regressions, that is $$x_i = \sum_{j=1}^{i-1}\phi_{ij}x_{i-j} + \epsilon_i, \ i=2,\dots, n$$ Then the precision matrix could be estimated as $$\Omega_n = \Phi^* \widetilde{D} \Phi$$ where the diagonal matrix $\widetilde{D} = \text{diag}\{\sigma_1^{-2}, \dots, \sigma_n^{-2}\}$, $\sigma_i^{2}$ is the variance of error $\epsilon_i$, and $\Phi$ is a lower triangular matrix having ones on its diagonal and $−\phi_{ij}$ at its $(i, i − j)$−th element for $j < i$. Then we just need to estimate the coefficients and error variance of this OLS regression. But it still remains a problem that is there are too many(totally $\frac{n(n+1)}{2}$) parameters to be estimated. Therefore, we use the sieve methods to help us finish the dimension reductions and estimations.
Based on the assumptions of local stationarity and short range dependence on $x_{1}, x_{2},\dots, x_{n}$, we could state two things for estimation of $\phi_{ij}$. For some small constant $b$, the first is that there exist some smooth functions $\phi_j(\cdot)$ on $[0, 1]$, s.t. $\sup_{i>b}|\phi_{ij} - \phi_j(\frac{i}{n})| = o(n^{-1/2})$ and secondly, we can treat $\phi_{ij}$ as 0 whenever $j > b$. Then we use sieve methods to estimate $\phi_j(\frac{i}{n})$ as, $$\hat{\phi_j(\frac{i}{n})} := \sum_{k=1}^c a_{jk} \alpha_k(\frac{i}{n})$$ where $\{\alpha_k(t)\}$ is a set of pre-chosen orthogonal basis functions on $[0, 1]$, and $c$ is the number of basis functions. In this simulation project, we mainly use three basis functions, Fourier series, orthogonal polynomials and orthogonal wavelets.
The Fourier basis of order $J_n$ could be expressed as: $$\text{Four}(J_n) = \{a_0 + \sum_{k=1}^{J_n}[a_k \cos(2k\pi x) + b_k \sin(2k\pi x)]\}, \ x \in [0, 1]$$ The classical Legendre polynomials of order $J_n$, which is one of the most frequently used orthogonal polynomials, could be expressed as: $$\text{Legendre}(J_n) = \{\sum_{k=1}^{J_n}\frac{1}{2^n n!}\frac{\mathrm{d}^n}{\mathrm{d} x^n}(x^2-1)^n\}, \ x \in [-1, 1]$$ The Wavelet basis of order $2^{J_n}$ could be expressed as: $$\text{Wav}(2^{J_n}) = \{\sum_{k=0}^{2^{j_0}-1}\alpha_{j_0 k}\phi^{*}_{j_0 k}(x) + \sum_{j=j_0}^{J_n-1}\sum_{k=0}^{2^j-1}\beta_{j k}\psi^{*}_{j k}(x)\}, \ x \in [0, 1]$$ where, $$\phi^{*}_{j k}(x) = 2^{j/2} \sum_{l \in \mathbb{Z}}\phi(2^j x + 2^j l - k), \ \psi^{*}_{j k}(x) = 2^{j/2} \sum_{l \in \mathbb{Z}}\psi(2^j x + 2^j l - k), \ x \in [0, 1]$$ and $\phi$ is a father wavelet and $\psi$ is a mother wavelet.
Then we could rewrite the best-linear forecasting regression as: $$x_i = \sum_{j=1}^b \sum_{k=1}^c a_{jk} \alpha(\frac{i}{n}) x_{i-j} + \epsilon_i, i = b + 1, \dots, n$$ Moreover, we define $\boldsymbol x = (x_{b+1},\dots,x_n)^* \in \mathbb{R}^{n-b}$, $\boldsymbol x_i = (x_{i-1},\dots,x_{i-b})^* \in \mathbb{R}^{b}$, $\boldsymbol X = (\boldsymbol x_{b+1},\dots,\boldsymbol x_{n}) \in \mathbb{R}^{b \times (n-b)}$. We also define $\boldsymbol b(t) = (\alpha_1(t), \dots, \alpha_c(t))^* \in \mathbb{R}^{c}$, $E_i \in \mathbb{R}^{(n-b) \times (n-b)}$ s.t. $(E_i)_{st} = 1$, when $s=t=i-b$ and $(E_i)_{st} = 0$ otherwise. Then the transpose of the design matirx $Y^*$ could be writen as: $$Y^* = \sum_{i=b+1}^n \Big(\boldsymbol X \otimes \boldsymbol b(\frac{i}{n})\Big) E_i$$ And the OLS estimator for $\boldsymbol \beta \in \mathbb{R}^{bc}$ is: $$\hat{\beta} = (Y^*Y)^{-1}Y^* \boldsymbol x$$ We decompose the $\hat{\beta}$ into $b$ blocks by denoting $\hat{\beta} = (\hat{\beta}_1^*, \dots, \hat{\beta}_b^*)^*$, where each $\hat{\beta}_j \in \mathbb{R}^{c}$, then we can express our estimate $\hat{\phi_j(\frac{i}{n})}$ as: $$\hat{\phi_j(\frac{i}{n})} = \hat{\beta}_j^* \boldsymbol b(\frac{i}{n})$$
Another small problem remaining is how to decide the parameter $b, c$ in the previous methods. We propose an intuitive method using visualization to choose $b$. We try $b = 1, 2, \dots, 10$, and for each model with different $b$, we draw the plot of $\hat{\phi_b(\frac{i}{n})}, i = b + 1, \dots, n$. Then for $b > b^*$, where $b^*$ is the most appropriate parameter, the plot should be around 0 without obvious pattern. That is, we could see significant differences for $b > b^*$ and $b \leq b^*$.
For choosing the parameter $c$, we use a cross validation method. That is for a given integer $l$, say $l = \lfloor 3 \log_2n \rfloor$, we dived the series into the training part $\{x_i\}_{i=1}^{n-l}$ and the validation part $\{x_i\}_{i=n-l+1}^{n}$. For some alternative pairs $\{(b_i, c_j)\}, \ i=1, 2,\dots, u, \ j = 1, 2, \dots, v$. (These pairs are around a initial guess or intuition.) We fit a previous sieve model with $b=b_i$ and $c=c_j$ based on the training set and using the fitted model to predict the series in validation set for each pair $(b_i, c_j)$. We denote the prediction series as $\hat{x}_{n-l+1,ij}, \hat{x}_{n-l+2,ij}, \dots, \hat{x}_{n,ij}$ for each pair $(b_i, c_j)$. Then we could define the sample $MSE$ of prediction for pair $(b_i, c_j)$ as: $$MSE_{ij} = \frac{1}{l}\sum_{k=n-l+1}^n (x_k - \hat{x}_{k,ij})^2$$
We could select the pair $(b_i, c_j)$ with the minimum sample $MSE$.
Moreover, we provide visualization method to help users in our module, that is we plot a line chart of $MSE$ corresponding to different $c$ when $b = 1, 2, \dots, 10$. And in practice, we could combine the results with the previous visualization method. We could find there usually exist a significant elbow point $c^*$ for $b=b^*$. And we recommend this elbow point $c^*$ or $c^* + 1$ due to the Occam's Razor.
We also want to make more inferences on the structure of $\phi_j(\cdot)$. One of these tests is the test of stability for our estimation, or in other words, the null hypothesis we want to test is: $$\boldsymbol H_0: \phi_j(\cdot) \ \text{is a constant function on} [0,1], j = 1, 2, \dots, b^* $$ Based on the previous regression model, we propose our statistic $T$ as: $$T = \sum_{j=1}^{b^*} \int_{0}^{1}(\hat{\phi_j(t)} - \overline{\hat{\phi_j}})^2 dt, \overline{\hat{\phi_j}} = \int_{0}^{1}\hat{\phi_j(t)}dt$$ The heuristic is that $\boldsymbol H_0$ is equivalent to $\hat{\phi_j(t)} = \overline{\hat{\phi_j}}$ for $j = 1, 2, \dots, b^*$, however, it is difficult to directly estimate the distribution of $T$. Therefore, we use a high-dimensional multiplier bootstrap statistics to mimic the distribution of $n T$.
For some positive integer $m$, we propose a sample statistic $\Phi$ to mimic the precision matirx $\Omega$ as: $$\Phi = \frac{1}{\sqrt{n-m-b^*+1}\sqrt{m}} \sum_{i=b^*+1}^{n-m}\Big[\Big(\sum_{j=i}^{i+m}\boldsymbol h_j\Big)\otimes\Big(\boldsymbol b(\frac{i}{n})\Big)\Big]R_i$$ where $R_i$ are i.i.d standard Gaussian random variable, and $\boldsymbol h_i = \boldsymbol x_i \hat{\epsilon_i} \in \mathbb{R}^{b^*}$.($\boldsymbol x_i, \boldsymbol b(t)$ are defined same with the previous estimation part).
We also define $\overline{b} = \int_0^1 \boldsymbol b(t) dt$, $w = I - \overline{b}\ \overline{b}^* \in \mathbb{R}^{c \times c}$, $\boldsymbol W$ is a $b^*c \times b^*c$ dimensional diagonal block matrix with diagonal block $w$, and $\hat{\Gamma}:= \hat{\Sigma}^{-1}\boldsymbol W \hat{\Sigma}^{-1}$ with $\hat{\Sigma} = \frac{1}{n}Y^*Y$. With these notations we could denote our bootstrap quadratic form variable as: $$\hat{\tau} := \Phi^* \hat{\Gamma} \Phi$$ The distribution of $\hat{\tau}$ could mimic that of $n T$ asymptotically.
Based on the previous discussion, the process of bootstrap method to test $H_0$ is to generate $M$(say 1000) i.i.d copies of $\{\Phi^{(k)}\}_{k=1}^M$ and compute $\{\hat{\tau}^{k}\}_{k=1}^M$ correspondingly. Then sort them and generate the order statistics $\hat{\tau}_{(1)} \leq \hat{\tau}_{(2)} \leq \dots \leq \hat{\tau}_{(M)}$. Reject $H_0$ at the level $\alpha$ if $n T > \hat{\tau}_{(\lfloor M(1-\alpha) \rfloor)}$, where $\lfloor x \rfloor$ denotes the largest integer smaller or equal to $x$. Let $M^* = \max{r: \hat{\tau}_{(r)} \leq n T}$, then the $p$-value is $1- \frac{M^*}{M}$
For the problem of how to determine the parameter $m$(a postive integer), we use the minimum volatility(MV) method to choose the window size $m$. That is we define the covariance structure $\hat{\Omega}$ as: $$\hat{\Omega} := \frac{1}{(n-m-b+1)m} \sum_{i=b+1}^{n-m}\Big[\Big(\sum_{j=i}^{i+m}\boldsymbol h_j\Big)\otimes\Big(\boldsymbol b(\frac{i}{n})\Big)\Big] \times \Big[\Big(\sum_{j=i}^{i+m}\boldsymbol h_j\Big)\otimes\Big(\boldsymbol b(\frac{i}{n})\Big)\Big]^*$$ Then, it desires to minimize this covariance structure in a suitable range of candidate $m$'s. In practice, we usually choose a sequence of window size $m_{-h_0 + 1} < \dots < m_1 < m_2 < \dots < m_{n_0} < \dots < m_{n_0 + h_0}$ for some given large value $m_{n_0}$ and neighborhood control parameter $h_0 > 0$(we usually select $h_0 = 3$). Then for any $m_j, j = 1, 2, \dots, n_0$, we calculate the matirx norm error of $\hat{\Omega}_{m_j}$ in the $h_0$-nighborhood, i.e., $$se(m_j) := se(\{\hat{\Omega}_{m_{j+k}}\}_{k=-h_0}^{h_0}) = \Big[\frac{1}{2h_0}\sum_{k=-h_0}^{h_0}||\bar{\hat{\Omega}}_{m_j}-\hat{\Omega}_{m_{j+k}}||^2\Big]^{1/2}$$ where $\bar{\hat{\Omega}}_{m_j} = \sum_{k=-h_0}^{h_0} \hat{\Omega}_{m_{j+k}} /(2h_0 + 1)$. Therefore, we choose the estimate of $m$ using $$\hat{m} := \arg\min_{m_1 \leq m \leq m_{n_0}} se(m)$$
Now we use the following simple AR(2) model to simulate by our methods: $$x_i = 0.6\sin(2\pi\frac{i}{n}) x_{i-1} + 0.4\cos(2\pi\frac{i}{n}) x_{i-2} + \epsilon_{i}$$
from Time_Series_Sieve import *
from tqdm import trange
def series_generator_example(n):
np.random.seed(2020)
x = list(np.random.randn(2))
for i in range(2, n):
x.append(0.6 * math.sin(2 * np.pi * (i + 1) / n) * x[-1] + 0.4 * math.cos(2 * np.pi * (i + 1) / n) * x[-2] + np.random.normal())
return x
n = 1000
b = 2
c = 5
true_series = [[0.6 * math.sin(2 * np.pi * (i + 1) / n) for i in range(b, n)], [0.4 * math.cos(2 * np.pi * (i + 1) / n) for i in range(b, n)]]
series = np.array(series_generator_example(n))
Firstly, we use Legendre polynomials, and try to choose the appropriate $b, c$
sieve_obj = SieveOLS(series, 'legendre', b, c)
sieve_obj.choose_b_intuitive()
sieve_obj.cross_validation_bc()
From the first two plots, we could easily find we should choose $b = 2$, and from the third plot, we could find the elbow point is $c=4$. Therefore, we could fit the model using $b = 2$ and $c = 4 \ or \ 5$. We use the function in our module to plot the comparison of estimation $\hat{\phi_j(\frac{i}{n})}$ and true parameters $\phi_{ij}$
b = 2
c = 4
sieve_obj = SieveOLS(series, 'legendre', b, c)
sieve_obj.versus_plot_true(true_series)
b = 2
c = 5
sieve_obj = SieveOLS(series, 'legendre', b, c)
sieve_obj.versus_plot_true(true_series)
The estimation is closed to the estimation. Then we try using different basis functions and repeat the previous process
Fourier series:
c = 5
sieve_obj = SieveOLS(series, 'triangle', b, c)
sieve_obj.choose_b_intuitive()
sieve_obj.cross_validation_bc()
Choose $b = 2, c = 3$
c = 3
sieve_obj = SieveOLS(series, 'triangle', b, c)
sieve_obj.versus_plot_true(true_series)
Daubechies wavelets:
c = 2
sieve_obj = SieveOLS(series, 'db14', b, c)
sieve_obj.choose_b_intuitive()
sieve_obj.cross_validation_bc()
Choose $b = 2, c = 2$
c = 2
sieve_obj = SieveOLS(series, 'db14', b, c)
sieve_obj.versus_plot_true(true_series)
Symlets wavelets:
sieve_obj = SieveOLS(series, 'sym14', b, c)
sieve_obj.choose_b_intuitive()
sieve_obj.cross_validation_bc()
Choose $b = 2, c = 1, 2$
c = 1
sieve_obj = SieveOLS(series, 'db14', b, c)
sieve_obj.versus_plot_true(true_series)
c = 2
sieve_obj = SieveOLS(series, 'db14', b, c)
sieve_obj.versus_plot_true(true_series)
Coiflets wavelets:
sieve_obj = SieveOLS(series, 'coif5', b, c)
sieve_obj.choose_b_intuitive()
sieve_obj.cross_validation_bc()
Choose $b = 2, c = 1, 2$
c = 1
sieve_obj = SieveOLS(series, 'db14', b, c)
sieve_obj.versus_plot_true(true_series)
c = 2
sieve_obj = SieveOLS(series, 'db14', b, c)
sieve_obj.versus_plot_true(true_series)
We use a simple AR(2) model with constant coefficients to simulate: $$x_i = 0.3 x_{i-1} + 0.1 x_{i-2} + \epsilon_{i}$$
def series_generator_example2(n):
np.random.seed(2023)
x = list(np.random.randn(2))
for i in range(2, n):
x.append(0.3 * x[-1] + 0.1 * x[-2] + np.random.normal())
return x
true_series2 = [[0.3 for i in range(b, n)], [0.1 for i in range(b, n)]]
series2 = np.array(series_generator_example2(n))
c = 5
sieve_obj2 = SieveOLS(series2, 'triangle', b, c)
sieve_obj2.stability_test(b, c)
('not reject', 0.992)
sieve_obj2 = SieveOLS(series2, 'legendre', b, c)
sieve_obj2.stability_test(b, c)
('not reject', 0.984)
c = 2
sieve_obj2 = SieveOLS(series2, 'db16', b, c)
sieve_obj2.stability_test(b, c)
('not reject', 0.995)
And now we try 1000 trials, under $\alpha = 0.05$
def series_generator_single(data, n):
x_1, x_2 = data[0], data[1]
yield x_1
yield x_2
for i in range(2, n):
x_1, x_2 = x_2, 0.3 * x_2 + 0.1 * x_1 + data[i]
yield x_2
def series_generator_matrix(n, m):
np.random.seed(2022)
x = np.random.randn(m, n)
return np.array([list(series_generator_single(x[i], n)) for i in range(m)])
n = 1000
m = 1000
series_matrix = series_generator_matrix(n, m)
def bootstrap_trial(data, func_type, b, c, alpha=0.05):
test_list = []
with trange(len(data)) as pbar:
for i in pbar:
test_list.append(SieveOLS(data[i], func_type, b, c).stability_test(b, c))
simulated_type_I_error = sum([_[0] == 'reject' for _ in test_list]) / len(test_list)
pbar.set_description("Simulated Type I Error={}".format(simulated_type_I_error))
return test_list
c = 5
test_list1 = bootstrap_trial(series_matrix, 'legendre', b, c)
Simulated Type I Error=0.066: 100%|██████████████████████████████████████████████| 1000/1000 [1:04:57<00:00, 3.90s/it]
test_list2 = bootstrap_trial(series_matrix, 'triangle', b, c)
Simulated Type I Error=0.048: 100%|██████████████████████████████████████████████| 1000/1000 [4:27:14<00:00, 16.03s/it]
c = 2
test_list3 = bootstrap_trial(series_matrix, 'db16', b, c)
Simulated Type I Error=0.043: 100%|██████████████████████████████████████████████| 1000/1000 [5:53:34<00:00, 21.21s/it]
We could find that all the simulated Type I Error are around $\alpha = 0.05$
In many scientific endeavors, researchers are interested in the relationships between multi-variables, and these relationships might follow some specific functional form, not linear. Therefore, we extend our previous model to a time-inhomogeneour nonlinear time series regression, that is: $$Y_i = \sum_{j=1}^r m_j(t_i, X_{j, i}) + \epsilon_i$$ where $r>0$ is some given fixed integer, $\{X_{j, i}\}$ are covariates and usually are $\{X_{i-j}\}$, $t_i = \frac{i}{n}$, and $\{\epsilon_i\}$ are the noisy terms.
Then, if under some smoothness condition, we could use the similar sieve basis to simulate $m_j(t, x), 1\leq j\leq r$, we denote $$m_{j,c,d}(t, x) = \sum_{l_1 = 1}^c\sum_{l_2=1}^d \beta_{j, l_1, l_2} b_{l_1, l_2}(t, x)$$ where $\{b_{l_1, l_2}(t, x)\}$ are the sieve basis functions similar to the previous model, and $\{\beta_{j, l_1, l_2}\}$ are the coefficients. But the previous basis we use are all defined on $[0, 1]$, and now the domain for $x$ in $m_j(t, x)$ are usually unbounded. Therefore, we use some mapping method to map the value of $x$ into $[0, 1]$. We list four mappings we use in our methods, the first two map $\mathbb{R}$ to $[0,1]$ and the last two map $\mathbb{R}^+$ to $[0,1]$, all these mapping functions contain a positive scaling factor $s$.
1.The algebraic mapping from $\mathbb{R}$ to $[0,1]$: $$u(x, s) = \frac{x}{2\sqrt{x^2 + s^2}} + \frac{1}{2}$$ $$u'(x, s) = \frac{s^2}{2(x^2 + s^2)^{3/2}} $$
2.The logarithmic mapping from $\mathbb{R}$ to $[0,1]$: $$u(x, s) = \frac{\tanh (\frac{x}{s})+1}{2} = \frac{e^{s^{-1}x} - e^{-s^{-1}x}}{2(e^{s^{-1}x} + e^{-s^{-1}x})} + \frac{1}{2}$$ $$u'(x, s) = \frac{1 - \tanh^2 (\frac{x}{s})}{2s}$$
3.The algebraic mapping from $\mathbb{R}^+$ to $[0,1]$: $$u(x, s) = \frac{x-s}{2(x+s)} + \frac{1}{2}$$ $$u'(x, s) = \frac{s}{(x + s)^2} $$
4.The logarithmic mapping from $\mathbb{R}^+$ to $[0,1]$: $$u(x, s) = \tanh (\frac{x}{s})$$ $$u'(x, s) = \frac{1 - \tanh^2 (\frac{x}{s})}{s}$$
We still use $\{\phi_i(\cdot)\}$ to denote the sequences of orthonormal sieve basis on $[0, 1]$, like mapped Fourier series, Legendre polynomials and wavelets. Then the basis defined on the $\mathbb{R}$ or $\mathbb{R}^+$ could be constructed by the previous mapping functions, that is: $$\varphi_i(x) = u'(x, s)\phi_i(u(x, s))$$ And then the basis function sequences $\{b_{l_1, l_2}(t, x)\}$ could be expressed as $\{\phi_{l_1}(t)\} \otimes \{\varphi_{l_2}(x)\}$, then the estimation of $m_j(t_i, X_{j, i})$ reduces to estimating the coefficients $\{\beta_{j, l_1, l_2}\}$. We apply OLS to estimate them and for convinience, we denote the coefficient vector $\boldsymbol \beta = (\beta_1, \dots, \beta_{rcd})^T \in \mathbb{R}^{rcd}$ which collects all $\{\beta_{j, l_1, l_2}\}$ corresponding to the order of sieve basis, i.e. for $1 \leq k \leq rcd$ $$\beta_k = \beta_{l+1, l_1+1, l_2}, l = \lfloor{\frac{k}{cd}}\rfloor, l_1 = \lfloor{\frac{k-lcd}{d}}\rfloor, l_2 = k - lcd - l_1 d$$ We denote $Y = (Y_i)_{r+1\leq i \leq n} \in \mathbb{R}^{n-r}$ and the entry of design matrix $W \in \mathbb{R}^{(n-r) \times rcd}$ as: $$W_{ik} = \phi_{l_1+1}(t_i)\varphi_{l_2}(X_{l+1, i})$$ Then the OLS estimator of $\boldsymbol \beta$ is: $$\hat{\boldsymbol \beta} = (W^T W)^{-1}W^T Y$$
We use $\boldsymbol b \in \mathbb{R}^{cd}$ denote the basis vector which equals to $\{\phi_{l_1}(t)\} \otimes \{\varphi_{l_2}(x)\}$. And for any $1 \leq j \leq r$, denote the diagonal matrix $L_j \in \mathbb{R}^{rcd \times rcd}$, such that $L_j = \oplus_{k=1}^r \delta_{k}(j)I_{cd}$, where $\oplus$ is direct sum, $\delta_{k}(j) = 1$ when $k = j$ and $0$ otherwise, $I$ is identity matrix. Then the estimator for $m_j(t, x)$ is: $$\hat{m}_{j, c, d}(t, x) = (\hat{\boldsymbol \beta} L_j)^T(\boldsymbol b \otimes I_{r})$$ Then if based on the locally stationary, short dependency, smoothness assumptions on the series and the function, $\hat{m}_{j, c, d}(t, x)$ is a consistent estimator of $m_j(t, x)$
For choosing the parameter $c, d$, we could use a similar cross validation method to the previous part. That is for some alternative pairs $\{(c_i, d_j)\}, \ i=1, 2,\dots, u, \ j = 1, 2, \dots, v$. We could select the pair $(b_i, c_j)$ with the minimum sample $MSE$. In practical analysis, we would also prefer the model with less parameters if the sample $MSE$ have no significant differences due to the Occam's Razor.
Similar to the previous part, we also want to make more inferences on the structure of $m_j(t, x)$. Two of these problems are to contruct a $1-\alpha$ simutaneous confidence region(SCR) of $m_j(t, x)$ based on our estimator $\hat{m}_{j, c, d}(t, x)$ and test whether $m_j(t, x)$ has some exact form.
For the first problem, we denote our $(1-\alpha)$ SCR as $\{\Upsilon_{\alpha}(t, x), 0\leq t\leq 1, x\in \mathbb{R}\}$, then $$\lim_{n \to \infty} \mathbb{P}\{m_j(t, x) \in \Upsilon_{\alpha}(t, x), \forall \ 0\leq t\leq 1, \forall \ x\in \mathbb{R}\} = 1 - \alpha$$
For $0\leq t\leq 1$ and $x\in \mathbb{R}$, denote that, $$h_j(t, x) = \sqrt{\boldsymbol l_j^T \Omega \boldsymbol l_j}, 1 \leq j \leq r$$ where $l_j = \Sigma^{-1}(\boldsymbol b \otimes I_{r})L_j$. We define our statistics $T_{1j}$ as: $$T_{1j}(t, x) = \hat{m}_{j, c, d}(t, x) - m_j(t, x)$$ Then $h_j^2(t, x)$ is the asymptotic variance of $\sqrt{n} T_{1j}$, and our $(1-\alpha)$ SCR could be constructed as: $$\hat{m}_{j, c, d}(t, x) \pm \frac{c_{\alpha}}{\sqrt{n}}h_j(t, x), 0\leq t\leq 1, x\in \mathbb{R}$$ where $c_{\alpha}$ is the critical value. Therefore the construction of $(1-\alpha)$ SCR equals to compute $c_{\alpha}$ and estimate $h_j(t, x)$. Because $\Sigma$ and $\Omega$ are hard to compute or estimate, therefore, we use some bootstrap sampling methods to simulate them and our statistics.
Then, we declare some variables we use in this bootstrap methods. $\{\hat{\epsilon_i}\}$ is the practical residuals, i.e.: $$\hat{\epsilon_i} = Y_i - \sum_{j=1}^r \hat{m}_{j, c, d}(t_i, X_{j, i})$$ We denote $\boldsymbol w_i = (w_{ik})_{\{1\leq k \leq rd\}} \in \mathbb{R}^{rd}, r+1 \leq i \leq n$, s.t. $$w_{ik} = \varphi_{l_2}(X_{l_1+1, i})$$ where $l_1 = \lfloor \frac{k}{d} \rfloor$ and $l_2 = k - l_1 d$. Similarly we define $\hat{\boldsymbol z_i} = (z_{ik})_{\{1\leq k \leq rd\}} \in \mathbb{R}^{rd}, r+1 \leq i \leq n$ as $$\hat{\boldsymbol z_i} = \hat{\epsilon_i} \boldsymbol w_i$$
For some positive integer $m$, we propose a sample statistic $\Xi$: $$\Xi: = \frac{1}{\sqrt{n-m-r}\sqrt{m}} \sum_{i=r+1}^{n-m}\Big[\Big(\sum_{j=i}^{i+m} \hat{\boldsymbol z_j}\Big)\otimes\Big(\boldsymbol a(t_i)\Big)\Big]R_i$$ where $R_i$ are i.i.d standard Gaussian random variable, and $\boldsymbol a(t) = (\phi_1(t), \cdots, \phi_c(t))^T$.
Using precious notations and variables we could propose our statistics as: $$\hat{T}_{1j}: = \Xi^T \hat{\Sigma}^{-1} (\boldsymbol b \otimes I_{r})L_j, \ \hat{\Sigma} = \frac{1}{n}W^T W$$ where $W$ is previous design_matrix.
The distribution of $\hat{T}_{1j}$ could mimic the distribution of $T_{1j}$. Now we could state our algorithms based on previous definition. For any given pair $(t, x)$, generate $M$(say 1000) i.i.d copies of $\{\Xi^{(k)}\}_{k=1}^M$ and compute corresponding $\hat{T}_{1j, k}, k=1,2, \dots, M$. We could reagard the sample standard deviation(s.t.d.) of $\{\hat{T}_{1j, k}\}$ as the estimation of $h_j(t, x)$, i.e. $\hat{h}_{j}(t, x)$. We construct a sequence of uniform grids of $[0, 1] \times [0, 1]$, denoted as $(t_{i_1}, y_{i_2}), \ 1\leq i_1 \leq C_1, \ 1\leq i_2 \leq C_2$, where $C_1$ and $C_2$ are some large integers like 1000. Using the inversion mapping of $y = u(x, s)$ we defined previously, then we could get $x_{i_2} = u^{-1}(y_{i_2}, s)$, next we could calculate $\hat{h}_{j}(t_{i_1}, x_{i_2})$ for each pair $(t_{i_1}, x_{i_2})$. Finally, we re-generate $B$(say 1000) i.i.d copies of $\{\Xi^{(k)}\}_{k=1}^B$ and compute corresponding $\{\hat{T}_{1j, k}(t_{i_1}, x_{i_2})\}, 1 \leq k\leq B$ for any pair $(t_{i_1}, x_{i_2})$. For each $k$, we calculate $\Gamma_k$ as follows $$\Gamma_k = \sup_{i_1, i_2}|\frac{\hat{T}_{1j, k}(t_{i_1}, x_{i_2})}{\hat{h}_{j}(t_{i_1}, x_{i_2})}|$$ Sort them and generate the order statistics $\Gamma_{(1)} \leq \Gamma_{(2)} \leq \dots \leq \Gamma_{(B)}$, and we calculate $\hat{c}_{\alpha} = \Gamma_{\lfloor B(1-\alpha)\rfloor}$. Now we could compute our $(1-\alpha)$ SCR as $\hat{m}_{j, c, d}(t, x) \pm \frac{\hat{c}_{\alpha}}{\sqrt{n}}h_j(t, x)$
For the second problem, we could state the null hypothesis as: $$\boldsymbol H_0: m_j(t, x) = m_{j,0}(t, x) $$ where $m_{j,0}(t, x)$ is some pre-given function. And we propose our statistic $T_{2j}$ as: $$T_{2j} = \int_{0}^{1}\int_{\mathbb{R}}\big[\hat{m}_{j, c, d}(t, x) - m_{j,0}(t, x)\big]^2 dxdt$$ Then under the null hypothesis, our statistics $T_{2j}$ should be small. Although we could derive the asymptotic distribution of $T_{2j}$, it involves the true value of $\epsilon_i$ and covariance of series which can not be observed directly. Therefore, like prevouly, we use bootstrap method to mimic some variables and then, the distribution of $n T_{2j}$.
We define $\boldsymbol B_j = \int_{0}^{1}\int_{\mathbb{R}} [(\boldsymbol b \otimes I_{r})L_j][(\boldsymbol b \otimes I_{r})L_j]^T dxdt$, and $\hat{\Lambda}:= \hat{\Sigma}^{-1}\boldsymbol B_j \hat{\Sigma}^{-1}$. Then our bootstrap statistics is: $$\hat{T}_{2j} := \Xi^* \hat{\Gamma} \Xi$$ The distribution of $\hat{T}_{2j}$ could mimic that of $n T_{2j}$ asymptotically.
Based on the previous discussion, the process of bootstrap method to test $H_0$ is to generate $M$(say 1000) i.i.d copies of $\{\Xi^{(k)}\}_{k=1}^M$ and compute $\{\hat{T}_{2j}^{k}\}_{k=1}^M$ correspondingly. Then sort them and generate the order statistics $\hat{T}_{2j, (1)} \leq \hat{T}_{2j, (2)} \leq \dots \leq \hat{T}_{2j, (M)}$. Reject $H_0$ at the level $\alpha$ if $n T_{2j} > \hat{T}_{2j,(\lfloor M(1-\alpha) \rfloor)}$, where $\lfloor x \rfloor$ denotes the largest integer smaller or equal to $x$. Let $M^* = \max{r: \hat{T}_{2j,(r)} \leq n T_{2j}}$, then the $p$-value is $1- \frac{M^*}{M}$
For the problem of how to determine the parameter $m$(a postive integer), we use the minimum volatility(MV) method to choose the window size $m$. That is we define the covariance structure $\hat{\Omega}$ as: $$\hat{\Omega} := \frac{1}{(n-m-b+1)m} \sum_{i=b+1}^{n-m}\Big[\Big(\sum_{j=i}^{i+m}\hat{\boldsymbol z_j}\Big)\otimes\Big(\boldsymbol a(t_i)\Big)\Big] \times \Big[\Big(\sum_{j=i}^{i+m}\hat{\boldsymbol z_j}\Big)\otimes\Big(\boldsymbol a(t_i)\Big)\Big]^*$$ Then, it desires to minimize this covariance structure in a suitable range of candidate $m$'s. In practice, we usually choose a sequence of window size $m_{-h_0 + 1} < \dots < m_1 < m_2 < \dots < m_{n_0} < \dots < m_{n_0 + h_0}$ for some given large value $m_{n_0}$ and neighborhood control parameter $h_0 > 0$(we usually select $h_0 = 3$). Then for any $m_j, j = 1, 2, \dots, n_0$, we calculate the matirx norm error of $\hat{\Omega}_{m_j}$ in the $h_0$-nighborhood, i.e., $$se(m_j) := se(\{\hat{\Omega}_{m_{j+k}}\}_{k=-h_0}^{h_0}) = \Big[\frac{1}{2h_0}\sum_{k=-h_0}^{h_0}||\bar{\hat{\Omega}}_{m_j}-\hat{\Omega}_{m_{j+k}}||^2\Big]^{1/2}$$ where $\bar{\hat{\Omega}}_{m_j} = \sum_{k=-h_0}^{h_0} \hat{\Omega}_{m_{j+k}} /(2h_0 + 1)$. Therefore, we choose the estimate of $m$ using $$\hat{m} := \arg\min_{m_1 \leq m \leq m_{n_0}} se(m)$$
We can also present two important examples of structure test
(1). Testing time-homogeneity $$\boldsymbol H_0: m_j(t, x) = m_{j}(x), \forall t \in [0,1]$$ For this test, we estimate function only using $\varphi(x)$(the basis function of $x$), then compare $\hat{m}_{j}(x)$ and the SCR of $\hat{m}_{j}(t, x)$.
If the plot of $\hat{m}_{j}(x)$ is not in the SCR of $\hat{m}_{j}(t, x)$, we reject the null hypothesis, which means we time is inhomogeneous.
(2). Testing separability $$\boldsymbol H_0: m_j(t, x) = f_{j}(t)g_j(x) , \forall t \in [0,1], x \in \mathbb{R}$$ For this test, we estimate $f_j(t)$ by $$\hat{f}_j(t) = \int_{0}^{1} \hat{m}_{j}(t, x)dx$$ and $g_j(x)$ by $$\hat{g}_j(x) = \int_{\mathbb{R}} \hat{m}_{j}(t, x)dt$$ Then we compare $\frac{\hat{f}_j(t)\hat{g}_j(x)}{\iint \hat{m}_{j}(t, x)dtdx}$ and the SCR of $\hat{m}_{j}(t, x)$
If the plot of $\frac{\hat{f}_j(t)\hat{g}_j(x)}{\iint \hat{m}_{j}(t, x)dtdx}$ is not in the SCR of $\hat{m}_{j}(t, x)$, we reject the null hypothesis, which means we think it is not separable.
we use a nonlinear AR(1) model to simulate where $Y_i = X_i$, $X_{j, i} = X_{i-j}$: $$X_i = \sin(2\pi\frac{i}{n}) e^{-X_{i-1}^2} + \epsilon_{i}$$
def series_generator_example(n):
np.random.seed(2022)
x = list(np.random.randn(1))
for i in range(1, n):
x.append((math.sin(2 * np.pi * (i + 1) / n)) * math.exp(- x[-1] ** 2) + np.random.normal())
return x
def true_function_series(t, x):
return (math.sin(2 * np.pi * t)) * math.exp(- x ** 2)
n = 4000
series = np.array(series_generator_example(n))
b = 1
c1 = 5
c2 = 4
sieve_obj = SieveOLS2D(series, series, 'legendre', 'legendre', 'algebraic', b, c1, c2)
In order to demonstrate the asymptotic property of $\hat{m}_{j, c, d}(t, x)$ to $m_{j}(t, x)$, we both draw several 2-d plots where we fixed one variable of $t, x$ and change another and also a 3-d plot. We could find the estimated function and true function plot close to each other
flexible_sequence_time = np.arange(1, n) / n
flexible_sequence_series = np.arange(1, 5 * n) / n
b_index = 1
for fixed_value in np.arange(1, 10) / 10:
sieve_obj.mimic_function_compara(true_function_series, 'time', fixed_value, flexible_sequence_series, b_index)
sieve_obj.mimic_function_compara(true_function_series, 'series', fixed_value, flexible_sequence_time, b_index)
sieve_obj.mimic_function_compara_3d(true_function_series, 1, np.arange(0, 1.001, 0.01), np.arange(0, 3.001, 0.01))
sieve_obj.cross_validation()
From the results of cross validation, we prefer to choose $c = 5$ and $d = 2$, because it has relatively small $MSE$ and parameters.
We still use the previous nonlinear AR(1) model to simulate where $Y_i = X_i$, $X_{j, i} = X_{i-j}$: $$X_i = \sin(2\pi\frac{i}{n}) e^{-X_{i-1}^2} + \epsilon_{i}$$
def null_function_1(t, x):
return (math.sin(2 * np.pi * t)) * math.exp(- x ** 2)
def null_function_2(t, x):
return (math.sin(2 * np.pi * t) + 1) * math.exp(- x ** 2)
for fixed_value in np.arange(1, 10) / 10:
sieve_obj.simultaneous_confidence_region_compara(null_function_1, 'time', fixed_value, flexible_sequence_series, b_index)
sieve_obj.simultaneous_confidence_region_compara(null_function_1, 'series', fixed_value, flexible_sequence_time, b_index)
index of split i=99: 100%|███████████████████████████████████████████████████████████| 100/100 [00:17<00:00, 5.84it/s] index of Tau list k=999: 100%|█████████████████████████████████████████████████████| 1000/1000 [00:15<00:00, 64.48it/s]
for fixed_value in np.arange(1, 10) / 10:
sieve_obj.time_homogeneity_compara(flexible_sequence_series, any_time = fixed_value)
The plots indicate we should reject the null hypothesis.
We also try a new AR(1) model to simulate where $Y_i = X_i$, $X_{j, i} = X_{i-j}$: $$X_i = e^{-X_{i-1}^2} + \epsilon_{i}$$
def series_generator_example2(n):
np.random.seed(2022)
x = list(np.random.randn(1))
for i in range(1, n):
x.append(math.exp(- x[-1] ** 2) + np.random.normal())
return x
n = 4000
series2 = np.array(series_generator_example2(n))
sieve_obj2 = SieveOLS2D(series2, series2, 'legendre', 'legendre', 'algebraic', b, c1, c2)
for fixed_value in np.arange(1, 10) / 10:
sieve_obj2.time_homogeneity_compara(flexible_sequence_series, any_time = fixed_value)
index of split i=99: 100%|███████████████████████████████████████████████████████████| 100/100 [00:04<00:00, 20.04it/s] index of Tau list k=999: 100%|█████████████████████████████████████████████████████| 1000/1000 [00:14<00:00, 69.34it/s]
The plots indicate we can not reject the null hypothesis for the new AR(1) series.
Then we try the separability test. We need some small modification to our sample to avoid that $\iint \hat{m}_{j}(t, x)dtdx=0$, therefore, we use $$X_i = \sin(2\pi\frac{i}{n} + 2) e^{-X_{i-1}^2} + \epsilon_{i}$$
def series_generator_example3(n):
np.random.seed(1)
x = list(np.random.randn(1))
for i in range(1, n):
x.append((math.sin(2 * np.pi * (i + 1) / n) + 2) * math.exp(- x[-1] ** 2) + np.random.normal())
return x
series3 = np.array(series_generator_example3(n))
fixed_value_sequence = np.arange(1, 10) / 10
sequence_time = np.arange(1, 100) / 100
sequence_series = np.arange(1, 500) / 100
sieve_obj3 = SieveOLS2D(series3, series3, 'legendre', 'legendre', 'algebraic', b, c1, c2)
sieve_obj3.separablity(fixed_value_sequence, sequence_time, sequence_series)
index of split i=99: 100%|███████████████████████████████████████████████████████████| 100/100 [00:05<00:00, 17.02it/s] index of Tau list k=999: 100%|█████████████████████████████████████████████████████| 1000/1000 [00:15<00:00, 63.57it/s]
The plots indicate we can not reject the null hypothesis, which means we think that this series is seperable
Finally, we operate test for exact form test: \ We still use the previous series where $Y_i = X_i$, $X_{j, i} = X_{i-j}$: $$X_i = \sin(2\pi\frac{i}{n}) e^{-X_{i-1}^2} + \epsilon_{i}$$ (1) $\boldsymbol H_0: m_{j,0}(t, x) = \sin(2\pi t) e^{-x^2}$ \ (2) $\boldsymbol H_0: m_{j,0}(t, x) = \Big(\sin(2\pi t) + 1\Big) e^{-x^2}$
sieve_obj.exact_form_test(null_function_1, 1)
('not reject', 0.18999999999999995)
sieve_obj.exact_form_test(null_function_2, 1)
('reject', 0.0)
The results show that the second null hypothesis is rejected.\ And now we try 1000 trials, under $\alpha = 0.1$
def series_generator_single(data, n):
x_1 = data[0]
yield x_1
for i in range(1, n):
x_1 = (math.sin(2 * np.pi * (i + 1) / n)) * math.exp(- x_1 ** 2) + data[i]
yield x_1
def series_generator_matrix(n, m):
np.random.seed(1)
x = np.random.randn(m, n)
return np.array([list(series_generator_single(x[i], n)) for i in range(m)])
n = 1000
m = 1000
series_matrix = series_generator_matrix(n, m)
def bootstrap_trial(data, func_type_time, func_type_series, mapping_type, alpha=0.05):
test_list = []
with trange(len(data)) as pbar:
for i in pbar:
sieve_obj2 = SieveOLS2D(data[i], data[i], func_type_time, func_type_series, mapping_type, b, c1, c2)
test_list.append(sieve_obj2.exact_form_test(null_function_1, 1, alpha))
simulated_type_I_error = sum([_[0] == 'reject' for _ in test_list]) / len(test_list)
pbar.set_description("Simulated Type I Error={}".format(simulated_type_I_error))
return test_list
test_list1 = bootstrap_trial(series_matrix, 'triangle', 'legendre', 'algebraic', alpha=0.1)
Simulated Type I Error=0.154: 100%|████████████████████████████████████████████████| 1000/1000 [40:06<00:00, 2.41s/it]
sum([_[1] < 0.05 for _ in test_list1])
75
The simulated type I error is 0.154 for $\alpha = 0.1$ and 0.075 for $\alpha = 0.05$
ABML_df = pd.read_csv('series-250822.csv')
ABML_time_series = ABML_df[ABML_df.columns[1]][81: 304].to_numpy().astype(float)
ABML_second_diff = np.diff(np.diff(ABML_time_series))
plt.plot(np.arange(len(ABML_time_series)), ABML_time_series)
plt.title('ABML time series')
plt.show()
plt.plot(np.arange(len(ABML_second_diff)), ABML_second_diff)
plt.title('ABML second order diff')
plt.show()
use orthogonal polynomials
for c_index in range(2, 6):
ABML_obj = SieveOLS(ABML_second_diff, 'legendre', 2, c_index)
ABML_obj.choose_b_intuitive()
use Fourier series
for c_index in range(2, 7):
ABML_obj = SieveOLS(ABML_second_diff, 'triangle', 2, c_index)
ABML_obj.choose_b_intuitive()
use orthogonal wavelets
for c_index in range(1, 5):
ABML_obj = SieveOLS(ABML_second_diff, 'db16', 2, c_index)
ABML_obj.choose_b_intuitive()
with open('PEAS.txt') as f:
lines = f.readlines()
PEAS_time_series = np.array([_.split() for _ in lines[1:]]).astype(float).flatten()
plt.plot(np.arange(len(PEAS_time_series)), PEAS_time_series)
plt.title('PEAS time series')
plt.show()
use orthogonal polynomials
for c_index in range(2, 7):
PEAS_obj = SieveOLS(PEAS_time_series, 'legendre', 2, c_index)
PEAS_obj.choose_b_intuitive()
use Fourier series
for c_index in range(2, 7):
PEAS_obj = SieveOLS(PEAS_time_series, 'triangle', 2, c_index)
PEAS_obj.choose_b_intuitive()
use orthogonal wavelets
for c_index in range(1, 6):
PEAS_obj = SieveOLS(PEAS_time_series, 'db16', 2, c_index)
PEAS_obj.choose_b_intuitive()
exchange_df = pd.read_csv('ei_mfrt_m.tsv', sep='\t', header=0)
Euro_Dollar_time_series = exchange_df.iloc[37,58:].to_numpy().astype(float)[::-1]
Euro_Dollar_time_series_log = np.diff(np.log(Euro_Dollar_time_series))
plt.plot(np.arange(len(Euro_Dollar_time_series_log)), Euro_Dollar_time_series_log)
plt.title('Eu-USD-log time series')
plt.show()
use orthogonal polynomials
for c_index in range(2, 7):
EuDO_obj = SieveOLS(Euro_Dollar_time_series_log, 'legendre', 2, c_index)
EuDO_obj.choose_b_intuitive()
use Fourier series
for c_index in range(2, 7):
EuDO_obj = SieveOLS(Euro_Dollar_time_series_log, 'triangle', 2, c_index)
EuDO_obj.choose_b_intuitive()
use orthogonal wavelets
for c_index in range(1, 6):
EuDO_obj = SieveOLS(Euro_Dollar_time_series_log, 'db16', 2, c_index)
EuDO_obj.choose_b_intuitive()