The package involves estimation and inference for time-inhomogeneous nonlinear time series using the tools and methods from [1,2,3]. The estimation procedure is nonparametric and based on the sieve methods as in [4]. A unified simultaneous inferential theory for structural and exact form tests [6] is utilized. In addition, the package also provides functions to process data visualization and use functions from R package “Sie2nts.”
library(ggplot2)
library(Sie2nts)
library(plotly)
library(SIMle)
library(splines)
library(stringr)
library(RCurl)
The mapping idea is popular in numerical PDEs for handling the boundary value problems [6]. In our approach, we apply suitable mappings to transform an unbounded domain into a compact one, mapping the values of \(x\) into interval \([0,1]\). We have employed four specific mappings in our methods: the algebraic mapping function maps \(\mathbb{R}\) to \([0,1]\), and the logarithmic mapping function maps \(\mathbb{R}^+\) to \([0,1]\). Each of these mapping functions includes a positive scaling factor \(s\). If we observe that all values in the time series are either positive or negative, we should extract the median of the time series. Then, we can use this data with the mentioned methods to apply domain in R. After the estimation and inference process, it is also necessary to reintroduce the median of the time series.
\[ u(x, s) = \frac{x}{2\sqrt{x^2 + s^2}} + \frac{1}{2} \] \[ u^{'}(x, s) = \frac{s^2}{2(x^2 + s^2)^{3/2}} \]
bs.plot.trans("Legen", "algeb", 5, title = "Mapped Legendre basis")
bs.plot("Legen", 5, title = "Legendre basis")
\[ u(x, s) = \frac{\tanh(\frac{x}{s}) + 1}{2} \]
\[ u^{'}(x, s) = \frac{1-\tanh^2(\frac{x}{s})}{2s} \]
bs.plot.trans("Legen", "logari", 5, title = "Mapped Legendre basis")
bs.plot("Legen", 5, title = "Legendre basis")
We consider the time-inhomogeneour nonlinear time series models to
check our results. This user’s guide contains 6 functions for generating
7 different types of time series named generate_nAR1()
,
generate_nAR1.1()
, generate_nAR1.2()
,
generate_nAR1.s()
, generate_nAR2.1()
,
generate_nAR2.2()
, and generate_nAR2.3()
. For
each function, the input n indicates the size of the sample and \(v\) is the variance of the sample.
Parameter \(v\) is for adjusting
purpose. The output is one vector contain n length time series basis on
6 different models respectively. Set assumptions \(Y_i = X_i\), \(X_{j,i} = X_{i-j}\), the AR1 and AR2 models
are given,
\[ X_i = \sin(2\pi \frac{i}{n})e^{-X_{i-1}^2} + \epsilon_i \]
\[ X_i = \sin(2\pi \frac{i}{n})\frac{1}{X_{i-1}^2+1} + \epsilon_i \]
\[ X_i = \left(\sin(2\pi \frac{i}{n})+2 \right)e^{-X_{i-1}^2} + \epsilon_i \]
\[ X_i = e^{-X_{i-1}^2} + \epsilon_i \]
\[ X_i = \sin(2\pi \frac{i}{n}) e^{- \frac{-X_{i-1}^2}{2}} + \cos(2\pi \frac{i}{n})e^{-X_{i-2}^2} + \epsilon_i \]
\[ X_i = \sin(2\pi \frac{i}{n}) e^{- \frac{-X_{i-2}^2}{2}} + \epsilon_i \]
\[ X_i = (\sin(2\pi \frac{i}{n})+1) e^{- \frac{-X_{i-1}^2}{2}} + (\cos(2\pi \frac{i}{n})+2)e^{\frac{-X_{i-2}^2}{2}} + \epsilon_i \]
This package contains 2 functions for estimation.
fix.fit()
focus on users specific and it requires that
users select the number of basis for both 2 parts, lags for time series,
mapping function type and type of estimation. auto.fit()
will get estimation of Sieve least square coefficients basis on
different rules. This package provides 4 rules.The first choice is “CV”
uses Cross-Validation. The second choice is “Kfold” which uses the
k-fold Cross-Validation for time series, it takes \(3*log_2(n)\) size as validation set where
\(n\) is the number of total
observations. The third choice is “AIC”, and the fourth choice is “BIC”
that are commonly used for OLS. The function will choose tuning
parameters that result in the smallest criterion value. Users can use
fit.plot()
(for fix.fit()
and
auto.fit()
) to present the plot of results of estimation.
cv.res()
and cv.plot()
can present result
table and plot of Cross-Validation.
Given time-inhomogeneous nonlinear time series regression, \[ Y_{i,n} =\sum_{j=1}^r m_j (t_i, X_{j,i,n} )+ϵ_{i,n}, i=1,2,…,n\]
Setting \(X_{j,i} = Y_{i−j}\) . If both \(X_i\) and \(\epsilon_i\) are locally stationary time series. Then, we use the sieve least square estimation approach to obtain the estimators. The objective function can be rewrite as,
\[ m_j(t,x ) =\sum_{l_1=1}^c \sum_{l_2=1}^d \beta_{j,l_1,l_2} b_{l_1,l_2} (t, x)\]
Where \(\{b_{l_1,l_2 } (t, x)\}\) are sieve basis functions and \(\{\beta_{l_1,l_2} (t, x)\}\) are the coefficients. Generally, sieve basis functions are chosen from wavelets, Legendre and other orthogonal systems and we apply some suitable mappings to map the unbounded domain to a compact one, say, \([−1, 1]\). In our case, let \(\{\phi_i \}\) and \(\{\varphi_i \}\) as the orthonormal basis functions, thus, the hierarchical sieve basis functions is given, \[\{\phi_i(t) \} \otimes \{\varphi_i(x) \} \]
And in practice, we select \(x\) from \([−10, 10]\) to mimic \(x\) belongs to \(\mathbb{R}\). Then, we use the ordinary least square (OLS) to obtain the estimators. The design matrix is created by these functions. Therefore, the design matrix W whose entry satisfies that \(W_{ik}= ϕ_{l_1}(t_i) φ_{l_2+1} (X_{i−(l+1)} ), 1≤k≤rcd, l=⌊k/cd⌋\) Finally, the estimators for \(\beta_{(j,l_1,l_2)}\) can be given, \[\hat \beta =(W^T W)^{−1}W^TY\]
This demo use AR1
simulated time series to demonstrate
estimation.
ts = generate_nAR1(4000, 1)
res_esti = SIMle::fix.fit(ts, 5, 2, "Legen", "Legen", "algeb", "fixx", 0.6)
fit.plot(res_esti[[1]], "fixx", "algeb")
res_esti = SIMle::fix.fit(ts, 5, 2, "Legen", "Legen", "algeb", "fixt", 0.2)
fit.plot(res_esti[[1]], "fixt", "algeb")
After estimating of sieve least square, users can also use
series.predict()
function to get the predicting. The inputs
are time series data, the number of basis for both 2 parts, lags for
time series and mapping function type. Based on the results of [6],
given time-inhomogeneous nonlinear \(\{x_1,...,x_n \}\), for the \(1\)-step ahead prediction, we recall the
sieve estimation process to get \(\hat
m_{j,c,d}(t, x) = \hat \beta_{j,l1,l2} b_{l1,l2}(t,x)\) where
\(b_{l1,l2}(t,x)= \phi_{l1}(t)
*\varphi_{l2}(x)\). The \(1\)-step ahead prediction can be written
as,
\[ \hat x_{n+1} = \sum\sum \hat \beta_{j,l1,l2} b_{l1,l2}(1, x_n) \]
Then, using the formula \(\hat x_{i} = \sum\sum \hat \beta_{j,l1,l2} b_{l1,l2}(t_i, x_{i-1}), i = 2,...,n\) we can obtain the \(h\)-step ahead prediction. We can apply prediction to get Cross-Validation results basis on different creterian.
cv.res(ts, 5, 5, "Legen", "Legen", "algeb", "AIC")
## [,1] [,2] [,3]
## [1,] 1 1 663.4004957
## [2,] 1 2 653.9552634
## [3,] 1 3 651.9009871
## [4,] 1 4 653.7586447
## [5,] 1 5 655.7168432
## [6,] 2 1 305.2514456
## [7,] 2 2 303.2324380
## [8,] 2 3 283.3356004
## [9,] 2 4 285.4447059
## [10,] 2 5 281.7214419
## [11,] 3 1 306.1121249
## [12,] 3 2 301.9022212
## [13,] 3 3 283.3177628
## [14,] 3 4 284.8761138
## [15,] 3 5 282.8990329
## [16,] 4 1 42.0831994
## [17,] 4 2 38.4779913
## [18,] 4 3 7.2912585
## [19,] 4 4 6.7345130
## [20,] 4 5 -0.5099485
## [21,] 5 1 42.5422733
## [22,] 5 2 39.3322684
## [23,] 5 3 11.2931813
## [24,] 5 4 8.8381524
## [25,] 5 5 2.9608136
cv.plot(cv.res(ts, 3, 3, "Legen", "Legen", "algeb", "AIC"))
After obtaining the estimations, we infer the nonlinear functions
\(m_j(t,x),1 ≤ j ≤ r\), based on our
proposed sieve least square estimator. The initial step involves
constructing a \((1 − \alpha)\)
simultaneous confidence region (SCR) of \(m_j(t,x)\). This package offers two
functions, fix.SCR()
and auto.SCR()
, to obtain
simultaneous confidence regions. fix.SCR()
is tailored for
user-specific requirements and necessitates the selection of the number
of basis functions for both components, lags for time series, mapping
function type, and the type of confidence region.
auto.SCR()
generates a simultaneous confidence region based
on rules identical to those used in the estimation part. Additionally,
scr.plot()
can visualize the simultaneous confidence region
with estimated function.
Given the nominal level \(\alpha\), \(j = 1\), the SCR can be constructed as follows,
\[ \hat m_{j,c,d}(t, x) \pm \frac{c_{\alpha}}{\sqrt{n}}h_j(t, x), t \in [0, 1], x\in \mathbb{R}, 1\le j \le r \]
where \(h_j(t, x) = \sqrt{l_j^T\Omega l_j}\) , and \(l_j = \Sigma^{-1}(b \otimes I_r)L_j\). \(I_r\) and \(L_j\) can be found in [6]. This package uses Algorithm 1 Multiplier Bootstrapping for constructing SCR on [6]. On step one, we compute \(\hat \Sigma^{-1} = \frac{1}{n}W^TW\), estimate \(\hat m_{j,c,d}(t, x)\) by sieve method and residuals \(\hat \epsilon_i = Y_i - \sum_{j=1}^{r} \hat m_{j,c,d}(t_i, X_{i-j})\)
On step two, given pair \((t, x)\), generate \(B=1000\) i.i.d copies of {\(\{\Xi^{(k)}\}_{k=1}^B\) where
\[ \Xi = \frac{1}{\sqrt{n-m-r}\sqrt{m}} \sum_{i=r+1}^{n-m} \left[\left(\sum_{j = i}^{i+m} \hat x_j \right) \otimes a(t_i) \right]R_i \] \(R_i\) are i.i.d N(0,1) standard normal variables. \(a(t_i) = \left(\phi_{1}(t_i),...,\phi_{c}(t_i) \right)^T\), \(x_i=(x_{ik})_{1\le k \le rd}\), \(x_{ik}=\hat \epsilon_i w_{ik}\), \(w_{ik} = \varphi_{l2}(X_{i-l_1 -1})\).
Then, we can compute \(\hat T_{1,k} = \Xi^T \hat \Sigma^{-1}(b \otimes I_r)L_j, k=1,2,...,B\) and calculate the sample standard deviation of \({\hat T_{1,k}}\) and denote it as \(\hat h(t,x)\). Let \(C_1 = C_2 = 2000\), on step three, construct a sequence of uniform grids of \([0,1] \times [0,1]\) denoted as \((t_i, y_j)\), \(1\le i\le C_1\), \(1\le j \le C_2\). \(x_j\) can be calculated by inverse mapping function (\(x_j= g(2y_j-1; s)\)). For each \((t_i, x_j)\), calculate \(\hat h(t_i,x_j)\). Finally, for each \((t_i, x_j)\), generate \(M = 1000\) i.i.d copies of \(\{ \hat T_{1,k}(t_i, x_j) \}\), \(1 \le k \le M\), for each \(k\), calculate,
\[ \tau_k = \displaystyle \sup_{i,j} \left| \frac{\hat T_{1,k}(t_i, x_j)}{\hat h(t_i, x_j)} \right| \]
Let \(\tau_{(1)} \le .. \le \tau_{(M)}\), calculate \(\hat C_\alpha = \tau_{\lfloor M(1-\alpha) \rfloor}\). The output should be, \[ \hat m_{j,c,d}(t, x) \pm \frac{\hat c_{\alpha}}{\sqrt{n}}\hat h_j(t, x) \]
ts = generate_nAR1(4000, 1)
res = fix.SCR(ts, 5, 2, m = "MV", "Legen", "Legen", "algeb", "fixx", 0.6, n_point = 1000)
scr.plot(res[[1]], "fixx")
ts = generate_nAR1(4000, 1)
res = fix.SCR(ts, 5, 2, m = "MV", "Legen", "Legen", "algeb", "fixt", 0.1, n_point = 1000)
scr.plot(res[[1]], "fixt")
Simultaneous Confidence Regions (SCR) can be employed for time-homogeneity tests, separability tests, and exact form tests. This package includes functions for all three applications, utilizing the Simultaneous Confidence Region approach. The process is outlined in the following sections.
If only locally-stationary time series is given, we can establish an AR (Auto-regressive) approximation theory by using only one set of basis functions. The methodology does not change, and we will get estimated coefficient function that only depends on time. So, we estimate function only using basis functions of time series and compare \(\hat m_j(x)\) and \(\hat m_j(t, x)\). The null hypotheses is given,
\[ \textbf{H}_0: m_j(t, x) = m_j(x), \forall t \in [0, 1] \]
The functions homo.test()
and
auto.homo.test()
are employed for conducting
Time-homogeneity tests. Specifically, homo.test()
is used
to accommodate user-specific requirements, while
auto.homo.test()
generates test results based on rules
identical to those used in the estimation process.
test.plot()
shows the result of test and we reject the null
hypothesis, which means we time is in-homogeneous, if the plot of \(\hat m_j(x)\) is not in the SCR of \(\hat m_j(t, x)\).
ts = generate_nAR1.s(4000, 1)
res = homo.test(ts, 5, 2, m = "MV", "Legen", "Legen", "algeb", fix_num = 0.1, n_point = 1000)
test.plot(res[[1]], "homot")
ts = generate_nAR1(4000, 1)
res = homo.test(ts, 5, 2, m = "MV", "Legen", "Legen", "algeb", fix_num = 0.1, n_point = 1000)
test.plot(res[[1]], "homot")
Another application of SCR is the Separability test. This test examines whether the function structure can be separated into a time series component and a time-independent part. Set functions \(m_j(t, x)\) \(f_j(t)\) and \(g_j(x)\), the null hypotheses is given,
\[ \textbf{H}_0: m_j(t, x) = f_j(t)g_j(x), \forall t \in [0, 1], x \in \mathbb{R} \] We assume that \(\left(\int_{\mathbb{R}}^{} \hat m_j(t, x) \; dx\right)\) and \(\left(\int_{0}^{1} \hat m_j(t, x) \; dt\right)\) are not zeros. Then, the estimation of \(f_j(t)\) and \(g_j(x)\) are,
\[ \hat f_j(t) = \left(\int_{\mathbb{R}}^{} \hat m_j(t, x) \; dx\right) \\ \hat g_j(x) =\left(\int_{0}^{1} \hat m_j(t, x) \; dt\right) \]
Thus, if the plot of \(\frac{\hat
f_j(t)\hat g_j(x)}{const}\), where (\(const = \int \int \hat m_j(t, x) \; dt
dx\)), does not fall within the Simultaneous Confidence Region
(SCR) of \(\hat m_j(t, x)\), we will
reject the null hypothesis, indicating that the function is not
separable. The functions sep.test()
and
auto.sep.test()
are utilized for performing the
Separability test. Specifically, sep.test()
is designed to
cater to user-specific requirements, while auto.sep.test()
generates test results based on rules identical to those used in the
estimation process. The function test.plot()
still displays
the test results.
ts = generate_nAR1.2(4000, 1)
res = sep.test(ts, 5, 2, m = "MV", "Legen", "Legen", "algeb", "fixx", fix_num = 0.6, n_point = 1000)
test.plot(res[[1]], "separa", "fixx", lower = 0, upper = 2)
ts = generate_nAR1.2(4000, 1)
res = sep.test(ts, 5, 2, m = "MV", "Legen", "Legen", "algeb", "fixt", fix_num = 0.1, n_point = 1000)
test.plot(res[[1]], "separa", "fixt", lower = 0, upper = 3)
Instead of interpreting their structure as discussed earlier, we can also test the exact forms of the functions. However, for the Exact Form Test, we will utilize an L2 test instead of a SCR. Given null hypothesis,
\[ H_0: m_j(t,x) = m_{j,0}(t,x) \] Where \(m_{j,0}(t,x)\) is some given-function. We apply Algorithm 2 Multiplier Bootstrapping for inferring on [6].On step one, compute \(\hat \Sigma^{-1} = \frac{1}{n}W^TW\), estimate \(\hat m_{j,c,d}(t, x)\) and residuals \(\hat \epsilon_i = Y_i - \sum_{j=1}^{r} \hat m_{j,c,d}(t_i, X_{i-j})\). On step two, generate \(B=1000\) i.i.d copies of {\(\{\Xi^{(k)}\}_{k=1}^B\), and compute \(\hat T_{2,k} = \Xi^T \hat W_j \Xi, k=1,2,...,B\), \(\hat W_j = \hat \Sigma^{-1} B_j \hat \Sigma^{-1}\), where
\[ B_j = \int_{[0,1]} \int_{\mathbb{R}} [(b\otimes I_r) L_j][(b\otimes I_r) L_j]^T dtdx \] On step 3, let \(\tau_{2,(1)} \le .. \le \tau_{2,(B)}\), we reject \(H_0\) at level \(\alpha\) if \(nT > \hat T_{2, \lfloor B(1-\alpha) \rfloor}\) and let \(B* = max\{\hat T_{2,r} \le nT_2 \}\), then, the p-value is given as \(1-\frac{B^*}{B}\). Where \(T_{2}\) is given by
\[ T_2 = \sum_{j=1}^{r} \int_{[0,1]} \int_{\mathbb{R}} [\hat m_{j,c,d}(t,x) - m_{j,0}(t,x)]^2 dt dx \]
This package provides functions exact.test()
and
auto.exact.test()
for Exact Form Test.
exact.test()
is designed to accommodate user-specific
requirements, while auto.exact.test()
generates test
results based on rules identical to those used in the estimation
process.
This test table presents the results of tests conducted using Legendre, trigonometric, and Daubechies wavelet basis functions. The experiments were carried out by the authors. Specifically, Non-linear AR1 and AR2.1 time series were employed to simulate the probabilities of error type one and error type two. This was done to validate the accuracy and power of the test. Here, ‘n’ represents the data size, while ‘c’, ‘d’, and ‘m’ denote tuning parameters. Additionally, the mapping function corresponds to the chosen type of mapping. Exact form test results are shown, on the column Basis, first one indicates type of basis for time t. Second one represents type of basis for time series.
Basis | Mapping function | n | c | d | m | Times of reject/Times of testing |
---|---|---|---|---|---|---|
Legendre Legendre | algeb | 4000 | 4 | 3 | MV method | 24/500 |
trigonometric Legendre | algeb | 4000 | 3 | 3 | MV method | 19/500 |
db10 db10 | algeb | 4000 | 2 | 2 | MV method | 31/500 |
trigonometric db10 | algeb | 4000 | 2 | 1 | MV method | 26/500 |
Basis | Mapping function | n | c | d | m | Times of reject/Times of testing |
---|---|---|---|---|---|---|
Legendre Legendre | algeb | 6000 | 5 | 2 | MV method | 29/500 |
Legendre Legendre | algeb | 6000 | 5 | 2 | MV method | 33/500 |
[1] Ding, Xiucai, and Zhou, Zhou. “Estimation and inference for precision matrices of nonstationary time series.” The Annals of Statistics 48(4) (2020): 2455-2477.
[2] Ding, Xiucai, and Zhou, Zhou. “Auto-regressive approximations to non-stationary time series, with inference and applications.” Available online, 2021.
[3] Ding, Xiucai, and Zhou, Zhou. “On the partial autocorrelation function for locally stationary time series: characterization, estimation, inference and applications.” Available online, 2022.
[4] Chen, Xiaohong. “Large Sample Sieve Estimation of Semi-Nonparametric Models.” Handbook of Econometrics, 6(B): 5549–5632,2007.
[5] Zhou, Zhou, “Heteroscedasticity and Autocorrelation Robust Structural Change Detection.” Journal of the American Statistical Association, 108(502): 726–740,2013.
[6] Ding, Xiucai, and Zhou, Zhou. “Simultaneous Sieve Inference for Time-Inhomogeneous Nonlinear Time Series Regression.” Available online, 2021.
UC Davis, xcading@ucdavis.edu, xiucaiding89@gmail.com↩︎
UC Davis, mrcqian@ucdavis.edu↩︎