mbreaks
packageThis vignette is intended for users who use mbreaks to test, estimate and conduct inference for linear regression models in presence of structural breaks. The library provides different estimation procedures to handle both pure change and partial change models, including estimating number of breaks via information criteria (cite) and sequentially estimating number of breaks(cite). For testing the packages provides diagnostic tests including Sup F test of 0 versus m breaks (cite), Double Max tests (cite) (R-base?), and sequential Sup F test of l versus l+1 breaks. A built-in function to visualize a structural break model with m breaks and a comprehensive call to conduct all of the procedures contained in package are provided.
The package mbreaks
provides R users with minimum code
but comprehensive functions to analyze structural breaks in linear
regression models. The framework based on the econometric model of the
following form:
$$ y_t = x_t^{\prime}\beta + z_t^{\prime}
\delta_j + u_t; \\
t=T_{j-1}+1,...,T_j; j\in\{1,...,m+1\}$$ where $\underset{(p \times 1)}{x_t}$ is vector of
regressors with coefficients unchanged over the sample and $\underset{(q \times 1)}{z_t}$ is vector of
regressors with coefficients changed under each regime j ∈ {1, 2, ..., m + 1}. If
p = 0, the model is equivalent
to pure structural change model, and if p > 0, the model is equivalent to
a partial structural change model. The above model implies there are
m breaks. Users can use
dofix()
function to estimate known (or user-specified)
m breaks. When the number of
breaks m is unknown, various
data-dependent procedures are provided in the package to test and select
number of breaks:
dotest()
(cite)dotest()
(cite)doseqtests()
(cite)doorder()
(cite)doorder()
(cite)dosequa()
(cite)dorepart()
(cite)There are 3 classes in the package, corresponding to 2 types of
diagnostic tests provided and structural change model estimation based
on m breaks. These S3 classes
have their modified print()
functions to summarize key
information. In summary:
sbtests
: S3 class returned from dotest()
function. It includes summary of SupF test and DMax test with test
statistics, critical values and summary tables which could be view in
console via print()
method or open in separated tab by
View()
seqtests
: S3 class returned from
doseqtests()
function. It includes summary of sequential
SupF test of l + 1 versus
l breaks with test statistics,
critical values and a summary table which could be view in console via
print()
method or open in separated tab by
View()
model
: S3 class returned from estimation procedures
above including dofix()
, doorder()
,
dosequa()
, and dorepart()
. The class
model
contains numerous information which are summarized
comprehensively into 3 main tables: i) break date estimates (estimated
date and corresponding asymptotic confidence intervals), ii)
regime-specific coefficients (estimates in each regime and corrected
standard errors (SEs) based on assumptions made on errors structure) and
iii) full-sample coefficient if p > 0 (with estimates and
corrected SEs). Besides the 3 main tables that include information about
the estimated structural change model with m breaks, model
is a
list contains fields (with majority of them are class
matrix
) which users can access by using operator
$
for further analysis in R (included information
summarized in aforementioned above and other information such as fitted
values, residuals and name of procedure used).mbreaks
The previous section introduces the framework that
mbreaks
package built on and summarize the classes of
procedures available to users. In this section, we will illustrate the
syntax of high-level functions and cover the arguments that users might
want to customize to match the structural break model with data and
desirable analysis.
The mbreaks
package designs high-level functions to have
identical arguments with default values recommended by the literature to
save users the burden. Users can use mdl()
, a comprehensive
function that invoked all high-level functions explained in previous
section 1:
#load US ex-post exchange rate data
USrate = data(real)
#carry out all testing and estimating procedures via a single call
rate = mdl('rate',data=real)
#display the results from the procedures
rate
#> -----------------------------------------------------
#> The number of breaks is estimated by KT
#> Pure change model with 2 estimated breaks.
#> Minimum SSR = 455.950
#>
#> Estimated date:
#> Break1 Break2
#> Date 47 79
#> 95% CI (37,50) (77,82)
#> 90% CI (40,49) (78,81)
#>
#> Estimated regime-specific coefficients:
#> Regime 1 Regime 2 Regime 3
#> (SE) 1.355 (0.155) -1.796 (0.511) 5.643 (0.603)
#>
#> No full sample regressors
#> -----------------------------------------------------
#> a) SupF tests against a fixed number of breaks
#>
#> 1 break 2 breaks 3 breaks 4 breaks 5 breaks
#> Sup F 57.906 43.014 33.323 24.771 18.326
#> 10% CV 7.040 6.280 5.210 4.410 3.470
#> 5% CV 8.580 7.220 5.960 4.990 3.910
#> 2.5% CV 10.180 8.140 6.720 5.510 4.340
#> 1% CV 12.290 9.360 7.600 6.190 4.910
#>
#> b) UDmax tests against an unknown number of breaks
#>
#> UDMax 10% CV 5% CV 2.5% CV 1% CV
#> 1 57.906 7.460 8.880 10.390 12.370
#> -----------------------------------------------------
#> supF(l+1|l) tests using global optimizers under the null
#>
#> supF(1|0) supF(2|1) supF(3|2) supF(4|3) supF(5|4)
#> Seq supF 57.906 33.927 14.725 0.033 0.000
#> 10% CV 7.040 8.510 9.410 10.040 10.580
#> 5% CV 8.580 10.130 11.140 11.830 12.250
#> 2.5% CV 10.180 11.860 12.660 13.400 13.890
#> 1% CV 12.290 13.890 14.800 15.280 15.760
#> -----------------------------------------------------
#> To access additional information about specific procedures
#> (not included above), type stored variable name + '$' + procedure name
Users should find the syntax minimal (similar to lm()
in
stats
package), with slight difference where users are
required to specifically mention the name of y variable in the data frame used.
If no z and x regressors (if any) name
specified, the program assumes this is a mean-shift model (as constant
is included by default). The names of regressors must match the name
used in data frame, otherwise errors will be displayed and execution
halted. All of the arguments are set at default value but users are free
to specify them to fit the analysis they are after. The section below
will explain the arguments/options for the main functions to help users
customize mbreaks
as desirable:
eps1 = s
where
0.05 ≤ s ≤ 0.5. 3. If the user input
value for eps1
is invalid, it will be set to default value
eps1=0.15
.m
specify maximum number of breaks considered
in the model. This argument is automatically matched with
eps1
argument. If the program finds that m
is
invalid (non-positive or larger than allowed by sample size given
trimming level eps1
), it will be set automatically to
maximal breaks allowed by sample size and specified trimming level
eps1
. The default value is m=5
.robust
: Allow for heteroskedastic and autocorrelated in
et. The
default value is robust=1
. If set to 0
, the
errors are assumed to be martingale difference sequence.hetvar
: Allow the variance of the errors et to be
different across segments/regimes. This option imposed stricter
assumptions on errors than robust
, and is not allowed to be
0
when robust=1
.prewhit
: Set to 1
if users want to
prewhiten the residuals prior to estimating the long-run covariance
matrix with an autoregressive (AR) process of order 1. The default value
is prewhit=1
.hetdat
: Set to 1
to allow the second
moment matrices of zt and xt (if any) to
be different across segments. Set to 0
otherwise. It is
recommended to set hetdat=1
for p > 0. The default value is
hetdat=1
hetq
: Set to 1
to allow moment matrices of
zt and
xt (if
any) to be different across segments. This is used in construction of
the confidence interval for the break dates. If hetq=0
, the
moment matrices of the regressors are assumed to be identical across
segments. The default value is hetq=1
.hetomega
: Set to 1
to allow the long-run
covariance matrix of ztut
to be different across segments. Set to 0
otherwise. This
is used in construction of the confidence interval for the break dates.
The default value is hetomega=1
.maxi
: Maximum number of iterations if no convergence
attained when running the iterative procedure to estimate partial change
model. The default is maxi=20
eps
: Criterion for convergence of the iterative
procedure. The default value is eps=0.0001
fixb
: Set to 1 if users intend to provide initial
values for β, where the
initial values are supplied as matrix betaini
of size (p × 1). If betaini
is
invalid, the program will automatically set fixb=0
and use
OLS estimate for the initial values. The default value is
fixb=0
There are two options available to all high-level functions: -
const
: Set to 1
to include constant in z regressors. The default value is
const=1
. Users can turn off the constant for z regressors by setting
const=0
. - printd
: Set to 1
to
print intermediate outputs from estimation procedures of the program to
console. The default value is printd=0
Besides, there is an additional specific option for
doorder()
to specify which information criteria to use,
bic_opt
. If bic_opt=1
, the BIC will be used as
criterion to select the number of breaks, while bic_opt=0
,
the LWZ will be used in place.
The vignette replicates 2 empirical exercises: i) US real interest rate (cite) and ii) New Keynesian Phillips curve (cite)
The empirical model examined in the papers (cite) is a linear
regression: yt = μi + et, t ∈ Ti − 1 + 1, ..., Ti
which is equivalent to a pure mean-shift model. We allow for
heteroskedastic and serially correlated errors et by using
default settings (robust=1
) and using AR(1) to prewhiten
the residuals before testing and estimation by default setting
(prewhit=1
). Here, instead of invoking mdl()
,
we demonstrate specific syntax to obtain BIC estimated model
#estimating the mean-shift model with BIC (the default option bic_opt=1, which use BIC as criterion)
rate_BIC = doorder('rate',data=real)
#NOTE: equivalent to rate$BIC; type rate$BIC to compare with new result
# visualization of estimated model with BIC (in the argument, we can replace rate$BIC with rate_BIC for exact same graph; recall that `$` is the operator to refer to field BIC in return list from mdl())
plot_model(rate$BIC, title = 'US Exchange rate', CI=0.90)
The plot_model()
function takes any estimated structural
break model class model
estimated by mbreaks
package and graph the following properties: - The observed y, fitted ŷm values from a
model with m breaks (estimated
or pre-specified depending on function called) and fitted ŷ from a no break model (null model)
for easy comparison - The estimated break dates with labels in
chronological order and confidence interval depends on CI
argument (which is either 0.9 or 0.95) for respective dates. - The
confidence interval (also depends on argument CI
) for
fitted ŷm
which visualizes the corrected standard SEs for coefficient
estimates.
To show flexibility of class model
in the package, we
can reproduce similar graph using information returned from stored
variable rate_BIC
.
#collect model information
m = rate_BIC$nbreak #number of breaks
y = rate_BIC$y #vector of dependent var
zreg = rate_BIC$z #matrix of regressors with changing coefs
date = rate_BIC$date #estimated date
fity = rate_BIC$fitted.values #fitted values of model
bigT = length(y)
#compute the null model
fixb = solve((t(zreg) %*% zreg)) %*% t(zreg) %*% y
fity_fix = zreg%*%fixb #fitted values of null model
#plots the model
tx = seq(1,bigT,1)
range_y = max(y)-min(y);
plot(tx,y,type='l',col="black", xlab='time',ylab="y",
ylim=c(min(y)-range_y/10,max(y)+range_y/10),lty=1)
#plot fitted values series for break model
lines(tx, fity,type='l', col="blue",lty=2)
#plot fitted values series for null model
lines(tx, fity_fix,type='l', col="dark red",lty=2)
#plot estimated dates + CIs
for (i in 1:m){
abline(v=date[i,1],lty=2)
if (rate_BIC$CI[i,1] < 0){rate_BIC$CI[i,1] = 0}
if(rate_BIC$CI[i,2]>bigT){ rate_BIC$CI[i,2]=bigT}
segments(rate_BIC$CI[i,1],min(y)*(12+i/m)/10,rate_BIC$CI[i,2],min(y)*(12+i/m)/10,lty=1,col='red')
}
legend(0,max(y)+range_y/10,legend=c("observed y",paste(m,'break y'),"0 break y"),
lty=c(1,2,2), col=c("black","blue","red"), ncol=1)
### New Keynesian Phillips Curve Perron and Yamamoto (Cite) investigates
the stability of New Keynesian Phillips curve proposed by Gali and
Gertler (cite) via linear model: πt = μ + γπt − 1 + κxt + βEtπt + 1 + ut
where πt
is inflation at time t, Et is
expectation operator conditional on information available up to t, and
xt is
determinant of inflation. In this example, we will reproduce specific
test results of the paper with ready-to-use dataset
nkpc
:
#x_t is GDP gap
z_name = c('inflag','ygap','inffut')
#we can invoke each test separately by using dotest() and doseqtests()
supF_ygap = dotest('inf',z_name,data=nkpc,prewhit = 0)
#z regressors' names are passed in the argument as an array, which equivalent to above argument call with z_name
seqF_ygap = doseqtests('inf',c('inflag','ygap','inffut'),data=nkpc,prewhit = 0)
#see test results
supF_ygap
#>
#> a) SupF tests against a fixed number of breaks
#>
#> 1 break 2 breaks 3 breaks 4 breaks 5 breaks
#> Sup F 22.220 8.513 12.456 18.481 16.292
#> 10% CV 14.260 12.600 11.210 9.970 8.370
#> 5% CV 16.190 13.770 12.170 10.790 9.090
#> 2.5% CV 18.130 14.990 13.060 11.550 9.660
#> 1% CV 20.230 16.550 14.260 12.420 10.530
#>
#> b) UDmax tests against an unknown number of breaks
#>
#> UDMax 10% CV 5% CV 2.5% CV 1% CV
#> 1 22.220 14.580 16.370 18.240 20.390
seqF_ygap
#>
#> supF(l+1|l) tests using global optimizers under the null
#>
#> supF(1|0) supF(2|1) supF(3|2) supF(4|3) supF(5|4)
#> Seq supF 22.220 12.559 22.134 9.979 0.000
#> 10% CV 14.260 16.110 17.310 18.000 18.450
#> 5% CV 16.190 18.110 18.930 19.640 20.190
#> 2.5% CV 18.130 19.700 20.660 21.460 21.970
#> 1% CV 20.230 21.970 22.800 23.060 23.760
#x_t is labor income share
#or invoke all tests using mdl()
nkpc_lbs = mdl('inf',c('inflag','lbs','inffut'),data=nkpc,prewhit = 0)
#>
#> There are no breaks selected by BIC and estimation is skipped
#>
#> There are no breaks selected by LWZ and estimation is skipped
#>
#> There are no breaks selected by KT and estimation is skipped
nkpc_lbs$sbtests
#>
#> a) SupF tests against a fixed number of breaks
#>
#> 1 break 2 breaks 3 breaks 4 breaks 5 breaks
#> Sup F 30.592 8.960 7.456 12.061 10.705
#> 10% CV 14.260 12.600 11.210 9.970 8.370
#> 5% CV 16.190 13.770 12.170 10.790 9.090
#> 2.5% CV 18.130 14.990 13.060 11.550 9.660
#> 1% CV 20.230 16.550 14.260 12.420 10.530
#>
#> b) UDmax tests against an unknown number of breaks
#>
#> UDMax 10% CV 5% CV 2.5% CV 1% CV
#> 1 30.592 14.580 16.370 18.240 20.390
nkpc_lbs$seqtests
#>
#> supF(l+1|l) tests using global optimizers under the null
#>
#> supF(1|0) supF(2|1) supF(3|2) supF(4|3) supF(5|4)
#> Seq supF 30.592 11.408 5.447 26.837 2.657
#> 10% CV 14.260 16.110 17.310 18.000 18.450
#> 5% CV 16.190 18.110 18.930 19.640 20.190
#> 2.5% CV 18.130 19.700 20.660 21.460 21.970
#> 1% CV 20.230 21.970 22.800 23.060 23.760
Users can call independent functions to carry out
specific procedures as outlined above instead of conducting all 6 main
procedures provided by package via mdl()
↩︎
The high level functions are mdl()
,
dofix()
, dosequa()
, dorepart()
,
doorder()
, dotest()
,
doseqtests()
↩︎
This argument is different from eps
where
eps
sets the convergence criterion for iterative scheme
when estimating partial change model.↩︎
These options are unaffected the pure change model↩︎