---
title: "Alaska"
subtitle: "NDVI analysis {remotePARTS}"
author: "Clay Morrow"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Alaska}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r chunk_setup, include = FALSE, purl=FALSE}
library(knitr)
knitr::opts_chunk$set(collapse = TRUE, comment = "##", dpi = 50)
```


# Introduction 

This vignette will demonstrate the main functionality of `remotePARTS` by working
through a real remote sensing data set. The example follows the more-detailed
development of the methods in Ives et al. (2021, Remote Sensing of Environment, 
https://doi.org/10.1016/j.rse.2021.112678).

First, install/update `remotePARTS` from github if needed:  

```
devtools::install_github("morrowcj/remotePARTS")
```

Then, ensure that the package is loaded into your library:

```{r load_remotePARTS}
library(remotePARTS)
```

This vignette will use `dplyr` and `ggplot2` for visualizing the data:

```{r load_tidyverse}
library(dplyr)
library(ggplot2)
```

```{r set_ggtheme, echo = FALSE}
## set default ggplot theme
theme_set(theme(panel.grid = element_blank(), 
                strip.background = element_blank(),
                panel.background = element_blank(), 
                text = element_text(size = 10),
                legend.text = element_text(size = 10), 
                axis.text = element_text(size = 10)
                ))
```

# Alaska datasets

`remotePARTS` ships with one spatial data object. This dataset contains NDVI 
values derived from NASA's MODIS satellite for the US State of Alaska. 
The first object, `ndvi_AK10000`, due to package size limitations, this dataset is a 
random sampling of 10,000 pixels from the full Alaska dataset. It is important 
to note, though that `remotePARTS` can handle the full map (and much larger maps).

For this vignette, we well also create a smaller 3000 pixel subsample `ndvi_AK3000`
for demonstrative purposes:

```{r load_data}
data("ndvi_AK10000")
ndvi_AK3000 <- ndvi_AK10000[seq_len(3000),] # first 3000 pixels from the random 10K
```

`ndvi_AK10000` is a `data.frame` with 37 columns. `lng` and `lat` are longitude and
latitude, respectively. `AR_coef` and `CLS_coef` are pre-calculated coefficient
estimates of the time trends in ndvi from pixel-level time series analyses
via AR REML and conditional least squares, respectively. These coefficient
estimates have been standardized by the mean ndvi value for the pixel over that
time period. `land` is a factor representing land-cover classes, 
The remaining 32 columns, of the form `ndviYYYY`, contain the NDVI values from 
1982 to 2013. These data sets already have rare land classes, that occur in less
than 2% of pixels, removed. Additionally, both sets have
no missing data. `remotePARTS` can **not**
handle any missing data. It is essential that **all** missing data are removed 
prior to conducting any analyses. 

```{r data_structure}
str(ndvi_AK10000)
```

For this demonstration, we are interested in asking the following questions 
using these data: "Is NDVI in Alaska increasing over time?"; "Are Alaska's 
NDVI time trends associated with land-cover classes?"; and "Do Alaska's 
NDVI time trends differ with latitude?"

The figure below shows a temporal cross-section of these data for 1982, 1998, 
and 2013. 

```{r plot_ndvi_time, fig.width = 6.5, fig.asp = .4}
reshape2::melt(ndvi_AK10000, measure = c("ndvi1982", "ndvi1998", "ndvi2013")) %>% 
  ggplot(aes(x = lng, y = lat, col = value )) + 
  geom_point(size = .1) +
  labs(col = "ndvi") +
  facet_wrap(~ gsub("ndvi", "", variable), ncol = 3) +
  scale_color_viridis_c(option = "magma") +
  labs(x = "Longitude", y = "Latitude")
```

The following figure shows how Alaska's three primary land-cover classes are distributed.

```{r plot_land, fig.width = 4.5, fig.asp = .8}
ndvi_AK10000 %>%  
ggplot(aes(x = lng, y = lat, col = land)) + 
  geom_point(size = .1) + 
  scale_color_viridis_d(direction = -1, end = .9) +
  labs(y = "Latitude", x = "Longitude", col = "Land cover", fill = "Land cover")
```

Use `help("ndvi_AK)` to see documentation for these 
datasets.

# Model

When using `remotePARTS`, the data are assumed to follow the general stochastic 
process of the form

$$y(t) = X(t)\beta + \varepsilon(t)$$ where

* $y(t)$ is a response variable of interest at time $t$

* $\beta$ is a vector of coefficients for the predictor variables $X(t)$ on $y(t)$

* $\varepsilon(t)$ is an temporal autoregressive process: 
$\varepsilon(t) = \rho \varepsilon(t - 1) + \delta(t)$

* at any time step, $\delta(t)$ is spatially autocorrelated according to covariance 
matrix $\Sigma$: $\delta(t) \sim N(0, \Sigma)$

* there is no temporal dependence in $\delta(t)$ (i.e., $\delta(t_i)$ is 
independent of $\delta(t_j)$ except when $t_i$ = $t_j$)

# Time-series analysis

The first step in a typical `remotePARTS` workflow is to obtain pixel-level
estimates of time-series coefficients. In our example, we are interested in 
estimating the time trends in NDVI for each pixel $i$, represented by $\beta_1$ 
in the regression model

$$y_i(t) = \beta_0 + \beta_1 t + \varepsilon_i(t)$$

where the random errors $\varepsilon_i(t)$ follow an AR(1) process: 

$$\varepsilon_i(t) = b\varepsilon_i(t - 1) + \delta_i(t)$$
$$\delta_i(t) \sim N(0 , \sigma)$$

We will use `fitAR_map()` to estimate $\beta_1$, which fits pixel-level AR(1) 
models to a map of pixels and estimates coefficients using restricted maximum 
likelihood (REML). To do so, we must extract only our NDVI columns as the matrix 
`Y`. We'll do this by matching all column names containing "ndvi" and slicing the
data.frame:

```{r extract_Y}
ndvi.cols <- grep("ndvi", names(ndvi_AK3000), value = TRUE)
Y <- as.matrix(ndvi_AK3000[, ndvi.cols])
```

We also need a 2-column coordinate matrix `coords`:

```{r extract_coords}
coords <- as.matrix(ndvi_AK3000[, c("lng", "lat")])
```

`Y` and `coords` are then passed to `fitAR_map()` with default settings:

```{r time_series_regression}
ARfit <- fitAR_map(Y = Y, coords = coords)
```

Coefficient estimates can be obtained from `ARfit` with `coefficients()`. The 
first column is the estimate of $\beta_0$, $\hat{\beta_0}$, and the second is
$\hat{\beta_1}$.

```{r coefficient_slice}
head(coefficients(ARfit))
```

These time-series analyses calculate the time trend in the raw NDVI data. 
In most situations it makes sense to ask if there are time trends in the 
relative NDVI values, that is, changes in NDVI relative to the mean value of 
NDVI in a pixel. Scaling the trend in NDVI relative to the mean gives assessments 
of the **proportional** change in NDVI. These trends in the proportional NDVI are 
calculated be dividing $\hat{\beta_1}$ by the mean. The values of the trend coefficients are 
contained in `ARfit$coefficients`, and since the coefficients for the trend are in 
the column of the coefficients matrix named `t`, the scaling is performed as

```{r standardize_coefficients}
ARfit$coefficients[, "t"] <- ARfit$coefficients[,"t"]/rowMeans(ndvi_AK3000[, ndvi.cols])
ndvi_AK3000$AR_coef <- coefficients(ARfit)[, "t"] # save time trend coefficient
```

These scaled values of the time trend are then stored in the `ndvi_AK3000` data 
frame.

Below is an image of the estimated coefficients (pre-calculated and scaled) for 
the full `ndvi_AK10000`. From this, it appears that northern latitudes may be 
greening faster than more southern latitudes.

```{r plot_ndvi_trend, fig.width = 4.5, fig.asp = .8}
ndvi_AK10000 %>% 
  ggplot(aes(x = lng, y = lat, col = AR_coef)) + 
  geom_point(size = .1) + 
  scale_color_gradient2(high = "red", low = "blue", 
                        mid = "grey90", midpoint = 0) + 
  guides(fill = "none") + 
  labs(y = "Latitude", x = "Longitude", col = expression(beta[1]))
```

`fitAR_map` and its conditional least-squares counterpart, `fitCLS_map`, are
wrappers for the functions `fitAR` and `fitCLS` which conduct individual time series 
analysis. If the user wants, these can be applied on a pixel-by-pixel basis to the 
data to allow greater flexibility. Both AR REML and CLS methods account for temporal 
autocorrelation in the time series. See function documentation for more details (i.e., `?fitAR_map()`).

# Spatial relationships

Now that we've reduced the temporal component of each time series to a single value 
(i.e., estimates of $\beta_1$) while accounting for temporal autocorrelation, we 
can focus on the spatial component of our problem. 

## Distance

The first step is to calculate the distances among pixels as a distance matrix 
`D`. Here, we'll calculate relative distances with `distm_scaled()` from our 
coordinate matrix.

```{r calc_distance}
D <- distm_scaled(coords)
```

`distm_scaled()` scales distances across the spatial domain so that the
greatest distance between two pixels is 1. Note that because 
`distm_scaled()` uses `geosphere::distm()`, it treats coordinates as degrees 
longitude and latitude on a sphere and calculates distances accordingly.

## Covariance

Next, we need to estimate the expected correlation between the random errors
of our spatial response variable (estimates of $\beta_1$) based on their distances.
To do so, we need a spatial covariance function. In this example, we will use an 
exponential covariance function to estimate correlations: 
$V = \exp\big(\frac{-D}{r}\big)$ where $r$ is a parameter that dictates
the range of spatial autocorrelation. The function `covar_exp()` corresponds to 
this covariance function. 

```{r visualize_range, fig.asp = 1, fig.width = 4.5, include = FALSE}
curve(covar_exp(x, r = .1), xlab = "distance", ylab = "covar_exp(d, r)")
curve(covar_exp(x, r = .2), add = TRUE, col = "red")
legend("topright", legend = c("0.1", "0.2"), title = "r", col = c("black", "red"), lty = 1)
```

`V` represents the correlation among points if all variation is accounted for.
However, it is safest to assume that there is some additional source of unexplained
and unmeasured variance (a nugget $\eta$). Therefore, we assume that the covariance
structure among pixels is given by $\Sigma = \eta I + (1-\eta)V$ where 
$I$ is the identity matrix.

```{r visualize_nugget, fig.asp = 1, fig.width = 4.5, include = FALSE}
nugget <- .3
curve(covar_exp(x, r = .1), xlab = "distance", ylab = "covar_exp(d, r)")
curve((1-nugget)*covar_exp(x, r = .1), add = TRUE, col = "red")
legend("topright", legend = c(0, .2), title = expression(eta), col = c("black", "red"), lty = 1)
```

If we know the range parameter $r$, we can calculate `V` from `D` with `covar_exp()`:

```{r calc_covariance}
r <- 0.1
V <- covar_exp(D, r)
```

```{r visualize_V, eval = FALSE, echo = FALSE}
image(V)
```

And we could add a known nugget to `V` to obtain `Sigma`:

```{r add_nugget}
nugget <- 0.2 
I <- diag(nrow(V)) # identity matrix
Sigma <- nugget*I + (1-nugget)*V
```

See `?covar_exp()` for a description of the covariance functions provided 
by `remotePARTS` and for more information regarding their use.

# GLS: 3000-pixel subset

To test spatial hypotheses with `remotePARTS`, we use a generalized 
least-squares regression model (GLS):  

$$\theta = X\alpha_{gls} + \gamma $$ where

* $\theta$ is a vector of response values

* $\alpha_{gls}$ is a vector of the effects of predictor variables $X$ on $\theta$

* the error term $\gamma$ is spatially autocorrelated according to $\Sigma_\gamma$: $\gamma \sim N(0, \Sigma_\gamma)$

$\theta$ will usually be a regression parameter. For example, if we're interested
in understanding trends in NDVI over time, we would use the pixel-level regression
coefficient for the effect of time on NDVI (i.e., $\theta = \hat\beta$)

## Known parameters

If the parameters that govern the spatial autocorrelation $\Sigma_\gamma$ are known,
a GLS can be fit with `fitGLS()`. Here, we will fit the GLS by providing (i) a model 
formula, (ii) a data source, (iii) our `V` matrix, which was pre-calculated with a 
spatial parameter $r = 0.1$ and (iv) a nugget of $\eta = 0.2$. 

The specific task in this examples is to estimate the effect of land-cover class on our time trend.
Because `land` is a factor, we'll also specify a no-intercept model. 

Note that the GLS fitting process requires an inversion of `V`. This means that 
even with only the 3000-pixel subset, it will take a few minutes to finish the
computations on most computers. 

```{r fit_land_GLS}
GLS.0 <- fitGLS(formula = AR_coef ~ 0 + land, data = ndvi_AK3000, V = V, nugget = nugget)
```

Note also that `fitGLS` adds the nugget to `V` internally. If we wanted to do this
ourselves, we could pass the covariance matrix `Sigma` which already contains a
nugget component and then set the `nugget` argument of `fitGLS` to 0:

```{r alt_fit_land_GLS, eval = FALSE}
fitGLS(formula = AR_coef ~ 0 + land, data = ndvi_AK3000, V = Sigma, nugget = 0) # equivalent
```

The estimates for our land class effects can be extracted with `coefficients()`.

```{r extract_GLS_coefs}
coefficients(GLS.0)
```

The full model fit is given by

```{r print_GLS}
GLS.0
```

Note that, although none of the three land-cover classes shows a statistically 
significant time trend in (proportional) NDVI, the F-test shows that there is a 
statistically significant difference among land-cover classes, because the model 
including land-cover classes gives a statistically significantly better fit to 
the data than the intercept-only model.

## Parameter estimation

In practice, we rarely know the values of the parameters that govern 
spatial autocorrelation (e.g., the range and nugget) in advance. Therefore,
these parameters will need to be estimated for most data.

### Spatial parameters

The spatial parameters of a covariance function (e.g., `covar_exp`) can be 
estimated from residuals of pixel-level time-series models (see Ives et al. RSE, 2021). Although we conducted
the time-series analyses as though each pixel was independent (with `fitAR_map()`),
they are, in fact, dependent. Specifically, the correlation of the residuals from
the pixel-level analyses is roughly proportional to the spatial autocorrelation 
of the residuals of the spatial model, $\gamma$, if all of the variation in
$\gamma$ is due to the spatiotemporal variation produced by $\varepsilon_i(t)$. 
Therefore, we can estimate the range parameter for **V** as 
$V_{ij} \approx \text{cor}\big(\varepsilon_i(t), \varepsilon_j(t)\big)$ 

The function `fitCor()` performs the estimation of spatial parameters. We will 
pass to this function (i) the time-series residuals for our map, extracted from the 
time-series analysis output object with `residuals()`, (ii) the coordinate 
matrix `coords`, (iii) the covariance function `covar_exp()`, and (iv) a list
specifying that we should start the search for the optimal range parameter 
at 0.1. For this example, we will also specify `fit.n = 3000`, which ensures 
that all pixels are used to estimate spatial parameters. 

```{r fit_range_parameter}
corfit <- fitCor(resids = residuals(ARfit), coords = coords, covar_FUN = "covar_exp", 
                 start = list(range = 0.1), fit.n = 3000)
(range.opt = corfit$spcor)
```

By default, `fitCor()` uses `distm_scaled()` to calculate distances
from the coordinate matrix, but any function that returns a distance matrix can be
specified with the `distm_FUN` argument. It is important to scale the parameter 
values appropriately, accounting for your distances. For example, if we instead use 
`distm_km()` to calculate distance in km instead of relative distances, we would
need to scale our starting range parameter by the maximum distance in km of our 
map:

```{r fit_range_km, eval = FALSE}
max.dist <- max(distm_km(coords))
corfit.km <- fitCor(resids = residuals(ARfit), coords = coords, covar_FUN = "covar_exp", 
                    start = list(range = max.dist*0.1), distm_FUN = "distm_km", fit.n = 3000)
```

Note that, depending on the covariance function used, not all parameters will 
need scaling. For example, `covar_exppow()` is an exponential-power covariance
function and takes a range and shape parameter, but only the range parameter 
should scale with distance. See `?fitCor()` for more details.

After we've obtained our range parameter estimate, we can use it to re-calculate 
the `V` matrix:

```{r optimized_covariance}
V.opt <- covar_exp(D, range.opt)
```

### Nugget

Similar to finding the optimal spatial parameters, the nugget can be 
estimated by selecting a nugget that maximizes the likelihood of the GLS given 
the data. `fitGLS()` will find this maximum-likelihood nugget when `nugget = NA` 
is specified. Note that this type of optimization requires fitting multiple
GLS models, which means it will be much slower than our call to `fitGLS()` 
with a known nugget. 

In addition to our original arguments, we'll also explicitly set `no.F = FALSE` 
so that F-tests are calculated. For the F-tests, the default reduced model is the intercept-only model, although 
it is also possible to specify alternative reduced models as a formula in the `formula0` option.

```{r optimized_nugget}
GLS.opt <- fitGLS(formula = AR_coef ~ 0 + land, data = ndvi_AK3000, V = V.opt, nugget = NA, no.F = FALSE)
(nug.opt = GLS.opt$nugget)
coefficients(GLS.opt)
```

Let's compare our GLS from earlier with this one with optimized parameters:

```{r compare_GLS}
rbind(GLS.0 = c(range = r, nugget = GLS.0$nugget, logLik = GLS.0$logLik, MSE = GLS.0$MSE),
      GLS.opt = c(range = range.opt, nugget = GLS.opt$nugget, logLik = GLS.opt$logLik, MSE = GLS.opt$MSE))
```

Note that in this example, `logLik` for `GLS.opt` is not functionally different
than `logLik` for `GLS.0`. This indicates that using the values of `range` = 0.1 
and `nugget` = 0.2 gives a similar likelihood than the optimal model when 
`range` is constrained to be the value calculated from `covar_exp()`, `GLS.opt`.

### Simultaneous parameter estimation

It is also possible to simultaneously estimate spatial parameters and the
nugget without using the time-series residuals. This is done by finding the
set of parameters describing spatial autocorrelation (e.g., `range` and 
`nugget`) that maximizes the likelihood of a GLS given the data. 
This task is computationally slower than optimizing `nugget` alone with
`fitGLS()` and therefore will take some time to run.

```{r optimized_GLS, eval = FALSE}
fitopt <- fitGLS_opt(formula = AR_coef ~ 0 + land, data = ndvi_AK3000, 
                     coords = ndvi_AK3000[, c("lng", "lat")], 
                     covar_FUN = "covar_exp", 
                     start = c(range = .1, nugget = .2),
                     method = "BFGS", # use BFGS algorightm (see ?stats::optim())
                     control = list(reltol = 1e-5)  # lower the convergence tolerance (see ?stats::optim()) 
                     )
fitopt$opt$par
#      range     nugget 
# 0.02497874 0.17914929 
fitopt$GLS$logLik
# [1] 12824.77
fitopt$GLS$MSE
# [1] 2.475972e-05
```

Note that, because `fitGLS_opt()` does not require time series residuals, 
it is possible to use `fitGLS_opt()` for statistical problems involving 
only spatial variables. In other words, rather than $\theta$ being a limited 
to a time trend, it can be a purely spatial variable as well.

When time-series residuals are available, we recommend that you estimate
spatial parameters with `fitCor()` and `fitGLS()`, rather than `fitGLS_opt()`. 
In simulation studies, using `fitCor()` with `fitGLS()` often has better statistical 
performance than using `fitGLS_opt()`. See `?fitGLS_opt()` for more 
information about this function and its use.

# Hypothesis testing

The purpose of the tools provided by `remotePARTS` is to test map-level 
hypotheses about spatiotemporal data sets. In this example, we will test 3 
hypotheses using 3 different GLS models. Note that these hypothesis are framed
as regression-style problems; indeed, `fitGLS()` is essentially regression 
with spatially autocorrelated random errors.

## Intercept-only model

If we want to test the hypothesis that "there was a trend in Alaska NDVI Alaska 
from 1982-2013", we can regress the AR coefficient on an intercept-only GLS 
model:

```{r GLS_intercept_only}
(GLS.int <- fitGLS(AR_coef ~ 1, data = ndvi_AK3000, V = V.opt, nugget = nug.opt, no.F = TRUE))
```

We can see from the t-test that the intercept is not statistically different
from zero. In other words, there is no map-level temporal trend in NDVI across
the entire data set. We have not performed an F-test, because the full model is
the intercept-only model and is therefore the same as the reduced model.

## Land-cover effects

If we want to test the hypothesis that "trends in Alaskan NDVI differ by land-
cover class", we can use `GLS.opt()` from earlier:

```{r GLS_land_optimized}
GLS.opt
```

The t-tests show that the trend in NDVI, for all land-cover classes, was not 
statistically different from zero, meaning that NDVI did not show a statistically
significant trend in any land-cover class. The F-test (ANOVA table), however, 
shows that time trends in NDVI differ among the land-cover classes. The better fit
of the model with land-cover classes can also be seen in the increase in the likelihood (`logLik`)
compared to the intercept-only model.

## Latitude effects

Finally, to test the hypothesis that "temporal trends in NDVI differ with 
latitude", we can regress the AR coefficient on latitude in our GLS model:

```{r fit_GLS_latitude}
(GLS.lat <- fitGLS(AR_coef ~ 1 + lat, data = ndvi_AK3000, V = V.opt, nugget = nug.opt, no.F = FALSE))
```

The t-tests show that temporal trends in NDVI did not differ with latitude. 
Note that the p-value from the F-test is equivalent to that of the t-test p-value
for effect of latitude. 

## Conclusions (ndvi_AK3000)

We can see from these hypothesis tests that, at least among the 3000-pixel 
sub-sample of Alaska, the answer to all three questions that we posed is no: there is
no statistical evidence for an overall greening in Alaska, nor differences among
land-cover classes or latitude.

# GLS: Full dataset

Until now, we have limited our analyses to the 3000-pixel subset of Alaska, 
`ndvi_AK3000`. Calls to `fitGLS()` involve inverting `V`, and the computational 
complexity scales with $N^3$ where $N$ is the number of pixels in
the map. We have used the data set `ndvi_AK3000` up to now because the computation time
for the analyses is reasonable. However, 3000 pixels means dealing with 
distance and covariance matrices that each contain 
$3,000 \times 3,000 = 9,000,000$ elements. This is approaching the
upper size limit for obtaining relatively fast results.

In contrast, the covariance matrix for the full `ndvi_AK10000` data set would have 
$10,000 \times 10,000 = 100,000,000$ elements which creates a computationally 
infeasible problem for a normal computer.

For these reasons, `fitGLS_paritition` may be the most useful function in the
`remotePARTS` package. This function can perform the GLS analysis on the full 
`ndvi_AK10000` data set. In fact, `ndvi_AK10000` is quite small in comparison to many remote
sensing data sets that could be analyzed with `fitGLS_partition()`.

## Partitioned GLS

`fitGLS_parition()` conducts GLS analyses on partitions of the data and then
combines the results from the partitions to give overall statistical results. 
Specifically, this process (1) breaks the data
into smaller and more manageable pieces (partitions), (2) conducts GLS on each
partition, (3) calculates cross-partition statistics from pairs of partitions, 
and (4) summarizes the results with statistical tests that account for correlations
among partitions. We will use the full `ndvi_AK10000` data set to demonstrate 
`fitGLS_parition()`.

We have already performed the time-series analyses on the full data set so 
you don't have to. These are in the `AR_coef` column of `ndvi_AK10000`. However, we 
used a complete data set, so you will need to remove rare land-cover classes. 

```{r remove_rare_land}
df = ndvi_AK10000
```

Step (1) is to divide pixels up into partitions, which is done with the function 
`sample_parition()`. Passing `sample_partitions`, the number
of pixels in our map, and the argument `partsize = 1500` 
will result in a partition matrix with 1,500 rows and 20 columns. 
Columns of the resulting partition matrix `pm` each contain a random sample 
of 1,500 pixels. Each of these 20 samples (partitions) are non-overlapping, 
containing no repeats. Setting `npart = NA` will automatically give the maximum
number of partitions possible (i.e., 20).

```{r partition_matrix}
pm <- sample_partitions(npix = nrow(df), partsize = 1500, npart = NA)
dim(pm)
```

Once we have our partition matrix, `fitGLS_parititon()` performs steps 
(2)-(4) of the analyses. The input is similar to `fitGLS`. For this example, we 
specify (i) a formula, (ii) the data as a data.frame (`df`), (iii) the partition 
matrix (partmat `pm`), (iv) the covariance function (`covar_FUN`), (v) a list 
of spatial parameters `covar.pars` including our optimized range parameter, 
and a `nugget`. If `nugget` is specified, this value is used for the calculations, 
while if `nugget` = `NA` (the default) it is estimated for each partition separately.

Note that, although the compute time
is much faster than if we needed to invert the full covariance matrix `V`, this example
still takes a long time to fit. Therefore, we have saved the output of 
this code as an R object `partGLS.ndviAK` so that you can look at its output 
without having to execute the function. 

The model was fit with this code:

```{r partitioned_GLS, eval = FALSE}
partGLS_ndviAK <- fitGLS_partition(formula = AR_coef ~ 0 + land, data = df, 
                                   partmat = pm, covar_FUN = "covar_exp", 
                                   covar.pars = list(range = range.opt),
                                   nugget = nug.opt, ncores = 4)
```

```{r save_partitioned_GLS, eval = FALSE, echo = FALSE, purl = FALSE}
save(partGLS_ndviAK, file = "data/partGLS_ndviAK.rda", compress = "xz")
```

and can be loaded with 

```{r load_partitioned_GLS}
data(partGLS_ndviAK)
```

Here are the t-tests, that show that land cover class does not significantly
affect NDVI trend:

```{r partitioned_t_test}
partGLS_ndviAK$overall$t.test
```

It is **highly** recommended that users read the full documentation 
(`?fitGLS_parition()`) before using `fitGLS_partition` to analyze any data.

<!-- ##### Chisqr-test problem  -->

Here is the p-value for the chisqr test of the partitioned GLS
<!-- it seems very wrong: -->

```{r chisqr_test}
partGLS_ndviAK$overall$pval.chisqr
```

This again, indicates that the model which includes land cover classes better
explains variation in NDVI trends than the intercept-only model. Note that the
p-value is much lower than the p-value from the F-test conducted by `fitGLS()`.
This is likely due to outliers in the data, which should be removed before 
conducting any real analysis. One simple way to filter potential trend outliers 
would be to remove any pixels whose time-series coefficient standard errors are 
unusually large (e.g., `SE > 4*mean(SE)`).

## Conclusions

Our parititoned GLS has returned the same conclusions as our standard GLS analysis.
No map-level trend in NDVI was found within any of the land cover classes. 

This was only one of our three hypotheses tested earlier. The remaining two 
can easily be tested with `fitGLS_partition()`, as they were with `fitGLS`, by 
changing the formula argument. 

<!-- Here's a manual test: -->

<!-- ```{r} -->
<!-- (stats = with(partGLS_ndviAK$overall, c(Fmean = Fstat, rSSR = rSSR, dfs[1], partdims["npart"]))) -->

<!-- ## use function -->
<!-- remotePARTS:::part_chisqr(stats["Fmean"], stats["rSSR"], stats["df1"], stats["npart"]) -->

<!-- ## test manually -->
<!-- (rZ = (stats["rSSR"]/stats["df1"])^0.5) -->
<!-- (v.MSR = diag(stats["df1"]) - rZ) -->
<!-- V.MSR = kronecker(diag(stats["npart"]), v.MSR) + rZ -->
<!-- (lambda = eigen(V.MSR)$values) -->
<!-- (pval = CompQuadForm::imhof(q = stats["npart"] * stats["df1"] * stats["Fmean"], lambda = lambda)$Qq) -->
<!-- (pval.print = ifelse(pval < 1e-6, 1e-6, pval)) -->
<!-- ``` -->
