# What’s nearly-isotonic regression?

Let’s say we’ve knowledge $(x_1, y_1), dots, (x_n, y_n) in mathbb{R}^2$ $(x_1, y_1), dots, (x_n, y_n) in mathbb{R}^2$

such that $x_1 < x_2 < dots < x_n$

. (We assume no ties among the many $x_i$

‘s for simplicity.) Isotonic regression provides us a monotonic match $beta_1 leq dots leq beta_n$

for the $y_i$

‘s by fixing the issue begin{aligned} text{minimize}_{beta_1, dots, beta_n} quad& sum_{i=1}^n (y_i - beta_i)^2 text{subject to} quad& beta_1 leq dots leq beta_n. end{aligned} begin{aligned} text{minimize}_{beta_1, dots, beta_n} quad& sum_{i=1}^n (y_i - beta_i)^2 text{subject to} quad& beta_1 leq dots leq beta_n. end{aligned}

(See this earlier put up for extra particulars.) Almost-isotonic regression, launched by Tibshirani et al. (2009) (Reference 1), generalizes isotonic regression by fixing the issue begin{aligned} text{minimize}_{beta_1, dots, beta_n} quad frac{1}{2}sum_{i=1}^n (y_i - beta_i)^2 + lambda sum_{i=1}^{n-1} (beta_i - beta_{i+1})_+, end{aligned} begin{aligned} text{minimize}_{beta_1, dots, beta_n} quad frac{1}{2}sum_{i=1}^n (y_i - beta_i)^2 + lambda sum_{i=1}^{n-1} (beta_i - beta_{i+1})_+, end{aligned}

the place $x_+ = max (0, x)$ $x_+ = max (0, x)$

and $lambda geq 0$

is a user-specified hyperparameter.

It seems that, attributable to properties of the optimization downside, the nearly-isotonic regression match could be computed for all $lambda$ $lambda$

values in $O(n log n)$

time, making it a sensible methodology to be used. See Part three and Algorithm 1 of Reference 1 for particulars. (Extra precisely, we will decide the nearly-isotonic regression match for a crucial set of $lambda$

values: the match for another different $lambda$

worth might be a linear interpolation of suits from this crucial set.)

How is nearly-isotonic regression a generalization of isotonic regression? The time period $(beta_i - beta_{i+1})_+$ $(beta_i - beta_{i+1})_+$

is constructive if and provided that $beta_i > beta_{i+1}” title=”beta_i > beta_{i+1}” class=”latex”>, that’s, if there’s a monotonicity violation. The bigger the violation, the bigger the penalty. As a substitute of insisting on no violations in any respect, nearly-isotonic regression trades off the scale of the violation with the development one will get from goodness of match to the information. Almost-isotonic regression provides us a sequence of suits that vary from interpolation of the information (when $lambda = 0$

) to the isotonic regression match (when $lambda = infty$

). (Truly, you’re going to get the isotonic regression match as soon as $lambda$

is large enough such that any change within the penalty can’t be mitigated by the goodness of match enchancment.)

Why may you need to use nearly-isotonic regression? One doable cause is to verify if the belief monotonicity is cheap on your knowledge. To take action, run nearly-isotonic regression with cross-validation on $lambda$ $lambda$

and compute the CV error for every $lambda$

worth. If the CV error achieved by the isotonic regression match (i.e. largest $lambda$

worth) is near or statistically indistinguishable from the minimal, that provides assurance that monotonicity is an inexpensive assumption on your knowledge.

You possibly can carry out nearly-isotonic regression in R with the `neariso` bundle. The `neariso()` perform returns suits for a complete path of $lambda$ $lambda$

values. The animation under reveals how the match adjustments as $lambda$

will get bigger and bigger (code out there right here).  Observe 1: The formulation for nearly-isotonic regression above assumes that the factors $x_1, dots, x_n$ $x_1, dots, x_n$

are equally spaced. If they aren’t, one ought to change the penalty with begin{aligned} lambda sum_{i=1}^{n-1} frac{(beta_i - beta_{i+1})_+}{x_{i+1} - x_i} end{aligned} begin{aligned} lambda sum_{i=1}^{n-1} frac{(beta_i - beta_{i+1})_+}{x_{i+1} - x_i} end{aligned}

to account for the different-sized gaps. The `neariso` bundle solely appears to deal with the case the place the $x_i$ $x_i$

‘s are equally spaced.

Observe 2: The animation above was created by producing separate .png information for every worth of $lambda$ $lambda$

, then stitching them collectively utilizing the `magick` bundle. My preliminary hope was to create an animation that will transition easily between the totally different suits utilizing the `gganimate` bundle however the transitions weren’t as easy as I’d have imagined them to be:  Does anybody understand how this problem might be fastened? Code for the animation is under, full code out there right here.

```p <- ggplot(df, aes(x = x, y = beta)) + geom_path(col = "blue") + geom_point(knowledge = truth_df, aes(x = x, y = y), form = 4) + labs(title = "Almost isotonic regression suits", subtitle = paste("Lambda = ", "{lambda[as.integer(closest_state)]}")) + transition_states(iter, transition_length = 1, state_length = 2) + theme_bw() + theme(plot.title = element_text(dimension = rel(1.5), face = "daring"))
animate(p, fps = 5)
```

References:

1. Tibshirani, R. J., Hoefling, H., and Tibshirani, R. (2011). Almost-isotonic regression.