I want to minimize a simple linear function Y = x1 + x2 + x3 + x4 + x5
using ordinary least squares with the constraint that the sum of all coefficients have to equal 5. How can I accomplish this in R? All of the packages I've seen seem to allow for constraints on individual coefficients, but I can't figure out how to set a single constraint affecting coefficients. I'm not tied to OLS; if this requires an iterative approach, that's fine as well.
Y ~ x1+x2+x3+x4+x5
, how do I indicate to the minimizing function that I want to keep the parameter for x5
set to 5-sum(x[1:4])
? I can't just solve for Y ~ x1+x2+x3+x4
, because that (appears to me to be) a completely different optimization problem - eykanal 2012-04-03 19:55
n=3
and sum(p)=C
. The original linear problem (without constraints) is ill-posed, because we can make a1*x1+a2*x2+a3*x3
as small as we want by setting the coefficients to large negative numbers if x is positive and vice versa. Putting the constraint on (a1+a2+a3=C) transforms this to a lower-dimensional, but still ill-posed problem, i.e. minimizing a1*(x1-x3)+a2*(x2-x3)+C*x3)
. Care to clarify the problem ... ? (Perhaps you mean you want to fit a linear least-squares problem?? - Ben Bolker 2012-04-03 20:05
optim
or nlminb
or something, though, that's fine as well. So yes, that's what I'm looking for, and clarification will be rewarded with cupcakes (actual cupcakes not included) - eykanal 2012-04-03 20:10
The basic math is as follows: we start with
mu = a0 + a1*x1 + a2*x2 + a3*x3 + a4*x4
and we want to find a0
-a4
to minimize the SSQ between mu
and our response variable y
.
if we replace the last parameter (say a4
) with (say) C-a1-a2-a3
to honour the constraint, we end up with a new set of linear equations
mu = a0 + a1*x1 + a2*x2 + a3*x3 + (C-a1-a2-a3)*x4
= a0 + a1*(x1-x4) + a2*(x2-x4) + a3*(x3-x4) + C*x4
(note that a4
has disappeared ...)
Something like this (untested!) implements it in R.
Original data frame:
d <- data.frame(y=runif(20),
x1=runif(20),
x2=runif(20),
x3=runif(20),
x4=runif(20))
Create a transformed version where all but the last column have the last column "swept out", e.g. x1 -> x1-x4; x2 -> x2-x4; ...
dtrans <- data.frame(y=d$y,
sweep(d[,2:4],
1,
d[,5],
"-"),
x4=d$x4)
Rename to tx1
, tx2
, ... to minimize confusion:
names(dtrans)[2:4] <- paste("t",names(dtrans[2:4]),sep="")
Sum-of-coefficients constraint:
constr <- 5
Now fit the model with an offset:
lm(y~tx1+tx2+tx3,offset=constr*x4,data=dtrans)
It wouldn't be too hard to make this more general.
This requires a little more thought and manipulation than simply specifying a constraint to a canned optimization program. On the other hand, (1) it could easily be wrapped in a convenience function; (2) it's much more efficient than calling a general-purpose optimizer, since the problem is still linear (and in fact one dimension smaller than the one you started with). It could even be done with big data (e.g. biglm
). (Actually, it occurs to me that if this is a linear model, you don't even need the offset, although using the offset means you don't have to compute a0=intercept-C*x4
after you finish.)
x4
be equal to 5-x1-x2-x3
; I'm looking to constrain the coefficients, not the variables themselves. How would I set up the constrain a4=5-a1-a2-a3
- eykanal 2012-04-04 11:31
y=a0+a1*(x1-x4)+a2*(x2-x4)+a3*(x3-x4)+C*x4
be y=a0 + (a1-a4)*x1 + ... + (a3-a4)*x3
- eykanal 2012-04-04 12:41
a4
doesn't appear in the equation any more, it has been transformed out. Have you worked through the algebra yourself - Ben Bolker 2012-04-04 13:18
Since you said you are open to other approaches, this can also be solved in terms of a quadratic programming (QP):
Minimize a quadratic objective: the sum of the squared errors,
subject to a linear constraint: your weights must sum to 5.
Assuming X is your n-by-5 matrix and Y is a vector of length(n), this would solve for your optimal weights:
library(limSolve)
lsei(A = X,
B = Y,
E = matrix(1, nrow = 1, ncol = 5),
F = 5)
5-sum(p[1:4])
... You could conceivably do the calculus yourself and get a closed-form expression .. - Ben Bolker 2012-04-03 19:48