I am trying to calculate a rolling mean using plyr. The data is at the industry-country-year, with repeated observations for each industry-country. The data is unbalanced, but most industry-countries have approximately 15 observations.
For example the data looks like this:
country ISIC Year Value
Algeria 1 1990 400
Algeria 1 1991 450
Algeria 1 1992 460
Algeria 2 1990 450
Algeria 2 1991 500
Algeria 2 1992 450
Argentina 1 1990 400
Argentina 1 1991 450
Argentina 1 1992 460
Argentina 2 1990 450
Argentina 2 1991 500
Argentina 2 1992 450
. . . .
. . . .
If I subset the data to a specific industry and country I am able to calculate the rolling mean like this
rollmean(subdata$Value, 3)
However, I've been unable to get it to work with plyr, so as to calculate the rolling mean for each industry-country group. I've tried:
roll <- ddply(data, .(country, ISIC), summarize, rollmean(data$Value, 3))
a rolling mean necessarily shortens the data which part of why you get the error.
ddply(dat, .(country, ISIC), function(df) data.frame(country=unique(df$country),
ISIC=unique(df$ISIC),
rolled=rollmean(df$Value, 3)))
country ISIC rolled
1 Algeria 1 436.6667
2 Algeria 2 466.6667
3 Argentina 1 436.6667
4 Argentina 2 466.6667
However, if you're doing a rolling mean on 3 samples and your data only has 3 samples, you're just calculating the mean:
ddply(dat, .(country, ISIC), summarise, mean(Value))
country ISIC ..1
1 Algeria 1 436.6667
2 Algeria 2 466.6667
3 Argentina 1 436.6667
4 Argentina 2 466.6667
UPDATED FOR COMMENTS:
To return the dates you can use the na.pad
argument to rollmean
:
ddply(dat, .(country, ISIC), function(df) {df$rolled <- rollmean(df$Value, 3, na.pad=TRUE); return(df)})
rollmean(1:5, 6)
gives the same error - Justin 2012-04-04 20:04