Rolling Mean with Plyr

Go To StackoverFlow.com

2

I am trying to calculate a rolling mean using plyr. The data is at the industry-country-year, with repeated observations for each industry-country. The data is unbalanced, but most industry-countries have approximately 15 observations.

For example the data looks like this:

country       ISIC      Year      Value
Algeria        1        1990       400
Algeria        1        1991       450
Algeria        1        1992       460
Algeria        2        1990       450
Algeria        2        1991       500
Algeria        2        1992       450
Argentina      1        1990       400
Argentina      1        1991       450
Argentina      1        1992       460
Argentina      2        1990       450
Argentina      2        1991       500
Argentina      2        1992       450
.              .        .          .
.              .        .          .

If I subset the data to a specific industry and country I am able to calculate the rolling mean like this

rollmean(subdata$Value, 3)

However, I've been unable to get it to work with plyr, so as to calculate the rolling mean for each industry-country group. I've tried:

roll <- ddply(data, .(country, ISIC), summarize, rollmean(data$Value, 3))
2012-04-04 19:45
by user1288578


4

a rolling mean necessarily shortens the data which part of why you get the error.

ddply(dat, .(country, ISIC), function(df) data.frame(country=unique(df$country),                  
                                                     ISIC=unique(df$ISIC),
                                                     rolled=rollmean(df$Value, 3)))
    country ISIC   rolled
1   Algeria    1 436.6667
2   Algeria    2 466.6667
3 Argentina    1 436.6667
4 Argentina    2 466.6667

However, if you're doing a rolling mean on 3 samples and your data only has 3 samples, you're just calculating the mean:

ddply(dat, .(country, ISIC), summarise, mean(Value))

    country ISIC      ..1
1   Algeria    1 436.6667
2   Algeria    2 466.6667
3 Argentina    1 436.6667
4 Argentina    2 466.6667

UPDATED FOR COMMENTS:

To return the dates you can use the na.pad argument to rollmean:

ddply(dat, .(country, ISIC), function(df) {df$rolled <- rollmean(df$Value, 3, na.pad=TRUE); return(df)})
2012-04-04 19:53
by Justin
I'm still getting the following error: Error: k <= n is not TRUE. Could this be because some of my industry-country groups have less than 3 observations - user1288578 2012-04-04 20:02
Thats because you have some samples with less than 3 results: rollmean(1:5, 6) gives the same error - Justin 2012-04-04 20:04
I dropped all of the groups that had less than 3 observations and now the code is working. But how do I merge the new rolling mean with the original dataset? When I try data$rollingmean <- ddply.... It gives me an error that the length of the two data sets are not the same. I think maybe the confusion came from my example, where I only included 3 observations - most groups actually have 15 observations, so I need a way for the ddply output to still have the year information so that I can merge the rolling mean. Any advice - user1288578 2012-04-04 20:14
See my edited answer - Justin 2012-04-04 20:22
Got it - Thanks - user1288578 2012-04-04 20:32
Ads