Filling in a data set using three "puzzle pieces"

Go To StackoverFlow.com

2

I have three data frames, the first (with column headers, but no row numbering) looks like

ID    1   2   3
 A   12  NA  NA
 B   NA   7  NA
 C   NA  NA  22

The second may look like

ID    1   2   3
 A   NA   6  NA
 B   NA  NA  29
 C   43  NA  NA

Lastly, the third looks like

ID    1   2   3
 A   NA  NA  32
 B    5  NA  NA
 C   NA   2  NA  

The first column is an ID column and the same for all three data frames. The final three columns represent the same variables (1, 2, and 3). The record for observation A, variable 1 is only in one of the data sets. So is the record for observation A, variable 2, but it's in a different data set.

How can I merge these data sets together to get something like

ID    1   2   3
 A   12   6  32
 B    5   7  29
 C   43   2  22

I apologize that I didn't have a better way of describing this problem. If someone could share the terminology for it, that would be great.

2012-04-03 23:07
by Charlie


1

Nice title! This is quite similar to R - Vector/ Array Addition

You can turn your data into a multi-dimensional array then sum or take the mean across the "puzzle piece" dimension:

df1 <- read.table(text="ID    1   2   3
A   12  NA  NA
B   NA   7  NA
C   NA  NA  22", header = TRUE)

df2 <- read.table(text="ID    1   2   3
A   NA   6  NA
B   NA  NA  29
C   43  NA  NA", header = TRUE)

df3 <- read.table(text="ID    1   2   3
A   NA  NA  32
B    5  NA  NA
C   NA   2  NA", header = TRUE)

# gather inputs and remove common ID column
lists  <- list(df1, df2, df3)
pieces <- lapply(lists, '[', , -1)

# turn data into a multi-dimensional array
a <- array(unlist(pieces), dim = c(nrow(df1),
                                   ncol(df1) - 1,
                                   length(pieces)))

# compute sums across pieces
rowSums(a, na.rm = TRUE, dims = 2)
# [,1] [,2] [,3]
# [1,]   12    6   32
# [2,]    5    7   29
# [3,]   43    2   22

Then you're only left with pasting the ID column back.

2012-04-04 00:51
by flodel


2

I didn't come up with it but:

merge.new<-function(...,col.ID){
    inter<-merge(...)
    inter<-inter[order(inter[col.ID]),] #merged data sorted by ID

    #total columns and rows for the target dataframe
    total.row<-length(unique(inter[[col.ID]]))
    total.col<-dim(inter)[2]
    row.ID<-unique(inter[[col.ID]])
    target<-matrix(NA,total.row,total.col)
    target<-as.data.frame(target)
    names(target)<-names(inter)

    for (i in 1:total.row){
        inter.part<-inter[inter[col.ID]==row.ID[i],] #select all rows with the same ID
        for (j in 1:total.col){
            if (is.na(inter.part[1,j])){
                if(is.na(inter.part[2,j])) {target[i,j]=NA}
                else {target[i,j]=inter.part[2,j]}
            }
            else {target[i,j]=inter.part[1,j]}

        }
    }
print(paste("total rows=",total.row))
print(paste("total columns=",total.col))
return(target)
}

if your data is named one, two and three:

> one
  ID  1  2  3
2  A 12 NA NA
3  B NA  7 NA
4  C NA NA 22
> two
  ID  1  2  3
2  A NA  6 NA
3  B NA NA 29
4  C 43 NA NA
> three
  ID  1  2  3
2  A NA NA 32
3  B  5 NA NA
4  C NA  2 NA
> merge.new(merge.new(one, two, all=TRUE, col.ID=1), three, all=TRUE, col.ID=1)
[1] "total rows= 3"
[1] "total columns= 4"
[1] "total rows= 3"
[1] "total columns= 4"
  ID  1 2  3
1  A 12 6 32
2  B  5 7 29
3  C 43 2 22
> 
2012-04-03 23:28
by Justin


2

I'm not sure if you can do this directly with data frames, but it is very easy to convert them to matrices first if not:

x <- matrix(c(12,NA,NA,NA,7,NA,NA,NA,22),3,3)
y <- matrix(c(NA,NA,43,6,NA,NA,NA,29,NA),3,3)
z <- matrix(c(NA,5,NA,NA,NA,2,32,NA,NA),3,3)
b <- matrix(0,3,3)
b[!is.na(x)] <- x[!is.na(x)]
b[!is.na(y)] <- y[!is.na(y)]
b[!is.na(z)] <- z[!is.na(z)]
b
     [,1] [,2] [,3]
[1,]   12    6   32
[2,]    5    7   29
[3,]   43    2   22
2012-04-04 00:57
by Thomson Comer
Ads