I have three data frames, the first (with column headers, but no row numbering) looks like
ID 1 2 3
A 12 NA NA
B NA 7 NA
C NA NA 22
The second may look like
ID 1 2 3
A NA 6 NA
B NA NA 29
C 43 NA NA
Lastly, the third looks like
ID 1 2 3
A NA NA 32
B 5 NA NA
C NA 2 NA
The first column is an ID column and the same for all three data frames. The final three columns represent the same variables (1, 2, and 3). The record for observation A, variable 1 is only in one of the data sets. So is the record for observation A, variable 2, but it's in a different data set.
How can I merge these data sets together to get something like
ID 1 2 3
A 12 6 32
B 5 7 29
C 43 2 22
I apologize that I didn't have a better way of describing this problem. If someone could share the terminology for it, that would be great.
Nice title! This is quite similar to R - Vector/ Array Addition
You can turn your data into a multi-dimensional array then sum or take the mean across the "puzzle piece" dimension:
df1 <- read.table(text="ID 1 2 3
A 12 NA NA
B NA 7 NA
C NA NA 22", header = TRUE)
df2 <- read.table(text="ID 1 2 3
A NA 6 NA
B NA NA 29
C 43 NA NA", header = TRUE)
df3 <- read.table(text="ID 1 2 3
A NA NA 32
B 5 NA NA
C NA 2 NA", header = TRUE)
# gather inputs and remove common ID column
lists <- list(df1, df2, df3)
pieces <- lapply(lists, '[', , -1)
# turn data into a multi-dimensional array
a <- array(unlist(pieces), dim = c(nrow(df1),
ncol(df1) - 1,
length(pieces)))
# compute sums across pieces
rowSums(a, na.rm = TRUE, dims = 2)
# [,1] [,2] [,3]
# [1,] 12 6 32
# [2,] 5 7 29
# [3,] 43 2 22
Then you're only left with pasting the ID column back.
merge.new<-function(...,col.ID){
inter<-merge(...)
inter<-inter[order(inter[col.ID]),] #merged data sorted by ID
#total columns and rows for the target dataframe
total.row<-length(unique(inter[[col.ID]]))
total.col<-dim(inter)[2]
row.ID<-unique(inter[[col.ID]])
target<-matrix(NA,total.row,total.col)
target<-as.data.frame(target)
names(target)<-names(inter)
for (i in 1:total.row){
inter.part<-inter[inter[col.ID]==row.ID[i],] #select all rows with the same ID
for (j in 1:total.col){
if (is.na(inter.part[1,j])){
if(is.na(inter.part[2,j])) {target[i,j]=NA}
else {target[i,j]=inter.part[2,j]}
}
else {target[i,j]=inter.part[1,j]}
}
}
print(paste("total rows=",total.row))
print(paste("total columns=",total.col))
return(target)
}
if your data is named one, two and three:
> one
ID 1 2 3
2 A 12 NA NA
3 B NA 7 NA
4 C NA NA 22
> two
ID 1 2 3
2 A NA 6 NA
3 B NA NA 29
4 C 43 NA NA
> three
ID 1 2 3
2 A NA NA 32
3 B 5 NA NA
4 C NA 2 NA
> merge.new(merge.new(one, two, all=TRUE, col.ID=1), three, all=TRUE, col.ID=1)
[1] "total rows= 3"
[1] "total columns= 4"
[1] "total rows= 3"
[1] "total columns= 4"
ID 1 2 3
1 A 12 6 32
2 B 5 7 29
3 C 43 2 22
>
I'm not sure if you can do this directly with data frames, but it is very easy to convert them to matrices first if not:
x <- matrix(c(12,NA,NA,NA,7,NA,NA,NA,22),3,3)
y <- matrix(c(NA,NA,43,6,NA,NA,NA,29,NA),3,3)
z <- matrix(c(NA,5,NA,NA,NA,2,32,NA,NA),3,3)
b <- matrix(0,3,3)
b[!is.na(x)] <- x[!is.na(x)]
b[!is.na(y)] <- y[!is.na(y)]
b[!is.na(z)] <- z[!is.na(z)]
b
[,1] [,2] [,3]
[1,] 12 6 32
[2,] 5 7 29
[3,] 43 2 22