I'm importing a csv file that consists of a crosstab with column names in a two-row hierarchy. When I get the table in R, the result looks like this:
alpha X.1 X.2 beta X.1 X.2 X.3 gamma X.1
var1 var2 var3 var1 var2 var3 var4 var1 var4
1 21 50 5 22 48 6 8 25 8
2 27 50 5 24 48 6 8 33 8
3 26 50 5 28 48 6 8 33 8
4 25 50 5 28 48 6 8 20 8
Here, alpha, beta
, and gamma
are all one level of the hierarchy, while var1
, var2
, var3
, and var4
are the second level.
What I would like it do is get output like the following, where the row names are concatenated but also keeping in mind the structure of the data.
alpha_var1 alpha_var2 alpha_var3 beta_var1 beta_var2 beta_var3 beta_var4 gamma_var1 gamma_var4
1 21 50 5 22 48 6 8 25 8
2 27 50 5 24 48 6 8 33 8
3 26 50 5 28 48 6 8 33 8
4 25 50 5 28 48 6 8 20 8
Any ideas here? Haven't been able to find anything to deal with this issue. Thanks in advance.
read.table()
taking advantage of the skip
and nrows
parameters - Chase 2012-04-03 23:28
This seems to work, though the use of xts
seems a bit heavy handed for the na.locf()
function, but I know it works and use it frequently, so that's what I used.
library(xts)
#Read in data without headers
x <- read.delim("Book1.txt", skip = 2, header = FALSE)
#Read in header files transposing them into columns
headers <- data.frame(t(read.delim("Book1.txt", nrows = 2, header = FALSE)), stringsAsFactors = FALSE)
#Create a now column with the value of alpha, beta, gama or NA
headers$vals <- with(headers, ifelse(grepl("[abg]", X1), X1, NA))
#Fill down the values above
headers$vals <- na.locf(headers$vals)
#Paste column names together
colnames(x) <- with(headers, paste(vals, X2, sep = "_"))
#Resulting object
x
alpha_var1 alpha_var2 alpha_var3 beta_var1 beta_var2 beta_var3 beta_var4 gamma_ var1 gamma_var4
1 21 50 5 22 48 6 8 25 8
2 27 50 5 24 48 6 8 33 8
3 26 50 5 28 48 6 8 33 8
4 25 50 5 28 48 6 8 20 8
!(grepl("X\\.\\d", x))
. It's finding the pattern "X." followed by a digit...then taking the negative of that - Chase 2012-04-04 01:32
X.1
, etc are auto-generated when you import empty header cells from a csv. If you say header=FALSE
, they appear as <NA>
. Another issue is that if any of the columns are <NA>
in both the first and second level, this fails - user1202761 2012-04-04 01:42
dput()
to represent your R objects in the futre). Transpose the two rows of data, use the na.locf()
function I showed you, paste them together, and stick it on top of the other data that you read in with the skip=...
parameter. All of the pieces you need to solve this are here, just need to connect the dots now - Chase 2012-04-04 01:45
dput()
in the future. As I should have done earlier, the help for the na.locf()
function solved my problem of NA
observations (set na.rm=FALSE
of course). Dots connected, problems solved, thank you again - user1202761 2012-04-05 01:37