I understand the answer in R to repetitive things is usually "apply()
" rather than loop. Is there a better R-design pattern for a nasty bit of code I create frequently?
So, pulling tabular data from HTML, I usually need to change the data type, and end up running something like this, to convert the first column to date format (from decimal), and columns 2-4 from character strings with comma thousand separators like "2,400,000" to numeric "2400000."
X[,1] <- decYY2YY(as.numeric(X[,1]))
X[,2] <- as.numeric(gsub(",", "", X[,2]))
X[,3] <- as.numeric(gsub(",", "", X[,3]))
X[,4] <- as.numeric(gsub(",", "", X[,4]))
I don't like that I have X[,number] repeated on both the left and ride sides here, or that I have basically the same statement repeated for 2-4.
Is there a very R-style way of making X[,2] less repetitive but still loop-free? Something that sort of says "apply this to columns 2,3,4---a function that reassigns the current column to a modified version in place?"
I don't want to create a whole, repeatable cleaning function, really, just a quick anonymous function that does this with less repetition.
Assuming X is a data frame, I would do:
X[2:4] <- lapply(X[2:4], function (x) as.numeric(gsub(",", "", x)))
Something like
comma2numeric <- function(x) { as.numeric(gsub(",","",x)) }
X[,2:4] <- apply(X[,2:4],2,comma2numeric)
is a start. transform
is a good modify-in-place idiom, but it operates with names rather than with column numbers.
edited: missing close-parenthesis in line 1
comma2numeric <- function(x) { as.numeric(gsub(",","",x) }
I get the error message Error: unexpected '}' in "comma2numeric <- function(x) { as.numeric(gsub(",","",x) }"
which is fixed when I change the function definition to 3 lines, with the '}' alone on the last line - Mittenchops 2012-04-03 20:01
{}
(although unnecessary) because I think it adds a bit of precision (coding styles & tastes differ). The missing close-parenthesis was the real problem - Ben Bolker 2012-04-03 20:10