With the help of flodel I found a way to replace numeric codes with value labels from a lookup table.
Ambitious as I am, I now want to put that into a function. Also, I have a lot of lookup tables I need to swoop onto my data so a function would be handy.
But first some sample data, starting with a data fram,
df <- data.frame(id = c(1:6),
profession = c(1, 5, 4, NA, 0, 5))
df
# id profession
# 1 1
# 2 5
# 3 4
# 4 NA
# 5 0
# 6 5
and a lookup table with human readable information about the profession codes,
profession.lookuptable <- c(Optometrists=1, Accountants=2, Veterinarians=3,
`Financial analysts`=4, Nurses=5)
flodel showed me how replace numeric codes with value labels from a lookup table. Like this,
match.idx <- match(df$profession, profession.lookuptable)
df$profession <- ifelse(is.na(match.idx),
df$profession, names(profession.lookuptable)[match.idx])
df
# id profession
# 1 Optometrists
# 2 Nurses
# 3 Financial analysts
# 4 <NA>
# 5 0
# 6 Nurses
I now want to put this into a function where I can state the data frame df
and the name of the variable profession
and have the function take care of the rest.
I define my function like this,
ADDlookup <- function(orginalDF, orginalVAR) {
DF.VAR <- paste(orginalDF, "$", orginalVAR, sep="")
lookup.table <- paste(orginalVAR, ".lookuptable")
match.idx <- match(DF.VAR, lookup.table)
DF.VAR <- ifelse(is.na(match.idx), DF.VAR, names(lookup.table)[match.idx])
}
but apparently that is not working
ADDlookup(df, profession)
I get the errorer messes
Error in paste(orginalDF, "$", orginalVAR, sep = "") :
object 'profession' not found
Now, this is where I get stuck.
Can anyone please tell what manual page I need to read or maybe a friendly hint on how to solve this?
Thank you for reading.
It's because you're passing profession
into the ADDlookup
function, but it doesn't exist yet.
The way you've written your function, you have to distinguish between using the character vector containing the name of the variable, and the variable itself.
For example, your first few lines paste(originalDF,'$',originalVAR,sep='')
etc appear to expect originalDF
and originalVAR
to be strings, and you'll have DF.VAR
being the string 'df$profession'
. However, when you do match
it looks like you want DF.VAR
to be the variable df$profession
.
This is how I suggest you get around it:
- pass in originalDF
as an object, being df
- pass in originalVAR
as a string, being 'profession'
(it's a column name and hence a string)
Then, retrieve the column contained in originalVar
from the data frame via:
DF.VAR <- originalDF[,originalVAR] # e.g. df[,'profession']
Now your next line where you look for the object profession.lookuptable
is a little trickier: you construct the string 'profession.lookuptable'
, and then you want to look up the object that has that name.
For this, you can use get
(?get
). get('df')
will return the df
data frame:
lookup.table <- get(paste(orginalVAR, "lookuptable",sep='.'))
This will retrieve the object called 'profession.lookuptable'
. It follows the same rules as if you'd typed profession.lookuptable
directly, so you have to make sure that the function can "see" that object (in your case you should be fine).
Next, it looks like you want to return the originalDF
data frame where the originalVAR
column has been substituted with the lookup values.
I'll just modify the originalDF[,originalVAR]
column by replacing it with the lookup values:
originalDF[,originalVAR] <-
ifelse(is.na(match.idx), DF.VAR, names(lookup.table)[match.idx])
NOTE that we are not actually modifying the df
data frame that you passed in as an argument to ADDlookup
; R makes a copy of the data frame within the function. So, your original df
is preserved.
Finally, we have to return the data frame:
return(originalDF)
All together now:
ADDlookup <- function(orginalDF, orginalVAR) {
# retrieve the originalVAR column of originalDF
DF.VAR <- originalDF[,originalVAR]
# find the variable called {originalVAR}.lookuptable
lookup.table <- get(paste(originalVAR, "lookuptable",sep='.'))
# look up the values
match.idx <- match(DF.VAR, lookup.table)
# replace the originalVAR column with the looked-up values
originalDF[,originalVAR] <-
ifelse(is.na(match.idx), DF.VAR, names(lookup.table)[match.idx])
# return the modified data frame
return(originalDF)
}
And now to test it:
> ADDlookup(df,'profession')
id profession
1 1 Optometrists
2 2 Nurses
3 3 Financial analysts
4 4 <NA>
5 5 0
6 6 Nurses
Note that the original df
is unmodified; in general R functions do not modify the parameters that are passed in to them.
As another improvement -- it is generally a bit dangerous to rely on the professions.lookup
table having been created before you call the ADDlookup
function.
Instead of the whole lookup.table <- get( 'profession.lookup' )
shebang (which, depending on if you have multiple 'profession.lookup' tables in various scopes), I'd strongly recommend you just pass in the lookup table as a parameter:
ADDlookup <- function( originalDF, originalVAR, lookup.table )
Then you can avoid that entire get(xxxx)
line (and all associated scoping problems that go with it).
Then you'd call the function via:
ADDlookup( df, 'profession', profession.lookup )
get
either; I've editted my post to mention that - mathematical.coffee 2012-04-04 04:48
Error in ADDlookup(df, "profession") : object 'originalDF' not found
. I'll take another look at it in the morning, it's late where I am. Once again, thank you for taking the time I'm definitely going to reread your post (several times). Thanks - Eric Fail 2012-04-04 05:05
There are certainly more complicated ways to get this to work the way you originally envisioned, but it will be much, much simpler to simply reorganize how this function will work slightly:
ADDLookup <- function(originalDF,var,varLookup){
match.idx <- match(originalDF[,var], varLookup)
originalDF[,var] <- ifelse(is.na(match.idx),
originalDF[,var], names(varLookup)[match.idx])
originalDF
}
ADDLookup(df,"profession",profession.lookuptable)
id profession
1 1 Optometrists
2 2 Nurses
3 3 Financial analysts
4 4 <NA>
5 5 0
6 6 Nurses
Note that now I'm passing the data frame, df
, the name of the variable in question, var
as a literal character and the lookup table itself as arguments.
Also, you've now learned why $
is used more in interactive use than in programming! Because it doesn't mesh well with arguments passed in to functions. For that type of task you need the [
syntax.
df
's inside the function be replaced with originalDF
- Eric Fail 2012-04-04 05:33
I would define lookup table as factor.
df[,"profession"] <- profession.lookuptable[df[,"profession"]]
get
- joran 2012-04-04 04:46