paste variable names inside R function

Go To StackoverFlow.com

2

With the help of flodel I found a way to replace numeric codes with value labels from a lookup table.

Ambitious as I am, I now want to put that into a function. Also, I have a lot of lookup tables I need to swoop onto my data so a function would be handy.

But first some sample data, starting with a data fram,

df <- data.frame(id = c(1:6),
                 profession = c(1, 5, 4, NA, 0, 5))

df
#  id profession
#  1          1
#  2          5
#  3          4
#  4         NA
#  5          0
#  6          5

and a lookup table with human readable information about the profession codes,

profession.lookuptable <- c(Optometrists=1, Accountants=2, Veterinarians=3, 
                            `Financial analysts`=4,  Nurses=5)

flodel showed me how replace numeric codes with value labels from a lookup table. Like this,

match.idx <- match(df$profession, profession.lookuptable)
df$profession <- ifelse(is.na(match.idx), 
                 df$profession, names(profession.lookuptable)[match.idx])

df
#  id         profession
#  1        Optometrists
#  2              Nurses
#  3  Financial analysts
#  4                <NA>
#  5                   0
#  6              Nurses

I now want to put this into a function where I can state the data frame df and the name of the variable profession and have the function take care of the rest.

I define my function like this,

ADDlookup <- function(orginalDF, orginalVAR) {
   DF.VAR <- paste(orginalDF, "$", orginalVAR, sep="")
   lookup.table <- paste(orginalVAR, ".lookuptable")
   match.idx <- match(DF.VAR, lookup.table)
   DF.VAR <- ifelse(is.na(match.idx), DF.VAR, names(lookup.table)[match.idx])
}

but apparently that is not working

ADDlookup(df, profession)

I get the errorer messes

Error in paste(orginalDF, "$", orginalVAR, sep = "") : 
  object 'profession' not found

Now, this is where I get stuck.

Can anyone please tell what manual page I need to read or maybe a friendly hint on how to solve this?

Thank you for reading.

2012-04-04 04:28
by Eric Fail


4

It's because you're passing profession into the ADDlookup function, but it doesn't exist yet.

The way you've written your function, you have to distinguish between using the character vector containing the name of the variable, and the variable itself.

For example, your first few lines paste(originalDF,'$',originalVAR,sep='') etc appear to expect originalDF and originalVAR to be strings, and you'll have DF.VAR being the string 'df$profession'. However, when you do match it looks like you want DF.VAR to be the variable df$profession.

This is how I suggest you get around it: - pass in originalDF as an object, being df - pass in originalVAR as a string, being 'profession' (it's a column name and hence a string)

Then, retrieve the column contained in originalVar from the data frame via:

DF.VAR <- originalDF[,originalVAR] # e.g. df[,'profession']

Now your next line where you look for the object profession.lookuptable is a little trickier: you construct the string 'profession.lookuptable', and then you want to look up the object that has that name.

For this, you can use get (?get). get('df') will return the df data frame:

lookup.table <- get(paste(orginalVAR, "lookuptable",sep='.'))

This will retrieve the object called 'profession.lookuptable'. It follows the same rules as if you'd typed profession.lookuptable directly, so you have to make sure that the function can "see" that object (in your case you should be fine).

Next, it looks like you want to return the originalDF data frame where the originalVAR column has been substituted with the lookup values.

I'll just modify the originalDF[,originalVAR] column by replacing it with the lookup values:

originalDF[,originalVAR] <- 
   ifelse(is.na(match.idx), DF.VAR, names(lookup.table)[match.idx])

NOTE that we are not actually modifying the df data frame that you passed in as an argument to ADDlookup; R makes a copy of the data frame within the function. So, your original df is preserved.

Finally, we have to return the data frame:

return(originalDF)

All together now:

ADDlookup <- function(orginalDF, orginalVAR) {
   # retrieve the originalVAR column of originalDF
   DF.VAR <- originalDF[,originalVAR] 
   # find the variable called {originalVAR}.lookuptable
   lookup.table <- get(paste(originalVAR, "lookuptable",sep='.'))
   # look up the values
   match.idx <- match(DF.VAR, lookup.table)
   # replace the originalVAR column with the looked-up values
   originalDF[,originalVAR] <- 
       ifelse(is.na(match.idx), DF.VAR, names(lookup.table)[match.idx])
   # return the modified data frame
   return(originalDF)
}

And now to test it:

> ADDlookup(df,'profession')
  id         profession
1  1       Optometrists
2  2             Nurses
3  3 Financial analysts
4  4               <NA>
5  5                  0
6  6             Nurses

Note that the original df is unmodified; in general R functions do not modify the parameters that are passed in to them.


As another improvement -- it is generally a bit dangerous to rely on the professions.lookup table having been created before you call the ADDlookup function.

Instead of the whole lookup.table <- get( 'profession.lookup' ) shebang (which, depending on if you have multiple 'profession.lookup' tables in various scopes), I'd strongly recommend you just pass in the lookup table as a parameter:

ADDlookup <- function( originalDF, originalVAR, lookup.table )

Then you can avoid that entire get(xxxx) line (and all associated scoping problems that go with it).

Then you'd call the function via:

ADDlookup( df, 'profession', profession.lookup )
2012-04-04 04:45
by mathematical.coffee
+1 Ok, so it wasn't that much more complicated, but I was trying to avoid using get - joran 2012-04-04 04:46
Yeah, I don't like the use of get either; I've editted my post to mention that - mathematical.coffee 2012-04-04 04:48
@mathematical.coffee Whooo, This is super informative. Thank you for taking the time to explain it all so thoroughly. It's probably me, but when I run your code I do get an error message when running the final function Error in ADDlookup(df, "profession") : object 'originalDF' not found. I'll take another look at it in the morning, it's late where I am. Once again, thank you for taking the time I'm definitely going to reread your post (several times). Thanks - Eric Fail 2012-04-04 05:05


3

There are certainly more complicated ways to get this to work the way you originally envisioned, but it will be much, much simpler to simply reorganize how this function will work slightly:

ADDLookup <- function(originalDF,var,varLookup){
    match.idx <- match(originalDF[,var], varLookup)
    originalDF[,var] <- ifelse(is.na(match.idx), 
                        originalDF[,var], names(varLookup)[match.idx])
    originalDF
}                            

ADDLookup(df,"profession",profession.lookuptable)
  id         profession
1  1       Optometrists
2  2             Nurses
3  3 Financial analysts
4  4               <NA>
5  5                  0
6  6             Nurses

Note that now I'm passing the data frame, df, the name of the variable in question, var as a literal character and the lookup table itself as arguments.

Also, you've now learned why $ is used more in interactive use than in programming! Because it doesn't mesh well with arguments passed in to functions. For that type of task you need the [ syntax.

2012-04-04 04:45
by joran
Thank you for taking the time and fixing my mess. I'm not sure I fully understand it all, but it get's the job done. Thanks - Eric Fail 2012-04-04 05:09
shoulden't all the df's inside the function be replaced with originalDF - Eric Fail 2012-04-04 05:33
@EricD.Brean Yep! Sorry about that.. - joran 2012-04-04 13:48


0

I would define lookup table as factor.

df[,"profession"] <- profession.lookuptable[df[,"profession"]]
2012-04-04 10:42
by Wojciech Sobala
Ads