Recoding over multiple data frames in R -
(edited reflect help...i'm not doing great formatting, appreciate feedback)
i'm bit stuck on suspect easy enough problem. have multiple different data sets have loaded r, of have different numbers of observations, of have 2 variables named "a1," "a2," , "a3". want create new variable in each of 3 data frames contains value held in "a1" if a3 contains value greater zero, , value held in "a2" if a3 contains value less zero. seems simple enough, right?
my attempt @ code uses faux-data:
set.seed(1) a1=seq(1,100,length=100) a2=seq(-100,-1,length=100) a3=runif(100,-1,1) df1=cbind(a1,a2,a3) a3=runif(100,-1,1) df2=cbind(a1,a2,a3)
i'm thousand percent sure r has functionality creating same named variable in multiple data frames, have tried doing lapply:
mylist=list(df1,df2) lapply(mylist,function(x){ x$newvar=x$a1 x$newvar[x$a3>0]=x$a2[x$a3>0] return(x) })
but newvar not available me once leave lapply loop. example, if ask mean of new variable: mean(df1$newvar) [1] na warning message: in mean.default(df1$newvar) : argument not numeric or logical: returning na
any appreciated.
thank you.
well first of all, df1
, df2
not data.frames
matrices (the dollar syntax doesn't work on matrices).
in fact, if do:
set.seed(1) a1=seq(1,100,length=100) a2=seq(-100,-1,length=100) a3=runif(100,-1,1) df1=as.data.frame(cbind(a1,a2,a3)) a3=runif(100,-1,1) df2=as.data.frame(cbind(a1,a2,a3)) mylist=list(df1,df2) lapply(mylist,function(x){ x$newvar=x$a1 x$newvar[x$a3>0]=x$a2 })
the code almost works gives warnings. in fact, there's still error in last line of function called lapply
. if change this, works expected:
lapply(mylist,function(x){ x$newvar=x$a1 x$newvar[x$a3>0]=x$a2[x$a3>0] # need subset x$a2 otherwise it's long return(x) # better state explicitly what's return value })
edit (as per comment):
as happens in r, functions not mutate existing objects return brand new objects.
so, in case df1
, df2
still same lapply
returns list expected 2 new data.frames i.e. :
resultlist <- lapply(mylist,function(x){ x$newvar=x$a1 x$newvar[x$a3>0]=x$a2[x$a3>0] return(x) }) newdf1 <- resultlist[[1]] newdf2 <- resultlist[[2]]
Comments
Post a Comment