Centering data frame by factor

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Centering data frame by factor

ronny
Hi,

I would like to center P1 and P2 of the following data frame by the factor "Experiment", i.e. substruct from each value the average of its experiment, and keep the original data structure, i.e. the experiment and the group of each value.

RAW= data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("A","A","B","A","A","B"),"P1"=c(10,12,14,5,3,4),"P2"=c(8,12,16,2,3,4))

Desired result:

NORMALIZED= data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-1,0),"P2"=c(-4,0,4,-1,0,1))

I tried using "by", but then I lose the original order, and the "Group" varaible. Can you help?

> RAW
  Experiment Group P1 P2
         2     A 10  8
         2     A 12 12
         2     B 14 16
         1     A  5  2
         1     A  3  3
         1     B  4  4

NOT.OK<- within (RAW, {P1<-do.call(rbind,by(RAW$P1,RAW$Experiment,scale,scale=F))})

> NOT.OK
  Experiment Group P1 P2
          2     A  1  8
          2     A -1 12
          2     B  0 16
          1     A -2  2
          1     A  0  3
          1     B  2  4
Reply | Threaded
Open this post in threaded view
|

Re: Centering data frame by factor

Daniel Malter

P1-tapply(P1,Experiment,mean)[Experiment]

HTH,
Daniel

ronny wrote
Hi,

I would like to center P1 and P2 of the following data frame by the factor "Experiment", i.e. substruct from each value the average of its experiment, and keep the original data structure, i.e. the experiment and the group of each value.

RAW= data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("A","A","B","A","A","B"),"P1"=c(10,12,14,5,3,4),"P2"=c(8,12,16,2,3,4))

Desired result:

NORMALIZED= data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-1,0),"P2"=c(-4,0,4,-1,0,1))

I tried using "by", but then I lose the original order, and the "Group" varaible. Can you help?

> RAW
  Experiment Group P1 P2
         2     A 10  8
         2     A 12 12
         2     B 14 16
         1     A  5  2
         1     A  3  3
         1     B  4  4

NOT.OK<- within (RAW, {P1<-do.call(rbind,by(RAW$P1,RAW$Experiment,scale,scale=F))})

> NOT.OK
  Experiment Group P1 P2
          2     A  1  8
          2     A -1 12
          2     B  0 16
          1     A -2  2
          1     A  0  3
          1     B  2  4
Reply | Threaded
Open this post in threaded view
|

Re: Centering data frame by factor

ronny
Perfect! Made my day!
Reply | Threaded
Open this post in threaded view
|

Re: Centering data frame by factor

David Winsemius
In reply to this post by Daniel Malter

On Jul 19, 2011, at 4:50 AM, Daniel Malter wrote:

>
> P1-tapply(P1,Experiment,mean)[Experiment]

Another way would be with ave(), but I discovered that it does not  
accept subsidiary arguments and does not issue warnings either, so  
this works:

 >  with(dfrm, ave(P1, Experiment, FUN=function(x) scale(x,  
scale=FALSE) ) )
[1] -2  0  2  1 -1  0


But this doesn't behave "as directed" ... by my pre-operational R-brain.

with(dfrm, ave(P1, Experiment, FUN=scale,  scale=FALSE) )
[1] -1  0  1  1 -1  0

(It applies both default arguments and issues no warning about unused  
argument. Most (well, some anyway) functions like this accept  
subsidiary arguments with ..., but `ave` uses that construction to  
gather its factor arguments rather than expecting them to be in a list  
or vector, as do tapply, aggregate, and by. Some functions like mapply  
and many other give you a moreArgs option, but not ave.)

--
David.

>
> HTH,
> Daniel
>
>
> ronny wrote:
>>
>> Hi,
>>
>> I would like to center P1 and P2 of the following data frame by the  
>> factor
>> "Experiment", i.e. substruct from each value the average of its
>> experiment, and keep the original data structure, i.e. the  
>> experiment and
>> the group of each value.
>>
>> RAW=
>> data
>> .frame
>> ("Experiment
>> "=
>> c
>> (2,2,2,1,1,1
>> ),"Group
>> "=
>> c
>> ("A
>> ","A","B","A","A","B"),"P1"=c(10,12,14,5,3,4),"P2"=c(8,12,16,2,3,4))
>>
>> Desired result:
>>
>> NORMALIZED=
>> data
>> .frame
>> ("Experiment
>> "=
>> c
>> (2,2,2,1,1,1
>> ),"Group
>> "=
>> c
>> ("B
>> ","A","B","B","A","B"),"P1"=c(-2,0,2,1,-1,0),"P2"=c(-4,0,4,-1,0,1))
>>
>> I tried using "by", but then I lose the original order, and the  
>> "Group"
>> varaible. Can you help?
>>
>>> RAW
>>  Experiment Group P1 P2
>>         2     A 10  8
>>         2     A 12 12
>>         2     B 14 16
>>         1     A  5  2
>>         1     A  3  3
>>         1     B  4  4
>>
>> NOT.OK<- within (RAW,
>> {P1<-do.call(rbind,by(RAW$P1,RAW$Experiment,scale,scale=F))})
>>
>>> NOT.OK
>>  Experiment Group P1 P2
>>          2     A  1  8
>>          2     A -1 12
>>          2     B  0 16
>>          1     A -2  2
>>          1     A  0  3
>>          1     B  2  4
>>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Centering-data-frame-by-factor-tp3677609p3677620.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Centering data frame by factor

William Dunlap
In reply to this post by Daniel Malter

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Daniel Malter
> Sent: Tuesday, July 19, 2011 1:51 AM
> To: [hidden email]
> Subject: Re: [R] Centering data frame by factor
>
>
> P1-tapply(P1,Experiment,mean)[Experiment]

Note that the above solution works in this example
because Experiment takes the values 1 and 2.  If
Experiment were coded as, say, 101 and 102 the above
would not work.  This is a case where converting
Experiment to a factor would avoid problems.  E.g.,
  > RAW <- data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-1,0),"P2"=c(-4,0,4,-1,0,1))
  > RAW$E <- RAW$Experiment + 100 # relabeled Experiment
  > with(RAW, P1-tapply(P1,Experiment,mean)[Experiment]) # good
   2  2  2  1  1  1
  -2  0  2  1 -1  0
  > with(RAW, P1-tapply(P1,E,mean)[E]) # bad
  <NA> <NA> <NA> <NA> <NA> <NA>
    NA   NA   NA   NA   NA   NA
  > RAW$E <- factor(RAW$E) # convert to factor
  > with(RAW, P1-tapply(P1,E,mean)[E]) # good
  102 102 102 101 101 101
   -2   0   2   1  -1   0

Another way to approach the problem is to think of
your normalized data as the residuals from a linear model:
  > residuals(lm(data=RAW, cbind(P1,P2) ~ E))
               P1            P2
  1 -2.000000e+00 -4.000000e+00
  2  4.385598e-17  8.771196e-17
  3  2.000000e+00  4.000000e+00
  4  1.000000e+00 -1.000000e+00
  5 -1.000000e+00  8.771196e-17
  6  4.385598e-17  1.000000e+00
  > zapsmall(.Last.value) # make reading easier
    P1 P2
  1 -2 -4
  2  0  0
  3  2  4
  4  1 -1
  5 -1  0
  6  0  1
That approach can make generizations to more factors
or to smoothing approaches easier.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

>
> HTH,
> Daniel
>
>
> ronny wrote:
> >
> > Hi,
> >
> > I would like to center P1 and P2 of the following data frame by the factor
> > "Experiment", i.e. substruct from each value the average of its
> > experiment, and keep the original data structure, i.e. the experiment and
> > the group of each value.
> >
> > RAW=
> >
> data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("A","A","B","A","A","B"),"P1"=c(10,12,14,5,3,4),"P2"=
> c(8,12,16,2,3,4))
> >
> > Desired result:
> >
> > NORMALIZED=
> > data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-
> 1,0),"P2"=c(-4,0,4,-1,0,1))
> >
> > I tried using "by", but then I lose the original order, and the "Group"
> > varaible. Can you help?
> >
> >> RAW
> >   Experiment Group P1 P2
> >          2     A 10  8
> >          2     A 12 12
> >          2     B 14 16
> >          1     A  5  2
> >          1     A  3  3
> >          1     B  4  4
> >
> > NOT.OK<- within (RAW,
> > {P1<-do.call(rbind,by(RAW$P1,RAW$Experiment,scale,scale=F))})
> >
> >> NOT.OK
> >   Experiment Group P1 P2
> >           2     A  1  8
> >           2     A -1 12
> >           2     B  0 16
> >           1     A -2  2
> >           1     A  0  3
> >           1     B  2  4
> >
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Centering-data-frame-by-factor-
> tp3677609p3677620.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Centering data frame by factor

David Winsemius

On Jul 19, 2011, at 11:58 AM, William Dunlap wrote:

>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]
>> ] On Behalf Of Daniel Malter
>> Sent: Tuesday, July 19, 2011 1:51 AM
>> To: [hidden email]
>> Subject: Re: [R] Centering data frame by factor
>>
>>
>> P1-tapply(P1,Experiment,mean)[Experiment]
>
> Note that the above solution works in this example
> because Experiment takes the values 1 and 2.  If
> Experiment were coded as, say, 101 and 102 the above
> would not work.  This is a case where converting
> Experiment to a factor would avoid problems.

I checked to see if my ave solution was subject to the same caveats  
and it is not. The help page is less categorical about what the  
grouping variables' structure should be, saying only that they are  
"typically factors".

>  E.g.,
>> RAW <-  
>> data
>> .frame
>> ("Experiment
>> "=
>> c
>> (2,2,2,1,1,1
>> ),"Group
>> "=
>> c
>> ("B
>> ","A","B","B","A","B"),"P1"=c(-2,0,2,1,-1,0),"P2"=c(-4,0,4,-1,0,1))
>> RAW$E <- RAW$Experiment + 100 # relabeled Experiment
>> with(RAW, P1-tapply(P1,Experiment,mean)[Experiment]) # good
>   2  2  2  1  1  1
>  -2  0  2  1 -1  0
>> with(RAW, P1-tapply(P1,E,mean)[E]) # bad
>  <NA> <NA> <NA> <NA> <NA> <NA>
>    NA   NA   NA   NA   NA   NA

with(RAW, ave(P1, E, FUN=function(x) scale(x,  scale=FALSE) ) )
# [1] -2  0  2  1 -1  0   good


>> RAW$E <- factor(RAW$E) # convert to factor
>> with(RAW, P1-tapply(P1,E,mean)[E]) # good
>  102 102 102 101 101 101
>   -2   0   2   1  -1   0

And take note that Bill made his variable a factor outside the tapply  
environment. If he had just used it in the tapply function (as I often  
do ...possibly unwisely in light of this gotcha)  it would fail:

 > with(RAW, P1-tapply(P1, factor(E), mean)[E])
<NA> <NA> <NA> <NA> <NA> <NA>
   NA   NA   NA   NA   NA   NA

... that is unless you also use factor(E) as the index:

 > with(RAW, P1-tapply(P1, factor(E), mean)[factor(E)])
102 102 102 101 101 101
  -2   0   2   1  -1   0

Thanks. Bill. I've learned a lot of R from you.

--
David.

>
> Another way to approach the problem is to think of
> your normalized data as the residuals from a linear model:
>> residuals(lm(data=RAW, cbind(P1,P2) ~ E))
>               P1            P2
>  1 -2.000000e+00 -4.000000e+00
>  2  4.385598e-17  8.771196e-17
>  3  2.000000e+00  4.000000e+00
>  4  1.000000e+00 -1.000000e+00
>  5 -1.000000e+00  8.771196e-17
>  6  4.385598e-17  1.000000e+00
>> zapsmall(.Last.value) # make reading easier
>    P1 P2
>  1 -2 -4
>  2  0  0
>  3  2  4
>  4  1 -1
>  5 -1  0
>  6  0  1
> That approach can make generizations to more factors
> or to smoothing approaches easier.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>>
>> HTH,
>> Daniel
>>
>>
>> ronny wrote:
>>>
>>> Hi,
>>>
>>> I would like to center P1 and P2 of the following data frame by  
>>> the factor
>>> "Experiment", i.e. substruct from each value the average of its
>>> experiment, and keep the original data structure, i.e. the  
>>> experiment and
>>> the group of each value.
>>>
>>> RAW=
>>>
>> data
>> .frame
>> ("Experiment
>> "=
>> c
>> (2,2,2,1,1,1
>> ),"Group"=c("A","A","B","A","A","B"),"P1"=c(10,12,14,5,3,4),"P2"=
>> c(8,12,16,2,3,4))
>>>
>>> Desired result:
>>>
>>> NORMALIZED=
>>> data
>>> .frame
>>> ("Experiment
>>> "=
>>> c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-
>> 1,0),"P2"=c(-4,0,4,-1,0,1))
>>>
>>> I tried using "by", but then I lose the original order, and the  
>>> "Group"
>>> varaible. Can you help?
>>>
>>>> RAW
>>>  Experiment Group P1 P2
>>>         2     A 10  8
>>>         2     A 12 12
>>>         2     B 14 16
>>>         1     A  5  2
>>>         1     A  3  3
>>>         1     B  4  4
>>>
>>> NOT.OK<- within (RAW,
>>> {P1<-do.call(rbind,by(RAW$P1,RAW$Experiment,scale,scale=F))})
>>>
>>>> NOT.OK
>>>  Experiment Group P1 P2
>>>          2     A  1  8
>>>          2     A -1 12
>>>          2     B  0 16
>>>          1     A -2  2
>>>          1     A  0  3
>>>          1     B  2  4
>>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Centering-data-frame-by-factor-
>> tp3677609p3677620.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.