Z score

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Z score

Vedant Sharma
Hi,

I need to find the z-score of the data present in a speardsheet. The values
needs to be calculated for each gene across the samples (refer the
example). And, it should be a simple thing, but I am unable to do it right
now !

The example re the structure of the spreadsheet is  -

# Example:

MyFile <- read.csv( text=
"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,98,22,34
Gene_3,33,43,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )

And, I think this formula that can be used for z score is -

(x-mean(x))/sd(x)

And, apply() function for rows should work. But bottomline - I am unable to
do it correctly.

Could you show me - using apply () or some other alternative function.

Thank you.

Cheers,
Ved

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Z score

Rui Barradas
Hello,

Try the following.

apply(MyFile, 1, scale)

Hope this helps,

Rui Barradas
Em 24-10-2012 07:17, Vedant Sharma escreveu:

> Hi,
>
> I need to find the z-score of the data present in a speardsheet. The values
> needs to be calculated for each gene across the samples (refer the
> example). And, it should be a simple thing, but I am unable to do it right
> now !
>
> The example re the structure of the spreadsheet is  -
>
> # Example:
>
> MyFile <- read.csv( text=
> "Names,'Sample_1','Sample_2','Sample_3'
> Gene_1,87,77,88
> Gene_2,98,22,34
> Gene_3,33,43,33
> Gene_4,78,,81
> ", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )
>
> And, I think this formula that can be used for z score is -
>
> (x-mean(x))/sd(x)
>
> And, apply() function for rows should work. But bottomline - I am unable to
> do it correctly.
>
> Could you show me - using apply () or some other alternative function.
>
> Thank you.
>
> Cheers,
> Ved
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Z score

arun kirshna
In reply to this post by Vedant Sharma
Hi,
Try this:
 res<-do.call(rbind,lapply(lapply(apply(MyFile,1,function(x) x[!is.na(x)]),function(x) (x-mean(x))/sd(x)),function(x) x[c("Sample_1","Sample_2","Sample_3")]))
 res
#         Sample_1   Sample_2   Sample_3
#Gene_1  0.4931970 -1.1507929  0.6575959
#Gene_2  1.1421818 -0.7179429 -0.4242390
#Gene_3 -0.5773503  1.1547005 -0.5773503
#Gene_4 -0.7071068         NA  0.7071068
A.K.





----- Original Message -----
From: Vedant Sharma <[hidden email]>
To: R help <[hidden email]>
Cc:
Sent: Wednesday, October 24, 2012 2:17 AM
Subject: [R] Z score

Hi,

I need to find the z-score of the data present in a speardsheet. The values
needs to be calculated for each gene across the samples (refer the
example). And, it should be a simple thing, but I am unable to do it right
now !

The example re the structure of the spreadsheet is  -

# Example:

MyFile <- read.csv( text=
"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,98,22,34
Gene_3,33,43,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )

And, I think this formula that can be used for z score is -

(x-mean(x))/sd(x)

And, apply() function for rows should work. But bottomline - I am unable to
do it correctly.

Could you show me - using apply () or some other alternative function.

Thank you.

Cheers,
Ved

    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Z score

arun kirshna
In reply to this post by Vedant Sharma
Hi,

In cases, with more sample columns, you could also use this:
 res2<-t(sapply(lapply(apply(MyFile,1,function(x) x[!is.na(x)]),function(x) (x-mean(x))/sd(x)),function(x) x[colnames(MyFile)] ))
res2
 #        Sample_1   Sample_2   Sample_3
#Gene_1  0.4931970 -1.1507929  0.6575959
#Gene_2  1.1421818 -0.7179429 -0.4242390
#Gene_3 -0.5773503  1.1547005 -0.5773503
#Gene_4 -0.7071068         NA  0.7071068
A.K.



----- Original Message -----
From: Vedant Sharma <[hidden email]>
To: R help <[hidden email]>
Cc:
Sent: Wednesday, October 24, 2012 2:17 AM
Subject: [R] Z score

Hi,

I need to find the z-score of the data present in a speardsheet. The values
needs to be calculated for each gene across the samples (refer the
example). And, it should be a simple thing, but I am unable to do it right
now !

The example re the structure of the spreadsheet is  -

# Example:

MyFile <- read.csv( text=
"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,98,22,34
Gene_3,33,43,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )

And, I think this formula that can be used for z score is -

(x-mean(x))/sd(x)

And, apply() function for rows should work. But bottomline - I am unable to
do it correctly.

Could you show me - using apply () or some other alternative function.

Thank you.

Cheers,
Ved

    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Z score

arun kirshna


Hi Ved,

Sorry, I didn't test it well enough at that time. 

In your example file,
 #there were NAs
MyFile1 <- read.csv( text=
"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,98,22,34
Gene_3,33,43,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )


#Here, the apply() function outputs a list when I remove the NA from the last row.
 apply(MyFile1,1,function(x) x[!is.na(x)]) #outputs a list
#$Gene_1
#Sample_1 Sample_2 Sample_3
 #     87       77       88

#$Gene_2
#Sample_1 Sample_2 Sample_3
 #     98       22       34

#$Gene_3
#Sample_1 Sample_2 Sample_3
 #     33       43       33

#$Gene_4
#Sample_1 Sample_3
 #     78       81

# Without NAs
MyFile2 <- read.csv( text=
"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,98,22,34
Gene_3,33,43,33
Gene_4,78,48,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )

apply(dat3,1,function(x) x[!is.na(x)]) # the output is a matrix
#         Gene_1 Gene_2 Gene_3 Gene_4
#Sample_1     87     98     33     78
#Sample_2     77     22     43     48
#Sample_3     88     34     33     81
is.matrix(apply(dat3,1,function(x) x[!is.na(x)]) )
#[1] TRUE

#Consider another case
MyFile3 <- read.csv( text=
"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,,22,34
Gene_3,33,43,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )

t(sapply(lapply(apply(MyFile3,1,function(x) x[!is.na(x)]),function(x) (x-mean(x))/sd(x)),function(x) x[colnames(MyFile3)] )) #works because the apply() output is a list
#        Sample_1   Sample_2   Sample_3
#Gene_1  0.4931970 -1.1507929  0.6575959
#Gene_2         NA -0.7071068  0.7071068
#Gene_3 -0.5773503  1.1547005 -0.5773503
#Gene_4 -0.7071068         NA  0.7071068


#Yet another case:
MyFile4 <- read.csv( text=
"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77
Gene_2,,22,34
Gene_3,33,,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )
 apply(MyFile4,1,function(x) x[!is.na(x)]) #output is a matrix because equal number of NAs were present in each row
#     Gene_1 Gene_2 Gene_3 Gene_4
#[1,]     87     22     33     78
#[2,]     77     34     33     81
t(sapply(lapply(apply(MyFile4,1,function(x) x[!is.na(x)]),function(x) (x-mean(x))/sd(x)),function(x) x[colnames(MyFile4)] )) #doesn't work



#In your dataset, there were no NAs
dat1<-read.csv("Bcl2_With_expressions.csv",sep="\t",row.names=1)
MyFile<-dat1[,-1]

 str(apply(MyFile,1,function(x) x[!is.na(x)])) # a matrix
# num [1:29, 1:18] 10.48 10.96 9.28 11.1 10.95 ...
 #- attr(*, "dimnames")=List of 2
 # ..$ : chr [1:29] "ALL2" "MLL8" "ALL42" "MLL5" ...
 # ..$ : chr [1:18] "BAX" "BCL2L15" "BCL2" "BMF" ...

#In this case,
either
 res2<-apply(MyFile,1,function(x) (x-mean(x))/sd(x))

#or

 res1<-apply(apply(MyFile,1,function(x) x[!is.na(x)]),2,function(x) (x-mean(x))/sd(x)) #works

 
 identical(res1,res2)
#[1] TRUE

 head(res1,2)
 #          BAX   BCL2L15     BCL2        BMF        BAD      MCL1     BCL2L1
#ALL2 0.1216373 -0.215256 1.040758 -0.4078606 -0.2427741 0.6967070 -0.1054749
#MLL8 0.6565878 -1.446252 1.052566 -0.1825442 -0.2312166 0.9882503 -0.9687260
  #          BOK     BCL2A1    BCL2L14       BAK1      BBC3    BCL2L11
#ALL2 -0.1465807  0.5353133 -0.1772439 -0.3751981 0.6341806 -1.2432273
#MLL8  0.2918296 -0.8466821  0.3088331 -1.4025846 0.7056799  0.9944288
  #          BID     NOXA1        BIK          HRK    BCL2L2
#ALL2 -2.2961643 0.2105960 -0.9195998 -0.001731806 1.6691590
#MLL8 -0.5103087 0.3433778  1.2352986 -0.568548518 0.3674839


Hope it helps
A.K.







________________________________
From: Vedant Sharma <[hidden email]>
To: arun <[hidden email]>
Sent: Wednesday, October 24, 2012 7:56 PM
Subject: Re: [R] Z score


Hello Arun,

Thank you. I could manage to get the answer.

However, this particular code, however, doesn't seem to work when I try to read from a .csv file (as attached). And, I am inquisitive to find out the reason !

MyFile <- read.csv (file.choose(), header=T, row.names=1)
MyFile <- MyFile [,-1]
res2<-t(sapply(lapply(apply(MyFile,1,function(x) x[!is.na(x)]),function(x) (x-mean(x))/sd(x)),function(x) x[colnames(MyFile)] ))

Thanks again !!

Cheers,
Ved

=============================================


On Wed, Oct 24, 2012 at 9:53 PM, arun <[hidden email]> wrote:

Hi,

>
>In cases, with more sample columns, you could also use this:
> res2<-t(sapply(lapply(apply(MyFile,1,function(x) x[!is.na(x)]),function(x) (x-mean(x))/sd(x)),function(x) x[colnames(MyFile)] ))
>res2
>
> #        Sample_1   Sample_2   Sample_3
>#Gene_1  0.4931970 -1.1507929  0.6575959
>#Gene_2  1.1421818 -0.7179429 -0.4242390
>#Gene_3 -0.5773503  1.1547005 -0.5773503
>#Gene_4 -0.7071068         NA  0.7071068
>A.K.
>
>
>
>----- Original Message -----
>From: Vedant Sharma <[hidden email]>
>To: R help <[hidden email]>
>Cc:
>Sent: Wednesday, October 24, 2012 2:17 AM
>Subject: [R] Z score
>
>
>Hi,
>
>I need to find the z-score of the data present in a speardsheet. The values
>needs to be calculated for each gene across the samples (refer the
>example). And, it should be a simple thing, but I am unable to do it right
>now !
>
>The example re the structure of the spreadsheet is  -
>
># Example:
>
>MyFile <- read.csv( text=
>"Names,'Sample_1','Sample_2','Sample_3'
>Gene_1,87,77,88
>Gene_2,98,22,34
>Gene_3,33,43,33
>Gene_4,78,,81
>", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )
>
>And, I think this formula that can be used for z score is -
>
>(x-mean(x))/sd(x)
>
>And, apply() function for rows should work. But bottomline - I am unable to
>do it correctly.
>
>Could you show me - using apply () or some other alternative function.
>
>Thank you.
>
>Cheers,
>Ved
>
>
>    [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Z score

Vedant Sharma
Hi Arun,

Thank you !  Very much appreciated.
[Only added t() at the end to preserve the original orientation.]

Also, many thanks to Rui Barradas.

Cheers,
Ved

================================================================


On Thu, Oct 25, 2012 at 12:06 PM, arun <[hidden email]> wrote:

>
>
> Hi Ved,
>
> Sorry, I didn't test it well enough at that time.
>
> In your example file,
>  #there were NAs
> MyFile1 <- read.csv( text=
> "Names,'Sample_1','Sample_2','Sample_3'
> Gene_1,87,77,88
> Gene_2,98,22,34
> Gene_3,33,43,33
> Gene_4,78,,81
> ", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )
>
>
> #Here, the apply() function outputs a list when I remove the NA from the
> last row.
>  apply(MyFile1,1,function(x) x[!is.na(x)]) #outputs a list
> #$Gene_1
> #Sample_1 Sample_2 Sample_3
>  #     87       77       88
>
> #$Gene_2
> #Sample_1 Sample_2 Sample_3
>  #     98       22       34
>
> #$Gene_3
> #Sample_1 Sample_2 Sample_3
>  #     33       43       33
>
> #$Gene_4
> #Sample_1 Sample_3
>  #     78       81
>
> # Without NAs
> MyFile2 <- read.csv( text=
> "Names,'Sample_1','Sample_2','Sample_3'
> Gene_1,87,77,88
> Gene_2,98,22,34
> Gene_3,33,43,33
> Gene_4,78,48,81
> ", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )
>
> apply(dat3,1,function(x) x[!is.na(x)]) # the output is a matrix
> #         Gene_1 Gene_2 Gene_3 Gene_4
> #Sample_1     87     98     33     78
> #Sample_2     77     22     43     48
> #Sample_3     88     34     33     81
> is.matrix(apply(dat3,1,function(x) x[!is.na(x)]) )
> #[1] TRUE
>
> #Consider another case
> MyFile3 <- read.csv( text=
> "Names,'Sample_1','Sample_2','Sample_3'
> Gene_1,87,77,88
> Gene_2,,22,34
> Gene_3,33,43,33
> Gene_4,78,,81
> ", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )
>
> t(sapply(lapply(apply(MyFile3,1,function(x) x[!is.na(x)]),function(x)
> (x-mean(x))/sd(x)),function(x) x[colnames(MyFile3)] )) #works because the
> apply() output is a list
> #        Sample_1   Sample_2   Sample_3
> #Gene_1  0.4931970 -1.1507929  0.6575959
> #Gene_2         NA -0.7071068  0.7071068
> #Gene_3 -0.5773503  1.1547005 -0.5773503
> #Gene_4 -0.7071068         NA  0.7071068
>
>
> #Yet another case:
> MyFile4 <- read.csv( text=
> "Names,'Sample_1','Sample_2','Sample_3'
> Gene_1,87,77
> Gene_2,,22,34
> Gene_3,33,,33
> Gene_4,78,,81
> ", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )
>  apply(MyFile4,1,function(x) x[!is.na(x)]) #output is a matrix because
> equal number of NAs were present in each row
> #     Gene_1 Gene_2 Gene_3 Gene_4
> #[1,]     87     22     33     78
> #[2,]     77     34     33     81
> t(sapply(lapply(apply(MyFile4,1,function(x) x[!is.na(x)]),function(x)
> (x-mean(x))/sd(x)),function(x) x[colnames(MyFile4)] )) #doesn't work
>
>
>
> #In your dataset, there were no NAs
> dat1<-read.csv("Bcl2_With_expressions.csv",sep="\t",row.names=1)
> MyFile<-dat1[,-1]
>
>  str(apply(MyFile,1,function(x) x[!is.na(x)])) # a matrix
> # num [1:29, 1:18] 10.48 10.96 9.28 11.1 10.95 ...
>  #- attr(*, "dimnames")=List of 2
>  # ..$ : chr [1:29] "ALL2" "MLL8" "ALL42" "MLL5" ...
>  # ..$ : chr [1:18] "BAX" "BCL2L15" "BCL2" "BMF" ...
>
> #In this case,
> either
>  res2<-apply(MyFile,1,function(x) (x-mean(x))/sd(x))
>
> #or
>
>  res1<-apply(apply(MyFile,1,function(x) x[!is.na(x)]),2,function(x)
> (x-mean(x))/sd(x)) #works
>
>
>  identical(res1,res2)
> #[1] TRUE
>
>  head(res1,2)
>  #          BAX   BCL2L15     BCL2        BMF        BAD      MCL1
> BCL2L1
> #ALL2 0.1216373 -0.215256 1.040758 -0.4078606 -0.2427741 0.6967070
> -0.1054749
> #MLL8 0.6565878 -1.446252 1.052566 -0.1825442 -0.2312166 0.9882503
> -0.9687260
>   #          BOK     BCL2A1    BCL2L14       BAK1      BBC3    BCL2L11
> #ALL2 -0.1465807  0.5353133 -0.1772439 -0.3751981 0.6341806 -1.2432273
> #MLL8  0.2918296 -0.8466821  0.3088331 -1.4025846 0.7056799  0.9944288
>   #          BID     NOXA1        BIK          HRK    BCL2L2
> #ALL2 -2.2961643 0.2105960 -0.9195998 -0.001731806 1.6691590
> #MLL8 -0.5103087 0.3433778  1.2352986 -0.568548518 0.3674839
>
>
> Hope it helps
> A.K.
>
>
>
>
>
>
>
> ________________________________
> From: Vedant Sharma <[hidden email]>
> To: arun <[hidden email]>
> Sent: Wednesday, October 24, 2012 7:56 PM
> Subject: Re: [R] Z score
>
>
> Hello Arun,
>
> Thank you. I could manage to get the answer.
>
> However, this particular code, however, doesn't seem to work when I try to
> read from a .csv file (as attached). And, I am inquisitive to find out the
> reason !
>
> MyFile <- read.csv (file.choose(), header=T, row.names=1)
> MyFile <- MyFile [,-1]
> res2<-t(sapply(lapply(apply(MyFile,1,function(x) x[!is.na(x)]),function(x)
> (x-mean(x))/sd(x)),function(x) x[colnames(MyFile)] ))
>
> Thanks again !!
>
> Cheers,
> Ved
>
> =============================================
>
>
> On Wed, Oct 24, 2012 at 9:53 PM, arun <[hidden email]> wrote:
>
> Hi,
> >
> >In cases, with more sample columns, you could also use this:
> > res2<-t(sapply(lapply(apply(MyFile,1,function(x) x[!is.na(x)]),function(x)
> (x-mean(x))/sd(x)),function(x) x[colnames(MyFile)] ))
> >res2
> >
> > #        Sample_1   Sample_2   Sample_3
> >#Gene_1  0.4931970 -1.1507929  0.6575959
> >#Gene_2  1.1421818 -0.7179429 -0.4242390
> >#Gene_3 -0.5773503  1.1547005 -0.5773503
> >#Gene_4 -0.7071068         NA  0.7071068
> >A.K.
> >
> >
> >
> >----- Original Message -----
> >From: Vedant Sharma <[hidden email]>
> >To: R help <[hidden email]>
> >Cc:
> >Sent: Wednesday, October 24, 2012 2:17 AM
> >Subject: [R] Z score
> >
> >
> >Hi,
> >
> >I need to find the z-score of the data present in a speardsheet. The
> values
> >needs to be calculated for each gene across the samples (refer the
> >example). And, it should be a simple thing, but I am unable to do it right
> >now !
> >
> >The example re the structure of the spreadsheet is  -
> >
> ># Example:
> >
> >MyFile <- read.csv( text=
> >"Names,'Sample_1','Sample_2','Sample_3'
> >Gene_1,87,77,88
> >Gene_2,98,22,34
> >Gene_3,33,43,33
> >Gene_4,78,,81
> >", header=TRUE, row.names=1, as.is=TRUE, quote="'", na.strings="" )
> >
> >And, I think this formula that can be used for z score is -
> >
> >(x-mean(x))/sd(x)
> >
> >And, apply() function for rows should work. But bottomline - I am unable
> to
> >do it correctly.
> >
> >Could you show me - using apply () or some other alternative function.
> >
> >Thank you.
> >
> >Cheers,
> >Ved
> >
> >
> >    [[alternative HTML version deleted]]
> >
> >______________________________________________
> >[hidden email] mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.