R Data

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

R Data

Spencer Brackett
Hello everyone,

The following is a portion of coding that a colleague sent. Given my lack
of experience in R, I am not quite sure what the significance of the
following arguments. Could anyone help me translate? For context, I am
aware of the downloading portion of the script... library(data.table) etc.,
but am not familiar with the portion pertaining to an1 .

library(data.table)
anno = as.data.frame(fread(file =
"/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/450K/mapper.txt", sep ="\t",
header = T))
meth = read.table(file =
"/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/27K/GBM.txt", sep  ="\t",
header = T, row.names = 1)
meth = as.matrix(meth)
""" the loop just formats the methylation column names to match format"""
colnames(meth) = sapply(colnames(meth), function(i){
  c1 = strsplit(i,split = '.', fixed = T)[[1]]
  c1[4] = paste(strsplit(c1[4],split = "",fixed = T)[[1]][1:2],collapse =
"")
  paste(c1,collapse = ".")
})
exp = read.table(file =
"/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/RNAseq/GBM.txt", sep = "\t",
header = T, row.names = 1)
exp = as.matrix(exp)
c = intersect(colnames(exp),colnames(meth))
exp = exp[,c]
meth = meth[,c]
m = apply(meth, 1, function(i){
  log2(i/(1-i))
})
m = t(as.matrix(m))
an = anno[anno$probe %in% rownames(m),]
an = an[an$gene %in% rownames(exp),]
an = an[an$location %in% c("TSS200","TSS1500"),]

p = apply(an,1,function(i){
  tryCatch(summary(lm(exp[as.character(i[2]),] ~
m[as.character(i[1]),]))$coefficient[2,4], error= function(e)NA)
})
t = apply(an,1,function(i){
  tryCatch(summary(lm(exp[as.character(i[2]),] ~
m[as.character(i[1]),]))$coefficient[2,3], error= function(e)NA)
})
an1 =cbind(an,p)
an1 = cbind(an1,t)
an1$q = p.adjust(as.numeric(an1$p))
summary(lm(exp["MAOB",] ~ m["cg00121904",]$coefficient[2,c(3:4)]
###############################################

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R Data

Fowler, Mark-2
Hi Spencer,

The an1 syntax is adding regression coefficients (or NAs where a regression could not be done) to the downloaded and processed data, which ends up a matrix. The cbind function adds the regression coefficients to the last column of the matrix (i.e. bind the columns of the inputs in the order given). Simple example below. Not actually any need for the separate cbind commands, could have just used an1=cbind(an,p,t). The cbind function expects all the columns to be of the same length, hence the use of the tryCatch function to capture NA's for failed regression attempts, ensuring that p and t correspond row by row with the matrix.

 x=seq(1,5)
 y=seq(6,10)
 z=seq(1,5)
xyz=cbind(x,y,z)
xyz
   x  y z
[1,] 1  6 1
[2,] 2  7 2
[3,] 3  8 3
[4,] 4  9 4
[5,] 5 10 5
dangs=rep(NA,5)
xyzd=cbind(xyz,dangs)
xyzd
     x  y z dangs
[1,] 1  6 1    NA
[2,] 2  7 2    NA
[3,] 3  8 3    NA
[4,] 4  9 4    NA
[5,] 5 10 5    NA

-----Original Message-----
From: R-help <[hidden email]> On Behalf Of Spencer Brackett
Sent: February 14, 2019 12:32 AM
To: R-help <[hidden email]>; Sarah Goslee <[hidden email]>; Caitlin Gibbons <[hidden email]>; Jeff Newmiller <[hidden email]>
Subject: [R] R Data

Hello everyone,

The following is a portion of coding that a colleague sent. Given my lack of experience in R, I am not quite sure what the significance of the following arguments. Could anyone help me translate? For context, I am aware of the downloading portion of the script... library(data.table) etc., but am not familiar with the portion pertaining to an1 .

library(data.table)
anno = as.data.frame(fread(file =
"/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/450K/mapper.txt", sep ="\t", header = T)) meth = read.table(file = "/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/27K/GBM.txt", sep  ="\t", header = T, row.names = 1) meth = as.matrix(meth) """ the loop just formats the methylation column names to match format"""
colnames(meth) = sapply(colnames(meth), function(i){
  c1 = strsplit(i,split = '.', fixed = T)[[1]]
  c1[4] = paste(strsplit(c1[4],split = "",fixed = T)[[1]][1:2],collapse =
"")
  paste(c1,collapse = ".")
})
exp = read.table(file =
"/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/RNAseq/GBM.txt", sep = "\t", header = T, row.names = 1) exp = as.matrix(exp) c = intersect(colnames(exp),colnames(meth))
exp = exp[,c]
meth = meth[,c]
m = apply(meth, 1, function(i){
  log2(i/(1-i))
})
m = t(as.matrix(m))
an = anno[anno$probe %in% rownames(m),]
an = an[an$gene %in% rownames(exp),]
an = an[an$location %in% c("TSS200","TSS1500"),]

p = apply(an,1,function(i){
  tryCatch(summary(lm(exp[as.character(i[2]),] ~ m[as.character(i[1]),]))$coefficient[2,4], error= function(e)NA)
})
t = apply(an,1,function(i){
  tryCatch(summary(lm(exp[as.character(i[2]),] ~ m[as.character(i[1]),]))$coefficient[2,3], error= function(e)NA)
})
an1 =cbind(an,p)
an1 = cbind(an1,t)
an1$q = p.adjust(as.numeric(an1$p))
summary(lm(exp["MAOB",] ~ m["cg00121904",]$coefficient[2,c(3:4)]
###############################################

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R Data

Spencer Brackett
Mr. Fowler,

Thank you! This information is most helpful. So from my understanding, I
can use the regression coefficients shown (via the coding I originally
sent, to generate a continuous distribution with what is essentially a line
of best fit? The data added here had some 30,000 variables (it is genomic
data from TCGA), does this mean that any none NA data is being accounted
for in said distribution?

Best,

Spencer Brackett



On Thursday, February 14, 2019, Fowler, Mark <[hidden email]>
wrote:

> Hi Spencer,
>
> The an1 syntax is adding regression coefficients (or NAs where a
> regression could not be done) to the downloaded and processed data, which
> ends up a matrix. The cbind function adds the regression coefficients to
> the last column of the matrix (i.e. bind the columns of the inputs in the
> order given). Simple example below. Not actually any need for the separate
> cbind commands, could have just used an1=cbind(an,p,t). The cbind function
> expects all the columns to be of the same length, hence the use of the
> tryCatch function to capture NA's for failed regression attempts, ensuring
> that p and t correspond row by row with the matrix.
>
>  x=seq(1,5)
>  y=seq(6,10)
>  z=seq(1,5)
> xyz=cbind(x,y,z)
> xyz
>    x  y z
> [1,] 1  6 1
> [2,] 2  7 2
> [3,] 3  8 3
> [4,] 4  9 4
> [5,] 5 10 5
> dangs=rep(NA,5)
> xyzd=cbind(xyz,dangs)
> xyzd
>      x  y z dangs
> [1,] 1  6 1    NA
> [2,] 2  7 2    NA
> [3,] 3  8 3    NA
> [4,] 4  9 4    NA
> [5,] 5 10 5    NA
>
> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Spencer Brackett
> Sent: February 14, 2019 12:32 AM
> To: R-help <[hidden email]>; Sarah Goslee <[hidden email]>;
> Caitlin Gibbons <[hidden email]>; Jeff Newmiller <
> [hidden email]>
> Subject: [R] R Data
>
> Hello everyone,
>
> The following is a portion of coding that a colleague sent. Given my lack
> of experience in R, I am not quite sure what the significance of the
> following arguments. Could anyone help me translate? For context, I am
> aware of the downloading portion of the script... library(data.table) etc.,
> but am not familiar with the portion pertaining to an1 .
>
> library(data.table)
> anno = as.data.frame(fread(file =
> "/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/450K/mapper.txt", sep ="\t",
> header = T)) meth = read.table(file = "/rsrch1/bcb/kchen_group/v_
> mohanty/data/TCGA/27K/GBM.txt", sep  ="\t", header = T, row.names = 1)
> meth = as.matrix(meth) """ the loop just formats the methylation column
> names to match format"""
> colnames(meth) = sapply(colnames(meth), function(i){
>   c1 = strsplit(i,split = '.', fixed = T)[[1]]
>   c1[4] = paste(strsplit(c1[4],split = "",fixed = T)[[1]][1:2],collapse =
> "")
>   paste(c1,collapse = ".")
> })
> exp = read.table(file =
> "/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/RNAseq/GBM.txt", sep = "\t",
> header = T, row.names = 1) exp = as.matrix(exp) c = intersect(colnames(exp),
> colnames(meth))
> exp = exp[,c]
> meth = meth[,c]
> m = apply(meth, 1, function(i){
>   log2(i/(1-i))
> })
> m = t(as.matrix(m))
> an = anno[anno$probe %in% rownames(m),]
> an = an[an$gene %in% rownames(exp),]
> an = an[an$location %in% c("TSS200","TSS1500"),]
>
> p = apply(an,1,function(i){
>   tryCatch(summary(lm(exp[as.character(i[2]),] ~ m[as.character(i[1]),]))$coefficient[2,4],
> error= function(e)NA)
> })
> t = apply(an,1,function(i){
>   tryCatch(summary(lm(exp[as.character(i[2]),] ~ m[as.character(i[1]),]))$coefficient[2,3],
> error= function(e)NA)
> })
> an1 =cbind(an,p)
> an1 = cbind(an1,t)
> an1$q = p.adjust(as.numeric(an1$p))
> summary(lm(exp["MAOB",] ~ m["cg00121904",]$coefficient[2,c(3:4)]
> ###############################################
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R Data

Fowler, Mark-2
I am not sure I would use the word ‘accounted’, more like discounted (tossed out).

From: Spencer Brackett <[hidden email]>
Sent: February 14, 2019 9:21 AM
To: Fowler, Mark <[hidden email]>
Cc: R-help <[hidden email]>; Sarah Goslee <[hidden email]>; Caitlin Gibbons <[hidden email]>; Jeff Newmiller <[hidden email]>
Subject: Re: R Data

Mr. Fowler,

Thank you! This information is most helpful. So from my understanding, I can use the regression coefficients shown (via the coding I originally sent, to generate a continuous distribution with what is essentially a line of best fit? The data added here had some 30,000 variables (it is genomic data from TCGA), does this mean that any none NA data is being accounted for in said distribution?

Best,

Spencer Brackett



On Thursday, February 14, 2019, Fowler, Mark <[hidden email]<mailto:[hidden email]>> wrote:
Hi Spencer,

The an1 syntax is adding regression coefficients (or NAs where a regression could not be done) to the downloaded and processed data, which ends up a matrix. The cbind function adds the regression coefficients to the last column of the matrix (i.e. bind the columns of the inputs in the order given). Simple example below. Not actually any need for the separate cbind commands, could have just used an1=cbind(an,p,t). The cbind function expects all the columns to be of the same length, hence the use of the tryCatch function to capture NA's for failed regression attempts, ensuring that p and t correspond row by row with the matrix.

 x=seq(1,5)
 y=seq(6,10)
 z=seq(1,5)
xyz=cbind(x,y,z)
xyz
   x  y z
[1,] 1  6 1
[2,] 2  7 2
[3,] 3  8 3
[4,] 4  9 4
[5,] 5 10 5
dangs=rep(NA,5)
xyzd=cbind(xyz,dangs)
xyzd
     x  y z dangs
[1,] 1  6 1    NA
[2,] 2  7 2    NA
[3,] 3  8 3    NA
[4,] 4  9 4    NA
[5,] 5 10 5    NA

-----Original Message-----
From: R-help <[hidden email]<mailto:[hidden email]>> On Behalf Of Spencer Brackett
Sent: February 14, 2019 12:32 AM
To: R-help <[hidden email]<mailto:[hidden email]>>; Sarah Goslee <[hidden email]<mailto:[hidden email]>>; Caitlin Gibbons <[hidden email]<mailto:[hidden email]>>; Jeff Newmiller <[hidden email]<mailto:[hidden email]>>
Subject: [R] R Data

Hello everyone,

The following is a portion of coding that a colleague sent. Given my lack of experience in R, I am not quite sure what the significance of the following arguments. Could anyone help me translate? For context, I am aware of the downloading portion of the script... library(data.table) etc., but am not familiar with the portion pertaining to an1 .

library(data.table)
anno = as.data.frame(fread(file =
"/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/450K/mapper.txt", sep ="\t", header = T)) meth = read.table(file = "/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/27K/GBM.txt", sep  ="\t", header = T, row.names = 1) meth = as.matrix(meth) """ the loop just formats the methylation column names to match format"""
colnames(meth) = sapply(colnames(meth), function(i){
  c1 = strsplit(i,split = '.', fixed = T)[[1]]
  c1[4] = paste(strsplit(c1[4],split = "",fixed = T)[[1]][1:2],collapse =
"")
  paste(c1,collapse = ".")
})
exp = read.table(file =
"/rsrch1/bcb/kchen_group/v_mohanty/data/TCGA/RNAseq/GBM.txt", sep = "\t", header = T, row.names = 1) exp = as.matrix(exp) c = intersect(colnames(exp),colnames(meth))
exp = exp[,c]
meth = meth[,c]
m = apply(meth, 1, function(i){
  log2(i/(1-i))
})
m = t(as.matrix(m))
an = anno[anno$probe %in% rownames(m),]
an = an[an$gene %in% rownames(exp),]
an = an[an$location %in% c("TSS200","TSS1500"),]

p = apply(an,1,function(i){
  tryCatch(summary(lm(exp[as.character(i[2]),] ~ m[as.character(i[1]),]))$coefficient[2,4], error= function(e)NA)
})
t = apply(an,1,function(i){
  tryCatch(summary(lm(exp[as.character(i[2]),] ~ m[as.character(i[1]),]))$coefficient[2,3], error= function(e)NA)
})
an1 =cbind(an,p)
an1 = cbind(an1,t)
an1$q = p.adjust(as.numeric(an1$p))
summary(lm(exp["MAOB",] ~ m["cg00121904",]$coefficient[2,c(3:4)]
###############################################

        [[alternative HTML version deleted]]

______________________________________________
[hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.