Help with apply and new column?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with apply and new column?

Gneuro
Hello members,

Can I ask question for apply, adding new column to data frame on this e-mail list?

Thanks!



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with apply and new column?

Jeff Newmiller
Read the Posting Guide... (see message footer) ... some relevant things you can find there:

a) Yes, this appears to be about how to use an R base function so it is on topic
b) Post a reproducible example (include some sample data, preferably using the dput function)
c) Post using plain text so the mailing list doesn't convert it for you and mangle things in a way you did not intend.
--
Sent from my phone. Please excuse my brevity.

On March 5, 2018 10:07:24 AM PST, "Sariya, Sanjeev" <[hidden email]> wrote:

>Hello members,
>
>Can I ask question for apply, adding new column to data frame on this
>e-mail list?
>
>Thanks!
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with apply and new column?

Gneuro
Thanks. I think nabble is good for programming questions. Bear with me if I'm incorrect.

Data: Genomics SNP information
Goal: I need to add Chromosome and SNP position to the data frame I'm using through apply.

I'd like to add new column from text processed through apply function.

For example:  10:60523:T:G  (Column 2)
CHR: 10
Position: 60523

Dataset:
chr rs ps n_miss allele1 allele0 af beta se l_remle p_wald
-9 10:60523:T:G -9 0 T G 0.977 -1.769354e-02 3.597196e-02 1.566731e-01 6.228309e-01
-9 10:60684:A:C -9 0 A C 0.973 1.698925e-02 2.942366e-02 1.561001e-01 5.636926e-01
-9 10:61331:A:G -9 0 A G 0.973 1.708586e-02 2.942424e-02 1.560944e-01 5.614851e-01
-9 10:62010:C:T -9 0 C T 0.980 -8.513143e-03 3.837054e-02 1.566875e-01 8.244260e-01

Code:

--------------------------------------------------------
data<-read.table("small.txt",header = T) # read data
data<-data[,c(2,11)] #delete other columns not needed

#--split data on : and get chromosome and position

split_rs<-function(rs){  
   
    chr<-vector(,length(rs)) # create new vector to store chr
    pos<-vector(,length(rs)) #create new vector to store position
   
    for(i in 1:length(rs)){ #iterate over RS column

        if(grepl(":",rs[i])){ #if : in column string
            temp <- strsplit(rs[i],":",fixed=T)         #split        
            chr[i] <-temp[[1]][1] #store CHR
            pos[i] <- temp[[1]][2]  #store position
        }    
    }
    return(list(chr=chr,pos=pos)) #return making a list
}

data$POS<-"NA" #add new column CHR and make NA
data$CHR <- "NA" #add new column POS and make NA

temp<-apply(data,2,split_rs) #send data frame to function

#--I assign value from list sent -- I would like to improve this part

data$CHR<-temp$rs$chr
data$POS<-temp$rs$pos

rm(temp)

colnames(data)<-c("SNP","P","CHR","BP")
--------------------------------------------------------



-----Original Message-----
From: Jeff Newmiller [mailto:[hidden email]]
Sent: Monday, March 5, 2018 1:48 PM
To: [hidden email]; Sariya, Sanjeev <[hidden email]>; R Help <[hidden email]>
Subject: Re: [R] Help with apply and new column?

Read the Posting Guide... (see message footer) ... some relevant things you can find there:

a) Yes, this appears to be about how to use an R base function so it is on topic
b) Post a reproducible example (include some sample data, preferably using the dput function)
c) Post using plain text so the mailing list doesn't convert it for you and mangle things in a way you did not intend.
--
Sent from my phone. Please excuse my brevity.

On March 5, 2018 10:07:24 AM PST, "Sariya, Sanjeev" <[hidden email]> wrote:

>Hello members,
>
>Can I ask question for apply, adding new column to data frame on this
>e-mail list?
>
>Thanks!
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with apply and new column?

Jeff Newmiller
Comments interspersed, and some code at the end.

On Mon, 5 Mar 2018, Sariya, Sanjeev wrote:

> Thanks. I think nabble is good for programming questions. Bear with me
> if I'm incorrect.

You may have found R-help archives at Nabble, but R-help has nothing to do
with Nabble.

>
> Data: Genomics SNP information

I know almost nothing about using R for genomics.

> Goal: I need to add Chromosome and SNP position to the data frame I'm using through apply.
>
> I'd like to add new column from text processed through apply function.
>
> For example:  10:60523:T:G  (Column 2)
> CHR: 10
> Position: 60523

Assuming Position is "P", what are your "SNP" and "BP" in the names you
assigned below as c("SNP","P","CHR","BP")?

> Dataset:
> chr rs ps n_miss allele1 allele0 af beta se l_remle p_wald
> -9 10:60523:T:G -9 0 T G 0.977 -1.769354e-02 3.597196e-02 1.566731e-01 6.228309e-01
> -9 10:60684:A:C -9 0 A C 0.973 1.698925e-02 2.942366e-02 1.561001e-01 5.636926e-01
> -9 10:61331:A:G -9 0 A G 0.973 1.708586e-02 2.942424e-02 1.560944e-01 5.614851e-01
> -9 10:62010:C:T -9 0 C T 0.980 -8.513143e-03 3.837054e-02 1.566875e-01 8.244260e-01
>
> Code:
>
> --------------------------------------------------------
> data<-read.table("small.txt",header = T) # read data
> data<-data[,c(2,11)] #delete other columns not needed
>
> #--split data on : and get chromosome and position
>
> split_rs<-function(rs){
>
>    chr<-vector(,length(rs)) # create new vector to store chr
>    pos<-vector(,length(rs)) #create new vector to store position
>
>    for(i in 1:length(rs)){ #iterate over RS column
>
>        if(grepl(":",rs[i])){ #if : in column string
>            temp <- strsplit(rs[i],":",fixed=T)         #split
>            chr[i] <-temp[[1]][1] #store CHR
>            pos[i] <- temp[[1]][2]  #store position
>        }
>    }
>    return(list(chr=chr,pos=pos)) #return making a list
> }
>
> data$POS<-"NA" #add new column CHR and make NA
> data$CHR <- "NA" #add new column POS and make NA
>
> temp<-apply(data,2,split_rs) #send data frame to function
>
> #--I assign value from list sent -- I would like to improve this part
>
> data$CHR<-temp$rs$chr
> data$POS<-temp$rs$pos
>
> rm(temp)
>
> colnames(data)<-c("SNP","P","CHR","BP")
> --------------------------------------------------------

######################################################
# Your code was pretty severely broken... it would not run,
# and I don't know what you expected to see as output.

# 1) data is the name of a function in base R... re-using
#    it can lead to puzzling errors
# 2) With all this character manipulation, you need to read
#    your character data in as character, not as factors
# 3) Don't use the T variable... use the constant TRUE,
#    since T can easily be overwritten to some non-TRUE value.
dta <- read.table( "small.txt", header = TRUE, as.is = TRUE ) # read data
dta <- dta[ , c( 2, 11 ) ] #delete other columns not needed

# 4) Not at all clear why you want to split all of the columns
#    using apply( ..., 2, ... ) when only one column has ":" characters
#temp <- apply( dta, 2, split_rs ) #send data frame to function
temp <- strsplit( dta$rs, ":" ) # gets the whole column splits at once

# wildly guessing here
rs_chrmatrix <- do.call( rbind, temp )
rs_DF <- as.data.frame( rs_chrmatrix, stringsAsFactors = FALSE )
names( rs_DF ) <- c( "CHR", "P", "X1", "X2" )
rs_DF$P <- as.integer( rs_DF$P )

str( rs_DF )
##################################################




>
> -----Original Message-----
> From: Jeff Newmiller [mailto:[hidden email]]
> Sent: Monday, March 5, 2018 1:48 PM
> To: [hidden email]; Sariya, Sanjeev <[hidden email]>; R Help <[hidden email]>
> Subject: Re: [R] Help with apply and new column?
>
> Read the Posting Guide... (see message footer) ... some relevant things you can find there:
>
> a) Yes, this appears to be about how to use an R base function so it is on topic
> b) Post a reproducible example (include some sample data, preferably using the dput function)
> c) Post using plain text so the mailing list doesn't convert it for you and mangle things in a way you did not intend.
> --
> Sent from my phone. Please excuse my brevity.
>
> On March 5, 2018 10:07:24 AM PST, "Sariya, Sanjeev" <[hidden email]> wrote:
>> Hello members,
>>
>> Can I ask question for apply, adding new column to data frame on this
>> e-mail list?
>>
>> Thanks!
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with apply and new column?

Gneuro
Thank you, that helps.

-----Original Message-----
From: Jeff Newmiller [mailto:[hidden email]]
Sent: Monday, March 5, 2018 3:36 PM
To: Sariya, Sanjeev <[hidden email]>
Cc: [hidden email]; R Help <[hidden email]>
Subject: RE: [R] Help with apply and new column?

Comments interspersed, and some code at the end.

On Mon, 5 Mar 2018, Sariya, Sanjeev wrote:

> Thanks. I think nabble is good for programming questions. Bear with me
> if I'm incorrect.

You may have found R-help archives at Nabble, but R-help has nothing to do with Nabble.

>
> Data: Genomics SNP information

I know almost nothing about using R for genomics.

> Goal: I need to add Chromosome and SNP position to the data frame I'm using through apply.
>
> I'd like to add new column from text processed through apply function.
>
> For example:  10:60523:T:G  (Column 2)
> CHR: 10
> Position: 60523

Assuming Position is "P", what are your "SNP" and "BP" in the names you assigned below as c("SNP","P","CHR","BP")?

> Dataset:
> chr rs ps n_miss allele1 allele0 af beta se l_remle p_wald
> -9 10:60523:T:G -9 0 T G 0.977 -1.769354e-02 3.597196e-02 1.566731e-01 6.228309e-01
> -9 10:60684:A:C -9 0 A C 0.973 1.698925e-02 2.942366e-02 1.561001e-01 5.636926e-01
> -9 10:61331:A:G -9 0 A G 0.973 1.708586e-02 2.942424e-02 1.560944e-01 5.614851e-01
> -9 10:62010:C:T -9 0 C T 0.980 -8.513143e-03 3.837054e-02 1.566875e-01 8.244260e-01
>
> Code:
>
> --------------------------------------------------------
> data<-read.table("small.txt",header = T) # read data
> data<-data[,c(2,11)] #delete other columns not needed
>
> #--split data on : and get chromosome and position
>
> split_rs<-function(rs){
>
>    chr<-vector(,length(rs)) # create new vector to store chr
>    pos<-vector(,length(rs)) #create new vector to store position
>
>    for(i in 1:length(rs)){ #iterate over RS column
>
>        if(grepl(":",rs[i])){ #if : in column string
>            temp <- strsplit(rs[i],":",fixed=T)         #split
>            chr[i] <-temp[[1]][1] #store CHR
>            pos[i] <- temp[[1]][2]  #store position
>        }
>    }
>    return(list(chr=chr,pos=pos)) #return making a list }
>
> data$POS<-"NA" #add new column CHR and make NA data$CHR <- "NA" #add
> new column POS and make NA
>
> temp<-apply(data,2,split_rs) #send data frame to function
>
> #--I assign value from list sent -- I would like to improve this part
>
> data$CHR<-temp$rs$chr
> data$POS<-temp$rs$pos
>
> rm(temp)
>
> colnames(data)<-c("SNP","P","CHR","BP")
> --------------------------------------------------------

######################################################
# Your code was pretty severely broken... it would not run, # and I don't know what you expected to see as output.

# 1) data is the name of a function in base R... re-using
#    it can lead to puzzling errors
# 2) With all this character manipulation, you need to read
#    your character data in as character, not as factors
# 3) Don't use the T variable... use the constant TRUE,
#    since T can easily be overwritten to some non-TRUE value.
dta <- read.table( "small.txt", header = TRUE, as.is = TRUE ) # read data dta <- dta[ , c( 2, 11 ) ] #delete other columns not needed

# 4) Not at all clear why you want to split all of the columns
#    using apply( ..., 2, ... ) when only one column has ":" characters
#temp <- apply( dta, 2, split_rs ) #send data frame to function temp <- strsplit( dta$rs, ":" ) # gets the whole column splits at once

# wildly guessing here
rs_chrmatrix <- do.call( rbind, temp )
rs_DF <- as.data.frame( rs_chrmatrix, stringsAsFactors = FALSE ) names( rs_DF ) <- c( "CHR", "P", "X1", "X2" ) rs_DF$P <- as.integer( rs_DF$P )

str( rs_DF )
##################################################




>
> -----Original Message-----
> From: Jeff Newmiller [mailto:[hidden email]]
> Sent: Monday, March 5, 2018 1:48 PM
> To: [hidden email]; Sariya, Sanjeev <[hidden email]>;
> R Help <[hidden email]>
> Subject: Re: [R] Help with apply and new column?
>
> Read the Posting Guide... (see message footer) ... some relevant things you can find there:
>
> a) Yes, this appears to be about how to use an R base function so it
> is on topic
> b) Post a reproducible example (include some sample data, preferably
> using the dput function)
> c) Post using plain text so the mailing list doesn't convert it for you and mangle things in a way you did not intend.
> --
> Sent from my phone. Please excuse my brevity.
>
> On March 5, 2018 10:07:24 AM PST, "Sariya, Sanjeev" <[hidden email]> wrote:
>> Hello members,
>>
>> Can I ask question for apply, adding new column to data frame on this
>> e-mail list?
>>
>> Thanks!
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.