Paste every two columns together

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Paste every two columns together

Beejai
I have genetic data as follows (simple example, actual data is much larger):

comb =

ID1 A A T G C T G C G T C G T A

ID2 G C T G C C T G C T G T T T

And I wish to get an output like this:

ID1 AA TG CT GC GT CG TA

ID2 GC TG CC TG CT GT TT

That is, paste every two columns together.

I have this code, but I get the error:

Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1

conc <- function(x) {
  s <- seq(2, nchar(x), 2)
  paste0(x[s], x[s+1])
}

combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)

Thanks in advance!

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Paste every two columns together

Jim Lemon-4
Hi Kate,
Maybe you want:

seq(2,length(x),by=2)

Jim


On Thu, Jan 29, 2015 at 10:55 AM, Kate Ignatius <[hidden email]> wrote:

> I have genetic data as follows (simple example, actual data is much larger):
>
> comb =
>
> ID1 A A T G C T G C G T C G T A
>
> ID2 G C T G C C T G C T G T T T
>
> And I wish to get an output like this:
>
> ID1 AA TG CT GC GT CG TA
>
> ID2 GC TG CC TG CT GT TT
>
> That is, paste every two columns together.
>
> I have this code, but I get the error:
>
> Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
>
> conc <- function(x) {
>   s <- seq(2, nchar(x), 2)
>   paste0(x[s], x[s+1])
> }
>
> combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
>
> Thanks in advance!
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Paste every two columns together

JS Huang
In reply to this post by Beejai
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Paste every two columns together

gnustats
In reply to this post by Beejai
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Paste every two columns together

Bert Gunter
eek!

Chel Hee,anything that complicated should engender fear and trembling.

Much simpler and more efficient (if I understand correctly)

i <- seq.int(1L,length(ID1),by = 2L)
paste0(ID1[i],ID1[i+1])

That gives a vector of paired letters. If you want a single character
string, just collapse with a " " (space):

paste0(ID1[i],ID1[i+1],collapse= " ")

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Wed, Jan 28, 2015 at 7:41 PM, Chel Hee Lee <[hidden email]> wrote:

> I am using just the first row of your data (i.e. ID1).
>
>> ID1 <- c("A", "A", "T", "G", "C", "T", "G", "C", "G", "T", "C", "G", "T",
>> "A")
>> do.call(c,lapply(tapply(ID1, gl(7,2), c), paste, collapse=""))
>    1    2    3    4    5    6    7
> "AA" "TG" "CT" "GC" "GT" "CG" "TA"
>>
>
> Is this what you are looking for?  I hope this helps.
>
> Chel Hee Lee
>
>
> On 01/28/2015 05:55 PM, Kate Ignatius wrote:
>>
>> I have genetic data as follows (simple example, actual data is much
>> larger):
>>
>> comb =
>>
>> ID1 A A T G C T G C G T C G T A
>>
>> ID2 G C T G C C T G C T G T T T
>>
>> And I wish to get an output like this:
>>
>> ID1 AA TG CT GC GT CG TA
>>
>> ID2 GC TG CC TG CT GT TT
>>
>> That is, paste every two columns together.
>>
>> I have this code, but I get the error:
>>
>> Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
>>
>> conc <- function(x) {
>>    s <- seq(2, nchar(x), 2)
>>    paste0(x[s], x[s+1])
>> }
>>
>> combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
>>
>> Thanks in advance!
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Paste every two columns together

gnustats
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Paste every two columns together

JohnJPosner
Kate, here's a solution that uses regular expressions, rather than vector manipulation:

> mystr = "ID1 A A T G C T G C G T C G T A"
> gsub(" ([ACGT]) ([ACGT])", " \\1\\2", mystr)
[1] "ID1 AA TG CT GC GT CG TA"

-John


> -----Original Message-----
> From: R-help [mailto:[hidden email]] On Behalf Of Chel Hee
> Lee
> Sent: Wednesday, January 28, 2015 11:07 PM
> To: Bert Gunter
> Cc: r-help
> Subject: Re: [R] Paste every two columns together
>
> Hi Bert! yes, you are VERY correct!!!  Why am I making this simple thing so
> complicated???  ;) Thank you so much for your nice lesson!
>
> Chel Hee Lee
>
> On 01/28/2015 09:59 PM, Bert Gunter wrote:
> > eek!
> >
> > Chel Hee,anything that complicated should engender fear and trembling.
> >
> > Much simpler and more efficient (if I understand correctly)
> >
> > i <- seq.int(1L,length(ID1),by = 2L)
> > paste0(ID1[i],ID1[i+1])
> >
> > That gives a vector of paired letters. If you want a single character
> > string, just collapse with a " " (space):
> >
> > paste0(ID1[i],ID1[i+1],collapse= " ")
> >
> > Cheers,
> > Bert
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> > (650) 467-7374
> >
> > "Data is not information. Information is not knowledge. And knowledge
> > is certainly not wisdom."
> > Clifford Stoll
> >
> >
> >
> >
> > On Wed, Jan 28, 2015 at 7:41 PM, Chel Hee Lee <[hidden email]>
> wrote:
> >> I am using just the first row of your data (i.e. ID1).
> >>
> >>> ID1 <- c("A", "A", "T", "G", "C", "T", "G", "C", "G", "T", "C", "G",
> >>> "T",
> >>> "A")
> >>> do.call(c,lapply(tapply(ID1, gl(7,2), c), paste, collapse=""))
> >>     1    2    3    4    5    6    7
> >> "AA" "TG" "CT" "GC" "GT" "CG" "TA"
> >>>
> >>
> >> Is this what you are looking for?  I hope this helps.
> >>
> >> Chel Hee Lee
> >>
> >>
> >> On 01/28/2015 05:55 PM, Kate Ignatius wrote:
> >>>
> >>> I have genetic data as follows (simple example, actual data is much
> >>> larger):
> >>>
> >>> comb =
> >>>
> >>> ID1 A A T G C T G C G T C G T A
> >>>
> >>> ID2 G C T G C C T G C T G T T T
> >>>
> >>> And I wish to get an output like this:
> >>>
> >>> ID1 AA TG CT GC GT CG TA
> >>>
> >>> ID2 GC TG CC TG CT GT TT
> >>>
> >>> That is, paste every two columns together.
> >>>
> >>> I have this code, but I get the error:
> >>>
> >>> Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
> >>>
> >>> conc <- function(x) {
> >>>     s <- seq(2, nchar(x), 2)
> >>>     paste0(x[s], x[s+1])
> >>> }
> >>>
> >>> combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
> >>>
> >>> Thanks in advance!
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Paste every two columns together

djmuseR
In reply to this post by Beejai
Hi:

Don't know about performance, but this is fairly simple for operating
on atomic vectors:

x <- c("A", "A", "G", "T", "C", "G")
apply(embed(x, 2), 1, paste0, collapse = "")
[1] "AA" "GA" "TG" "CT" "GC"

Check the help page of embed() for details.

Dennis

On Wed, Jan 28, 2015 at 3:55 PM, Kate Ignatius <[hidden email]> wrote:

> I have genetic data as follows (simple example, actual data is much larger):
>
> comb =
>
> ID1 A A T G C T G C G T C G T A
>
> ID2 G C T G C C T G C T G T T T
>
> And I wish to get an output like this:
>
> ID1 AA TG CT GC GT CG TA
>
> ID2 GC TG CC TG CT GT TT
>
> That is, paste every two columns together.
>
> I have this code, but I get the error:
>
> Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
>
> conc <- function(x) {
>   s <- seq(2, nchar(x), 2)
>   paste0(x[s], x[s+1])
> }
>
> combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
>
> Thanks in advance!
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.