sample and rearrange

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

sample and rearrange

Laetitia Schmid-2
Dear Wu Gong and Peter Ehlers,
thank you very much for your help debugging my script.

Now I have a general following up question:
Is there a straightforward way to rearrange the following dataset so  
that all first letters of each column will be combined in one column,  
all the second letters in a second column, all the third ones in a  
third column and so on, resulting in 7 columns,
i.e. for the first individual (GM920222) GGGG AAAA TTTT TTAA GGGG CCAA  
CCCC ?

Thank you very much,
Laetitia

SampleID A1 A2 A3 A4
GM920222 GATTGCC GATTGCC GATAGAC GATAGAC
GM930040 GTCATCA GAGTGCA ACTATAA GATTGCC
GM930040 GTCATCA GAGTGCA ACTATAA GATTGCC
GM960023 GATTGCC GTCATCA GATTGCC GATTGCC
GM920224 ACTAGAA GTCATCA GTCATCA ACTAGAA
GM920224 ACTAGAA GTCATCA GTCATCA ACTAGAA
GM920034 GATTGCC GTCATCA GATTGCA GATTGCA
GM920096 GATTGCC GATTGCC GATTGCA GATTGCC
GM930029 GTCATCA GATTGCC GTCATCA GATTGCC
GM940031 GATTGCC GAGTGCA GATTGCA ACTAGAA
GM960028 GATTGCC GAGTGCA GATTGCA ACTAGAA
GM980007 GTCATCA GATTGCC ACTTGAA GTCATCA
GM970009 ACTAGAA GTCAGAA GTCAGCA ACTAGCA
GM930026 ACTAGAA GAGTGCA GAGTGCA ACTAGAA
GM920031 GATTGCC GTCATCA GATTGCC GATTGCC
GM990105 GATTGCC GATTGCC GTCAGCA GTCAGCA
GM920202 GATTGCC GATTGCC GATTGCC GATTGCC
GM920089 GAGTGCA GTCAGAA ACTATCA GATTGCC
GM980051 ACTAGAA ACTAGAA GATAGCC GATAGCC
GM930109 GTCATCA GAGTGCA GTTTTAA ACTAGAA
GM940039 GTCATCA GAGTGCA GTTTGCC ACTTTCA
GM050099 GAGTGCA GTCAGAA GTTATCC ACTTTCA
GM050099 GAGTGCA GTCAGAA GTTATCC ACTTTCA
GM030005 ACTAGAA GAGTGCA ACTAGAA ACTAGAA
GM050009 ACTAGAA GATTGCC GATTGCC ACTAGAA
GM990027 GATTGCC GAGTGCA GATTGCA GATTGCC
GM990066 GATTGCC GTCATCA GTCATCA GATTGCC

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: sample and rearrange

jholtman
try this:

> x <- read.table(textConnection("SampleID        A1      A2      A3      A4
+ GM920222        GATTGCC GATTGCC GATAGAC GATAGAC
+ GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
+ GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
+ GM960023        GATTGCC GTCATCA GATTGCC GATTGCC
+ GM920224        ACTAGAA GTCATCA GTCATCA ACTAGAA
+ GM920224        ACTAGAA GTCATCA GTCATCA ACTAGAA
+ GM920034        GATTGCC GTCATCA GATTGCA GATTGCA
+ GM920096        GATTGCC GATTGCC GATTGCA GATTGCC
+ GM930029        GTCATCA GATTGCC GTCATCA GATTGCC
+ GM940031        GATTGCC GAGTGCA GATTGCA ACTAGAA
+ GM960028        GATTGCC GAGTGCA GATTGCA ACTAGAA
+ GM980007        GTCATCA GATTGCC ACTTGAA GTCATCA
+ GM970009        ACTAGAA GTCAGAA GTCAGCA ACTAGCA
+ GM930026        ACTAGAA GAGTGCA GAGTGCA ACTAGAA
+ GM920031        GATTGCC GTCATCA GATTGCC GATTGCC
+ GM990105        GATTGCC GATTGCC GTCAGCA GTCAGCA
+ GM920202        GATTGCC GATTGCC GATTGCC GATTGCC
+ GM920089        GAGTGCA GTCAGAA ACTATCA GATTGCC
+ GM980051        ACTAGAA ACTAGAA GATAGCC GATAGCC
+ GM930109        GTCATCA GAGTGCA GTTTTAA ACTAGAA
+ GM940039        GTCATCA GAGTGCA GTTTGCC ACTTTCA
+ GM050099        GAGTGCA GTCAGAA GTTATCC ACTTTCA
+ GM050099        GAGTGCA GTCAGAA GTTATCC ACTTTCA
+ GM030005        ACTAGAA GAGTGCA ACTAGAA ACTAGAA
+ GM050009        ACTAGAA GATTGCC GATTGCC ACTAGAA
+ GM990027        GATTGCC GAGTGCA GATTGCA GATTGCC"), header=TRUE, as.is=TRUE)
> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+     # separate characters
+     z <- do.call(rbind, strsplit(.row[-1], ''))
+     # combine each column
+     z.col <- t(apply(z, 2, paste, collapse=''))
+     # add the ID
+     cbind(.row[1], z.col)
+ }))
      [,1]       [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
 [1,] "GM920222" "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
 [2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
 [3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
 [4,] "GM960023" "GGGG" "ATAA" "TCTT" "TATT" "GTGG" "CCCC" "CACC"
 [5,] "GM920224" "AGGA" "CTTC" "TCCT" "AAAA" "GTTG" "ACCA" "AAAA"
 [6,] "GM920224" "AGGA" "CTTC" "TCCT" "AAAA" "GTTG" "ACCA" "AAAA"
 [7,] "GM920034" "GGGG" "ATAA" "TCTT" "TATT" "GTGG" "CCCC" "CAAA"
 [8,] "GM920096" "GGGG" "AAAA" "TTTT" "TTTT" "GGGG" "CCCC" "CCAC"
 [9,] "GM930029" "GGGG" "TATA" "CTCT" "ATAT" "TGTG" "CCCC" "ACAC"
[10,] "GM940031" "GGGA" "AAAC" "TGTT" "TTTA" "GGGG" "CCCA" "CAAA"
[11,] "GM960028" "GGGA" "AAAC" "TGTT" "TTTA" "GGGG" "CCCA" "CAAA"
[12,] "GM980007" "GGAG" "TACT" "CTTC" "ATTA" "TGGT" "CCAC" "ACAA"
[13,] "GM970009" "AGGA" "CTTC" "TCCT" "AAAA" "GGGG" "AACC" "AAAA"
[14,] "GM930026" "AGGA" "CAAC" "TGGT" "ATTA" "GGGG" "ACCA" "AAAA"
[15,] "GM920031" "GGGG" "ATAA" "TCTT" "TATT" "GTGG" "CCCC" "CACC"
[16,] "GM990105" "GGGG" "AATT" "TTCC" "TTAA" "GGGG" "CCCC" "CCAA"
[17,] "GM920202" "GGGG" "AAAA" "TTTT" "TTTT" "GGGG" "CCCC" "CCCC"
[18,] "GM920089" "GGAG" "ATCA" "GCTT" "TAAT" "GGTG" "CACC" "AAAC"
[19,] "GM980051" "AAGG" "CCAA" "TTTT" "AAAA" "GGGG" "AACC" "AACC"
[20,] "GM930109" "GGGA" "TATC" "CGTT" "ATTA" "TGTG" "CCAA" "AAAA"
[21,] "GM940039" "GGGA" "TATC" "CGTT" "ATTT" "TGGT" "CCCC" "AACA"
[22,] "GM050099" "GGGA" "ATTC" "GCTT" "TAAT" "GGTT" "CACC" "AACA"
[23,] "GM050099" "GGGA" "ATTC" "GCTT" "TAAT" "GGTT" "CACC" "AACA"
[24,] "GM030005" "AGAA" "CACC" "TGTT" "ATAA" "GGGG" "ACAA" "AAAA"
[25,] "GM050009" "AGGA" "CAAC" "TTTT" "ATTA" "GGGG" "ACCA" "ACCA"
[26,] "GM990027" "GGGG" "AAAA" "TGTT" "TTTT" "GGGG" "CCCC" "CAAC"


On Wed, May 19, 2010 at 8:29 AM, Laetitia Schmid <[hidden email]> wrote:

> Dear Wu Gong and Peter Ehlers,
> thank you very much for your help debugging my script.
>
> Now I have a general following up question:
> Is there a straightforward way to rearrange the following dataset so that
> all first letters of each column will be combined in one column, all the
> second letters in a second column, all the third ones in a third column and
> so on, resulting in 7 columns,
> i.e. for the first individual (GM920222) GGGG AAAA TTTT TTAA GGGG CCAA CCCC
> ?
>
> Thank you very much,
> Laetitia
>
> SampleID        A1      A2      A3      A4
> GM920222        GATTGCC GATTGCC GATAGAC GATAGAC
> GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
> GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
> GM960023        GATTGCC GTCATCA GATTGCC GATTGCC
> GM920224        ACTAGAA GTCATCA GTCATCA ACTAGAA
> GM920224        ACTAGAA GTCATCA GTCATCA ACTAGAA
> GM920034        GATTGCC GTCATCA GATTGCA GATTGCA
> GM920096        GATTGCC GATTGCC GATTGCA GATTGCC
> GM930029        GTCATCA GATTGCC GTCATCA GATTGCC
> GM940031        GATTGCC GAGTGCA GATTGCA ACTAGAA
> GM960028        GATTGCC GAGTGCA GATTGCA ACTAGAA
> GM980007        GTCATCA GATTGCC ACTTGAA GTCATCA
> GM970009        ACTAGAA GTCAGAA GTCAGCA ACTAGCA
> GM930026        ACTAGAA GAGTGCA GAGTGCA ACTAGAA
> GM920031        GATTGCC GTCATCA GATTGCC GATTGCC
> GM990105        GATTGCC GATTGCC GTCAGCA GTCAGCA
> GM920202        GATTGCC GATTGCC GATTGCC GATTGCC
> GM920089        GAGTGCA GTCAGAA ACTATCA GATTGCC
> GM980051        ACTAGAA ACTAGAA GATAGCC GATAGCC
> GM930109        GTCATCA GAGTGCA GTTTTAA ACTAGAA
> GM940039        GTCATCA GAGTGCA GTTTGCC ACTTTCA
> GM050099        GAGTGCA GTCAGAA GTTATCC ACTTTCA
> GM050099        GAGTGCA GTCAGAA GTTATCC ACTTTCA
> GM030005        ACTAGAA GAGTGCA ACTAGAA ACTAGAA
> GM050009        ACTAGAA GATTGCC GATTGCC ACTAGAA
> GM990027        GATTGCC GAGTGCA GATTGCA GATTGCC
> GM990066        GATTGCC GTCATCA GTCATCA GATTGCC
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: sample and rearrange

Wu Gong
It took me a day to make the sense of Jim's code :(

Hope my comments will help.

## Transform data to matrix
x <- as.matrix(x)

## Apply function to each row
## Create a function to rearrange bases
result <- apply(x, 1, function(eachrow){

## Split each gene to bases
## Exclude the fist column which is id
        bases <- strsplit(eachrow[-1], '')
       
## Transform list to matrix
## Because the result of function strsplit is a list
        bases <- do.call(rbind,bases)
       
## Recombine bases by connecting all bases in each column
        recombine <- apply(bases, 2, paste, collapse="")
       
## Add id
## Transpos recombine
        cbind(eachrow[1], t(recombine))
 })

## Transpose the result matrix
result <- t(result)
Reply | Threaded
Open this post in threaded view
|

Re: sample and rearrange

David Winsemius

On May 19, 2010, at 5:01 PM, Wu Gong wrote:

>
> It took me a day to make the sense of Jim's code :(
>
> Hope my comments will help.
>
> ## Transform data to matrix
> x <- as.matrix(x)
>
> ## Apply function to each row
> ## Create a function to rearrange bases
> result <- apply(x, 1, function(eachrow){
>
> ## Split each gene to bases
> ## Exclude the fist column which is id
> bases <- strsplit(eachrow[-1], '')
>
> ## Transform list to matrix
> ## Because the result of function strsplit is a list
> bases <- do.call(rbind,bases)
>
> ## Recombine bases by connecting all bases in each column
> recombine <- apply(bases, 2, paste, collapse="")
>
> ## Add id
> ## Transpos recombine
> cbind(eachrow[1], t(recombine))
> })
>
> ## Transpose the result matrix
> result <- t(result)

It will come more quickly as you learn more. I also looked at Jimm's  
solution by pulling it apart, although I did not spend a whole day at  
it, maybe ten minutes. I thought a three line version was more  
informative, because it did not make everything scroll of the console:

 > x <- read.table(textConnection("SampleID        A1      A2      
A3      A4
+  GM920222        GATTGCC GATTGCC GATAGAC GATAGAC
+  GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
+  GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC"), header=TRUE,  
as.is=TRUE)
 > x <- as.matrix(x)
 > t(apply(x, 1, function(.row){
+      # separate characters
+      z <- do.call(rbind, strsplit(.row[-1], ''))
+      # combine each column
+      z.col <- t(apply(z, 2, paste, collapse=''))
+      # add the ID
+      cbind(.row[1], z.col)
+  }))
      [,1]       [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
[1,] "GM920222" "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"

# I usually see if I can get the inner-most function to work:

 > z <- do.call(rbind, strsplit(x[1,], ''))
Warning message:
In function (..., deparse.level = 1)  :
   number of columns of result is not a multiple of vector length (arg  
2)
 > z
          [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
SampleID "G"  "M"  "9"  "2"  "0"  "2"  "2"  "2"

#So I guess I didn't get an exact replica since Jim had excluded the  
first element in the row

A1       "G"  "A"  "T"  "T"  "G"  "C"  "C"  "G"
A2       "G"  "A"  "T"  "T"  "G"  "C"  "C"  "G"
A3       "G"  "A"  "T"  "A"  "G"  "A"  "C"  "G"
A4       "G"  "A"  "T"  "A"  "G"  "A"  "C"  "G"
 > z <- do.call(rbind, strsplit(x[1,-1], ''))  # there ... cleaner
 > z
    [,1] [,2] [,3] [,4] [,5] [,6] [,7]
A1 "G"  "A"  "T"  "T"  "G"  "C"  "C"
A2 "G"  "A"  "T"  "T"  "G"  "C"  "C"
A3 "G"  "A"  "T"  "A"  "G"  "A"  "C"
A4 "G"  "A"  "T"  "A"  "G"  "A"  "C"

That seemed to help understand what was going on in the middle of the  
functions. Now I wondered if the transpose could be avoided. So I  
tried cbind instead of rbind:

 > z <- do.call(cbind, strsplit(x[1,-1], ''))
 > z
      A1  A2  A3  A4
[1,] "G" "G" "G" "G"
[2,] "A" "A" "A" "A"
[3,] "T" "T" "T" "T"
[4,] "T" "T" "A" "A"
[5,] "G" "G" "G" "G"
[6,] "C" "C" "A" "A"
[7,] "C" "C" "C" "C"
 > z.col <- apply(z, 2, paste, collapse='')
 > z.col
        A1        A2        A3        A4
"GATTGCC" "GATTGCC" "GATAGAC" "GATAGAC"

## Nope that does not work:
## So try apply on the columns ...
 > z.col <- apply(z, 1, paste, collapse='')
 > z.col
[1] "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"

## OK that worked. Now see if it works inside the whole sequence:

 > x <- as.matrix(x)
 > t(apply(x, 1, function(.row){
+      # separate characters
+      z <- do.call(cbind, strsplit(.row[-1], ''))
+      # combine each column
+      z.col <- apply(z, 1, paste, collapse='')
+      # add the ID
+      cbind(.row[1], z.col)
+  }))
      [,1]       [,2]       [,3]       [,4]       [,5]       [,
6]       [,7]
[1,] "GM920222" "GM920222" "GM920222" "GM920222" "GM920222" "GM920222"  
"GM920222"
[2,] "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040"  
"GM930040"
[3,] "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040"  
"GM930040"

Well not exactly.
      [,8]   [,9]   [,10]  [,11]  [,12]  [,13]  [,14]
[1,] "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
 > x <- as.matrix(x)
 > t(apply(x, 1, function(.row){
+      # separate characters
+      z <- do.call(cbind, strsplit(.row[-1], ''))
+      # combine each column
+      z.col <- apply(z, 1, paste, collapse='')
+      # add the ID
## and add the transpose columns:
+      cbind(.row[1], t(z.col))
+  }))
      [,1]       [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
[1,] "GM920222" "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"

So I got to the same place but didn't really achieve any savings.

>
> -----
> A R learner.


David "also still learning" Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: sample and rearrange

Wu Gong
I tried to use a separate function to make the code more understandable. But I failed. I don't know what's wrong with the code.

x <- as.matrix(x)

rearrange <- function(.row){
        z <- do.call(rbind, strsplit(.row[-1], ''))
        z.col <- t(apply(z, 2, paste, collapse=''))
        cbind(.row[1], z.col)
        }
       
t(apply(x, 1, rearrange(.row)))

Error in strsplit(.row[-1], "") : object '.row' not found

I don't know how to pass the value to the function.
Reply | Threaded
Open this post in threaded view
|

Re: sample and rearrange

David Winsemius

On May 19, 2010, at 7:47 PM, Wu Gong wrote:

>
> I tried to use a separate function to make the code more  
> understandable. But
> I failed. I don't know what's wrong with the code.
>
> x <- as.matrix(x)
>
> rearrange <- function(.row){
> z <- do.call(rbind, strsplit(.row[-1], ''))
> z.col <- t(apply(z, 2, paste, collapse=''))
> cbind(.row[1], z.col)
> }
>
> t(apply(x, 1, rearrange(.row)))
>
> Error in strsplit(.row[-1], "") : object '.row' not found

The error occurs because apply is sending a single row at a time, but  
it is not named .row. Your code _does_ work, but only if you use it  
thusly:

t(apply(x, 1, rearrange))


>
> I don't know how to pass the value to the function.

You may not, ... but R knows how.
>
> -----
> A R learner.
> --
--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: sample and rearrange

jholtman
In reply to this post by Wu Gong
you just need the function name; the parameter is being supplied by the lapply:

 t(apply(x, 1, rearrange))

On Wed, May 19, 2010 at 7:47 PM, Wu Gong <[hidden email]> wrote:

>
> I tried to use a separate function to make the code more understandable. But
> I failed. I don't know what's wrong with the code.
>
> x <- as.matrix(x)
>
> rearrange <- function(.row){
>        z <- do.call(rbind, strsplit(.row[-1], ''))
>        z.col <- t(apply(z, 2, paste, collapse=''))
>        cbind(.row[1], z.col)
>        }
>
> t(apply(x, 1, rearrange(.row)))
>
> Error in strsplit(.row[-1], "") : object '.row' not found
>
> I don't know how to pass the value to the function.
>
> -----
> A R learner.
> --
> View this message in context: http://r.789695.n4.nabble.com/sample-and-rearrange-tp2222747p2223767.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: sample and rearrange

Wu Gong
In reply to this post by David Winsemius
Thank David and Jim.
I got it.