Compare data in two rows and replace objects in data frame

classic Classic list List threaded Threaded
5 messages Options
raz
Reply | Threaded
Open this post in threaded view
|

Compare data in two rows and replace objects in data frame

raz
Dear all,

I have a data frame 144 x 20000 values.
I need to take every value in the first row and compare to the second row,
and the same for rows 3-4 and 5-6 and so on.
the output should be one line for each of the two row comparison.
the comparison is:
if row1==1 and row2==1 <-'HT'
if row1==1 and row2==0 <-'A'
if row1==0 and row2==1 <-'B'
if row1==1 and row2=='-' <-'Aht'
if row1=='-' and row2==1 <-'Bht'

for example:
if the data is:
CloneID    genotype 2001    genotype 2002    genotype 2003
2471250    1    1    1
2471250    0    0    0
2433062    0    0    0
2433062    1    1    1
100021605    1    1    0
100021605    1    0    1
100005599    1    1    0
100005599    1    1    1
100002798    1    1    0
100002798    1    1    1

then the output should be:
CloneID    genotype 2001    genotype 2002    genotype 2003
2471250    A    A    A
2433062    B    B    B
100021605    HT    A    B
100005599    HT    HT    B
100002798    HT    HT    B

I tried this for the whole data, but its so slow:

AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)


for (i in seq(1,nrow(AX),by=2)){
for (j in 6:144){
if (AX[i,j]==1 & AX[i+1,j]==0){
AX[i,j]<-'A'
}
if (AX[i,j]==0 & AX[i+1,j]==1){
AX[i,j]<-'B'
}
if (AX[i,j]==1 & AX[i+1,j]==1){
AX[i,j]<-'HT'
}
if (AX[i,j]==1 & AX[i+1,j]=="-"){
AX[i,j]<-'Aht'
}
if (AX[i,j]=="-" & AX[i+1,j]==1){
AX[i,j]<-'Bht'
}
}
}

AX1<-AX[!duplicated(AX[,3]),]
AX2<-AX[duplicated(AX[,3]),]

Thanks for any help,

Raz



--
\m/

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Compare data in two rows and replace objects in data frame

Gerrit Eichner
Hello, Raz,

if X is the data frame that contains your data, then using sort of an
"indexing trick" to circumvent your numerous if-statements as in

aggregate( X[ c( "genotype 2001", "genotype 2002", "genotype 2003")],
            X[ "CloneID"],
            FUN = function( x)
                   c( "11" = "HT",
                      "10" = "A",
                      "01" = "B",
                      "1-" = "Aht",
                      "-1" = "Bht")[ paste( x, collapse = "")])

presumably does what you want (and can certainly be improved).

Hth  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
[hidden email]   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109        http://www.uni-giessen.de/cms/eichner
---------------------------------------------------------------------

On Mon, 4 Aug 2014, raz wrote:

> Dear all,
>
> I have a data frame 144 x 20000 values.
> I need to take every value in the first row and compare to the second row,
> and the same for rows 3-4 and 5-6 and so on.
> the output should be one line for each of the two row comparison.
> the comparison is:
> if row1==1 and row2==1 <-'HT'
> if row1==1 and row2==0 <-'A'
> if row1==0 and row2==1 <-'B'
> if row1==1 and row2=='-' <-'Aht'
> if row1=='-' and row2==1 <-'Bht'
>
> for example:
> if the data is:
> CloneID    genotype 2001    genotype 2002    genotype 2003
> 2471250    1    1    1
> 2471250    0    0    0
> 2433062    0    0    0
> 2433062    1    1    1
> 100021605    1    1    0
> 100021605    1    0    1
> 100005599    1    1    0
> 100005599    1    1    1
> 100002798    1    1    0
> 100002798    1    1    1
>
> then the output should be:
> CloneID    genotype 2001    genotype 2002    genotype 2003
> 2471250    A    A    A
> 2433062    B    B    B
> 100021605    HT    A    B
> 100005599    HT    HT    B
> 100002798    HT    HT    B
>
> I tried this for the whole data, but its so slow:
>
> AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)
>
>
> for (i in seq(1,nrow(AX),by=2)){
> for (j in 6:144){
> if (AX[i,j]==1 & AX[i+1,j]==0){
> AX[i,j]<-'A'
> }
> if (AX[i,j]==0 & AX[i+1,j]==1){
> AX[i,j]<-'B'
> }
> if (AX[i,j]==1 & AX[i+1,j]==1){
> AX[i,j]<-'HT'
> }
> if (AX[i,j]==1 & AX[i+1,j]=="-"){
> AX[i,j]<-'Aht'
> }
> if (AX[i,j]=="-" & AX[i+1,j]==1){
> AX[i,j]<-'Bht'
> }
> }
> }
>
> AX1<-AX[!duplicated(AX[,3]),]
> AX2<-AX[duplicated(AX[,3]),]
>
> Thanks for any help,
>
> Raz
>
>
>
> --
> \m/
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Compare data in two rows and replace objects in data frame

arun kirshna
In reply to this post by raz
You could try data.table

#dat is the dataset


library(data.table)
v1 <- setNames(c("HT", "A", "B", "Aht", "Bht"), c("11", "10", "01", "1-", "-1"))
dat2 <- setDT(dat1)[, lapply(.SD, function(x) v1[paste(x, collapse="")]), by=CloneID]

A.K.




On Monday, August 4, 2014 5:55 AM, raz <[hidden email]> wrote:
Dear all,

I have a data frame 144 x 20000 values.
I need to take every value in the first row and compare to the second row,
and the same for rows 3-4 and 5-6 and so on.
the output should be one line for each of the two row comparison.
the comparison is:
if row1==1 and row2==1 <-'HT'
if row1==1 and row2==0 <-'A'
if row1==0 and row2==1 <-'B'
if row1==1 and row2=='-' <-'Aht'
if row1=='-' and row2==1 <-'Bht'

for example:
if the data is:
CloneID    genotype 2001    genotype 2002    genotype 2003
2471250    1    1    1
2471250    0    0    0
2433062    0    0    0
2433062    1    1    1
100021605    1    1    0
100021605    1    0    1
100005599    1    1    0
100005599    1    1    1
100002798    1    1    0
100002798    1    1    1

then the output should be:
CloneID    genotype 2001    genotype 2002    genotype 2003
2471250    A    A    A
2433062    B    B    B
100021605    HT    A    B
100005599    HT    HT    B
100002798    HT    HT    B

I tried this for the whole data, but its so slow:

AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)


for (i in seq(1,nrow(AX),by=2)){
for (j in 6:144){
if (AX[i,j]==1 & AX[i+1,j]==0){
AX[i,j]<-'A'
}
if (AX[i,j]==0 & AX[i+1,j]==1){
AX[i,j]<-'B'
}
if (AX[i,j]==1 & AX[i+1,j]==1){
AX[i,j]<-'HT'
}
if (AX[i,j]==1 & AX[i+1,j]=="-"){
AX[i,j]<-'Aht'
}
if (AX[i,j]=="-" & AX[i+1,j]==1){
AX[i,j]<-'Bht'
}
}
}

AX1<-AX[!duplicated(AX[,3]),]
AX2<-AX[duplicated(AX[,3]),]

Thanks for any help,

Raz



--
\m/

    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Compare data in two rows and replace objects in data frame

John McKown
In reply to this post by raz
On Mon, Aug 4, 2014 at 4:53 AM, raz <[hidden email]> wrote:

> Dear all,
>
> I have a data frame 144 x 20000 values.
> I need to take every value in the first row and compare to the second row,
> and the same for rows 3-4 and 5-6 and so on.
> the output should be one line for each of the two row comparison.
> the comparison is:
> if row1==1 and row2==1 <-'HT'
> if row1==1 and row2==0 <-'A'
> if row1==0 and row2==1 <-'B'
> if row1==1 and row2=='-' <-'Aht'
> if row1=='-' and row2==1 <-'Bht'
>
> for example:
> if the data is:
> CloneID    genotype 2001    genotype 2002    genotype 2003
> 2471250    1    1    1
> 2471250    0    0    0
> 2433062    0    0    0
> 2433062    1    1    1
> 100021605    1    1    0
> 100021605    1    0    1
> 100005599    1    1    0
> 100005599    1    1    1
> 100002798    1    1    0
> 100002798    1    1    1
>
> then the output should be:
> CloneID    genotype 2001    genotype 2002    genotype 2003
> 2471250    A    A    A
> 2433062    B    B    B
> 100021605    HT    A    B
> 100005599    HT    HT    B
> 100002798    HT    HT    B
>
> I tried this for the whole data, but its so slow:
>
> AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)
>
>
> for (i in seq(1,nrow(AX),by=2)){
> for (j in 6:144){
> if (AX[i,j]==1 & AX[i+1,j]==0){
> AX[i,j]<-'A'
> }
> if (AX[i,j]==0 & AX[i+1,j]==1){
> AX[i,j]<-'B'
> }
> if (AX[i,j]==1 & AX[i+1,j]==1){
> AX[i,j]<-'HT'
> }
> if (AX[i,j]==1 & AX[i+1,j]=="-"){
> AX[i,j]<-'Aht'
> }
> if (AX[i,j]=="-" & AX[i+1,j]==1){
> AX[i,j]<-'Bht'
> }
> }
> }
>
> AX1<-AX[!duplicated(AX[,3]),]
> AX2<-AX[duplicated(AX[,3]),]
>
> Thanks for any help,
>
> Raz

I don't know if you've received a solution as yet. Below is my generic
solution. I don't know how fast it will be, but it does _NOT_ do any
looping. It does do a few if functions. The result is in the variable
new_data. The variables data_odd and data_even are temporaries which
can be removed. Or you can wrap the code up in a function which
returns new_data and they will simply "go away" when the function
ends.

#
# Read in the data
data <- read.csv(file="data.csv",header=TRUE,stringsAsFactors=FALSE);
#
# The criteria
#if row1==1 and row2==1 <-'HT'
#if row1==1 and row2==0 <-'A'
#if row1==0 and row2==1 <-'B'
#if row1==1 and row2=='-' <-'Aht'
#if row1=='-' and row2==1 <-'Bht'
#
# The following assumes that data is properly ordered!
data$rowNumber <- seq(1:nrow(data));
data_odd <-data[data$rowNumber %% 2 == 1,];
data_even <-data[data$rowNumber %% 2 == 0,];
#
# You really need to make sure that
# the CloneID values are correct in data_odd
# and data_even. Something like:
stopifnot(data_odd$CloneID == data_even$CloneID);
CloneIDs <- data_even[,1]; # Get the list of CloneIDs
#data_even[,1] <- NULL; # Remove CloneIDs from even data
#data_odd[,1] <- NULL;  # And also from odd data
#
# Initialize new_data - make everything NA so
# it will stick out later!
new_data <- data_even;
new_data[,colnames(data_even)] <- NA;
#
new_data[data_odd == 1 & data_odd ==1] <- 'HT';
new_data[data_odd == 1 & data_even == 0] <- 'A';
new_data[data_odd == 0 & data_even == 1] <- 'B';
new_data[data_odd == 1 & data_even == '.'] <- 'Aht';
new_data[data_odd == '-' & data_even == 1] <- 'Bht';
new_data$CloneID <- CloneIDs;
new_data$rowNumber<-NULL;
#
#stopifnot( !is.na(new_data)); # Make sure no NAs left




--
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Compare data in two rows and replace objects in data frame

jholtman
here is another way of doing it using 'tidyr' and 'dplyr'


> x <- read.table(text = "CloneID    genotype2001    genotype2002    genotype2003
+ 2471250    1    1    1
+ 2471250    0    0    0
+ 2433062    0    0    0
+ 2433062    1    1    1
+ 100021605    1    1    0
+ 100021605    1    0    1
+ 100005599    1    1    0
+ 100005599    1    1    1
+ 100002798    1    1    0
+ 100002798    1    1    1", header = TRUE, as.is = TRUE)
> # translation key
> keyTrans <- c(`11` = 'HT'
+       , `10` = "A"
+       , `01` = "B"
+       , `1-` = "Aht"
+       , `-1` = "Bht"
+       )
> require(dplyr)
> require(tidyr)
> x %>%
+     gather(key, val, -CloneID) %>%  # 'melt' the data
+     group_by(CloneID, key) %>%  # group by CloneID
+     summarise(newKey = paste0(val, collapse = '')) %>%  # add concat
to two rows
+     mutate(newVal = keyTrans[newKey]) %>%  # add the new value
+     select(-newKey) %>%  # remove newKey for output
+     spread(key, newVal)
Source: local data frame [5 x 4]

    CloneID genotype2001 genotype2002 genotype2003
1   2433062            B            B            B
2   2471250            A            A            A
3 100002798           HT           HT            B
4 100005599           HT           HT            B
5 100021605           HT            A            B

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Aug 4, 2014 at 2:21 PM, John McKown
<[hidden email]> wrote:

> On Mon, Aug 4, 2014 at 4:53 AM, raz <[hidden email]> wrote:
>> Dear all,
>>
>> I have a data frame 144 x 20000 values.
>> I need to take every value in the first row and compare to the second row,
>> and the same for rows 3-4 and 5-6 and so on.
>> the output should be one line for each of the two row comparison.
>> the comparison is:
>> if row1==1 and row2==1 <-'HT'
>> if row1==1 and row2==0 <-'A'
>> if row1==0 and row2==1 <-'B'
>> if row1==1 and row2=='-' <-'Aht'
>> if row1=='-' and row2==1 <-'Bht'
>>
>> for example:
>> if the data is:
>> CloneID    genotype 2001    genotype 2002    genotype 2003
>> 2471250    1    1    1
>> 2471250    0    0    0
>> 2433062    0    0    0
>> 2433062    1    1    1
>> 100021605    1    1    0
>> 100021605    1    0    1
>> 100005599    1    1    0
>> 100005599    1    1    1
>> 100002798    1    1    0
>> 100002798    1    1    1
>>
>> then the output should be:
>> CloneID    genotype 2001    genotype 2002    genotype 2003
>> 2471250    A    A    A
>> 2433062    B    B    B
>> 100021605    HT    A    B
>> 100005599    HT    HT    B
>> 100002798    HT    HT    B
>>
>> I tried this for the whole data, but its so slow:
>>
>> AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)
>>
>>
>> for (i in seq(1,nrow(AX),by=2)){
>> for (j in 6:144){
>> if (AX[i,j]==1 & AX[i+1,j]==0){
>> AX[i,j]<-'A'
>> }
>> if (AX[i,j]==0 & AX[i+1,j]==1){
>> AX[i,j]<-'B'
>> }
>> if (AX[i,j]==1 & AX[i+1,j]==1){
>> AX[i,j]<-'HT'
>> }
>> if (AX[i,j]==1 & AX[i+1,j]=="-"){
>> AX[i,j]<-'Aht'
>> }
>> if (AX[i,j]=="-" & AX[i+1,j]==1){
>> AX[i,j]<-'Bht'
>> }
>> }
>> }
>>
>> AX1<-AX[!duplicated(AX[,3]),]
>> AX2<-AX[duplicated(AX[,3]),]
>>
>> Thanks for any help,
>>
>> Raz
>
> I don't know if you've received a solution as yet. Below is my generic
> solution. I don't know how fast it will be, but it does _NOT_ do any
> looping. It does do a few if functions. The result is in the variable
> new_data. The variables data_odd and data_even are temporaries which
> can be removed. Or you can wrap the code up in a function which
> returns new_data and they will simply "go away" when the function
> ends.
>
> #
> # Read in the data
> data <- read.csv(file="data.csv",header=TRUE,stringsAsFactors=FALSE);
> #
> # The criteria
> #if row1==1 and row2==1 <-'HT'
> #if row1==1 and row2==0 <-'A'
> #if row1==0 and row2==1 <-'B'
> #if row1==1 and row2=='-' <-'Aht'
> #if row1=='-' and row2==1 <-'Bht'
> #
> # The following assumes that data is properly ordered!
> data$rowNumber <- seq(1:nrow(data));
> data_odd <-data[data$rowNumber %% 2 == 1,];
> data_even <-data[data$rowNumber %% 2 == 0,];
> #
> # You really need to make sure that
> # the CloneID values are correct in data_odd
> # and data_even. Something like:
> stopifnot(data_odd$CloneID == data_even$CloneID);
> CloneIDs <- data_even[,1]; # Get the list of CloneIDs
> #data_even[,1] <- NULL; # Remove CloneIDs from even data
> #data_odd[,1] <- NULL;  # And also from odd data
> #
> # Initialize new_data - make everything NA so
> # it will stick out later!
> new_data <- data_even;
> new_data[,colnames(data_even)] <- NA;
> #
> new_data[data_odd == 1 & data_odd ==1] <- 'HT';
> new_data[data_odd == 1 & data_even == 0] <- 'A';
> new_data[data_odd == 0 & data_even == 1] <- 'B';
> new_data[data_odd == 1 & data_even == '.'] <- 'Aht';
> new_data[data_odd == '-' & data_even == 1] <- 'Bht';
> new_data$CloneID <- CloneIDs;
> new_data$rowNumber<-NULL;
> #
> #stopifnot( !is.na(new_data)); # Make sure no NAs left
>
>
>
>
> --
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
>
> Maranatha! <><
> John McKown
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.