conditional subset and reorder dataframe rows

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

conditional subset and reorder dataframe rows

IAIN GALLAGHER
Hi List

I have a dataframe (~1,200,000 rows deep) and I'd like to conditionally reorder groups of rows in this dataframe.

I would like to reorder any rows where the Chr.Strand column contains a '-' but reorder within subsets delineated by the Probe.Set.Name column.

# toy example ####

library(plyr)

negStrandGene <- data.frame(Probe.Set.Name = rep('ENSMUSG00000022174_at', 6), Chr = rep(14,6), Chr.Strand = rep('-', 6), Chr.From = c(54873546, 54873539, 54873533, 54873529, 54873527, 54873416), Probe.X = c(388,1634,2141,2305,882,960), Probe.Y = c(2112, 1773, 1045, 862, 971, 2160))

posStrandGene <- data.frame(Probe.Set.Name = rep('ENSMUSG00000047459_at', 6), Chr = rep(2, 6), Chr.Strand = rep('+', 6), Chr.From = c(155062277, 155062304, 155062305, 155062309, 155062326, 155062531), Probe.X = c(428, 1681, 2058, 1570, 1293, 2125), Probe.Y = c(1484, 2090, 893, 1082, 1435, 1008))

mapping <- rbind (negStrandGene, posStrandGene)

# define a function to do what we want
revSort <- function(df){

  if (unique(df$Chr.Strand == '-'))
  return (df[order(df$Chr.From), ])
  else return (df)

}


# split the data with plyr, apply the function and recombine
test2 <- ddply(mapping, .(Probe.Set.Name), function(df) revSort(df)) # ok, cool works

So here the rows with the '-' if Chr.Strand are reordered whilst those with '+' are not.

My initial attempt using plyr is very inefficient and I wondered if someone could suggest something better.

Best

Iain

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: conditional subset and reorder dataframe rows

arun kirshna
Hi,

I guess the ' +' strand was already ordered in the "mapping" set while '-' was not ordered.

Try this:
  test2<-mapping[with(mapping,rev(order(Chr.From))),]
 rownames(test2)<-1:nrow(test2)
 test2
          Probe.Set.Name Chr Chr.Strand  Chr.From Probe.X Probe.Y
1  ENSMUSG00000047459_at   2          + 155062531    2125    1008
2  ENSMUSG00000047459_at   2          + 155062326    1293    1435
3  ENSMUSG00000047459_at   2          + 155062309    1570    1082
4  ENSMUSG00000047459_at   2          + 155062305    2058     893
5  ENSMUSG00000047459_at   2          + 155062304    1681    2090
6  ENSMUSG00000047459_at   2          + 155062277     428    1484
7  ENSMUSG00000022174_at  14          -  54873546     388    2112
8  ENSMUSG00000022174_at  14          -  54873539    1634    1773
9  ENSMUSG00000022174_at  14          -  54873533    2141    1045
10 ENSMUSG00000022174_at  14          -  54873529    2305     862
11 ENSMUSG00000022174_at  14          -  54873527     882     971
12 ENSMUSG00000022174_at  14          -  54873416     960    2160


#Now I am rearranging rows

 mapping1<-mapping[c(2,3,5,1,4,6,9,10,7,8,11,12),]
test3<-mapping1[with(mapping1,rev(order(Chr.From))),]
 test3
          Probe.Set.Name Chr Chr.Strand  Chr.From Probe.X Probe.Y
12 ENSMUSG00000047459_at   2          + 155062531    2125    1008
11 ENSMUSG00000047459_at   2          + 155062326    1293    1435
10 ENSMUSG00000047459_at   2          + 155062309    1570    1082
9  ENSMUSG00000047459_at   2          + 155062305    2058     893
8  ENSMUSG00000047459_at   2          + 155062304    1681    2090
7  ENSMUSG00000047459_at   2          + 155062277     428    1484
1  ENSMUSG00000022174_at  14          -  54873546     388    2112
2  ENSMUSG00000022174_at  14          -  54873539    1634    1773
3  ENSMUSG00000022174_at  14          -  54873533    2141    1045
4  ENSMUSG00000022174_at  14          -  54873529    2305     862
5  ENSMUSG00000022174_at  14          -  54873527     882     971
6  ENSMUSG00000022174_at  14          -  54873416     960    2160

Not quite sure this is what you were looking for.

A.K.


----- Original Message -----
From: Iain Gallagher <[hidden email]>
To: r-help <[hidden email]>
Cc:
Sent: Friday, July 20, 2012 7:40 AM
Subject: [R] conditional subset and reorder dataframe rows

Hi List

I have a dataframe (~1,200,000 rows deep) and I'd like to conditionally reorder groups of rows in this dataframe.

I would like to reorder any rows where the Chr.Strand column contains a '-' but reorder within subsets delineated by the Probe.Set.Name column.

# toy example ####

library(plyr)

negStrandGene <- data.frame(Probe.Set.Name = rep('ENSMUSG00000022174_at', 6), Chr = rep(14,6), Chr.Strand = rep('-', 6), Chr.From = c(54873546, 54873539, 54873533, 54873529, 54873527, 54873416), Probe.X = c(388,1634,2141,2305,882,960), Probe.Y = c(2112, 1773, 1045, 862, 971, 2160))

posStrandGene <- data.frame(Probe.Set.Name = rep('ENSMUSG00000047459_at', 6), Chr = rep(2, 6), Chr.Strand = rep('+', 6), Chr.From = c(155062277, 155062304, 155062305, 155062309, 155062326, 155062531), Probe.X = c(428, 1681, 2058, 1570, 1293, 2125), Probe.Y = c(1484, 2090, 893, 1082, 1435, 1008))

mapping <- rbind (negStrandGene, posStrandGene)

# define a function to do what we want
revSort <- function(df){

  if (unique(df$Chr.Strand == '-'))
  return (df[order(df$Chr.From), ])
  else return (df)

}


# split the data with plyr, apply the function and recombine
test2 <- ddply(mapping, .(Probe.Set.Name), function(df) revSort(df)) # ok, cool works

So here the rows with the '-' if Chr.Strand are reordered whilst those with '+' are not.

My initial attempt using plyr is very inefficient and I wondered if someone could suggest something better.

Best

Iain

    [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.