aggregate function....

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

aggregate function....

Stéphane CRUVEILLER-2
Dear R users,

I have some trouble with the aggregate function. Here are my data

> daf
      S_id AF_Class count... R_gc_percent S_length
5  8264497        1       30         0.48    35678
6  8264497        3        7         0.48    35678
8  8264554        1       31         0.51    38894
9  8264554        2       11         0.51    38894
10 8264554        3        1         0.51    38894

for a given S_id, I would like to select the line corresponding to the
max count. To perform this, I used:
 > aggregate(daf,list(daf$S_id),max)
  Group.1    S_id AF_Class count... R_gc_percent S_length
1 8264497 8264497        3       30         0.48    35678
2 8264554 8264554        3       31         0.51    38894

which is ok for the count. But I realized that max function is also
applied
to AF_class (should be 1 and 1 instead of 3 and 3), so it seems that
aggregate is not the appropriate function for that I want to do. Is
there any other function I could use instead?

Best whishes,


Stéphane.
--
==========================================================
Stephane CRUVEILLER Ph. D.
Genoscope - Centre National de Sequencage
Atelier de Genomique Comparative
2, Rue Gaston Cremieux   CP 5706
91057 Evry Cedex - France
Phone: +33 (0)1 60 87 84 58
Fax: +33 (0)1 60 87 25 14
EMails: [hidden email] ,[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: aggregate function....

jholtman
try 'by':

> x
      S_id AF_Class count... R_gc_percent S_length
5  8264497        1       30         0.48    35678
6  8264497        3        7         0.48    35678
8  8264554        1       31         0.51    38894
9  8264554        2       11         0.51    38894
10 8264554        3        1         0.51    38894
> do.call('rbind', by(x, x$S_id, function(y) y[which.max(y$AF_Class),]))
           S_id AF_Class count... R_gc_percent S_length
8264497 8264497        3        7         0.48    35678
8264554 8264554        3        1         0.51    38894
>



On 3/29/06, Stephane CRUVEILLER <[hidden email]> wrote:

>
> Dear R users,
>
> I have some trouble with the aggregate function. Here are my data
>
> > daf
>      S_id AF_Class count... R_gc_percent S_length
> 5  8264497        1       30         0.48    35678
> 6  8264497        3        7         0.48    35678
> 8  8264554        1       31         0.51    38894
> 9  8264554        2       11         0.51    38894
> 10 8264554        3        1         0.51    38894
>
> for a given S_id, I would like to select the line corresponding to the
> max count. To perform this, I used:
> > aggregate(daf,list(daf$S_id),max)
> Group.1    S_id AF_Class count... R_gc_percent S_length
> 1 8264497 8264497        3       30         0.48    35678
> 2 8264554 8264554        3       31         0.51    38894
>
> which is ok for the count. But I realized that max function is also
> applied
> to AF_class (should be 1 and 1 instead of 3 and 3), so it seems that
> aggregate is not the appropriate function for that I want to do. Is
> there any other function I could use instead?
>
> Best whishes,
>
>
> Stéphane.
> --
> ==========================================================
> Stephane CRUVEILLER Ph. D.
> Genoscope - Centre National de Sequencage
> Atelier de Genomique Comparative
> 2, Rue Gaston Cremieux   CP 5706
> 91057 Evry Cedex - France
> Phone: +33 (0)1 60 87 84 58
> Fax: +33 (0)1 60 87 25 14
> EMails: [hidden email] ,[hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>


--
Jim Holtman
Cincinnati, OH
+1 513 646 9390 (Cell)
+1 513 247 0281 (Home)

What the problem you are trying to solve?

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: aggregate function....

Stéphane CRUVEILLER-2
Nice trick, thx...

Stéphane.

On Wed, 2006-03-29 at 11:17 -0500, jim holtman wrote:

> try 'by':
>  
> > x
>       S_id AF_Class count... R_gc_percent S_length
> 5  8264497        1       30         0.48    35678
> 6  8264497        3        7         0.48    35678
> 8  8264554        1       31         0.51    38894
> 9  8264554        2       11         0.51    38894
> 10 8264554        3        1         0.51    38894
> > do.call('rbind', by(x, x$S_id, function(y) y[which.max(y
> $AF_Class),]))
>            S_id AF_Class count... R_gc_percent S_length
> 8264497 8264497        3        7         0.48    35678
> 8264554 8264554        3        1         0.51    38894
> >
>
>
>  
> On 3/29/06, Stephane CRUVEILLER <[hidden email]> wrote:
>         Dear R users,
>        
>         I have some trouble with the aggregate function. Here are my
>         data
>        
>         > daf
>              S_id AF_Class count... R_gc_percent S_length
>         5  8264497        1       30         0.48    35678
>         6  8264497        3        7         0.48    35678
>         8  8264554        1       31         0.51    38894
>         9  8264554        2       11         0.51    38894
>         10 8264554        3        1         0.51    38894
>        
>         for a given S_id, I would like to select the line
>         corresponding to the
>         max count. To perform this, I used:
>         > aggregate(daf,list(daf$S_id),max)
>         Group.1    S_id AF_Class count... R_gc_percent S_length
>         1 8264497 8264497        3       30         0.48    35678
>         2 8264554 8264554        3       31         0.51    38894
>        
>         which is ok for the count. But I realized that max function is
>         also
>         applied
>         to AF_class (should be 1 and 1 instead of 3 and 3), so it
>         seems that
>         aggregate is not the appropriate function for that I want to
>         do. Is
>         there any other function I could use instead?
>        
>         Best whishes,
>        
>        
>         Stéphane.
>         --
>         ==========================================================
>         Stephane CRUVEILLER Ph. D.
>         Genoscope - Centre National de Sequencage
>         Atelier de Genomique Comparative
>         2, Rue Gaston Cremieux   CP 5706
>         91057 Evry Cedex - France
>         Phone: +33 (0)1 60 87 84 58
>         Fax: +33 (0)1 60 87 25 14
>         EMails: [hidden email] ,[hidden email]
>        
>         ______________________________________________
>         [hidden email] mailing list
>         https://stat.ethz.ch/mailman/listinfo/r-help
>         PLEASE do read the posting guide!
>         http://www.R-project.org/posting-guide.html
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390 (Cell)
> +1 513 247 0281 (Home)
>
> What the problem you are trying to solve?
--
==========================================================
Stephane CRUVEILLER Ph. D.
Genoscope - Centre National de Sequencage
Atelier de Genomique Comparative
2, Rue Gaston Cremieux   CP 5706
91057 Evry Cedex - France
Phone: +33 (0)1 60 87 84 58
Fax: +33 (0)1 60 87 25 14
EMails: [hidden email] ,[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html