Quantcast

subseting a data frame

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

subseting a data frame

Nathalie Conte
HI,
this is my problem I want to subset this file df, using only  unique
df$exon printing the line once even if  df$exon appear several times:

unique(df$exon) will show me the unique exons
If I try to print only the unique exon lines
with df[unique(df$exon),] -this doesn't print only the unique ones :(

could you help?
thanks
Nat




                         exon size  chr     start       end
413077 ChrX_133594175_133594368_HPRT1  193 ChrX 133594175 133594368
413270 ChrX_133594183_133594368_HPRT1  185 ChrX 133594183 133594368
413455 ChrX_133594381_133594565_HPRT1  184 ChrX 133594381 133594565
413639 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
413745 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
413851 ChrX_133607404_133607495_HPRT1   91 ChrX 133607404 133607495
413942 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
414125 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
414308 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
414373 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
414438 ChrX_133620692_133620696_HPRT1    4 ChrX 133620692 133620696
414442 ChrX_133624218_133624235_HPRT1   17 ChrX 133624218 133624235



--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: subseting a data frame

Rui Barradas
Hello,

> HI,
> this is my problem I want to subset this file df, using only  unique
> df$exon printing the line once even if  df$exon appear several times:
>
> unique(df$exon) will show me the unique exons
> If I try to print only the unique exon lines
> with df[unique(df$exon),] -this doesn't print only the unique ones :(
>

Try

inx <- match(unique(df$exon), df$exon)
df[inx, ]


Hope this helps,

Rui Barradas
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: subseting a data frame

Michael Weylandt
In reply to this post by Nathalie Conte
I believe you want the duplicated() function.

Michael

On Mar 2, 2012, at 10:19 AM, nathalie <[hidden email]> wrote:

> HI,
> this is my problem I want to subset this file df, using only  unique df$exon printing the line once even if  df$exon appear several times:
>
> unique(df$exon) will show me the unique exons
> If I try to print only the unique exon lines
> with df[unique(df$exon),] -this doesn't print only the unique ones :(
>
> could you help?
> thanks
> Nat
>
>
>
>
>                        exon size  chr     start       end
> 413077 ChrX_133594175_133594368_HPRT1  193 ChrX 133594175 133594368
> 413270 ChrX_133594183_133594368_HPRT1  185 ChrX 133594183 133594368
> 413455 ChrX_133594381_133594565_HPRT1  184 ChrX 133594381 133594565
> 413639 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
> 413745 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
> 413851 ChrX_133607404_133607495_HPRT1   91 ChrX 133607404 133607495
> 413942 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
> 414125 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
> 414308 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
> 414373 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
> 414438 ChrX_133620692_133620696_HPRT1    4 ChrX 133620692 133620696
> 414442 ChrX_133624218_133624235_HPRT1   17 ChrX 133624218 133624235
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: subseting a data frame

Michael Weylandt
Please always cc the list for archival/threading reasons.

Sort answer is that unique() gives the unique elements rather than something you should subset by, like a set of logical indices or row numbers.

Note that in general unique(x) == x[!duplicated(x)] I'd imagine there are cases where this breaks down but I can't assemble one off the top of my head.

Michael

On Mar 2, 2012, at 12:13 PM, nathalie <[hidden email]> wrote:

> thanks
> why unique doesn't work here??
>> I believe you want the duplicated() function.
>>
>> Michael
>>
>> On Mar 2, 2012, at 10:19 AM, nathalie<[hidden email]>  wrote:
>>
>>> HI,
>>> this is my problem I want to subset this file df, using only  unique df$exon printing the line once even if  df$exon appear several times:
>>>
>>> unique(df$exon) will show me the unique exons
>>> If I try to print only the unique exon lines
>>> with df[unique(df$exon),] -this doesn't print only the unique ones :(
>>>
>>> could you help?
>>> thanks
>>> Nat
>>>
>>>
>>>
>>>
>>>                        exon size  chr     start       end
>>> 413077 ChrX_133594175_133594368_HPRT1  193 ChrX 133594175 133594368
>>> 413270 ChrX_133594183_133594368_HPRT1  185 ChrX 133594183 133594368
>>> 413455 ChrX_133594381_133594565_HPRT1  184 ChrX 133594381 133594565
>>> 413639 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
>>> 413745 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
>>> 413851 ChrX_133607404_133607495_HPRT1   91 ChrX 133607404 133607495
>>> 413942 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
>>> 414125 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
>>> 414308 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
>>> 414373 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
>>> 414438 ChrX_133620692_133620696_HPRT1    4 ChrX 133620692 133620696
>>> 414442 ChrX_133624218_133624235_HPRT1   17 ChrX 133624218 133624235
>>>
>>>
>>>
>>> --
>>> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...