how to find number of unique rows for combination of r columns

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

how to find number of unique rows for combination of r columns

anikaM
Hello,

I have a data frame like this:

> head(dt,20)
     chr    pos         gene_id pval_nominal  pval_ret       wl      wr
 1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
 2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
 3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
 4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
 5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
 6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
 7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
 8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
 9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838

it is very big,
> dim(dt)
[1] 73719122        8

To count number of unique rows for all 3 columns: chr, pos and gene_id
I could just join those 3 columns and than count. But how would I find
unique number of rows for these 4 columns without joining them?

Thanks
Ana

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

Gerrit Eichner
Hi, Ana,

doesn't

udt <- unique(dt[c("chr", "pos", "gene_id")])
nrow(udt)

get close to what you want?

  Hth  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
[hidden email]   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
http://www.uni-giessen.de/eichner
---------------------------------------------------------------------

Am 08.11.2019 um 15:38 schrieb Ana Marija:

> Hello,
>
> I have a data frame like this:
>
>> head(dt,20)
>       chr    pos         gene_id pval_nominal  pval_ret       wl      wr
>   1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
>   2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
>   3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
>   4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
>   5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
>   6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
>   7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
>   8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
>   9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838
>
> it is very big,
>> dim(dt)
> [1] 73719122        8
>
> To count number of unique rows for all 3 columns: chr, pos and gene_id
> I could just join those 3 columns and than count. But how would I find
> unique number of rows for these 4 columns without joining them?
>
> Thanks
> Ana
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

anikaM
I tried it but I got this error:
> udt <- unique(dt[c("chr", "pos", "gene_id")])
Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
  When i is a data.table (or character vector), the columns to join by
must be specified using 'on=' argument (see ?data.table), by keying x
(i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
column names between x and i (i.e., a natural join). Keyed joins might
have further speed benefits on very large data due to x being sorted
in RAM.

On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
<[hidden email]> wrote:

>
> Hi, Ana,
>
> doesn't
>
> udt <- unique(dt[c("chr", "pos", "gene_id")])
> nrow(udt)
>
> get close to what you want?
>
>   Hth  --  Gerrit
>
> ---------------------------------------------------------------------
> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
> [hidden email]   Justus-Liebig-University Giessen
> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> http://www.uni-giessen.de/eichner
> ---------------------------------------------------------------------
>
> Am 08.11.2019 um 15:38 schrieb Ana Marija:
> > Hello,
> >
> > I have a data frame like this:
> >
> >> head(dt,20)
> >       chr    pos         gene_id pval_nominal  pval_ret       wl      wr
> >   1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
> >   2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
> >   3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
> >   4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
> >   5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
> >   6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
> >   7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
> >   8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
> >   9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
> > 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838
> >
> > it is very big,
> >> dim(dt)
> > [1] 73719122        8
> >
> > To count number of unique rows for all 3 columns: chr, pos and gene_id
> > I could just join those 3 columns and than count. But how would I find
> > unique number of rows for these 4 columns without joining them?
> >
> > Thanks
> > Ana
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

Gerrit Eichner
It seems as if dt is not a (base R) data frame but a
data table. I assume, you will have to transform dt
into a data frame (maybe with as.data.frame) to be
able to apply unique in the suggested way. However,
I am not familiar with data tables. Perhaps somebody
else can provide a more profound guess.

  Regards  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
[hidden email]   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
http://www.uni-giessen.de/eichner
---------------------------------------------------------------------

Am 08.11.2019 um 16:02 schrieb Ana Marija:

> I tried it but I got this error:
>> udt <- unique(dt[c("chr", "pos", "gene_id")])
> Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
>    When i is a data.table (or character vector), the columns to join by
> must be specified using 'on=' argument (see ?data.table), by keying x
> (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
> column names between x and i (i.e., a natural join). Keyed joins might
> have further speed benefits on very large data due to x being sorted
> in RAM.
>
> On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
> <[hidden email]> wrote:
>>
>> Hi, Ana,
>>
>> doesn't
>>
>> udt <- unique(dt[c("chr", "pos", "gene_id")])
>> nrow(udt)
>>
>> get close to what you want?
>>
>>    Hth  --  Gerrit
>>
>> ---------------------------------------------------------------------
>> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
>> [hidden email]   Justus-Liebig-University Giessen
>> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
>> http://www.uni-giessen.de/eichner
>> ---------------------------------------------------------------------
>>
>> Am 08.11.2019 um 15:38 schrieb Ana Marija:
>>> Hello,
>>>
>>> I have a data frame like this:
>>>
>>>> head(dt,20)
>>>        chr    pos         gene_id pval_nominal  pval_ret       wl      wr
>>>    1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
>>>    2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
>>>    3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
>>>    4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
>>>    5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
>>>    6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
>>>    7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
>>>    8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
>>>    9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
>>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838
>>>
>>> it is very big,
>>>> dim(dt)
>>> [1] 73719122        8
>>>
>>> To count number of unique rows for all 3 columns: chr, pos and gene_id
>>> I could just join those 3 columns and than count. But how would I find
>>> unique number of rows for these 4 columns without joining them?
>>>
>>> Thanks
>>> Ana
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

anikaM
Thank you so much! Converting it to data frame resolved the issue!

On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
<[hidden email]> wrote:

>
> It seems as if dt is not a (base R) data frame but a
> data table. I assume, you will have to transform dt
> into a data frame (maybe with as.data.frame) to be
> able to apply unique in the suggested way. However,
> I am not familiar with data tables. Perhaps somebody
> else can provide a more profound guess.
>
>   Regards  --  Gerrit
>
> ---------------------------------------------------------------------
> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
> [hidden email]   Justus-Liebig-University Giessen
> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> http://www.uni-giessen.de/eichner
> ---------------------------------------------------------------------
>
> Am 08.11.2019 um 16:02 schrieb Ana Marija:
> > I tried it but I got this error:
> >> udt <- unique(dt[c("chr", "pos", "gene_id")])
> > Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
> >    When i is a data.table (or character vector), the columns to join by
> > must be specified using 'on=' argument (see ?data.table), by keying x
> > (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
> > column names between x and i (i.e., a natural join). Keyed joins might
> > have further speed benefits on very large data due to x being sorted
> > in RAM.
> >
> > On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
> > <[hidden email]> wrote:
> >>
> >> Hi, Ana,
> >>
> >> doesn't
> >>
> >> udt <- unique(dt[c("chr", "pos", "gene_id")])
> >> nrow(udt)
> >>
> >> get close to what you want?
> >>
> >>    Hth  --  Gerrit
> >>
> >> ---------------------------------------------------------------------
> >> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
> >> [hidden email]   Justus-Liebig-University Giessen
> >> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> >> http://www.uni-giessen.de/eichner
> >> ---------------------------------------------------------------------
> >>
> >> Am 08.11.2019 um 15:38 schrieb Ana Marija:
> >>> Hello,
> >>>
> >>> I have a data frame like this:
> >>>
> >>>> head(dt,20)
> >>>        chr    pos         gene_id pval_nominal  pval_ret       wl      wr
> >>>    1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
> >>>    2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
> >>>    3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
> >>>    4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
> >>>    5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
> >>>    6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
> >>>    7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
> >>>    8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
> >>>    9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
> >>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838
> >>>
> >>> it is very big,
> >>>> dim(dt)
> >>> [1] 73719122        8
> >>>
> >>> To count number of unique rows for all 3 columns: chr, pos and gene_id
> >>> I could just join those 3 columns and than count. But how would I find
> >>> unique number of rows for these 4 columns without joining them?
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

anikaM
would you know how would I extract from my original data frame, just
these unique rows?
because this gives me only those 3 columns, and I want all columns
from the original data frame

> head(udt)
   chr   pos         gene_id
1 chr1 54490 ENSG00000227232
2 chr1 58814 ENSG00000227232
3 chr1 60351 ENSG00000227232
4 chr1 61920 ENSG00000227232
5 chr1 63671 ENSG00000227232
6 chr1 64931 ENSG00000227232

> head(dt)
    chr   pos         gene_id pval_nominal pval_ret       wl      wr      META
1: chr1 54490 ENSG00000227232     0.608495 0.783778 31.62278 21.2838 0.7475480
2: chr1 58814 ENSG00000227232     0.295211 0.897582 31.62278 21.2838 0.6031214
3: chr1 60351 ENSG00000227232     0.439788 0.867959 31.62278 21.2838 0.6907182
4: chr1 61920 ENSG00000227232     0.319528 0.601809 31.62278 21.2838 0.4032200
5: chr1 63671 ENSG00000227232     0.237739 0.988039 31.62278 21.2838 0.7482519
6: chr1 64931 ENSG00000227232     0.276679 0.907037 31.62278 21.2838 0.5974800

On Fri, Nov 8, 2019 at 9:30 AM Ana Marija <[hidden email]> wrote:

>
> Thank you so much! Converting it to data frame resolved the issue!
>
> On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
> <[hidden email]> wrote:
> >
> > It seems as if dt is not a (base R) data frame but a
> > data table. I assume, you will have to transform dt
> > into a data frame (maybe with as.data.frame) to be
> > able to apply unique in the suggested way. However,
> > I am not familiar with data tables. Perhaps somebody
> > else can provide a more profound guess.
> >
> >   Regards  --  Gerrit
> >
> > ---------------------------------------------------------------------
> > Dr. Gerrit Eichner                   Mathematical Institute, Room 212
> > [hidden email]   Justus-Liebig-University Giessen
> > Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> > http://www.uni-giessen.de/eichner
> > ---------------------------------------------------------------------
> >
> > Am 08.11.2019 um 16:02 schrieb Ana Marija:
> > > I tried it but I got this error:
> > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
> > > Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
> > >    When i is a data.table (or character vector), the columns to join by
> > > must be specified using 'on=' argument (see ?data.table), by keying x
> > > (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
> > > column names between x and i (i.e., a natural join). Keyed joins might
> > > have further speed benefits on very large data due to x being sorted
> > > in RAM.
> > >
> > > On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
> > > <[hidden email]> wrote:
> > >>
> > >> Hi, Ana,
> > >>
> > >> doesn't
> > >>
> > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
> > >> nrow(udt)
> > >>
> > >> get close to what you want?
> > >>
> > >>    Hth  --  Gerrit
> > >>
> > >> ---------------------------------------------------------------------
> > >> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
> > >> [hidden email]   Justus-Liebig-University Giessen
> > >> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> > >> http://www.uni-giessen.de/eichner
> > >> ---------------------------------------------------------------------
> > >>
> > >> Am 08.11.2019 um 15:38 schrieb Ana Marija:
> > >>> Hello,
> > >>>
> > >>> I have a data frame like this:
> > >>>
> > >>>> head(dt,20)
> > >>>        chr    pos         gene_id pval_nominal  pval_ret       wl      wr
> > >>>    1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
> > >>>    2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
> > >>>    3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
> > >>>    4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
> > >>>    5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
> > >>>    6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
> > >>>    7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
> > >>>    8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
> > >>>    9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
> > >>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838
> > >>>
> > >>> it is very big,
> > >>>> dim(dt)
> > >>> [1] 73719122        8
> > >>>
> > >>> To count number of unique rows for all 3 columns: chr, pos and gene_id
> > >>> I could just join those 3 columns and than count. But how would I find
> > >>> unique number of rows for these 4 columns without joining them?
> > >>>
> > >>> Thanks
> > >>> Ana
> > >>>
> > >>> ______________________________________________
> > >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>>
> > >>
> > >> ______________________________________________
> > >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

Boris Steipe
Are you trying to eliminate duplicated rows from your dataframe? Because that would be better achieved with duplicated().


B.




> On 2019-11-08, at 10:32, Ana Marija <[hidden email]> wrote:
>
> would you know how would I extract from my original data frame, just
> these unique rows?
> because this gives me only those 3 columns, and I want all columns
> from the original data frame
>
>> head(udt)
>   chr   pos         gene_id
> 1 chr1 54490 ENSG00000227232
> 2 chr1 58814 ENSG00000227232
> 3 chr1 60351 ENSG00000227232
> 4 chr1 61920 ENSG00000227232
> 5 chr1 63671 ENSG00000227232
> 6 chr1 64931 ENSG00000227232
>
>> head(dt)
>    chr   pos         gene_id pval_nominal pval_ret       wl      wr      META
> 1: chr1 54490 ENSG00000227232     0.608495 0.783778 31.62278 21.2838 0.7475480
> 2: chr1 58814 ENSG00000227232     0.295211 0.897582 31.62278 21.2838 0.6031214
> 3: chr1 60351 ENSG00000227232     0.439788 0.867959 31.62278 21.2838 0.6907182
> 4: chr1 61920 ENSG00000227232     0.319528 0.601809 31.62278 21.2838 0.4032200
> 5: chr1 63671 ENSG00000227232     0.237739 0.988039 31.62278 21.2838 0.7482519
> 6: chr1 64931 ENSG00000227232     0.276679 0.907037 31.62278 21.2838 0.5974800
>
> On Fri, Nov 8, 2019 at 9:30 AM Ana Marija <[hidden email]> wrote:
>>
>> Thank you so much! Converting it to data frame resolved the issue!
>>
>> On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
>> <[hidden email]> wrote:
>>>
>>> It seems as if dt is not a (base R) data frame but a
>>> data table. I assume, you will have to transform dt
>>> into a data frame (maybe with as.data.frame) to be
>>> able to apply unique in the suggested way. However,
>>> I am not familiar with data tables. Perhaps somebody
>>> else can provide a more profound guess.
>>>
>>>  Regards  --  Gerrit
>>>
>>> ---------------------------------------------------------------------
>>> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
>>> [hidden email]   Justus-Liebig-University Giessen
>>> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
>>> http://www.uni-giessen.de/eichner
>>> ---------------------------------------------------------------------
>>>
>>> Am 08.11.2019 um 16:02 schrieb Ana Marija:
>>>> I tried it but I got this error:
>>>>> udt <- unique(dt[c("chr", "pos", "gene_id")])
>>>> Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
>>>>   When i is a data.table (or character vector), the columns to join by
>>>> must be specified using 'on=' argument (see ?data.table), by keying x
>>>> (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
>>>> column names between x and i (i.e., a natural join). Keyed joins might
>>>> have further speed benefits on very large data due to x being sorted
>>>> in RAM.
>>>>
>>>> On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
>>>> <[hidden email]> wrote:
>>>>>
>>>>> Hi, Ana,
>>>>>
>>>>> doesn't
>>>>>
>>>>> udt <- unique(dt[c("chr", "pos", "gene_id")])
>>>>> nrow(udt)
>>>>>
>>>>> get close to what you want?
>>>>>
>>>>>   Hth  --  Gerrit
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
>>>>> [hidden email]   Justus-Liebig-University Giessen
>>>>> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
>>>>> http://www.uni-giessen.de/eichner
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>> Am 08.11.2019 um 15:38 schrieb Ana Marija:
>>>>>> Hello,
>>>>>>
>>>>>> I have a data frame like this:
>>>>>>
>>>>>>> head(dt,20)
>>>>>>       chr    pos         gene_id pval_nominal  pval_ret       wl      wr
>>>>>>   1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
>>>>>>   2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
>>>>>>   3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
>>>>>>   4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
>>>>>>   5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
>>>>>>   6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
>>>>>>   7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
>>>>>>   8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
>>>>>>   9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
>>>>>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838
>>>>>>
>>>>>> it is very big,
>>>>>>> dim(dt)
>>>>>> [1] 73719122        8
>>>>>>
>>>>>> To count number of unique rows for all 3 columns: chr, pos and gene_id
>>>>>> I could just join those 3 columns and than count. But how would I find
>>>>>> unique number of rows for these 4 columns without joining them?
>>>>>>
>>>>>> Thanks
>>>>>> Ana
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

Bert Gunter-2
In reply to this post by anikaM
Sorry, but you ask basic questions.You really need to spend some more time
with an R tutorial or two. This list is not meant to replace your own
learning efforts.

You also do not seem to be reading the docs carefully. Under ?unique, it
links ?duplicated and tells you that it gives indices of duplicated rows of
a data frame. These then can be used by subscripting to remove those rows
from the data frame. Here is a reproducible example:

df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
df[-duplicated(df[,2:3]), ]  ## Note the - sign

If you prefer, the "Tidyverse" world has what are purported to be more
user-friendly versions of such data handling functionality that you can use
instead.


Bert

On Fri, Nov 8, 2019 at 7:38 AM Ana Marija <[hidden email]>
wrote:

> would you know how would I extract from my original data frame, just
> these unique rows?
> because this gives me only those 3 columns, and I want all columns
> from the original data frame
>
> > head(udt)
>    chr   pos         gene_id
> 1 chr1 54490 ENSG00000227232
> 2 chr1 58814 ENSG00000227232
> 3 chr1 60351 ENSG00000227232
> 4 chr1 61920 ENSG00000227232
> 5 chr1 63671 ENSG00000227232
> 6 chr1 64931 ENSG00000227232
>
> > head(dt)
>     chr   pos         gene_id pval_nominal pval_ret       wl      wr
> META
> 1: chr1 54490 ENSG00000227232     0.608495 0.783778 31.62278 21.2838
> 0.7475480
> 2: chr1 58814 ENSG00000227232     0.295211 0.897582 31.62278 21.2838
> 0.6031214
> 3: chr1 60351 ENSG00000227232     0.439788 0.867959 31.62278 21.2838
> 0.6907182
> 4: chr1 61920 ENSG00000227232     0.319528 0.601809 31.62278 21.2838
> 0.4032200
> 5: chr1 63671 ENSG00000227232     0.237739 0.988039 31.62278 21.2838
> 0.7482519
> 6: chr1 64931 ENSG00000227232     0.276679 0.907037 31.62278 21.2838
> 0.5974800
>
> On Fri, Nov 8, 2019 at 9:30 AM Ana Marija <[hidden email]>
> wrote:
> >
> > Thank you so much! Converting it to data frame resolved the issue!
> >
> > On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
> > <[hidden email]> wrote:
> > >
> > > It seems as if dt is not a (base R) data frame but a
> > > data table. I assume, you will have to transform dt
> > > into a data frame (maybe with as.data.frame) to be
> > > able to apply unique in the suggested way. However,
> > > I am not familiar with data tables. Perhaps somebody
> > > else can provide a more profound guess.
> > >
> > >   Regards  --  Gerrit
> > >
> > > ---------------------------------------------------------------------
> > > Dr. Gerrit Eichner                   Mathematical Institute, Room 212
> > > [hidden email]   Justus-Liebig-University Giessen
> > > Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> > > http://www.uni-giessen.de/eichner
> > > ---------------------------------------------------------------------
> > >
> > > Am 08.11.2019 um 16:02 schrieb Ana Marija:
> > > > I tried it but I got this error:
> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
> > > > Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
> > > >    When i is a data.table (or character vector), the columns to join
> by
> > > > must be specified using 'on=' argument (see ?data.table), by keying x
> > > > (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
> > > > column names between x and i (i.e., a natural join). Keyed joins
> might
> > > > have further speed benefits on very large data due to x being sorted
> > > > in RAM.
> > > >
> > > > On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
> > > > <[hidden email]> wrote:
> > > >>
> > > >> Hi, Ana,
> > > >>
> > > >> doesn't
> > > >>
> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
> > > >> nrow(udt)
> > > >>
> > > >> get close to what you want?
> > > >>
> > > >>    Hth  --  Gerrit
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> Dr. Gerrit Eichner                   Mathematical Institute, Room
> 212
> > > >> [hidden email]   Justus-Liebig-University
> Giessen
> > > >> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen,
> Germany
> > > >> http://www.uni-giessen.de/eichner
> > > >>
> ---------------------------------------------------------------------
> > > >>
> > > >> Am 08.11.2019 um 15:38 schrieb Ana Marija:
> > > >>> Hello,
> > > >>>
> > > >>> I have a data frame like this:
> > > >>>
> > > >>>> head(dt,20)
> > > >>>        chr    pos         gene_id pval_nominal  pval_ret       wl
>     wr
> > > >>>    1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278
> 21.2838
> > > >>>    2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278
> 21.2838
> > > >>>    3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278
> 21.2838
> > > >>>    4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278
> 21.2838
> > > >>>    5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278
> 21.2838
> > > >>>    6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278
> 21.2838
> > > >>>    7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278
> 21.2838
> > > >>>    8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278
> 21.2838
> > > >>>    9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278
> 21.2838
> > > >>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278
> 21.2838
> > > >>>
> > > >>> it is very big,
> > > >>>> dim(dt)
> > > >>> [1] 73719122        8
> > > >>>
> > > >>> To count number of unique rows for all 3 columns: chr, pos and
> gene_id
> > > >>> I could just join those 3 columns and than count. But how would I
> find
> > > >>> unique number of rows for these 4 columns without joining them?
> > > >>>
> > > >>> Thanks
> > > >>> Ana
> > > >>>
> > > >>> ______________________________________________
> > > >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > > >>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > >>> and provide commented, minimal, self-contained, reproducible code.
> > > >>>
> > > >>
> > > >> ______________________________________________
> > > >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > > >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > >> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

anikaM
In reply to this post by Boris Steipe
I am trying to first identify how many duplicate rows are there determined
by the unique values in the first 3 columns. Now I know that is about 20000
rows which are non unique. But I would like to extract all 8 columns for
those non unique rows and see what is going on with META value I have in
them.

About duplicated() function I know as well as about unique

On Fri, 8 Nov 2019 at 10:08, Boris Steipe <[hidden email]> wrote:

> Are you trying to eliminate duplicated rows from your dataframe? Because
> that would be better achieved with duplicated().
>
>
> B.
>
>
>
>
> > On 2019-11-08, at 10:32, Ana Marija <[hidden email]> wrote:
> >
> > would you know how would I extract from my original data frame, just
> > these unique rows?
> > because this gives me only those 3 columns, and I want all columns
> > from the original data frame
> >
> >> head(udt)
> >   chr   pos         gene_id
> > 1 chr1 54490 ENSG00000227232
> > 2 chr1 58814 ENSG00000227232
> > 3 chr1 60351 ENSG00000227232
> > 4 chr1 61920 ENSG00000227232
> > 5 chr1 63671 ENSG00000227232
> > 6 chr1 64931 ENSG00000227232
> >
> >> head(dt)
> >    chr   pos         gene_id pval_nominal pval_ret       wl      wr
> META
> > 1: chr1 54490 ENSG00000227232     0.608495 0.783778 31.62278 21.2838
> 0.7475480
> > 2: chr1 58814 ENSG00000227232     0.295211 0.897582 31.62278 21.2838
> 0.6031214
> > 3: chr1 60351 ENSG00000227232     0.439788 0.867959 31.62278 21.2838
> 0.6907182
> > 4: chr1 61920 ENSG00000227232     0.319528 0.601809 31.62278 21.2838
> 0.4032200
> > 5: chr1 63671 ENSG00000227232     0.237739 0.988039 31.62278 21.2838
> 0.7482519
> > 6: chr1 64931 ENSG00000227232     0.276679 0.907037 31.62278 21.2838
> 0.5974800
> >
> > On Fri, Nov 8, 2019 at 9:30 AM Ana Marija <[hidden email]>
> wrote:
> >>
> >> Thank you so much! Converting it to data frame resolved the issue!
> >>
> >> On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
> >> <[hidden email]> wrote:
> >>>
> >>> It seems as if dt is not a (base R) data frame but a
> >>> data table. I assume, you will have to transform dt
> >>> into a data frame (maybe with as.data.frame) to be
> >>> able to apply unique in the suggested way. However,
> >>> I am not familiar with data tables. Perhaps somebody
> >>> else can provide a more profound guess.
> >>>
> >>>  Regards  --  Gerrit
> >>>
> >>> ---------------------------------------------------------------------
> >>> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
> >>> [hidden email]   Justus-Liebig-University Giessen
> >>> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> >>> http://www.uni-giessen.de/eichner
> >>> ---------------------------------------------------------------------
> >>>
> >>> Am 08.11.2019 um 16:02 schrieb Ana Marija:
> >>>> I tried it but I got this error:
> >>>>> udt <- unique(dt[c("chr", "pos", "gene_id")])
> >>>> Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
> >>>>   When i is a data.table (or character vector), the columns to join by
> >>>> must be specified using 'on=' argument (see ?data.table), by keying x
> >>>> (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
> >>>> column names between x and i (i.e., a natural join). Keyed joins might
> >>>> have further speed benefits on very large data due to x being sorted
> >>>> in RAM.
> >>>>
> >>>> On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
> >>>> <[hidden email]> wrote:
> >>>>>
> >>>>> Hi, Ana,
> >>>>>
> >>>>> doesn't
> >>>>>
> >>>>> udt <- unique(dt[c("chr", "pos", "gene_id")])
> >>>>> nrow(udt)
> >>>>>
> >>>>> get close to what you want?
> >>>>>
> >>>>>   Hth  --  Gerrit
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
> >>>>> [hidden email]   Justus-Liebig-University
> Giessen
> >>>>> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> >>>>> http://www.uni-giessen.de/eichner
> >>>>> ---------------------------------------------------------------------
> >>>>>
> >>>>> Am 08.11.2019 um 15:38 schrieb Ana Marija:
> >>>>>> Hello,
> >>>>>>
> >>>>>> I have a data frame like this:
> >>>>>>
> >>>>>>> head(dt,20)
> >>>>>>       chr    pos         gene_id pval_nominal  pval_ret       wl
>   wr
> >>>>>>   1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278
> 21.2838
> >>>>>>   2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278
> 21.2838
> >>>>>>   3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278
> 21.2838
> >>>>>>   4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278
> 21.2838
> >>>>>>   5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278
> 21.2838
> >>>>>>   6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278
> 21.2838
> >>>>>>   7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278
> 21.2838
> >>>>>>   8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278
> 21.2838
> >>>>>>   9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278
> 21.2838
> >>>>>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278
> 21.2838
> >>>>>>
> >>>>>> it is very big,
> >>>>>>> dim(dt)
> >>>>>> [1] 73719122        8
> >>>>>>
> >>>>>> To count number of unique rows for all 3 columns: chr, pos and
> gene_id
> >>>>>> I could just join those 3 columns and than count. But how would I
> find
> >>>>>> unique number of rows for these 4 columns without joining them?
> >>>>>>
> >>>>>> Thanks
> >>>>>> Ana
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>
> >>>>>
> >>>>> ______________________________________________
> >>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

Boris Steipe
Good. Duplicated returns a boolean index vector that you can use to extract the non-unique rows.

B.



> On 2019-11-08, at 11:30, Ana Marija <[hidden email]> wrote:
>
> I am trying to first identify how many duplicate rows are there determined by the unique values in the first 3 columns. Now I know that is about 20000 rows which are non unique. But I would like to extract all 8 columns for those non unique rows and see what is going on with META value I have in them.
>
> About duplicated() function I know as well as about unique
>
> On Fri, 8 Nov 2019 at 10:08, Boris Steipe <[hidden email]> wrote:
> Are you trying to eliminate duplicated rows from your dataframe? Because that would be better achieved with duplicated().
>
>
> B.
>
>
>
>
> > On 2019-11-08, at 10:32, Ana Marija <[hidden email]> wrote:
> >
> > would you know how would I extract from my original data frame, just
> > these unique rows?
> > because this gives me only those 3 columns, and I want all columns
> > from the original data frame
> >
> >> head(udt)
> >   chr   pos         gene_id
> > 1 chr1 54490 ENSG00000227232
> > 2 chr1 58814 ENSG00000227232
> > 3 chr1 60351 ENSG00000227232
> > 4 chr1 61920 ENSG00000227232
> > 5 chr1 63671 ENSG00000227232
> > 6 chr1 64931 ENSG00000227232
> >
> >> head(dt)
> >    chr   pos         gene_id pval_nominal pval_ret       wl      wr      META
> > 1: chr1 54490 ENSG00000227232     0.608495 0.783778 31.62278 21.2838 0.7475480
> > 2: chr1 58814 ENSG00000227232     0.295211 0.897582 31.62278 21.2838 0.6031214
> > 3: chr1 60351 ENSG00000227232     0.439788 0.867959 31.62278 21.2838 0.6907182
> > 4: chr1 61920 ENSG00000227232     0.319528 0.601809 31.62278 21.2838 0.4032200
> > 5: chr1 63671 ENSG00000227232     0.237739 0.988039 31.62278 21.2838 0.7482519
> > 6: chr1 64931 ENSG00000227232     0.276679 0.907037 31.62278 21.2838 0.5974800
> >
> > On Fri, Nov 8, 2019 at 9:30 AM Ana Marija <[hidden email]> wrote:
> >>
> >> Thank you so much! Converting it to data frame resolved the issue!
> >>
> >> On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
> >> <[hidden email]> wrote:
> >>>
> >>> It seems as if dt is not a (base R) data frame but a
> >>> data table. I assume, you will have to transform dt
> >>> into a data frame (maybe with as.data.frame) to be
> >>> able to apply unique in the suggested way. However,
> >>> I am not familiar with data tables. Perhaps somebody
> >>> else can provide a more profound guess.
> >>>
> >>>  Regards  --  Gerrit
> >>>
> >>> ---------------------------------------------------------------------
> >>> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
> >>> [hidden email]   Justus-Liebig-University Giessen
> >>> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> >>> http://www.uni-giessen.de/eichner
> >>> ---------------------------------------------------------------------
> >>>
> >>> Am 08.11.2019 um 16:02 schrieb Ana Marija:
> >>>> I tried it but I got this error:
> >>>>> udt <- unique(dt[c("chr", "pos", "gene_id")])
> >>>> Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
> >>>>   When i is a data.table (or character vector), the columns to join by
> >>>> must be specified using 'on=' argument (see ?data.table), by keying x
> >>>> (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
> >>>> column names between x and i (i.e., a natural join). Keyed joins might
> >>>> have further speed benefits on very large data due to x being sorted
> >>>> in RAM.
> >>>>
> >>>> On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
> >>>> <[hidden email]> wrote:
> >>>>>
> >>>>> Hi, Ana,
> >>>>>
> >>>>> doesn't
> >>>>>
> >>>>> udt <- unique(dt[c("chr", "pos", "gene_id")])
> >>>>> nrow(udt)
> >>>>>
> >>>>> get close to what you want?
> >>>>>
> >>>>>   Hth  --  Gerrit
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
> >>>>> [hidden email]   Justus-Liebig-University Giessen
> >>>>> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> >>>>> http://www.uni-giessen.de/eichner
> >>>>> ---------------------------------------------------------------------
> >>>>>
> >>>>> Am 08.11.2019 um 15:38 schrieb Ana Marija:
> >>>>>> Hello,
> >>>>>>
> >>>>>> I have a data frame like this:
> >>>>>>
> >>>>>>> head(dt,20)
> >>>>>>       chr    pos         gene_id pval_nominal  pval_ret       wl      wr
> >>>>>>   1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
> >>>>>>   2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
> >>>>>>   3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
> >>>>>>   4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
> >>>>>>   5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
> >>>>>>   6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
> >>>>>>   7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
> >>>>>>   8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
> >>>>>>   9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
> >>>>>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838
> >>>>>>
> >>>>>> it is very big,
> >>>>>>> dim(dt)
> >>>>>> [1] 73719122        8
> >>>>>>
> >>>>>> To count number of unique rows for all 3 columns: chr, pos and gene_id
> >>>>>> I could just join those 3 columns and than count. But how would I find
> >>>>>> unique number of rows for these 4 columns without joining them?
> >>>>>>
> >>>>>> Thanks
> >>>>>> Ana
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>
> >>>>>
> >>>>> ______________________________________________
> >>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

Bert Gunter-2
In reply to this post by Bert Gunter-2
Correction:
df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
df[!duplicated(df[,2:3]), ]  ## Note the ! sign

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Nov 8, 2019 at 7:59 AM Bert Gunter <[hidden email]> wrote:

> Sorry, but you ask basic questions.You really need to spend some more time
> with an R tutorial or two. This list is not meant to replace your own
> learning efforts.
>
> You also do not seem to be reading the docs carefully. Under ?unique, it
> links ?duplicated and tells you that it gives indices of duplicated rows of
> a data frame. These then can be used by subscripting to remove those rows
> from the data frame. Here is a reproducible example:
>
> df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
> df[-duplicated(df[,2:3]), ]  ## Note the - sign
>
> If you prefer, the "Tidyverse" world has what are purported to be more
> user-friendly versions of such data handling functionality that you can use
> instead.
>
>
> Bert
>
> On Fri, Nov 8, 2019 at 7:38 AM Ana Marija <[hidden email]>
> wrote:
>
>> would you know how would I extract from my original data frame, just
>> these unique rows?
>> because this gives me only those 3 columns, and I want all columns
>> from the original data frame
>>
>> > head(udt)
>>    chr   pos         gene_id
>> 1 chr1 54490 ENSG00000227232
>> 2 chr1 58814 ENSG00000227232
>> 3 chr1 60351 ENSG00000227232
>> 4 chr1 61920 ENSG00000227232
>> 5 chr1 63671 ENSG00000227232
>> 6 chr1 64931 ENSG00000227232
>>
>> > head(dt)
>>     chr   pos         gene_id pval_nominal pval_ret       wl      wr
>> META
>> 1: chr1 54490 ENSG00000227232     0.608495 0.783778 31.62278 21.2838
>> 0.7475480
>> 2: chr1 58814 ENSG00000227232     0.295211 0.897582 31.62278 21.2838
>> 0.6031214
>> 3: chr1 60351 ENSG00000227232     0.439788 0.867959 31.62278 21.2838
>> 0.6907182
>> 4: chr1 61920 ENSG00000227232     0.319528 0.601809 31.62278 21.2838
>> 0.4032200
>> 5: chr1 63671 ENSG00000227232     0.237739 0.988039 31.62278 21.2838
>> 0.7482519
>> 6: chr1 64931 ENSG00000227232     0.276679 0.907037 31.62278 21.2838
>> 0.5974800
>>
>> On Fri, Nov 8, 2019 at 9:30 AM Ana Marija <[hidden email]>
>> wrote:
>> >
>> > Thank you so much! Converting it to data frame resolved the issue!
>> >
>> > On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
>> > <[hidden email]> wrote:
>> > >
>> > > It seems as if dt is not a (base R) data frame but a
>> > > data table. I assume, you will have to transform dt
>> > > into a data frame (maybe with as.data.frame) to be
>> > > able to apply unique in the suggested way. However,
>> > > I am not familiar with data tables. Perhaps somebody
>> > > else can provide a more profound guess.
>> > >
>> > >   Regards  --  Gerrit
>> > >
>> > > ---------------------------------------------------------------------
>> > > Dr. Gerrit Eichner                   Mathematical Institute, Room 212
>> > > [hidden email]   Justus-Liebig-University Giessen
>> > > Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
>> > > http://www.uni-giessen.de/eichner
>> > > ---------------------------------------------------------------------
>> > >
>> > > Am 08.11.2019 um 16:02 schrieb Ana Marija:
>> > > > I tried it but I got this error:
>> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
>> > > > Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
>> > > >    When i is a data.table (or character vector), the columns to
>> join by
>> > > > must be specified using 'on=' argument (see ?data.table), by keying
>> x
>> > > > (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
>> > > > column names between x and i (i.e., a natural join). Keyed joins
>> might
>> > > > have further speed benefits on very large data due to x being sorted
>> > > > in RAM.
>> > > >
>> > > > On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
>> > > > <[hidden email]> wrote:
>> > > >>
>> > > >> Hi, Ana,
>> > > >>
>> > > >> doesn't
>> > > >>
>> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
>> > > >> nrow(udt)
>> > > >>
>> > > >> get close to what you want?
>> > > >>
>> > > >>    Hth  --  Gerrit
>> > > >>
>> > > >>
>> ---------------------------------------------------------------------
>> > > >> Dr. Gerrit Eichner                   Mathematical Institute, Room
>> 212
>> > > >> [hidden email]   Justus-Liebig-University
>> Giessen
>> > > >> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen,
>> Germany
>> > > >> http://www.uni-giessen.de/eichner
>> > > >>
>> ---------------------------------------------------------------------
>> > > >>
>> > > >> Am 08.11.2019 um 15:38 schrieb Ana Marija:
>> > > >>> Hello,
>> > > >>>
>> > > >>> I have a data frame like this:
>> > > >>>
>> > > >>>> head(dt,20)
>> > > >>>        chr    pos         gene_id pval_nominal  pval_ret
>>  wl      wr
>> > > >>>    1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278
>> 21.2838
>> > > >>>    2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278
>> 21.2838
>> > > >>>    3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278
>> 21.2838
>> > > >>>    4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278
>> 21.2838
>> > > >>>    5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278
>> 21.2838
>> > > >>>    6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278
>> 21.2838
>> > > >>>    7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278
>> 21.2838
>> > > >>>    8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278
>> 21.2838
>> > > >>>    9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278
>> 21.2838
>> > > >>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278
>> 21.2838
>> > > >>>
>> > > >>> it is very big,
>> > > >>>> dim(dt)
>> > > >>> [1] 73719122        8
>> > > >>>
>> > > >>> To count number of unique rows for all 3 columns: chr, pos and
>> gene_id
>> > > >>> I could just join those 3 columns and than count. But how would I
>> find
>> > > >>> unique number of rows for these 4 columns without joining them?
>> > > >>>
>> > > >>> Thanks
>> > > >>> Ana
>> > > >>>
>> > > >>> ______________________________________________
>> > > >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > > >>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > > >>> and provide commented, minimal, self-contained, reproducible code.
>> > > >>>
>> > > >>
>> > > >> ______________________________________________
>> > > >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > > >> https://stat.ethz.ch/mailman/listinfo/r-help
>> > > >> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > > >> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

anikaM
Thank you so much!!!

On Fri, Nov 8, 2019 at 11:40 AM Bert Gunter <[hidden email]> wrote:

>
> Correction:
> df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
> df[!duplicated(df[,2:3]), ]  ## Note the ! sign
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Nov 8, 2019 at 7:59 AM Bert Gunter <[hidden email]> wrote:
>>
>> Sorry, but you ask basic questions.You really need to spend some more time with an R tutorial or two. This list is not meant to replace your own learning efforts.
>>
>> You also do not seem to be reading the docs carefully. Under ?unique, it links ?duplicated and tells you that it gives indices of duplicated rows of a data frame. These then can be used by subscripting to remove those rows from the data frame. Here is a reproducible example:
>>
>> df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
>> df[-duplicated(df[,2:3]), ]  ## Note the - sign
>>
>> If you prefer, the "Tidyverse" world has what are purported to be more user-friendly versions of such data handling functionality that you can use instead.
>>
>>
>> Bert
>>
>> On Fri, Nov 8, 2019 at 7:38 AM Ana Marija <[hidden email]> wrote:
>>>
>>> would you know how would I extract from my original data frame, just
>>> these unique rows?
>>> because this gives me only those 3 columns, and I want all columns
>>> from the original data frame
>>>
>>> > head(udt)
>>>    chr   pos         gene_id
>>> 1 chr1 54490 ENSG00000227232
>>> 2 chr1 58814 ENSG00000227232
>>> 3 chr1 60351 ENSG00000227232
>>> 4 chr1 61920 ENSG00000227232
>>> 5 chr1 63671 ENSG00000227232
>>> 6 chr1 64931 ENSG00000227232
>>>
>>> > head(dt)
>>>     chr   pos         gene_id pval_nominal pval_ret       wl      wr      META
>>> 1: chr1 54490 ENSG00000227232     0.608495 0.783778 31.62278 21.2838 0.7475480
>>> 2: chr1 58814 ENSG00000227232     0.295211 0.897582 31.62278 21.2838 0.6031214
>>> 3: chr1 60351 ENSG00000227232     0.439788 0.867959 31.62278 21.2838 0.6907182
>>> 4: chr1 61920 ENSG00000227232     0.319528 0.601809 31.62278 21.2838 0.4032200
>>> 5: chr1 63671 ENSG00000227232     0.237739 0.988039 31.62278 21.2838 0.7482519
>>> 6: chr1 64931 ENSG00000227232     0.276679 0.907037 31.62278 21.2838 0.5974800
>>>
>>> On Fri, Nov 8, 2019 at 9:30 AM Ana Marija <[hidden email]> wrote:
>>> >
>>> > Thank you so much! Converting it to data frame resolved the issue!
>>> >
>>> > On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
>>> > <[hidden email]> wrote:
>>> > >
>>> > > It seems as if dt is not a (base R) data frame but a
>>> > > data table. I assume, you will have to transform dt
>>> > > into a data frame (maybe with as.data.frame) to be
>>> > > able to apply unique in the suggested way. However,
>>> > > I am not familiar with data tables. Perhaps somebody
>>> > > else can provide a more profound guess.
>>> > >
>>> > >   Regards  --  Gerrit
>>> > >
>>> > > ---------------------------------------------------------------------
>>> > > Dr. Gerrit Eichner                   Mathematical Institute, Room 212
>>> > > [hidden email]   Justus-Liebig-University Giessen
>>> > > Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
>>> > > http://www.uni-giessen.de/eichner
>>> > > ---------------------------------------------------------------------
>>> > >
>>> > > Am 08.11.2019 um 16:02 schrieb Ana Marija:
>>> > > > I tried it but I got this error:
>>> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
>>> > > > Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
>>> > > >    When i is a data.table (or character vector), the columns to join by
>>> > > > must be specified using 'on=' argument (see ?data.table), by keying x
>>> > > > (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
>>> > > > column names between x and i (i.e., a natural join). Keyed joins might
>>> > > > have further speed benefits on very large data due to x being sorted
>>> > > > in RAM.
>>> > > >
>>> > > > On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
>>> > > > <[hidden email]> wrote:
>>> > > >>
>>> > > >> Hi, Ana,
>>> > > >>
>>> > > >> doesn't
>>> > > >>
>>> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
>>> > > >> nrow(udt)
>>> > > >>
>>> > > >> get close to what you want?
>>> > > >>
>>> > > >>    Hth  --  Gerrit
>>> > > >>
>>> > > >> ---------------------------------------------------------------------
>>> > > >> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
>>> > > >> [hidden email]   Justus-Liebig-University Giessen
>>> > > >> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
>>> > > >> http://www.uni-giessen.de/eichner
>>> > > >> ---------------------------------------------------------------------
>>> > > >>
>>> > > >> Am 08.11.2019 um 15:38 schrieb Ana Marija:
>>> > > >>> Hello,
>>> > > >>>
>>> > > >>> I have a data frame like this:
>>> > > >>>
>>> > > >>>> head(dt,20)
>>> > > >>>        chr    pos         gene_id pval_nominal  pval_ret       wl      wr
>>> > > >>>    1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
>>> > > >>>    2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
>>> > > >>>    3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
>>> > > >>>    4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
>>> > > >>>    5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
>>> > > >>>    6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
>>> > > >>>    7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
>>> > > >>>    8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
>>> > > >>>    9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
>>> > > >>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838
>>> > > >>>
>>> > > >>> it is very big,
>>> > > >>>> dim(dt)
>>> > > >>> [1] 73719122        8
>>> > > >>>
>>> > > >>> To count number of unique rows for all 3 columns: chr, pos and gene_id
>>> > > >>> I could just join those 3 columns and than count. But how would I find
>>> > > >>> unique number of rows for these 4 columns without joining them?
>>> > > >>>
>>> > > >>> Thanks
>>> > > >>> Ana
>>> > > >>>
>>> > > >>> ______________________________________________
>>> > > >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> > > >>> and provide commented, minimal, self-contained, reproducible code.
>>> > > >>>
>>> > > >>
>>> > > >> ______________________________________________
>>> > > >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > > >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> > > >> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

Martin Morgan-4
With this example

> df = data.frame(a = c(1, 1, 2, 2), b = c(1, 1, 2, 3), value = 1:4)
> df
  a b value
1 1 1     1
2 1 1     2
3 2 2     3
4 2 3     4

The approach to drop duplicates in the first and second columns has as a consequence the arbitrary choice of 'value' for the duplicate entries -- why chose a value of '1' rather than '2' (or the average of 1 and 2, or a list containing all possible values, or...) for the rows duplicated in columns a and b?

> df[!duplicated(df[,1:2]),]
  a b value
1 1 1     1
3 2 2     3
4 2 3     4

In base R one might

> aggregate(value ~ a + b, df, mean)
  a b value
1 1 1   1.5
2 2 2   3.0
3 2 3   4.0
> aggregate(value ~ a + b, df, list)
  a b value
1 1 1  1, 2
2 2 2     3
3 2 3     4

but handling several value-like columns would be hard(?)

Using library(dplyr), I have

> group_by(df, a, b) %>% summarize(mean_value = mean(value))
# A tibble: 3 x 3
# Groups:   a [2]
      a     b mean_value
  <dbl> <dbl>      <dbl>
1     1     1        1.5
2     2     2        3
3     2     3        4

or

> group_by(df, a, b) %>% summarize(values = list(value))
# A tibble: 3 x 3
# Groups:   a [2]
      a     b values
  <dbl> <dbl> <list>
1     1     1 <int [2]>
2     2     2 <int [1]>
3     2     3 <int [1]>

summarizing multiple columns with dplyr

> df$v1 = 1:4
> df$v2 = 4:1                                                                  
>  group_by(df, a, b) %>% summarize(v1_mean = mean(v1), v2_median = median(v2))
# A tibble: 3 x 4
# Groups:   a [2]
      a     b v1_mean v2_median
  <dbl> <dbl>   <dbl>     <dbl>
1     1     1     1.5       3.5
2     2     2     3         2
3     2     3     4         1

I do not know how performant this would be with data of your size.

Martin Morgan

On 11/8/19, 1:39 PM, "R-help on behalf of Ana Marija" <[hidden email] on behalf of [hidden email]> wrote:

    Thank you so much!!!
   
    On Fri, Nov 8, 2019 at 11:40 AM Bert Gunter <[hidden email]> wrote:
    >
    > Correction:
    > df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
    > df[!duplicated(df[,2:3]), ]  ## Note the ! sign
    >
    > Bert Gunter
    >
    > "The trouble with having an open mind is that people keep coming along and sticking things into it."
    > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
    >
    >
    > On Fri, Nov 8, 2019 at 7:59 AM Bert Gunter <[hidden email]> wrote:
    >>
    >> Sorry, but you ask basic questions.You really need to spend some more time with an R tutorial or two. This list is not meant to replace your own learning efforts.
    >>
    >> You also do not seem to be reading the docs carefully. Under ?unique, it links ?duplicated and tells you that it gives indices of duplicated rows of a data frame. These then can be used by subscripting to remove those rows from the data frame. Here is a reproducible example:
    >>
    >> df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
    >> df[-duplicated(df[,2:3]), ]  ## Note the - sign
    >>
    >> If you prefer, the "Tidyverse" world has what are purported to be more user-friendly versions of such data handling functionality that you can use instead.
    >>
    >>
    >> Bert
    >>
    >> On Fri, Nov 8, 2019 at 7:38 AM Ana Marija <[hidden email]> wrote:
    >>>
    >>> would you know how would I extract from my original data frame, just
    >>> these unique rows?
    >>> because this gives me only those 3 columns, and I want all columns
    >>> from the original data frame
    >>>
    >>> > head(udt)
    >>>    chr   pos         gene_id
    >>> 1 chr1 54490 ENSG00000227232
    >>> 2 chr1 58814 ENSG00000227232
    >>> 3 chr1 60351 ENSG00000227232
    >>> 4 chr1 61920 ENSG00000227232
    >>> 5 chr1 63671 ENSG00000227232
    >>> 6 chr1 64931 ENSG00000227232
    >>>
    >>> > head(dt)
    >>>     chr   pos         gene_id pval_nominal pval_ret       wl      wr      META
    >>> 1: chr1 54490 ENSG00000227232     0.608495 0.783778 31.62278 21.2838 0.7475480
    >>> 2: chr1 58814 ENSG00000227232     0.295211 0.897582 31.62278 21.2838 0.6031214
    >>> 3: chr1 60351 ENSG00000227232     0.439788 0.867959 31.62278 21.2838 0.6907182
    >>> 4: chr1 61920 ENSG00000227232     0.319528 0.601809 31.62278 21.2838 0.4032200
    >>> 5: chr1 63671 ENSG00000227232     0.237739 0.988039 31.62278 21.2838 0.7482519
    >>> 6: chr1 64931 ENSG00000227232     0.276679 0.907037 31.62278 21.2838 0.5974800
    >>>
    >>> On Fri, Nov 8, 2019 at 9:30 AM Ana Marija <[hidden email]> wrote:
    >>> >
    >>> > Thank you so much! Converting it to data frame resolved the issue!
    >>> >
    >>> > On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
    >>> > <[hidden email]> wrote:
    >>> > >
    >>> > > It seems as if dt is not a (base R) data frame but a
    >>> > > data table. I assume, you will have to transform dt
    >>> > > into a data frame (maybe with as.data.frame) to be
    >>> > > able to apply unique in the suggested way. However,
    >>> > > I am not familiar with data tables. Perhaps somebody
    >>> > > else can provide a more profound guess.
    >>> > >
    >>> > >   Regards  --  Gerrit
    >>> > >
    >>> > > ---------------------------------------------------------------------
    >>> > > Dr. Gerrit Eichner                   Mathematical Institute, Room 212
    >>> > > [hidden email]   Justus-Liebig-University Giessen
    >>> > > Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
    >>> > > http://www.uni-giessen.de/eichner
    >>> > > ---------------------------------------------------------------------
    >>> > >
    >>> > > Am 08.11.2019 um 16:02 schrieb Ana Marija:
    >>> > > > I tried it but I got this error:
    >>> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
    >>> > > > Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
    >>> > > >    When i is a data.table (or character vector), the columns to join by
    >>> > > > must be specified using 'on=' argument (see ?data.table), by keying x
    >>> > > > (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
    >>> > > > column names between x and i (i.e., a natural join). Keyed joins might
    >>> > > > have further speed benefits on very large data due to x being sorted
    >>> > > > in RAM.
    >>> > > >
    >>> > > > On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
    >>> > > > <[hidden email]> wrote:
    >>> > > >>
    >>> > > >> Hi, Ana,
    >>> > > >>
    >>> > > >> doesn't
    >>> > > >>
    >>> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
    >>> > > >> nrow(udt)
    >>> > > >>
    >>> > > >> get close to what you want?
    >>> > > >>
    >>> > > >>    Hth  --  Gerrit
    >>> > > >>
    >>> > > >> ---------------------------------------------------------------------
    >>> > > >> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
    >>> > > >> [hidden email]   Justus-Liebig-University Giessen
    >>> > > >> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
    >>> > > >> http://www.uni-giessen.de/eichner
    >>> > > >> ---------------------------------------------------------------------
    >>> > > >>
    >>> > > >> Am 08.11.2019 um 15:38 schrieb Ana Marija:
    >>> > > >>> Hello,
    >>> > > >>>
    >>> > > >>> I have a data frame like this:
    >>> > > >>>
    >>> > > >>>> head(dt,20)
    >>> > > >>>        chr    pos         gene_id pval_nominal  pval_ret       wl      wr
    >>> > > >>>    1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
    >>> > > >>>    2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
    >>> > > >>>    3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
    >>> > > >>>    4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
    >>> > > >>>    5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
    >>> > > >>>    6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
    >>> > > >>>    7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
    >>> > > >>>    8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
    >>> > > >>>    9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
    >>> > > >>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838
    >>> > > >>>
    >>> > > >>> it is very big,
    >>> > > >>>> dim(dt)
    >>> > > >>> [1] 73719122        8
    >>> > > >>>
    >>> > > >>> To count number of unique rows for all 3 columns: chr, pos and gene_id
    >>> > > >>> I could just join those 3 columns and than count. But how would I find
    >>> > > >>> unique number of rows for these 4 columns without joining them?
    >>> > > >>>
    >>> > > >>> Thanks
    >>> > > >>> Ana
    >>> > > >>>
    >>> > > >>> ______________________________________________
    >>> > > >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    >>> > > >>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>> > > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    >>> > > >>> and provide commented, minimal, self-contained, reproducible code.
    >>> > > >>>
    >>> > > >>
    >>> > > >> ______________________________________________
    >>> > > >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    >>> > > >> https://stat.ethz.ch/mailman/listinfo/r-help
    >>> > > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    >>> > > >> and provide commented, minimal, self-contained, reproducible code.
    >>>
    >>> ______________________________________________
    >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    >>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    >>> and provide commented, minimal, self-contained, reproducible code.
   
    ______________________________________________
    [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
   
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to find number of unique rows for combination of r columns

Rui Barradas
Hello,

If performance is important, and with 73M rows it probably is, take a
look at this StackOverflow post.

[1] https://stackoverflow.com/a/36058634/8245406


Hope this helps,

Rui Barradas

Às 21:33 de 08/11/19, Martin Morgan escreveu:

> With this example
>
>> df = data.frame(a = c(1, 1, 2, 2), b = c(1, 1, 2, 3), value = 1:4)
>> df
>    a b value
> 1 1 1     1
> 2 1 1     2
> 3 2 2     3
> 4 2 3     4
>
> The approach to drop duplicates in the first and second columns has as a consequence the arbitrary choice of 'value' for the duplicate entries -- why chose a value of '1' rather than '2' (or the average of 1 and 2, or a list containing all possible values, or...) for the rows duplicated in columns a and b?
>
>> df[!duplicated(df[,1:2]),]
>    a b value
> 1 1 1     1
> 3 2 2     3
> 4 2 3     4
>
> In base R one might
>
>> aggregate(value ~ a + b, df, mean)
>    a b value
> 1 1 1   1.5
> 2 2 2   3.0
> 3 2 3   4.0
>> aggregate(value ~ a + b, df, list)
>    a b value
> 1 1 1  1, 2
> 2 2 2     3
> 3 2 3     4
>
> but handling several value-like columns would be hard(?)
>
> Using library(dplyr), I have
>
>> group_by(df, a, b) %>% summarize(mean_value = mean(value))
> # A tibble: 3 x 3
> # Groups:   a [2]
>        a     b mean_value
>    <dbl> <dbl>      <dbl>
> 1     1     1        1.5
> 2     2     2        3
> 3     2     3        4
>
> or
>
>> group_by(df, a, b) %>% summarize(values = list(value))
> # A tibble: 3 x 3
> # Groups:   a [2]
>        a     b values
>    <dbl> <dbl> <list>
> 1     1     1 <int [2]>
> 2     2     2 <int [1]>
> 3     2     3 <int [1]>
>
> summarizing multiple columns with dplyr
>
>> df$v1 = 1:4
>> df$v2 = 4:1
>>   group_by(df, a, b) %>% summarize(v1_mean = mean(v1), v2_median = median(v2))
> # A tibble: 3 x 4
> # Groups:   a [2]
>        a     b v1_mean v2_median
>    <dbl> <dbl>   <dbl>     <dbl>
> 1     1     1     1.5       3.5
> 2     2     2     3         2
> 3     2     3     4         1
>
> I do not know how performant this would be with data of your size.
>
> Martin Morgan
>
> On 11/8/19, 1:39 PM, "R-help on behalf of Ana Marija" <[hidden email] on behalf of [hidden email]> wrote:
>
>      Thank you so much!!!
>      
>      On Fri, Nov 8, 2019 at 11:40 AM Bert Gunter <[hidden email]> wrote:
>      >
>      > Correction:
>      > df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
>      > df[!duplicated(df[,2:3]), ]  ## Note the ! sign
>      >
>      > Bert Gunter
>      >
>      > "The trouble with having an open mind is that people keep coming along and sticking things into it."
>      > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>      >
>      >
>      > On Fri, Nov 8, 2019 at 7:59 AM Bert Gunter <[hidden email]> wrote:
>      >>
>      >> Sorry, but you ask basic questions.You really need to spend some more time with an R tutorial or two. This list is not meant to replace your own learning efforts.
>      >>
>      >> You also do not seem to be reading the docs carefully. Under ?unique, it links ?duplicated and tells you that it gives indices of duplicated rows of a data frame. These then can be used by subscripting to remove those rows from the data frame. Here is a reproducible example:
>      >>
>      >> df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
>      >> df[-duplicated(df[,2:3]), ]  ## Note the - sign
>      >>
>      >> If you prefer, the "Tidyverse" world has what are purported to be more user-friendly versions of such data handling functionality that you can use instead.
>      >>
>      >>
>      >> Bert
>      >>
>      >> On Fri, Nov 8, 2019 at 7:38 AM Ana Marija <[hidden email]> wrote:
>      >>>
>      >>> would you know how would I extract from my original data frame, just
>      >>> these unique rows?
>      >>> because this gives me only those 3 columns, and I want all columns
>      >>> from the original data frame
>      >>>
>      >>> > head(udt)
>      >>>    chr   pos         gene_id
>      >>> 1 chr1 54490 ENSG00000227232
>      >>> 2 chr1 58814 ENSG00000227232
>      >>> 3 chr1 60351 ENSG00000227232
>      >>> 4 chr1 61920 ENSG00000227232
>      >>> 5 chr1 63671 ENSG00000227232
>      >>> 6 chr1 64931 ENSG00000227232
>      >>>
>      >>> > head(dt)
>      >>>     chr   pos         gene_id pval_nominal pval_ret       wl      wr      META
>      >>> 1: chr1 54490 ENSG00000227232     0.608495 0.783778 31.62278 21.2838 0.7475480
>      >>> 2: chr1 58814 ENSG00000227232     0.295211 0.897582 31.62278 21.2838 0.6031214
>      >>> 3: chr1 60351 ENSG00000227232     0.439788 0.867959 31.62278 21.2838 0.6907182
>      >>> 4: chr1 61920 ENSG00000227232     0.319528 0.601809 31.62278 21.2838 0.4032200
>      >>> 5: chr1 63671 ENSG00000227232     0.237739 0.988039 31.62278 21.2838 0.7482519
>      >>> 6: chr1 64931 ENSG00000227232     0.276679 0.907037 31.62278 21.2838 0.5974800
>      >>>
>      >>> On Fri, Nov 8, 2019 at 9:30 AM Ana Marija <[hidden email]> wrote:
>      >>> >
>      >>> > Thank you so much! Converting it to data frame resolved the issue!
>      >>> >
>      >>> > On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner
>      >>> > <[hidden email]> wrote:
>      >>> > >
>      >>> > > It seems as if dt is not a (base R) data frame but a
>      >>> > > data table. I assume, you will have to transform dt
>      >>> > > into a data frame (maybe with as.data.frame) to be
>      >>> > > able to apply unique in the suggested way. However,
>      >>> > > I am not familiar with data tables. Perhaps somebody
>      >>> > > else can provide a more profound guess.
>      >>> > >
>      >>> > >   Regards  --  Gerrit
>      >>> > >
>      >>> > > ---------------------------------------------------------------------
>      >>> > > Dr. Gerrit Eichner                   Mathematical Institute, Room 212
>      >>> > > [hidden email]   Justus-Liebig-University Giessen
>      >>> > > Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
>      >>> > > http://www.uni-giessen.de/eichner
>      >>> > > ---------------------------------------------------------------------
>      >>> > >
>      >>> > > Am 08.11.2019 um 16:02 schrieb Ana Marija:
>      >>> > > > I tried it but I got this error:
>      >>> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
>      >>> > > > Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
>      >>> > > >    When i is a data.table (or character vector), the columns to join by
>      >>> > > > must be specified using 'on=' argument (see ?data.table), by keying x
>      >>> > > > (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
>      >>> > > > column names between x and i (i.e., a natural join). Keyed joins might
>      >>> > > > have further speed benefits on very large data due to x being sorted
>      >>> > > > in RAM.
>      >>> > > >
>      >>> > > > On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
>      >>> > > > <[hidden email]> wrote:
>      >>> > > >>
>      >>> > > >> Hi, Ana,
>      >>> > > >>
>      >>> > > >> doesn't
>      >>> > > >>
>      >>> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")])
>      >>> > > >> nrow(udt)
>      >>> > > >>
>      >>> > > >> get close to what you want?
>      >>> > > >>
>      >>> > > >>    Hth  --  Gerrit
>      >>> > > >>
>      >>> > > >> ---------------------------------------------------------------------
>      >>> > > >> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
>      >>> > > >> [hidden email]   Justus-Liebig-University Giessen
>      >>> > > >> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
>      >>> > > >> http://www.uni-giessen.de/eichner
>      >>> > > >> ---------------------------------------------------------------------
>      >>> > > >>
>      >>> > > >> Am 08.11.2019 um 15:38 schrieb Ana Marija:
>      >>> > > >>> Hello,
>      >>> > > >>>
>      >>> > > >>> I have a data frame like this:
>      >>> > > >>>
>      >>> > > >>>> head(dt,20)
>      >>> > > >>>        chr    pos         gene_id pval_nominal  pval_ret       wl      wr
>      >>> > > >>>    1: chr1  54490 ENSG00000227232    0.6084950 0.7837780 31.62278 21.2838
>      >>> > > >>>    2: chr1  58814 ENSG00000227232    0.2952110 0.8975820 31.62278 21.2838
>      >>> > > >>>    3: chr1  60351 ENSG00000227232    0.4397880 0.8679590 31.62278 21.2838
>      >>> > > >>>    4: chr1  61920 ENSG00000227232    0.3195280 0.6018090 31.62278 21.2838
>      >>> > > >>>    5: chr1  63671 ENSG00000227232    0.2377390 0.9880390 31.62278 21.2838
>      >>> > > >>>    6: chr1  64931 ENSG00000227232    0.2766790 0.9070370 31.62278 21.2838
>      >>> > > >>>    7: chr1  81587 ENSG00000227232    0.6057930 0.6167630 31.62278 21.2838
>      >>> > > >>>    8: chr1 115746 ENSG00000227232    0.4078770 0.7799110 31.62278 21.2838
>      >>> > > >>>    9: chr1 135203 ENSG00000227232    0.4078770 0.9299130 31.62278 21.2838
>      >>> > > >>> 10: chr1 138593 ENSG00000227232    0.8464560 0.5696060 31.62278 21.2838
>      >>> > > >>>
>      >>> > > >>> it is very big,
>      >>> > > >>>> dim(dt)
>      >>> > > >>> [1] 73719122        8
>      >>> > > >>>
>      >>> > > >>> To count number of unique rows for all 3 columns: chr, pos and gene_id
>      >>> > > >>> I could just join those 3 columns and than count. But how would I find
>      >>> > > >>> unique number of rows for these 4 columns without joining them?
>      >>> > > >>>
>      >>> > > >>> Thanks
>      >>> > > >>> Ana
>      >>> > > >>>
>      >>> > > >>> ______________________________________________
>      >>> > > >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>      >>> > > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>      >>> > > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>      >>> > > >>> and provide commented, minimal, self-contained, reproducible code.
>      >>> > > >>>
>      >>> > > >>
>      >>> > > >> ______________________________________________
>      >>> > > >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>      >>> > > >> https://stat.ethz.ch/mailman/listinfo/r-help
>      >>> > > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>      >>> > > >> and provide commented, minimal, self-contained, reproducible code.
>      >>>
>      >>> ______________________________________________
>      >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>      >>> https://stat.ethz.ch/mailman/listinfo/r-help
>      >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>      >>> and provide commented, minimal, self-contained, reproducible code.
>      
>      ______________________________________________
>      [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>      https://stat.ethz.ch/mailman/listinfo/r-help
>      PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>      and provide commented, minimal, self-contained, reproducible code.
>      
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.