unique mismatch in R and Excel

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

unique mismatch in R and Excel

Koushik Saha
i have a wired problem. i want to count the unique entry in a certain
column.Here i have attached my csv file.

i am doing this to get the unique entries in the column.

dat<-read.csv("C:/Project/Gawk-scripts/Book1.csv")
names(dat)<-c("user_name")
unique(dat$user_name)

results says i have 170 unique values.


But i am doing "remove duplicate entries"  in excel i am having 147 unique
entries in the column.

Can anyone explain why there is a mismatch of the results or i am doing
something wrong.

Regards
Koushik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: unique mismatch in R and Excel

Duncan Murdoch-2
On 13-12-24 4:08 AM, Koushik Saha wrote:

> i have a wired problem. i want to count the unique entry in a certain
> column.Here i have attached my csv file.
>
> i am doing this to get the unique entries in the column.
>
> dat<-read.csv("C:/Project/Gawk-scripts/Book1.csv")
> names(dat)<-c("user_name")
> unique(dat$user_name)
>
> results says i have 170 unique values.
>
>
> But i am doing "remove duplicate entries"  in excel i am having 147 unique
> entries in the column.
>
> Can anyone explain why there is a mismatch of the results or i am doing
> something wrong.
>

Surely you can just compare the lists.  147 is not that many entries,
and if they are sorted, it will be easy.

Duncan Murdoch

> Regards
> Koushik
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: unique mismatch in R and Excel

David Winsemius
In reply to this post by Koushik Saha

On Dec 24, 2013, at 1:08 AM, Koushik Saha wrote:

> i have a wired problem. i want to count the unique entry in a certain
> column.Here i have attached my csv file.

Files named with extension .csv do not typically make it through the R-help mail server.

>
> i am doing this to get the unique entries in the column.
>
> dat<-read.csv("C:/Project/Gawk-scripts/Book1.csv")
> names(dat)<-c("user_name")
> unique(dat$user_name)
>
> results says i have 170 unique values.
>
>
> But i am doing "remove duplicate entries"  in excel i am having 147 unique
> entries in the column.
>
> Can anyone explain why there is a mismatch of the results or i am doing
> something wrong.
>

Rename the file to have an extension of .txt. Then you mail-client will probably label it correctly as a MIME-TEXT file.

--
David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: unique mismatch in R and Excel

barry rowlingson
In reply to this post by Koushik Saha
We answered this on StackOverflow already. Excel was doing
case-insensitive duplicate matching.

http://stackoverflow.com/questions/20759346/counting-unique-values-in-r-and-excel/20759523#20759523

Barry

On Tue, Dec 24, 2013 at 5:43 PM, David Winsemius <[hidden email]> wrote:

>
> On Dec 24, 2013, at 1:08 AM, Koushik Saha wrote:
>
>> i have a wired problem. i want to count the unique entry in a certain
>> column.Here i have attached my csv file.
>
> Files named with extension .csv do not typically make it through the R-help mail server.
>
>>
>> i am doing this to get the unique entries in the column.
>>
>> dat<-read.csv("C:/Project/Gawk-scripts/Book1.csv")
>> names(dat)<-c("user_name")
>> unique(dat$user_name)
>>
>> results says i have 170 unique values.
>>
>>
>> But i am doing "remove duplicate entries"  in excel i am having 147 unique
>> entries in the column.
>>
>> Can anyone explain why there is a mismatch of the results or i am doing
>> something wrong.
>>
>
> Rename the file to have an extension of .txt. Then you mail-client will probably label it correctly as a MIME-TEXT file.
>
> --
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.