Mathematical working procedure of duplicated() function in r

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Mathematical working procedure of duplicated() function in r

K Purna Prakash
Dear Sir(s),
I request you to provide the detailed* internal mathematical working
mechanism of the following function *for better understanding.
*x[duplicated(x) | duplicated(x, fromLast=TRUE), ]*
I am having some confusion in understanding how duplicates are being
identified when thousands of records are there.
I will look for a positive response.
Thank you,
K.Purna Prakash.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Mathematical working procedure of duplicated() function in r

Rui Barradas
Hello,

R is open source, you can see exactly what is the internal working of
any function. You can have access to the code by typing the function's
name without parenthesis at an R command line.

 > duplicated
function (x, incomparables = FALSE, ...)
UseMethod("duplicated")
<bytecode: 0x55e5ef683040>
<environment: namespace:base>

Now, this tells users that duplicated is a generic function, and that
there are methods written to handle the different S3 classes of objects x.
When this happens, there is always a default method, duplicated.default

 > duplicated.default
function (x, incomparables = FALSE, fromLast = FALSE, nmax = NA,
     ...)
.Internal(duplicated(x, incomparables, fromLast, if (is.factor(x))
min(length(x),
     nlevels(x) + 1L) else nmax))
<bytecode: 0x55e5ef6826a0>
<environment: namespace:base>


The default method calls .Internal(duplicated, etc). So you'll have to
download the R sources, if you haven't done it yet, and search for a
file where that function might be. The file is

src/main/duplicate.c


Good reading.
Also, like the posting guide asks R-Help users to do, please post in
plain text, not in HTML.

Hope this helps,

Rui Barradas

Às 12:54 de 04/08/20, K Purna Prakash escreveu:

> Dear Sir(s),
> I request you to provide the detailed* internal mathematical working
> mechanism of the following function *for better understanding.
> *x[duplicated(x) | duplicated(x, fromLast=TRUE), ]*
> I am having some confusion in understanding how duplicates are being
> identified when thousands of records are there.
> I will look for a positive response.
> Thank you,
> K.Purna Prakash.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Mathematical working procedure of duplicated() function in r

glsnow
In reply to this post by K Purna Prakash
Rui pointed out that you can examine the source yourself.  FAQ 7.40
has a link to an article with detail on finding and examining the
source code.

A general algorithm for checking for duplicates follows (I have not
examined to R source code to see if they use something more clever).

Create an empty object (I will call it seen).  This could be a simple
vector, but for efficiency it is better to use an object type that has
fast lookup, e.g. binary tree, associative array/hash/dictionary, etc.

Create an empty vector of logicals the same length as x (I will call it result).

loop from 1 to the length of x (or from the length to 1 if
fromLast=TRUE), on each iteration
 check to see if the value of x[i] is in seen
   If it is: set result[i] to TRUE
   If it is not: add the current value to seen and set result[i] to false

After the loop finishes, throw away seen and reclaim the memory, then
return result.

Since it looks like you are using this on a matrix or data frame,
there is probably a preprocessing step that combines all the values on
each row into a single character string.

On Tue, Aug 4, 2020 at 6:45 AM K Purna Prakash <[hidden email]> wrote:

>
> Dear Sir(s),
> I request you to provide the detailed* internal mathematical working
> mechanism of the following function *for better understanding.
> *x[duplicated(x) | duplicated(x, fromLast=TRUE), ]*
> I am having some confusion in understanding how duplicates are being
> identified when thousands of records are there.
> I will look for a positive response.
> Thank you,
> K.Purna Prakash.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Gregory (Greg) L. Snow Ph.D.
[hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.