Retrieving the 2 row of "dist" computations

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Retrieving the 2 row of "dist" computations

Jeff08
Dear R Gurus,

As you probably know, dist calculates the distance between every two rows of data. What I am interested in is the actual two rows that have the least distance between them, rather than the numerical value of the distance itself.

For example, If the minimum distance in the following sample run is d[14], which is .3826119, and the rows are 4 & 6. I need to find a generic way to retrieve these rows, for a generic matrix of NRows (in this example NRows=7)

NCols=5
NRows=7
myMat<-matrix(runif(NCols*NRows), ncol=NCols)

d<-dist(myMat)

          1         2         3         4         5         6
2 0.7202138                                                  
3 0.7866527 0.9052319                                        
4 0.6105235 1.0754259 0.8897555                              
5 0.5032729 1.0789359 0.9756421 0.4167131                    
6 0.6007685 0.6949224 0.3826119 0.7590029 0.7994574          
7 0.9751200 1.2218754 1.0547197 0.5681905 0.7795579 0.8291303

e<-sort.list(d)
e<-e[1:5]  ##Retrieve minimum 5 distances

[1] 14 16  4 18  5
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving the 2 row of "dist" computations

Jorge I Velez
Hi there,

I am sure there is a better way to do it, but here is a suggestion:

res <- matrix(NA, ncol = 2, nrow = 5)
for(i in 1:5) res[i, ] <- which(as.matrix(d) == sort(d)[i], arr.ind =
TRUE)[1,]
res

HTH,
Jorge


On Wed, Jun 9, 2010 at 11:30 PM, Jeff08 <> wrote:

>
> Dear R Gurus,
>
> As you probably know, dist calculates the distance between every two rows
> of
> data. What I am interested in is the actual two rows that have the least
> distance between them, rather than the numerical value of the distance
> itself.
>
> For example, If the minimum distance in the following sample run is d[14],
> which is .3826119, and the rows are 4 & 6. I need to find a generic way to
> retrieve these rows, for a generic matrix of NRows (in this example
> NRows=7)
>
> NCols=5
> NRows=7
> myMat<-matrix(runif(NCols*NRows), ncol=NCols)
>
> d<-dist(myMat)
>
>          1         2         3         4         5         6
> 2 0.7202138
> 3 0.7866527 0.9052319
> 4 0.6105235 1.0754259 0.8897555
> 5 0.5032729 1.0789359 0.9756421 0.4167131
> 6 0.6007685 0.6949224 0.3826119 0.7590029 0.7994574
> 7 0.9751200 1.2218754 1.0547197 0.5681905 0.7795579 0.8291303
>
> e<-sort.list(d)
> e<-e[1:5]  ##Retrieve minimum 5 distances
>
> [1] 14 16  4 18  5
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Retrieving-the-2-row-of-dist-computations-tp2249844p2249844.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving the 2 row of "dist" computations

Bill.Venables
In reply to this post by Jeff08
This is a lazy way, and a slightly extravagant way if your memory is limited and you are dealing with large numbers of rows.

NCols <- 5
NRows <- 7
myMat <- matrix(runif(NCols*NRows), ncol=NCols)

d <- dist(myMat)

dm <- as.matrix(d)
diag(dm) <- Inf
ij <- which(dm == min(dm), arr.ind = TRUE)[1,]
ij

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Jeff08
Sent: Thursday, 10 June 2010 1:30 PM
To: [hidden email]
Subject: [R] Retrieving the 2 row of "dist" computations


Dear R Gurus,

As you probably know, dist calculates the distance between every two rows of
data. What I am interested in is the actual two rows that have the least
distance between them, rather than the numerical value of the distance
itself.

For example, If the minimum distance in the following sample run is d[14],
which is .3826119, and the rows are 4 & 6. I need to find a generic way to
retrieve these rows, for a generic matrix of NRows (in this example NRows=7)

NCols=5
NRows=7
myMat<-matrix(runif(NCols*NRows), ncol=NCols)

d<-dist(myMat)

          1         2         3         4         5         6
2 0.7202138                                                  
3 0.7866527 0.9052319                                        
4 0.6105235 1.0754259 0.8897555                              
5 0.5032729 1.0789359 0.9756421 0.4167131                    
6 0.6007685 0.6949224 0.3826119 0.7590029 0.7994574          
7 0.9751200 1.2218754 1.0547197 0.5681905 0.7795579 0.8291303

e<-sort.list(d)
e<-e[1:5]  ##Retrieve minimum 5 distances

[1] 14 16  4 18  5
--
View this message in context: http://r.789695.n4.nabble.com/Retrieving-the-2-row-of-dist-computations-tp2249844p2249844.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving the 2 row of "dist" computations

Jeff08
In reply to this post by Jorge I Velez
Hey,

The code definitely works, but I may need a more elegant way to do it. Rather than 5 rows, the full data contains 829 rows, so instead of d of length 10, d will be of length 343206.

Jorge Ivan Velez wrote
Hi there,

I am sure there is a better way to do it, but here is a suggestion:

res <- matrix(NA, ncol = 2, nrow = 5)
for(i in 1:5) res[i, ] <- which(as.matrix(d) == sort(d)[i], arr.ind =
TRUE)[1,]
res

HTH,
Jorge


On Wed, Jun 9, 2010 at 11:30 PM, Jeff08 <> wrote:

>
> Dear R Gurus,
>
> As you probably know, dist calculates the distance between every two rows
> of
> data. What I am interested in is the actual two rows that have the least
> distance between them, rather than the numerical value of the distance
> itself.
>
> For example, If the minimum distance in the following sample run is d[14],
> which is .3826119, and the rows are 4 & 6. I need to find a generic way to
> retrieve these rows, for a generic matrix of NRows (in this example
> NRows=7)
>
> NCols=5
> NRows=7
> myMat<-matrix(runif(NCols*NRows), ncol=NCols)
>
> d<-dist(myMat)
>
>          1         2         3         4         5         6
> 2 0.7202138
> 3 0.7866527 0.9052319
> 4 0.6105235 1.0754259 0.8897555
> 5 0.5032729 1.0789359 0.9756421 0.4167131
> 6 0.6007685 0.6949224 0.3826119 0.7590029 0.7994574
> 7 0.9751200 1.2218754 1.0547197 0.5681905 0.7795579 0.8291303
>
> e<-sort.list(d)
> e<-e[1:5]  ##Retrieve minimum 5 distances
>
> [1] 14 16  4 18  5
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Retrieving-the-2-row-of-dist-computations-tp2249844p2249844.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving the 2 row of "dist" computations

Jeff08
Edit:

There is something funky about the code. It definitely returns the right column of the "distance" data, but returns an incorrect row.

Code:

NCols=250
NRows=829
myMat<-matrix(runif(NCols*NRows), ncol=NCols)

d<-dist(myMat)
e<-sort.list(d)
e<-e[1:5]  ##Retrieve minimum 5 distances

k <- 5
res <- matrix(NA, ncol = 2, nrow = k)
ds <- sort(d)
for(i in 1:k) res[i, ] <- which(as.matrix(d) == ds[i], arr.ind = TRUE)[1,]
colnames(res) <- c('row','col')
rownames(res) <- 1:k
res

I have derived the formula for 829 rows, to check if the returned column and row matches the index given by e.

Column # = x, Row # = y. n = 828-(x-2)
index = y+(n+828)(828-n+1)/2


Formula R CODE

##Just checking for row 1
i<-1
y<-res[i,1]
x<-res[i,2]
n<-(828-(x-2))
index1<-(y+(n+828)*(828-n+1)/2)
index2<-e[i]
##index1 should equal index2, but this is not the case
##you can tell that the column is right because index1 & index 2 is close
##(a change in row of 1 shifts the index by 1, but a change in column
## shifts index by ~400 on average)

You can then compare this index to the one given by e[i]


Jorge Ivan Velez wrote
Hi there,

I am sure there is a better way to do it, but here is a suggestion:

res <- matrix(NA, ncol = 2, nrow = 5)
for(i in 1:5) res[i, ] <- which(as.matrix(d) == sort(d)[i], arr.ind =
TRUE)[1,]
res

HTH,
Jorge


On Wed, Jun 9, 2010 at 11:30 PM, Jeff08 <> wrote:

>
> Dear R Gurus,
>
> As you probably know, dist calculates the distance between every two rows
> of
> data. What I am interested in is the actual two rows that have the least
> distance between them, rather than the numerical value of the distance
> itself.
>
> For example, If the minimum distance in the following sample run is d[14],
> which is .3826119, and the rows are 4 & 6. I need to find a generic way to
> retrieve these rows, for a generic matrix of NRows (in this example
> NRows=7)
>
> NCols=5
> NRows=7
> myMat<-matrix(runif(NCols*NRows), ncol=NCols)
>
> d<-dist(myMat)
>
>          1         2         3         4         5         6
> 2 0.7202138
> 3 0.7866527 0.9052319
> 4 0.6105235 1.0754259 0.8897555
> 5 0.5032729 1.0789359 0.9756421 0.4167131
> 6 0.6007685 0.6949224 0.3826119 0.7590029 0.7994574
> 7 0.9751200 1.2218754 1.0547197 0.5681905 0.7795579 0.8291303
>
> e<-sort.list(d)
> e<-e[1:5]  ##Retrieve minimum 5 distances
>
> [1] 14 16  4 18  5
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Retrieving-the-2-row-of-dist-computations-tp2249844p2249844.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply | Threaded
Open this post in threaded view
|

Re: Retrieving the 2 row of "dist" computations

Jeff08
Edit:

I'm stupid and visualized the "dist" matrix incorrectly in my head.

Should be
Column # = x, Row # = y. n = 827-(x-2)
index = y-1+(n+827)(827-n+1)/2

Everything works just fine. Thanks!

Jeff08 wrote
Edit:

There is something funky about the code. It definitely returns the right column of the "distance" data, but returns an incorrect row.

Code:

NCols=250
NRows=829
myMat<-matrix(runif(NCols*NRows), ncol=NCols)

d<-dist(myMat)
e<-sort.list(d)
e<-e[1:5]  ##Retrieve minimum 5 distances

k <- 5
res <- matrix(NA, ncol = 2, nrow = k)
ds <- sort(d)
for(i in 1:k) res[i, ] <- which(as.matrix(d) == ds[i], arr.ind = TRUE)[1,]
colnames(res) <- c('row','col')
rownames(res) <- 1:k
res

I have derived the formula for 829 rows, to check if the returned column and row matches the index given by e.

Column # = x, Row # = y. n = 828-(x-2)
index = y+(n+828)(828-n+1)/2


Formula R CODE

##Just checking for row 1
i<-1
y<-res[i,1]
x<-res[i,2]
n<-(828-(x-2))
index1<-(y+(n+828)*(828-n+1)/2)
index2<-e[i]
##index1 should equal index2, but this is not the case
##you can tell that the column is right because index1 & index 2 is close
##(a change in row of 1 shifts the index by 1, but a change in column
## shifts index by ~400 on average)

You can then compare this index to the one given by e[i]


Jorge Ivan Velez wrote
Hi there,

I am sure there is a better way to do it, but here is a suggestion:

res <- matrix(NA, ncol = 2, nrow = 5)
for(i in 1:5) res[i, ] <- which(as.matrix(d) == sort(d)[i], arr.ind =
TRUE)[1,]
res

HTH,
Jorge


On Wed, Jun 9, 2010 at 11:30 PM, Jeff08 <> wrote:

>
> Dear R Gurus,
>
> As you probably know, dist calculates the distance between every two rows
> of
> data. What I am interested in is the actual two rows that have the least
> distance between them, rather than the numerical value of the distance
> itself.
>
> For example, If the minimum distance in the following sample run is d[14],
> which is .3826119, and the rows are 4 & 6. I need to find a generic way to
> retrieve these rows, for a generic matrix of NRows (in this example
> NRows=7)
>
> NCols=5
> NRows=7
> myMat<-matrix(runif(NCols*NRows), ncol=NCols)
>
> d<-dist(myMat)
>
>          1         2         3         4         5         6
> 2 0.7202138
> 3 0.7866527 0.9052319
> 4 0.6105235 1.0754259 0.8897555
> 5 0.5032729 1.0789359 0.9756421 0.4167131
> 6 0.6007685 0.6949224 0.3826119 0.7590029 0.7994574
> 7 0.9751200 1.2218754 1.0547197 0.5681905 0.7795579 0.8291303
>
> e<-sort.list(d)
> e<-e[1:5]  ##Retrieve minimum 5 distances
>
> [1] 14 16  4 18  5
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Retrieving-the-2-row-of-dist-computations-tp2249844p2249844.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.