# Re: Transport and Earth Mover's Distance

4 messages
Open this post in threaded view
|

## Re: Transport and Earth Mover's Distance

 Dear Lorenzo, The code you have attached (if you replace 00 by 50) generates two sets of 50 points each that are uniformly distributed on the unit square [0,1]^2 and then computes the Wasserstein-1 distance between them assuming that each point has mass 1/50. Do the following to get a plot of the optimal point assignment (the average of the line lengths is the Wasserstein-1 distance). x <- pp(matrix(runif(100),50,2)) y <- pp(matrix(runif(100),50,2)) match <- transport(x,y,p=1) plot(x,y,match) wasserstein(x,y,p=1,match) Currently the transport package only matches "objects" with the same total mass, i.e. point patterns (class pp) with equal numbers of points, weighted point patterns (class wpp) where e.g. the mass at each point is given by 1/(no. of points), and pixel grids (class pgrid) which are supposed to have the same total mass summed over all pixel values. If you have no particular need for binning, check out the function pppdist in the R-package spatstat, which offers a more flexible way to deal with point patterns of different size. If you need to bin, check out transport.pgrid, but be aware that it does not do "incomplete transport", but will normalize both pixel grids to unit total mass if these masses are different. Best regards, Dominic > Am 07.03.2017 um 13:29 schrieb Lorenzo Isella <[hidden email]>: > > Dear All, > From time to time I need to resort to the calculation of the earth > mover' distance (see > > https://en.wikipedia.org/wiki/Earth_mover's_distance and > https://en.wikipedia.org/wiki/Wasserstein_metric . > > In the past I used the package > > https://r-forge.r-project.org/projects/earthmovdist/> > which apparently is no longer available, but there is plenty of choice > in R. > > From the transport package, I found this example > > set.seed(27) > x <- pp(matrix(runif(100),50,2)) > y <- pp(matrix(runif(100),00,2)) > wasserstein(x,y,p=1) > > but it is not 100% clear to me how to interpret it. > Are x and y meant as histograms where the the center of each bin is > provided and the total mass in the bins is automatically normalized to > 1? > > Essentially, my situation is that I have two univariate samples of unequal > size. I would like to bin them and calculate the earth mover's > distance between them. > > I am not sure if this is what the example above does. > Cheers > > Lorenzo > > ------------------------------------ Prof. Dr. Dominic Schuhmacher Institute for Mathematical Stochastics University of Goettingen Goldschmidtstrasse 7 D-37077 Goettingen Germany Phone: +49 (0)551 39172107 E-mail: [hidden email] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Transport and Earth Mover's Distance

 Dear Dominic, Thanks a lot for the quick reply. Just a few questions to make sure I got it all right (I now understand that transport and spatstat in particular can do much more than I need right now). Essentially I am after the Wasserstein distance between univariate distributions (and it would be great if I can extend it to the case of two distributions with a different bin structure). 1) two distributions with the same bins (I identify each bin by the central point in the bin). n_bin <- 11 # number of bins bin_structure <- seq(10, by=1, len=n_bin) set.seed(1234) x_counts <- rpois(n_bin, 10) y_counts <- rpois(n_bin, 10) x <- pp(as.matrix(cbind(bin_structure, x_counts))) y <- pp(as.matrix(cbind(bin_structure, y_counts))) match <- transport(x,y,p=1) plot(x,y,match) wasserstein_dist <- wasserstein(x,y,p=1,match) 2) Now I do not have the same bin structure y2 <- pp(as.matrix(cbind(bin_structure+2, y_counts))) match <- transport(x,y2,p=1) plot(x,y2,match) wasserstein_dist2 <- wasserstein(x,y2,p=1,match) Do 1) and 2) make sense? > >If you have no particular need for binning, check out the function >pppdist in the R-package spatstat, which offers a more flexible way >to deal with point patterns of different size. Well, this is not clear, but possibly very important for me. My raw data consists of 2 univariate samples of unequal length. suppose that x<-rnorm(100) and y<-rnorm(90) Is there a way to define the Wasserstein distance between them without going through the binning procedure? Many thanks! Lorenzo ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.