On Mon, 30 Jan 2006, Ole Edsberg wrote:

> Hello,

>

> I have a data set on which I run the sammon algorithm as follows:

>

> library(MASS)

> data = read.table('problemforr.dat')

Hmm. This is a data frame of 387 rows and 387 columns and Euclidean

distance is used. Squeezing 387 dims (and PCA shows these points as well

spread in almost all those dimensions) to 2 is not a well-posed problem,

and you should welcome the plurality of answers found.

> y = cmdscale(data, add=TRUE)

> s = sammon(data, y$points)

>

> (In case it should be relevant, I make the data available at

>

http://idi.ntnu.no/~edsberg/problemforr.dat)

>

> With R 2.2.1 on Debian Sid I always get one of two solutions (stress

> 1.74288 after 10 iterations or stress 1.33629 afer 9 iterations). I

> always get the same result within the same R session, even if I read

> the data again. With R 2.2.0 on SunOS 5.9 I always get the same result

> (stress 0.13186 after 74 iterations).

Note that your subject line attributes this to sammon, but it could also

be due to cmdscale.

On AMD64 Linux I get

> s = sammon(data, y$points)

Initial stress : 2.21024

stress after 10 iters: 1.22268, magic = 0.092

stress after 20 iters: 0.48801, magic = 0.009

stress after 30 iters: 0.35007, magic = 0.020

stress after 40 iters: 0.24377, magic = 0.045

stress after 50 iters: 0.17343, magic = 0.021

stress after 60 iters: 0.14944, magic = 0.048

stress after 70 iters: 0.12810, magic = 0.022

stress after 80 iters: 0.12423, magic = 0.010

stress after 90 iters: 0.12191, magic = 0.118

stress after 100 iters: 0.11986, magic = 0.500

That large reduction in `magic' indicates the algorithm is having

problems. Without optimization (used for valgrind) I got the solution you

quoted for Solaris 9.

However, on all four systems (AMD64 FC3 Linux, i686 FC3 Linux, Solaris and

Windows) I tried the results were different between systems and repeatable

by system. I even ran under valgrind to be sure that no uninitialized

areas were used (on FC3).

> I understand that the sammon algorithm is very sensitive to even tiny

> variations in the starting point, but the observed behaviour seems

> strange to me. Difference between machines could perhaps be explained

> by floating point portability issues, but not difference on the same

> machine, and not the fact that i get the same result within the same R

> session.

No, but then that is not reproducible, and has never been reported before.

If for example different BLAS libraries get selected on different runs

this would explain it. Or it could be a Debian-Sid-specific bug in a

shared library or compiler.

> I read in the documentation

> (

http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/sammon.html)

> that "Further, since the configuration is only determined up to

> rotations and reflections (by convention the centroid is at the origin),

> the result can vary considerably from machine to machine." This doesn't

> make sense to me.

Note that is addressing a separate issue. For a given minimized stress

there are multiple solutions which can be transformed into each other, and

the help file is warning you of that. There are also (in general)

multiple local minima.

> If the data and the algorithm is the same, the result should be the

> same.

Depending what you mean by 'algorithm', this is what the subject of

numerical analysis is about. I take it you are familiar with J. H.

Wilkinson's classic work on the Algebraic Eigenvalue Problem?

> What differences between machines do they refer to here? Floating

> point issues?

Any difference in the CPU/FPU or compiler or run-time environment

(including all the dynamically linked support libraries). Just changing

the optimization level of the compiler changes the assembler-level

algorithm used, and can often affect the answer of e.g. an eigenvalue

calculation. Rounding errors depend on whether (and when)

extended-precision registers are used and the exact order of the

calculations since computer arithmetic is not distributive.

--

Brian D. Ripley,

[hidden email]
Professor of Applied Statistics,

http://www.stats.ox.ac.uk/~ripley/University of Oxford, Tel: +44 1865 272861 (self)

1 South Parks Road, +44 1865 272866 (PA)

Oxford OX1 3TG, UK Fax: +44 1865 272595

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html