# ecdf() to nls() - how to transform data? Classic List Threaded 9 messages Open this post in threaded view
|

## ecdf() to nls() - how to transform data?

 Hi, I am using ecdf-function and want to use the ecdf()-data-points for nls() as data-parameter. nls() expects 'list' or 'environment' as a data-type, knots(ecdf(mydata)) gives me 'numeric'. What should I do now? Thanks in advance - Jochen Here is the code: ################################################# # --- Fit --- # Gumbel-Dist-Function, cumulative, http://en.wikipedia.org/wiki/Gumbel_distribution#               ( - ( x - mue ) / beta ) #          ( -e )^ # F(x) = e^ # formula for fitting-function # formula: y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) / beta ) ) # data: ecdf( hgd\$V1 ) # start: list( mue=0.1, beta=0.025 ) gpfunction <- ecdf( hgd\$V1 ) gplistrange <- seq( 0.0, 1, by=0.001 ) gplist <- gpfunction( gplistrange ) print( gplist ) print( class( gplist ) ) print("---") #res <- nls( y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) / beta ) ), df, list( mue=0.1, beta=0.025 ), trace=TRUE ) #print( summary( res ) )
Open this post in threaded view
|

## Re: ecdf() to nls() - how to transform data?

 On Jul 16, 2011, at 8:17 AM, Jochen1980 wrote: > Hi, > > I am using ecdf-function and want to use the ecdf()-data-points for   > nls() as > data-parameter. > nls() expects 'list' or 'environment' as a data-type,   > knots(ecdf(mydata)) > gives me 'numeric'. If you put them into 'df' with appropriate name and add an appropriate   'y' value you should get success. See below. And perhaps you should   plot your ecdf so you can figure out what you y-values should be. > What should I do now? > > Thanks in advance - Jochen > > Here is the code: > ################################################# > # --- Fit --- > # Gumbel-Dist-Function, cumulative, > http://en.wikipedia.org/wiki/Gumbel_distribution> #               ( - ( x - mue ) / beta ) > #          ( -e )^ > # F(x) = e^ > # formula for fitting-function > # formula: y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) / beta ) ) > # data: ecdf( hgd\$V1 ) > # start: list( mue=0.1, beta=0.025 ) > gpfunction <- ecdf( hgd\$V1 ) > gplistrange <- seq( 0.0, 1, by=0.001 ) > gplist <- gpfunction( gplistrange ) > print( gplist ) > print( class( gplist ) ) > print("---") > #res <- nls( y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) /   > beta ) ), df, > list( mue=0.1, beta=0.025 ), trace=TRUE ) This may or may not work. I didn't try it because I was pretty sure   that the I() was unnecessary and I knew that exp(1)^(.) was just a   convoluted way of doing exp(.), so it can be considerably simplified: The 'df' object needs to be constructed so that the variables inside   the function match column names in df, so you need df to have columns   'y' and 'x'. Try it out with a function like rgamma. Generate a random sample,   rgamma(100, shape=2), apply ecdf, construct a 'df' argument and use   nls to solve for y ~ pgamma(knots, shape). Then substitute in your   (hopefully) simplified Gumbel function. > #print( summary( res ) ) David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: ecdf() to nls() - how to transform data?

 In reply to this post by Jochen1980 On Jul 16, 2011, at 14:17 , Jochen1980 wrote: > Hi, > > I am using ecdf-function and want to use the ecdf()-data-points for nls() as > data-parameter. > nls() expects 'list' or 'environment' as a data-type, knots(ecdf(mydata)) > gives me 'numeric'. > What should I do now? Consider using fitdistr() from the MASS package. What you're trying to do is just wrong! > > Thanks in advance - Jochen > > Here is the code: > ################################################# > # --- Fit --- > # Gumbel-Dist-Function, cumulative, > http://en.wikipedia.org/wiki/Gumbel_distribution> #               ( - ( x - mue ) / beta ) > #          ( -e )^ > # F(x) = e^ > # formula for fitting-function > # formula: y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) / beta ) ) Ouch!!!! Do teach yourself what exp() means. I believe that wants to be exp( - exp( - ( x - mue ) / beta ) ) notice that fitdistr() needs the density, though. I.e. the derivative of the above with respect to x. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: [hidden email]  Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: ecdf() to nls() - how to transform data?

 Thanks David and Peter! I tried to improve my R-script to get closer to my goal. I guess I have to use nls(), because later I want to work with Levenberg-Marquardt-Algorithm and when I got it right, LM-Algorithm uses least squares as well, fitdistr() instead uses Maximum Likelihoods. Anyway, I am wondering, how to create the appropriate data.frame for nls()? In my opinion all data must be there: I got the absolute numbers in my dataset from the internet. I got the histogram/density function right, as you can see in the graph number 3 after running the following example. I got the cumulative density right as well - so how to find x and y for the data-frame in line 116? Is it necessary/easier to get this example worked with absolute numbers? The two easy tutorials for nls() worked out as well, so that can't be that hard I guess, but I am sitting here for two days and got no idea what to do now!? Thanks in advance for a little more input guys. ################################################# ### R-Skript Example Fit                     ### ################################################# cat( " ************************************************************* \n" ) cat( "  Script curvefit - from histogram to function \n" ) cat( " \n" ) # Preparation setwd( "/home/joba/jobascripts/rproject/" ) rm( list=ls( all=TRUE ) ) # Organise X11, x Rows, y Graphs par( mfrow=c( 2, 2 ) ) ############### tutorial 1 ############################ # Build data.frame() x <- seq( 0.0, 0.30, length=10 )   # seq( from, to, by ) y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 ) # vals of target function df <- data.frame( x=x, y=y ) #print( df ) plot( df, main="Tutorial 1" ) # Formula: y = x^a , look for a res <- nls( y ~ I(x^exponent), data = df, start = list( exponent = 1 ), trace = T ) print( round( summary(res)\$coefficients, 3 )) lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" ) # Formula: y = x^a + b, look for a and b res2 <- nls( y ~ I( x^exponent + gamma ), data = df, start = list( exponent=1, gamma=0 ), trace = TRUE ) print( round( summary(res2)\$coefficients, 3 )) lines( x, predict( res2, list( x = x ) ), lty = 1, col = "red" ) ############### tutorial 2 ############################ len <- 24 x <- runif( len ) # 24 random values between 0 and 1 y <- x^3 + rnorm( len, 0, 0.06 ) # compute y-values and add some noise ds <- data.frame( x = x, y = y ) # build data.frame #str( ds ) # Compute prediction s <- seq( 0, 1, length = 100 ) # compute real-data, to get y-values m <- nls( y ~ I(x^power), data = ds, start = list( power = 1 ), trace = T ) #print( class(m) ) print( summary(m) ) # plot fitted curve and known curve power <- round( summary(m)\$coefficients, 3 ) # get power from summary power.se <- round( summary(m)\$coefficients, 3 ) # get failure from summary plot( y ~ x, main = "Tutorial 2", sub="Blue: fit; green: known" ) lines( s, s^3, lty = 2, col = "green" ) # known real data lines( s, predict( m, list( x = s )), lty = 1, col = "blue" ) # predicted vals text( 0, 0.5, paste( "y =x^ (", power, " +/- ", power.se, ")", sep=""), pos=4 ) ############### real data ############################# # --- Create Points --- # Real data and density hgd <- read.csv( file="http://www.jochen-bauer.net/downloads/curve-fitting-tutorial/data01.txt", sep=",", head=FALSE) #print( hgd\$V1 ) print( summary( hgd\$V1 ) ) denspoints <- density( hgd\$V1 ) ##### define variables for later use msa <- "MSA-Name" c1 <- "col1" c2 <- "col2" histtitle <- paste( msa, "Spalten:", c1, ",", c2 ) ################################################# # --- Plot --- # Organise X11, x Rows, y Graphs #par( mfrow=c( 1, 2 ) ) # Histogram histo <- hist( hgd\$V1, freq=FALSE, main=histtitle, xlab="U-Value", ylab="Density") # Density-Line lines( denspoints, col="GREEN", lwd=1 ) # Plot cumulatives plot( ecdf( hgd\$V1 ), col="GREEN", lwd=1, main="", xlab="U-Value", ylab="Probability" ) # real data ################################################# # --- Fit --- # Matching Gumbel distribution as tip from expert # Gumbel-Dist-Function, cumulative #               ( - ( x - mue ) / beta ) #          ( -e )^ # F(x) = e^ # => exp( - exp( - ( x - mue ) / beta ) )  # Thanks to Peter Dalgaard :-) # Starting guesses: mue = medianOfData, beta = sdevOfData mueStart <- median( hgd\$V1 ) betaStart <- sd( hgd\$V1 ) print( paste( "mueStart: ", mueStart )) print( paste( "betaStart: ", betaStart )) # Build data.frame() # x = min of uvals to max of uvals # y = probabilities, which add to 1, normalized histogramms could help ? ### XXXXXXXXXXX ######### XXXXXXXXXXXXX ############################# x <- seq( 0.0, 0.30, length=10 )   # seq( from, to, by ) y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 ) # vals of target function df <- data.frame( x=x, y=y ) print( df ) # Formula: y = x^a , look for a res <- nls( y ~ exp( - exp( - ( x - mue ) / beta )), data = df, start = list( mue = mueStart, beta = betaStart ), trace = T ) print( round( summary(res)\$coefficients, 3 )) lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" ) cat( "\n Skript curvefit - Ende\n" ) cat( " --- \n" )
Open this post in threaded view
|

## Re: ecdf() to nls() - how to transform data?

Open this post in threaded view
|

## Re: ecdf() to nls() - how to transform data?

Open this post in threaded view
|

## Re: ecdf() to nls() - how to transform data?

 Hi David, my first attempt to work through your code was successful, my predicted line is pretty close to the ecdf-function. I have no idea why you inverted the gumbel-function and what the advantage of this strategy is? I interpret your (1:100/100)-trick like this: you build a sequence of 100 Tics. In addition to this 100 is identical to the number of created points. In my real world situation, ecdf() will produce ties, same values, and this results that I do not know what number knot() will return, then this will result in a non-symmetric-data.frame()-error. R-Code: ############### Tutorial 3, Gamma-Distribution-Points to Gumbel ################ # --- create Points --- # http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-td3671754.htmlprint( "Tutorial 3" ) p <- rgamma( 100, 2) print( p ) ep <- ecdf( p ) x <- knots(ep) hist( p, main="Tutorial 3" ) ypre <- seq( from=1, to=100, by=1 ) y <- ypre / 100 df <- data.frame( x=x, y=y ) plot( df ) res <- nls( y ~ exp( - exp( - ( x - mue ) / beta )), start=list(mue=2, beta=2), data=df, trace=TRUE )   print( summary(res) ) #print( str(res)) lines( x, predict( res, list( x=x )), lty = 1, col = "blue" ) # predicted vals Console-Output:  "Tutorial 3"    4.75748951 5.36642043 2.03025702 1.80216440 1.40745277 0.91393251    1.30575505 2.07451301 1.18500815 0.24972503 1.36604865 2.36786796   1.37386798 1.72390843 1.93700139 0.78722468 2.92828385 1.51165415   4.17000960 2.98356424 1.20195996 1.85899533 0.94863757 1.50438053   0.50854274 3.32760075 2.21334316 0.58463817 2.26985676 2.88033389   2.00718903 1.19043470 1.62945752 0.32010478 0.54837755 1.32965131   3.32024573 0.53825226 3.30077557 0.46426513 1.00681720 0.59276030   1.31431609 2.27822419 2.56404713 0.52459218 1.05228996 2.23799673   1.14438962 2.41311612 1.40254244 1.20379073 0.44195457 0.95880408   1.45254027 4.49645001 2.18826796 1.07161515 1.62617544 3.15506003   2.15611491 2.43261017 1.40293461 2.79977886 3.44503201 3.25282551   0.91570531 0.70243512 5.67971774 3.55532192 1.71515046 2.97718949   0.47145131 0.32879089 0.60735231 0.92388638 4.29277857 1.73681839   0.09387383 2.81281295 1.84419058 2.63070353 1.52298124 2.89761263   1.05251987 1.24258703 3.09130229 2.66738574 3.17035060 2.15287327   2.63042751 1.69736779 3.21126033 1.56920999 2.68985477 0.82952068   3.62230865 1.65286592 2.76798783 2.08091935 Formula: y ~ exp(-exp(-(x - mue)/beta)) Parameters:      Estimate Std. Error t value Pr(>|t|)     mue  1.382744   0.005926   233.3   <2e-16 *** beta 0.984836   0.008816   111.7   <2e-16 ***