ecdf() to nls() - how to transform data?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

ecdf() to nls() - how to transform data?

Jochen1980
Hi,

I am using ecdf-function and want to use the ecdf()-data-points for nls() as data-parameter.
nls() expects 'list' or 'environment' as a data-type, knots(ecdf(mydata)) gives me 'numeric'.
What should I do now?

Thanks in advance - Jochen

Here is the code:
#################################################
# --- Fit ---
# Gumbel-Dist-Function, cumulative, http://en.wikipedia.org/wiki/Gumbel_distribution
#               ( - ( x - mue ) / beta )
#          ( -e )^
# F(x) = e^
# formula for fitting-function
# formula: y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) / beta ) )
# data: ecdf( hgd$V1 )
# start: list( mue=0.1, beta=0.025 )
gpfunction <- ecdf( hgd$V1 )
gplistrange <- seq( 0.0, 1, by=0.001 )
gplist <- gpfunction( gplistrange )
print( gplist )
print( class( gplist ) )
print("---")
#res <- nls( y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) / beta ) ), df, list( mue=0.1, beta=0.025 ), trace=TRUE )
#print( summary( res ) )
Reply | Threaded
Open this post in threaded view
|

Re: ecdf() to nls() - how to transform data?

David Winsemius

On Jul 16, 2011, at 8:17 AM, Jochen1980 wrote:

> Hi,
>
> I am using ecdf-function and want to use the ecdf()-data-points for  
> nls() as
> data-parameter.
> nls() expects 'list' or 'environment' as a data-type,  
> knots(ecdf(mydata))
> gives me 'numeric'.

If you put them into 'df' with appropriate name and add an appropriate  
'y' value you should get success. See below. And perhaps you should  
plot your ecdf so you can figure out what you y-values should be.

> What should I do now?
>
> Thanks in advance - Jochen
>
> Here is the code:
> #################################################
> # --- Fit ---
> # Gumbel-Dist-Function, cumulative,
> http://en.wikipedia.org/wiki/Gumbel_distribution
> #               ( - ( x - mue ) / beta )
> #          ( -e )^
> # F(x) = e^
> # formula for fitting-function
> # formula: y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) / beta ) )
> # data: ecdf( hgd$V1 )
> # start: list( mue=0.1, beta=0.025 )
> gpfunction <- ecdf( hgd$V1 )
> gplistrange <- seq( 0.0, 1, by=0.001 )
> gplist <- gpfunction( gplistrange )
> print( gplist )
> print( class( gplist ) )
> print("---")
> #res <- nls( y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) /  
> beta ) ), df,
> list( mue=0.1, beta=0.025 ), trace=TRUE )

This may or may not work. I didn't try it because I was pretty sure  
that the I() was unnecessary and I knew that exp(1)^(.) was just a  
convoluted way of doing exp(.), so it can be considerably simplified:

The 'df' object needs to be constructed so that the variables inside  
the function match column names in df, so you need df to have columns  
'y' and 'x'.

Try it out with a function like rgamma. Generate a random sample,  
rgamma(100, shape=2), apply ecdf, construct a 'df' argument and use  
nls to solve for y ~ pgamma(knots, shape). Then substitute in your  
(hopefully) simplified Gumbel function.


> #print( summary( res ) )



David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ecdf() to nls() - how to transform data?

Peter Dalgaard-2
In reply to this post by Jochen1980

On Jul 16, 2011, at 14:17 , Jochen1980 wrote:

> Hi,
>
> I am using ecdf-function and want to use the ecdf()-data-points for nls() as
> data-parameter.
> nls() expects 'list' or 'environment' as a data-type, knots(ecdf(mydata))
> gives me 'numeric'.
> What should I do now?

Consider using fitdistr() from the MASS package. What you're trying to do is just wrong!

>
> Thanks in advance - Jochen
>
> Here is the code:
> #################################################
> # --- Fit ---
> # Gumbel-Dist-Function, cumulative,
> http://en.wikipedia.org/wiki/Gumbel_distribution
> #               ( - ( x - mue ) / beta )
> #          ( -e )^
> # F(x) = e^
> # formula for fitting-function
> # formula: y ~ I( exp(1) ^ ( - exp(1) ) ^ ( - ( x - mue ) / beta ) )

Ouch!!!!

Do teach yourself what exp() means. I believe that wants to be

exp( - exp( - ( x - mue ) / beta ) )


notice that fitdistr() needs the density, though. I.e. the derivative of the above with respect to x.


--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ecdf() to nls() - how to transform data?

Jochen1980
Thanks David and Peter!

I tried to improve my R-script to get closer to my goal.
I guess I have to use nls(), because later I want to work with Levenberg-Marquardt-Algorithm and when I got it right, LM-Algorithm uses least squares as well, fitdistr() instead uses Maximum Likelihoods. Anyway, I am wondering, how to create the appropriate data.frame for nls()? In my opinion all data must be there:
I got the absolute numbers in my dataset from the internet.
I got the histogram/density function right, as you can see in the graph number 3 after running the following example.
I got the cumulative density right as well - so how to find x and y for the data-frame in line 116?
Is it necessary/easier to get this example worked with absolute numbers?
The two easy tutorials for nls() worked out as well, so that can't be that hard I guess, but I am sitting here for two days and got no idea what to do now!? Thanks in advance for a little more input guys.


#################################################
### R-Skript Example Fit                     ###
#################################################

cat( " ************************************************************* \n" )
cat( "  Script curvefit - from histogram to function \n" )
cat( " \n" )

# Preparation
setwd( "/home/joba/jobascripts/rproject/" )
rm( list=ls( all=TRUE ) )

# Organise X11, x Rows, y Graphs
par( mfrow=c( 2, 2 ) )

############### tutorial 1 ############################

# Build data.frame()
x <- seq( 0.0, 0.30, length=10 )   # seq( from, to, by )
y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 ) # vals of target function
df <- data.frame( x=x, y=y )

#print( df )
plot( df, main="Tutorial 1" )

# Formula: y = x^a , look for a
res <- nls( y ~ I(x^exponent), data = df, start = list( exponent = 1 ), trace = T )
print( round( summary(res)$coefficients[1], 3 ))
lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )

# Formula: y = x^a + b, look for a and b
res2 <- nls( y ~ I( x^exponent + gamma ), data = df, start = list( exponent=1, gamma=0 ), trace = TRUE )
print( round( summary(res2)$coefficients[1], 3 ))
lines( x, predict( res2, list( x = x ) ), lty = 1, col = "red" )


############### tutorial 2 ############################

len <- 24
x <- runif( len ) # 24 random values between 0 and 1
y <- x^3 + rnorm( len, 0, 0.06 ) # compute y-values and add some noise
ds <- data.frame( x = x, y = y ) # build data.frame
#str( ds )

# Compute prediction
s <- seq( 0, 1, length = 100 ) # compute real-data, to get y-values
m <- nls( y ~ I(x^power), data = ds, start = list( power = 1 ), trace = T )
#print( class(m) )
print( summary(m) )

# plot fitted curve and known curve
power <- round( summary(m)$coefficients[1], 3 ) # get power from summary
power.se <- round( summary(m)$coefficients[2], 3 ) # get failure from summary
plot( y ~ x, main = "Tutorial 2", sub="Blue: fit; green: known" )
lines( s, s^3, lty = 2, col = "green" ) # known real data
lines( s, predict( m, list( x = s )), lty = 1, col = "blue" ) # predicted vals
text( 0, 0.5, paste( "y =x^ (", power, " +/- ", power.se, ")", sep=""), pos=4 )

############### real data #############################
# --- Create Points ---

# Real data and density
hgd <- read.csv( file="http://www.jochen-bauer.net/downloads/curve-fitting-tutorial/data01.txt", sep=",", head=FALSE)
#print( hgd$V1 )
print( summary( hgd$V1 ) )
denspoints <- density( hgd$V1 )

##### define variables for later use
msa <- "MSA-Name"
c1 <- "col1"
c2 <- "col2"
histtitle <- paste( msa, "Spalten:", c1, ",", c2 )

#################################################
# --- Plot ---

# Organise X11, x Rows, y Graphs
#par( mfrow=c( 1, 2 ) )

# Histogram
histo <- hist( hgd$V1, freq=FALSE, main=histtitle, xlab="U-Value", ylab="Density")

# Density-Line
lines( denspoints, col="GREEN", lwd=1 )

# Plot cumulatives
plot( ecdf( hgd$V1 ), col="GREEN", lwd=1, main="", xlab="U-Value", ylab="Probability" ) # real data

#################################################
# --- Fit ---

# Matching Gumbel distribution as tip from expert

# Gumbel-Dist-Function, cumulative
#               ( - ( x - mue ) / beta )
#          ( -e )^
# F(x) = e^
# => exp( - exp( - ( x - mue ) / beta ) )  # Thanks to Peter Dalgaard :-)

# Starting guesses: mue = medianOfData, beta = sdevOfData
mueStart <- median( hgd$V1 )
betaStart <- sd( hgd$V1 )
print( paste( "mueStart: ", mueStart ))
print( paste( "betaStart: ", betaStart ))

# Build data.frame()
# x = min of uvals to max of uvals
# y = probabilities, which add to 1, normalized histogramms could help ?

### XXXXXXXXXXX ######### XXXXXXXXXXXXX #############################
x <- seq( 0.0, 0.30, length=10 )   # seq( from, to, by )
y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 ) # vals of target function
df <- data.frame( x=x, y=y )
print( df )

# Formula: y = x^a , look for a
res <- nls( y ~ exp( - exp( - ( x - mue ) / beta )), data = df, start = list( mue = mueStart, beta = betaStart ), trace = T )
print( round( summary(res)$coefficients[1], 3 ))
lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )

cat( "\n Skript curvefit - Ende\n" )
cat( " --- \n" )
Reply | Threaded
Open this post in threaded view
|

Re: ecdf() to nls() - how to transform data?

David Winsemius

On Jul 17, 2011, at 4:12 AM, Jochen1980 wrote:

> Thanks David and Peter!
>
> I tried to improve my R-script to get closer to my goal.
> I guess I have to use nls(), because later I want to work with
> Levenberg-Marquardt-Algorithm and when I got it right, LM-Algorithm  
> uses
> least squares as well, fitdistr() instead uses Maximum Likelihoods.  
> Anyway,
> I am wondering, how to create the appropriate data.frame for nls()?  
> In my
> opinion all data must be there:
> I got the absolute numbers in my dataset from the internet.
> I got the histogram/density function right, as you can see in the  
> graph
> number 3 after running the following example.
> I got the cumulative density right as well - so how to find x and y  
> for the
> data-frame in line 116?

The trick I used (which has been criticised by my statistical betters)  
was to recognize that the y values for the ecdf function were simply  
(1:100)/100

df <- dataframe(y=(1:100)/100, x=knots(ecdf)  # worked for me
res <- nls( y ~ exp( - exp( - ( x - mue ) / beta ), start=list(mue=2,  
beta=3), data=df)

The concern statistically was that the nls function is minimizing  
error around y which has no error.  Whether one can invert the  
function and thereby avoid the concern about minimizing error around a  
regular series ... I don't know.

You could certainly try both, since the inverted function is so easy  
to construct:

res <- nls( x ~ beta(- log(- log(y))) - mue, start=list(mue=2,  
beta=3), data=df)


(None of this code was tested, although the first part is very similar  
if not identical to code I tested yesterday. I was concerned this was  
a homework problem and was less specific than I might have been. I now  
see that you have at least made efforts, although I am in a hurry to  
get to a sailing and have not gone through it.)
--
David.


> Is it necessary/easier to get this example worked with absolute  
> numbers?
> The two easy tutorials for nls() worked out as well, so that can't  
> be that
> hard I guess, but I am sitting here for two days and got no idea  
> what to do
> now!? Thanks in advance for a little more input guys.
>
>
> #################################################
> ### R-Skript Example Fit                     ###
> #################################################
>
> cat( " *************************************************************  
> \n" )
> cat( "  Script curvefit - from histogram to function \n" )
> cat( " \n" )
>
> # Preparation
> setwd( "/home/joba/jobascripts/rproject/" )
> rm( list=ls( all=TRUE ) )
>
> # Organise X11, x Rows, y Graphs
> par( mfrow=c( 2, 2 ) )
>
> ############### tutorial 1 ############################
>
> # Build data.frame()
> x <- seq( 0.0, 0.30, length=10 )   # seq( from, to, by )
> y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 )
> # vals
> of target function
> df <- data.frame( x=x, y=y )
>
> #print( df )
> plot( df, main="Tutorial 1" )
>
> # Formula: y = x^a , look for a
> res <- nls( y ~ I(x^exponent), data = df, start = list( exponent =  
> 1 ),
> trace = T )
> print( round( summary(res)$coefficients[1], 3 ))
> lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )
>
> # Formula: y = x^a + b, look for a and b
> res2 <- nls( y ~ I( x^exponent + gamma ), data = df, start = list(
> exponent=1, gamma=0 ), trace = TRUE )
> print( round( summary(res2)$coefficients[1], 3 ))
> lines( x, predict( res2, list( x = x ) ), lty = 1, col = "red" )
>
>
> ############### tutorial 2 ############################
>
> len <- 24
> x <- runif( len ) # 24 random values between 0 and 1
> y <- x^3 + rnorm( len, 0, 0.06 ) # compute y-values and add some noise
> ds <- data.frame( x = x, y = y ) # build data.frame
> #str( ds )
>
> # Compute prediction
> s <- seq( 0, 1, length = 100 ) # compute real-data, to get y-values
> m <- nls( y ~ I(x^power), data = ds, start = list( power = 1 ),  
> trace = T )
> #print( class(m) )
> print( summary(m) )
>
> # plot fitted curve and known curve
> power <- round( summary(m)$coefficients[1], 3 ) # get power from  
> summary
> power.se <- round( summary(m)$coefficients[2], 3 ) # get failure from
> summary
> plot( y ~ x, main = "Tutorial 2", sub="Blue: fit; green: known" )
> lines( s, s^3, lty = 2, col = "green" ) # known real data
> lines( s, predict( m, list( x = s )), lty = 1, col = "blue" ) #  
> predicted
> vals
> text( 0, 0.5, paste( "y =x^ (", power, " +/- ", power.se, ")",  
> sep=""),
> pos=4 )
>
> ############### real data #############################
> # --- Create Points ---
>
> # Real data and density
> hgd <- read.csv(
> file="http://www.jochen-bauer.net/downloads/curve-fitting-tutorial/data01.txt 
> ",
> sep=",", head=FALSE)
> #print( hgd$V1 )
> print( summary( hgd$V1 ) )
> denspoints <- density( hgd$V1 )
>
> ##### define variables for later use
> msa <- "MSA-Name"
> c1 <- "col1"
> c2 <- "col2"
> histtitle <- paste( msa, "Spalten:", c1, ",", c2 )
>
> #################################################
> # --- Plot ---
>
> # Organise X11, x Rows, y Graphs
> #par( mfrow=c( 1, 2 ) )
>
> # Histogram
> histo <- hist( hgd$V1, freq=FALSE, main=histtitle, xlab="U-Value",
> ylab="Density")
>
> # Density-Line
> lines( denspoints, col="GREEN", lwd=1 )
>
> # Plot cumulatives
> plot( ecdf( hgd$V1 ), col="GREEN", lwd=1, main="", xlab="U-Value",
> ylab="Probability" ) # real data
>
> #################################################
> # --- Fit ---
>
> # Matching Gumbel distribution as tip from expert
>
> # Gumbel-Dist-Function, cumulative
> #               ( - ( x - mue ) / beta )
> #          ( -e )^
> # F(x) = e^
> # => exp( - exp( - ( x - mue ) / beta ) )  # Thanks to Peter  
> Dalgaard :-)
>
> # Starting guesses: mue = medianOfData, beta = sdevOfData
> mueStart <- median( hgd$V1 )
> betaStart <- sd( hgd$V1 )
> print( paste( "mueStart: ", mueStart ))
> print( paste( "betaStart: ", betaStart ))
>
> # Build data.frame()
> # x = min of uvals to max of uvals
> # y = probabilities, which add to 1, normalized histogramms could  
> help ?
>
> ### XXXXXXXXXXX ######### XXXXXXXXXXXXX #############################
> x <- seq( 0.0, 0.30, length=10 )   # seq( from, to, by )
> y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 )
> # vals
> of target function
> df <- data.frame( x=x, y=y )
> print( df )
>
> # Formula: y = x^a , look for a
> res <- nls( y ~ exp( - exp( - ( x - mue ) / beta )), data = df,  
> start =
> list( mue = mueStart, beta = betaStart ), trace = T )
> print( round( summary(res)$coefficients[1], 3 ))
> lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )
>
> cat( "\n Skript curvefit - Ende\n" )
> cat( " --- \n" )
>
> --
> View this message in context: http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-tp3671754p3673055.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ecdf() to nls() - how to transform data?

David Winsemius

On Jul 17, 2011, at 11:07 AM, David Winsemius wrote:

>
> On Jul 17, 2011, at 4:12 AM, Jochen1980 wrote:
>
>> Thanks David and Peter!
>>
>> I tried to improve my R-script to get closer to my goal.
>> I guess I have to use nls(), because later I want to work with
>> Levenberg-Marquardt-Algorithm and when I got it right, LM-Algorithm  
>> uses
>> least squares as well, fitdistr() instead uses Maximum Likelihoods.  
>> Anyway,
>> I am wondering, how to create the appropriate data.frame for nls()?  
>> In my
>> opinion all data must be there:
>> I got the absolute numbers in my dataset from the internet.
>> I got the histogram/density function right, as you can see in the  
>> graph
>> number 3 after running the following example.
>> I got the cumulative density right as well - so how to find x and y  
>> for the
>> data-frame in line 116?
>
> The trick I used (which has been criticised by my statistical  
> betters) was to recognize that the y values for the ecdf function  
> were simply (1:100)/100
>
> df <- dataframe(y=(1:100)/100, x=knots(ecdf)  # worked for me
> res <- nls( y ~ exp( - exp( - ( x - mue ) / beta ),  
> start=list(mue=2, beta=3), data=df)

The expression should  have been:

y ~ exp( - exp( - ( x - mue ) / beta ))

Using:

  gdta <- rgamma(100, 2)
  ecdfgr <- ecdf(gdta)
dfecdf <- data.frame(knots=knots(ecdfgr), Fn =(1:100)/100)

 > res <- nls( Fn ~ exp( - exp( - ( knots - mue ) / beta )),  
start=list(mue=2, beta=2), data=dfecdf)
 > res
Nonlinear regression model
   model:  Fn ~ exp(-exp(-(knots - mue)/beta))
    data:  dfecdf
   mue  beta
1.395 1.068
  residual sum-of-squares: 0.04979

>
> The concern statistically was that the nls function is minimizing  
> error around y which has no error.  Whether one can invert the  
> function and thereby avoid the concern about minimizing error around  
> a regular series ... I don't know.
>
> You could certainly try both, since the inverted function is so easy  
> to construct:
>
> res <- nls( x ~ beta(- log(- log(y))) - mue, start=list(mue=2,  
> beta=3), data=df)

I think I got the sign of  mue  wrong and I get an Inf error with the  
100th value for the dataframe but leaving it off gave me a comparable  
answer:

 > res <- nls( knots ~ - beta1*(- log(- log(Fn))) - mue,  
start=list(mue=2, beta1=2), data=dfecdf[1:99, ])

 > res <- nls( knots ~  beta1*(- log(- log(Fn))) + mue,  
start=list(mue=2, beta1=2), data=dfecdf[1:99, ])
 > res
Nonlinear regression model
   model:  knots ~ beta1 * (-log(-log(Fn))) + mue
    data:  dfecdf[1:99, ]
   mue beta1
1.455 1.004
  residual sum-of-squares: 3.041

Number of iterations to convergence: 1
Achieved convergence tolerance: 3.295e-09

>
>
> (None of this code was tested, although the first part is very  
> similar if not identical to code I tested yesterday. I was concerned  
> this was a homework problem and was less specific than I might have  
> been. I now see that you have at least made efforts, although I am  
> in a hurry to get to a sailing and have not gone through it.)
> --
> David.
>
>
>> Is it necessary/easier to get this example worked with absolute  
>> numbers?
>> The two easy tutorials for nls() worked out as well, so that can't  
>> be that
>> hard I guess, but I am sitting here for two days and got no idea  
>> what to do
>> now!? Thanks in advance for a little more input guys.
>>
>>
>> #################################################
>> ### R-Skript Example Fit                     ###
>> #################################################
>>
>> cat( "  
>> ************************************************************* \n" )
>> cat( "  Script curvefit - from histogram to function \n" )
>> cat( " \n" )
>>
>> # Preparation
>> setwd( "/home/joba/jobascripts/rproject/" )
>> rm( list=ls( all=TRUE ) )
>>
>> # Organise X11, x Rows, y Graphs
>> par( mfrow=c( 2, 2 ) )
>>
>> ############### tutorial 1 ############################
>>
>> # Build data.frame()
>> x <- seq( 0.0, 0.30, length=10 )   # seq( from, to, by )
>> y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20,  
>> 0.26 ) # vals
>> of target function
>> df <- data.frame( x=x, y=y )
>>
>> #print( df )
>> plot( df, main="Tutorial 1" )
>>
>> # Formula: y = x^a , look for a
>> res <- nls( y ~ I(x^exponent), data = df, start = list( exponent =  
>> 1 ),
>> trace = T )
>> print( round( summary(res)$coefficients[1], 3 ))
>> lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )
>>
>> # Formula: y = x^a + b, look for a and b
>> res2 <- nls( y ~ I( x^exponent + gamma ), data = df, start = list(
>> exponent=1, gamma=0 ), trace = TRUE )
>> print( round( summary(res2)$coefficients[1], 3 ))
>> lines( x, predict( res2, list( x = x ) ), lty = 1, col = "red" )
>>
>>
>> ############### tutorial 2 ############################
>>
>> len <- 24
>> x <- runif( len ) # 24 random values between 0 and 1
>> y <- x^3 + rnorm( len, 0, 0.06 ) # compute y-values and add some  
>> noise
>> ds <- data.frame( x = x, y = y ) # build data.frame
>> #str( ds )
>>
>> # Compute prediction
>> s <- seq( 0, 1, length = 100 ) # compute real-data, to get y-values
>> m <- nls( y ~ I(x^power), data = ds, start = list( power = 1 ),  
>> trace = T )
>> #print( class(m) )
>> print( summary(m) )
>>
>> # plot fitted curve and known curve
>> power <- round( summary(m)$coefficients[1], 3 ) # get power from  
>> summary
>> power.se <- round( summary(m)$coefficients[2], 3 ) # get failure  
>> from
>> summary
>> plot( y ~ x, main = "Tutorial 2", sub="Blue: fit; green: known" )
>> lines( s, s^3, lty = 2, col = "green" ) # known real data
>> lines( s, predict( m, list( x = s )), lty = 1, col = "blue" ) #  
>> predicted
>> vals
>> text( 0, 0.5, paste( "y =x^ (", power, " +/- ", power.se, ")",  
>> sep=""),
>> pos=4 )
>>
>> ############### real data #############################
>> # --- Create Points ---
>>
>> # Real data and density
>> hgd <- read.csv(
>> file="http://www.jochen-bauer.net/downloads/curve-fitting-tutorial/data01.txt 
>> ",
>> sep=",", head=FALSE)
>> #print( hgd$V1 )
>> print( summary( hgd$V1 ) )
>> denspoints <- density( hgd$V1 )
>>
>> ##### define variables for later use
>> msa <- "MSA-Name"
>> c1 <- "col1"
>> c2 <- "col2"
>> histtitle <- paste( msa, "Spalten:", c1, ",", c2 )
>>
>> #################################################
>> # --- Plot ---
>>
>> # Organise X11, x Rows, y Graphs
>> #par( mfrow=c( 1, 2 ) )
>>
>> # Histogram
>> histo <- hist( hgd$V1, freq=FALSE, main=histtitle, xlab="U-Value",
>> ylab="Density")
>>
>> # Density-Line
>> lines( denspoints, col="GREEN", lwd=1 )
>>
>> # Plot cumulatives
>> plot( ecdf( hgd$V1 ), col="GREEN", lwd=1, main="", xlab="U-Value",
>> ylab="Probability" ) # real data
>>
>> #################################################
>> # --- Fit ---
>>
>> # Matching Gumbel distribution as tip from expert
>>
>> # Gumbel-Dist-Function, cumulative
>> #               ( - ( x - mue ) / beta )
>> #          ( -e )^
>> # F(x) = e^
>> # => exp( - exp( - ( x - mue ) / beta ) )  # Thanks to Peter  
>> Dalgaard :-)
>>
>> # Starting guesses: mue = medianOfData, beta = sdevOfData
>> mueStart <- median( hgd$V1 )
>> betaStart <- sd( hgd$V1 )
>> print( paste( "mueStart: ", mueStart ))
>> print( paste( "betaStart: ", betaStart ))
>>
>> # Build data.frame()
>> # x = min of uvals to max of uvals
>> # y = probabilities, which add to 1, normalized histogramms could  
>> help ?
>>
>> ### XXXXXXXXXXX ######### XXXXXXXXXXXXX #############################
>> x <- seq( 0.0, 0.30, length=10 )   # seq( from, to, by )
>> y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20,  
>> 0.26 ) # vals
>> of target function
>> df <- data.frame( x=x, y=y )
>> print( df )
>>
>> # Formula: y = x^a , look for a
>> res <- nls( y ~ exp( - exp( - ( x - mue ) / beta )), data = df,  
>> start =
>> list( mue = mueStart, beta = betaStart ), trace = T )
>> print( round( summary(res)$coefficients[1], 3 ))
>> lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )
>>
>> cat( "\n Skript curvefit - Ende\n" )
>> cat( " --- \n" )
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-tp3671754p3673055.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ecdf() to nls() - how to transform data?

Jochen1980
Hi David,

my first attempt to work through your code was successful, my predicted line is pretty close to the ecdf-function. I have no idea why you inverted the gumbel-function and what the advantage of this strategy is?

I interpret your (1:100/100)-trick like this: you build a sequence of 100 Tics. In addition to this 100 is identical to the number of created points.

In my real world situation, ecdf() will produce ties, same values, and this results that I do not know what number knot() will return, then this will result in a non-symmetric-data.frame()-error.

R-Code:
############### Tutorial 3, Gamma-Distribution-Points to Gumbel ################
# --- create Points --- # http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-td3671754.html
print( "Tutorial 3" )
p <- rgamma( 100, 2)
print( p )
ep <- ecdf( p )
x <- knots(ep)
hist( p, main="Tutorial 3" )
ypre <- seq( from=1, to=100, by=1 )
y <- ypre / 100
df <- data.frame( x=x, y=y )
plot( df )
res <- nls( y ~ exp( - exp( - ( x - mue ) / beta )), start=list(mue=2, beta=2), data=df, trace=TRUE )  
print( summary(res) )
#print( str(res))
lines( x, predict( res, list( x=x )), lty = 1, col = "blue" ) # predicted vals

Console-Output:
[1] "Tutorial 3"
  [1] 4.75748951 5.36642043 2.03025702 1.80216440 1.40745277 0.91393251
  [7] 1.30575505 2.07451301 1.18500815 0.24972503 1.36604865 2.36786796
 [13] 1.37386798 1.72390843 1.93700139 0.78722468 2.92828385 1.51165415
 [19] 4.17000960 2.98356424 1.20195996 1.85899533 0.94863757 1.50438053
 [25] 0.50854274 3.32760075 2.21334316 0.58463817 2.26985676 2.88033389
 [31] 2.00718903 1.19043470 1.62945752 0.32010478 0.54837755 1.32965131
 [37] 3.32024573 0.53825226 3.30077557 0.46426513 1.00681720 0.59276030
 [43] 1.31431609 2.27822419 2.56404713 0.52459218 1.05228996 2.23799673
 [49] 1.14438962 2.41311612 1.40254244 1.20379073 0.44195457 0.95880408
 [55] 1.45254027 4.49645001 2.18826796 1.07161515 1.62617544 3.15506003
 [61] 2.15611491 2.43261017 1.40293461 2.79977886 3.44503201 3.25282551
 [67] 0.91570531 0.70243512 5.67971774 3.55532192 1.71515046 2.97718949
 [73] 0.47145131 0.32879089 0.60735231 0.92388638 4.29277857 1.73681839
 [79] 0.09387383 2.81281295 1.84419058 2.63070353 1.52298124 2.89761263
 [85] 1.05251987 1.24258703 3.09130229 2.66738574 3.17035060 2.15287327
 [91] 2.63042751 1.69736779 3.21126033 1.56920999 2.68985477 0.82952068
 [97] 3.62230865 1.65286592 2.76798783 2.08091935

Formula: y ~ exp(-exp(-(x - mue)/beta))

Parameters:
     Estimate Std. Error t value Pr(>|t|)    
mue  1.382744   0.005926   233.3   <2e-16 ***
beta 0.984836   0.008816   111.7   <2e-16 ***


Reply | Threaded
Open this post in threaded view
|

Re: ecdf() to nls() - how to transform data?

David Winsemius

On Jul 17, 2011, at 8:13 PM, Jochen1980 wrote:

> Hi David,
>
> my first attempt to work through your code was successful, my  
> predicted line
> is pretty close to the ecdf-function. I have no idea why you  
> inverted the
> gumbel-function and what the advantage of this strategy is?

I was advised by a person who knows way more statistics than I do that  
the strategy of minimizing the error around a regular step function  
was statistically suspect (presumably even if the x value were a  
random process). So I inverted the solution and put the random values  
on the y-axis, leaving the stair-step values on the x-axis.. I think  
claiming an "advantage" might be premature (especially since both  
methods yield very similar values) and would probably require some  
testing before acceptance. There is of course the fitdistr function in  
MASS and there is a nice vignette on "Fitting Distribution in R" some  
place around (yep, first google hit):

http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf


>
> I interpret your (1:100/100)-trick like this: you build a sequence  
> of 100
> Tics. In addition to this 100 is identical to the number of created  
> points.

The ecdf function object (at least as I understood from its plotted  
result) sets up its y-values (in the function's environment) to be a  
regular sequence determined by the number of original points and  
allows you to recover the x-values with the knots function.

--
David

>
> In my real world situation, ecdf() will produce ties, same values,  
> and this
> results that I do not know what number knot() will return, then this  
> will
> result in a non-symmetric-data.frame()-error.
>
> R-Code:
> ############### Tutorial 3, Gamma-Distribution-Points to Gumbel
> ################
> # --- create Points --- #
> http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-td3671754.html
> print( "Tutorial 3" )
> p <- rgamma( 100, 2)
> print( p )
> ep <- ecdf( p )
> x <- knots(ep)
> hist( p, main="Tutorial 3" )
> ypre <- seq( from=1, to=100, by=1 )
> y <- ypre / 100
> df <- data.frame( x=x, y=y )
> plot( df )
> res <- nls( y ~ exp( - exp( - ( x - mue ) / beta )), start=list(mue=2,
> beta=2), data=df, trace=TRUE )
> print( summary(res) )
> #print( str(res))
> lines( x, predict( res, list( x=x )), lty = 1, col = "blue" ) #  
> predicted
> vals
>
> Console-Output:
> [1] "Tutorial 3"
>  [1] 4.75748951 5.36642043 2.03025702 1.80216440 1.40745277 0.91393251
>  [7] 1.30575505 2.07451301 1.18500815 0.24972503 1.36604865 2.36786796
> [13] 1.37386798 1.72390843 1.93700139 0.78722468 2.92828385 1.51165415
> [19] 4.17000960 2.98356424 1.20195996 1.85899533 0.94863757 1.50438053
> [25] 0.50854274 3.32760075 2.21334316 0.58463817 2.26985676 2.88033389
> [31] 2.00718903 1.19043470 1.62945752 0.32010478 0.54837755 1.32965131
> [37] 3.32024573 0.53825226 3.30077557 0.46426513 1.00681720 0.59276030
> [43] 1.31431609 2.27822419 2.56404713 0.52459218 1.05228996 2.23799673
> [49] 1.14438962 2.41311612 1.40254244 1.20379073 0.44195457 0.95880408
> [55] 1.45254027 4.49645001 2.18826796 1.07161515 1.62617544 3.15506003
> [61] 2.15611491 2.43261017 1.40293461 2.79977886 3.44503201 3.25282551
> [67] 0.91570531 0.70243512 5.67971774 3.55532192 1.71515046 2.97718949
> [73] 0.47145131 0.32879089 0.60735231 0.92388638 4.29277857 1.73681839
> [79] 0.09387383 2.81281295 1.84419058 2.63070353 1.52298124 2.89761263
> [85] 1.05251987 1.24258703 3.09130229 2.66738574 3.17035060 2.15287327
> [91] 2.63042751 1.69736779 3.21126033 1.56920999 2.68985477 0.82952068
> [97] 3.62230865 1.65286592 2.76798783 2.08091935
>
> Formula: y ~ exp(-exp(-(x - mue)/beta))
>
> Parameters:
>     Estimate Std. Error t value Pr(>|t|)
> mue  1.382744   0.005926   233.3   <2e-16 ***
> beta 0.984836   0.008816   111.7   <2e-16 ***
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-tp3671754p3674227.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ecdf() to nls() - how to transform data?

Jochen1980
Ok, I think I had a good idea to solve my problem and need someone who second me on that or tell me what a fool I am :-) .

My problem: I went for curve-fitting with nls() and took knot()-ecdf() for collecting data for nls-basis-dataframe. I came into trouble, because my x-y-data of the data.frame() was not "equal". I computed 1000 values and there were always some ties, those ties were not returned by knot()-function.

Okay, I thought about it and now I am using 100 percentile-values for the data-frame - nls() works fine - moreover 100 percentiles are always there, exceptions won't happen. Is it wise to use 100 cumulative percentile for curve-fitting-base-data instead of ecdf()-function data?

Thanks in advance.