# problem with nlsLM.....

8 messages
Open this post in threaded view
|

## problem with nlsLM.....

 dear members,                              Yesterday, Duncan identified a silly mistake in an nls call...in the list of starting values, one of the names of the variable was getting repeated. But I am experiencing the same problem again, with a different formula: > formulaDH3 HM1 ~ (a + (b * ((HM2 + 0.3)^(1/3)))) * (c * log(HM3 + 27)) here HM1 is the response variable, and HM2 and HM3 are predictors..... > nonlin_modDH3 <- nls(formulaDH3, start = list(a = 0.43143, b = 0.68173,c = 0.02954)) Error in nlsModel(formula, mf, start, wts) :   singular gradient matrix at initial parameter estimates > nonlin_modDH3 <- nlsLM(formulaDH3, start = list(a = 0.43143, b = 0.68173,c = 0.02954)) Error in nlsModel(formula, mf, start, wts) :   singular gradient matrix at initial parameter estimates I am using nlsLM function from the minpack.lm package which says that it will converge when nls fails with singular gradient matrix error... Is there again a silly mistake(pardon me again if there is one!), or is the problem serious? If it is serious, any pointers towards a solution? very many thanks for your time and effort.... yours sincerely, AKSHAY M KULKARNI         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Fw: problem with nlsLM.....

Open this post in threaded view
|

## Re: problem with nlsLM.....

 In reply to this post by akshay kulkarni On Wed, 20 Mar 2019 08:02:45 +0000 akshay kulkarni <[hidden email]> wrote: > formulaDH5 <- as.formula(HM1 ~ (a + (b * ((HM2 + 0.3)^(1/2)))) + > (A*sin(w*HM3 + c) + C)) The problem with this formula is simple: the partial derivative with respect to a is the same as the partial derivative with respect to C. This makes the regression problem have an infinite number of solutions, all of them satisfying equation \lambda_1 * a + \lambda_2 * C + \lambda_3 = 0 for some values of \lambda_i. Gradient-based optimizers (which both nls and nlsLM are) don't like problems with non-unique solutions, especially when the model function has same partial derivative with respect to different variables, making them indistinguishable. Solution: remove one of the variables. > > formulaDH3   > HM1 ~ (a + (b * ((HM2 + 0.3)^(1/3)))) * (c * log(HM3 + 27)) The problem with this formula is similar, albeit slightly different. Suppose that (a, b, c) is a solution. Then (\lambda * a, \lambda * b, c / \lambda) is also a solution for any real \lambda. Once again, removing c should get rid of ambiguity. > I've checked the Internet  for a method of getting the starting > values, but they are not comprehensive....any resources for how to > find the starting values? Starting values depend on the particular function you are trying to fit. The usual approach seems to be in transforming the formula and getting rid of parts you can safely assume to be small until it looks like linear regression, or applying domain specific knowledge (e.g. when trying to it a peak function, look for the biggest local maximum in the dataset). If you cannot do that, there also are global optimization algorithms (see nloptr), though they still perform better on some problems and worse on others. It certainly helps to have upper and lower bounds on all parameter values. I've heard about a scientific group creating a pool of many initial Levenberg-Marquardt parameter estimates, then improving them using a genetic algorithm. The whole thing "converged overnight" on a powerful desktop computer. -- Best regards, Ivan ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: problem with nlsLM.....

 dear Ivan,                    THanks for the reply...... But doesn't removing some of the parameters reduce the precision of the relationship between the response variable and the predictors(inefficient estimates of the coefficients)? very many thanks for your time and effort.... yours sincerely, AKSHAY M KULKARNI ________________________________ From: Ivan Krylov <[hidden email]> Sent: Wednesday, March 20, 2019 3:06 PM To: akshay kulkarni Cc: R help Mailing list Subject: Re: [R] problem with nlsLM..... On Wed, 20 Mar 2019 08:02:45 +0000 akshay kulkarni <[hidden email]> wrote: > formulaDH5 <- as.formula(HM1 ~ (a + (b * ((HM2 + 0.3)^(1/2)))) + > (A*sin(w*HM3 + c) + C)) The problem with this formula is simple: the partial derivative with respect to a is the same as the partial derivative with respect to C. This makes the regression problem have an infinite number of solutions, all of them satisfying equation \lambda_1 * a + \lambda_2 * C + \lambda_3 = 0 for some values of \lambda_i. Gradient-based optimizers (which both nls and nlsLM are) don't like problems with non-unique solutions, especially when the model function has same partial derivative with respect to different variables, making them indistinguishable. Solution: remove one of the variables. > > formulaDH3 > HM1 ~ (a + (b * ((HM2 + 0.3)^(1/3)))) * (c * log(HM3 + 27)) The problem with this formula is similar, albeit slightly different. Suppose that (a, b, c) is a solution. Then (\lambda * a, \lambda * b, c / \lambda) is also a solution for any real \lambda. Once again, removing c should get rid of ambiguity. > I've checked the Internet  for a method of getting the starting > values, but they are not comprehensive....any resources for how to > find the starting values? Starting values depend on the particular function you are trying to fit. The usual approach seems to be in transforming the formula and getting rid of parts you can safely assume to be small until it looks like linear regression, or applying domain specific knowledge (e.g. when trying to it a peak function, look for the biggest local maximum in the dataset). If you cannot do that, there also are global optimization algorithms (see nloptr), though they still perform better on some problems and worse on others. It certainly helps to have upper and lower bounds on all parameter values. I've heard about a scientific group creating a pool of many initial Levenberg-Marquardt parameter estimates, then improving them using a genetic algorithm. The whole thing "converged overnight" on a powerful desktop computer. -- Best regards, Ivan         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: problem with nlsLM.....

 On Wed, 20 Mar 2019 09:43:11 +0000 akshay kulkarni <[hidden email]> wrote: > But doesn't removing some of the parameters reduce the precision of > the relationship between the response variable and the > predictors(inefficient estimates of the coefficients)? No, it doesn't, since there is already more variables in the formula than it has relationships between response and predictors. Let me offer you an example. Suppose you have a function y(x) = a*b*x + c. Let's try to simulate some data and then fit it: # choose according to your taste a <- ... b <- ... c <- ... # simulate model data abc <- data.frame(x = runif(100)) abc$y <- a*b*abc$x + c # add some normally distributed noise abc$y <- abc$y + rnorm(100, 0, 0.01) Now try to fit formula y ~ a*b*x + c using data in data frame abc. Do you get any results? Do they match the values you have originally set?[*] Then try a formula with the ambiguity removed: y ~ d*x + c. Do you get a result? Does the obtained d match a*b you had originally set? Note that for the d you obtained you can get an infinite amount of (a,b) tuples equally satisfying the equation d = a*b and the original regression problem, unless you constrain a or b. -- Best regards, Ivan [*] Using R, I couldn't, but the nonlinear solver in gnuplot is sometimes able to give *a* result for such a degenerate problem when data is sufficiently noisy. Of course, such a result usually doesn't match the originally set variable values and should not be trusted. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: problem with nlsLM.....

 dear Ivan,                    Thank you very much...You have been very helpful.... very many thanks for your time and effort.... yours sincerely, AKSHAY M KULKARNI ________________________________ From: Ivan Krylov <[hidden email]> Sent: Wednesday, March 20, 2019 4:08 PM To: akshay kulkarni Cc: R help Mailing list Subject: Re: [R] problem with nlsLM..... On Wed, 20 Mar 2019 09:43:11 +0000 akshay kulkarni <[hidden email]> wrote: > But doesn't removing some of the parameters reduce the precision of > the relationship between the response variable and the > predictors(inefficient estimates of the coefficients)? No, it doesn't, since there is already more variables in the formula than it has relationships between response and predictors. Let me offer you an example. Suppose you have a function y(x) = a*b*x + c. Let's try to simulate some data and then fit it: # choose according to your taste a <- ... b <- ... c <- ... # simulate model data abc <- data.frame(x = runif(100)) abc$y <- a*b*abc$x + c # add some normally distributed noise abc$y <- abc$y + rnorm(100, 0, 0.01) Now try to fit formula y ~ a*b*x + c using data in data frame abc. Do you get any results? Do they match the values you have originally set?[*] Then try a formula with the ambiguity removed: y ~ d*x + c. Do you get a result? Does the obtained d match a*b you had originally set? Note that for the d you obtained you can get an infinite amount of (a,b) tuples equally satisfying the equation d = a*b and the original regression problem, unless you constrain a or b. -- Best regards, Ivan [*] Using R, I couldn't, but the nonlinear solver in gnuplot is sometimes able to give *a* result for such a degenerate problem when data is sufficiently noisy. Of course, such a result usually doesn't match the originally set variable values and should not be trusted.         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.