smooth.spline error while fitting bacterial growth curves with grofit

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

smooth.spline error while fitting bacterial growth curves with grofit

Jeffrey David Johnson
I'm trying to use the grofit package to compare growth rates between
bacterial cultures, but I've come across a couple glitches/things I
don't understand. I'm not sure if they're related to the package or to a
problem with my growth data, which is messy. Some strains don't follow
a proper logarithmic growth curve because they died or didn't grow over
the course of the experiment. I could remove those but it will get more
time consuming once I have more cultures going.

I've attached the 'time' matrix and 'data' data frame. This code should
fit the growth curves, but when I run it I get an error related to
`smooth.spline`:

require(grofit)
mytime <- as.matrix(read.table('time.txt'))
mydata <- read.csv('data.csv')
dimnames(mytime) <- NULL
fits <- gcFit(mytime, mydata, grofit.control(
  interactive=FALSE, # don't ask if the graphs look OK
  nboot.gc=1000,     # number of bootstraps
  fit.opt="s"        # just do splines, no models
))

= 1. growth curve =================================
----------------------------------------------------
= 2. growth curve =================================
----------------------------------------------------
= 3. growth curve =================================
----------------------------------------------------
Error in smooth.spline(time, data, spar = control$smooth.gc) :
  'tol' must be strictly positive and finite
Error in gcFitSpline(time.cur, data.cur, gcID, control.change) :
  object 'y.spl' not found

That error usually occurs at some point, though I've run through all 17
successfully a couple times. The documentation says:

> smooth.gc: Parameter describing the smoothness of the spline fit;
> usually (not necessary) in (0;1]. Set ‘smooth.gc=NULL’ causes the
> program to query an optimal value via cross validation techniques.
> Note: This is partly experimental. In future improved implementations
> of the ‘smooth.spline’ function may lead to different results. See
> documentation of the R function ‘smooth.spline’ for further details.
> Especially for datasets with few data points the option ‘NULL’ might
> result in a too small smoothing parameter, which produces an error in
> ‘smooth.spline’. In that case the usage of a fixed value is
> recommended. Default: ‘NULL’.

I tried setting different values (0.1, 0.5, 0.9, 1, 10) and they all
cause the same error. If instead I use the `gcBootSpline` function
directly, it gives a different error about the number of bootstraps
being 0, when they clearly aren't:

fits <- gcBootSpline(mytime, mydata, grofit.control(nboot.gc=1000))

Error in gcBootSpline(mytime, mydata, grofit.control(nboot.gc =
1000)) : Number of bootstrap samples is zero! See grofit.control()

Am I using these right? Is there something about the data that would
make it un-fittable?
Jeff
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: smooth.spline error while fitting bacterial growth curves with grofit

Bert Gunter
1. Very likely, you have insufficient data in some of your growth
curves to do the fits using gcv. If  you remove the curves where the
bacteria didn't grow, things should work. Alternatively, there may
well be ways of expressing the model that would allow pooling across
cultures that didn't grow. (Sounds like a mixtures problem, actually:
you are mixing cultures that grow  with those that don't and need to
determine the mixing proportion and the growth parameters of those
that grew).

2. HOWEVER, IF you remove the curves, you may very well be getting the
wrong (biased) results -- i.e. your results will be irreproducible
garbage, as you will only be taking data from cultures that grew well.
I would **strongly** suggest you work with a local statistical expert
to help you deal with these issues. I do not think you should trust
remote advice from the internet on such complex data (including mine!)

Cheers,
Bert


Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sun, May 17, 2015 at 10:42 AM, Jeffrey David Johnson
<[hidden email]> wrote:

> I'm trying to use the grofit package to compare growth rates between
> bacterial cultures, but I've come across a couple glitches/things I
> don't understand. I'm not sure if they're related to the package or to a
> problem with my growth data, which is messy. Some strains don't follow
> a proper logarithmic growth curve because they died or didn't grow over
> the course of the experiment. I could remove those but it will get more
> time consuming once I have more cultures going.
>
> I've attached the 'time' matrix and 'data' data frame. This code should
> fit the growth curves, but when I run it I get an error related to
> `smooth.spline`:
>
> require(grofit)
> mytime <- as.matrix(read.table('time.txt'))
> mydata <- read.csv('data.csv')
> dimnames(mytime) <- NULL
> fits <- gcFit(mytime, mydata, grofit.control(
>   interactive=FALSE, # don't ask if the graphs look OK
>   nboot.gc=1000,     # number of bootstraps
>   fit.opt="s"        # just do splines, no models
> ))
>
> = 1. growth curve =================================
> ----------------------------------------------------
> = 2. growth curve =================================
> ----------------------------------------------------
> = 3. growth curve =================================
> ----------------------------------------------------
> Error in smooth.spline(time, data, spar = control$smooth.gc) :
>   'tol' must be strictly positive and finite
> Error in gcFitSpline(time.cur, data.cur, gcID, control.change) :
>   object 'y.spl' not found
>
> That error usually occurs at some point, though I've run through all 17
> successfully a couple times. The documentation says:
>
>> smooth.gc: Parameter describing the smoothness of the spline fit;
>> usually (not necessary) in (0;1]. Set ‘smooth.gc=NULL’ causes the
>> program to query an optimal value via cross validation techniques.
>> Note: This is partly experimental. In future improved implementations
>> of the ‘smooth.spline’ function may lead to different results. See
>> documentation of the R function ‘smooth.spline’ for further details.
>> Especially for datasets with few data points the option ‘NULL’ might
>> result in a too small smoothing parameter, which produces an error in
>> ‘smooth.spline’. In that case the usage of a fixed value is
>> recommended. Default: ‘NULL’.
>
> I tried setting different values (0.1, 0.5, 0.9, 1, 10) and they all
> cause the same error. If instead I use the `gcBootSpline` function
> directly, it gives a different error about the number of bootstraps
> being 0, when they clearly aren't:
>
> fits <- gcBootSpline(mytime, mydata, grofit.control(nboot.gc=1000))
>
> Error in gcBootSpline(mytime, mydata, grofit.control(nboot.gc =
> 1000)) : Number of bootstrap samples is zero! See grofit.control()
>
> Am I using these right? Is there something about the data that would
> make it un-fittable?
> Jeff
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: smooth.spline error while fitting bacterial growth curves with grofit

Jeffrey David Johnson
Thanks, I think you're right. I removed the strains whose final OD was
below 0.2 since all the ones that clearly grew are above that, and
grofit produces fewer errors on the remaining 6. The error still happens
occasionally, but if I stick to 1000 bootstraps instead of 10000 it's
not often. Of course I won't rely on these numbers! I'll try again once
my current timecourse is done with 6 replicates per strain, and if
everything is still messy rethink the experimental design.

... Which brings up another question. Would it be better to estimate
growth parameters (mu, lambda, etc.) for each replicate and then take
the mean and standard deviation of those, or to average the growth data
first and calculate one set of parameters per strain? (Sorry if that's
very basic statistics)
Jeff

On Sun, 17 May 2015 11:42:27 -0700
Bert Gunter <[hidden email]> wrote:

> 1. Very likely, you have insufficient data in some of your growth
> curves to do the fits using gcv. If  you remove the curves where the
> bacteria didn't grow, things should work. Alternatively, there may
> well be ways of expressing the model that would allow pooling across
> cultures that didn't grow. (Sounds like a mixtures problem, actually:
> you are mixing cultures that grow  with those that don't and need to
> determine the mixing proportion and the growth parameters of those
> that grew).
>
> 2. HOWEVER, IF you remove the curves, you may very well be getting the
> wrong (biased) results -- i.e. your results will be irreproducible
> garbage, as you will only be taking data from cultures that grew well.
> I would **strongly** suggest you work with a local statistical expert
> to help you deal with these issues. I do not think you should trust
> remote advice from the internet on such complex data (including mine!)
>
> Cheers,
> Bert
>
>
> Cheers,
> Bert
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> Clifford Stoll
>
>
>
>
> On Sun, May 17, 2015 at 10:42 AM, Jeffrey David Johnson
> <[hidden email]> wrote:
> > I'm trying to use the grofit package to compare growth rates between
> > bacterial cultures, but I've come across a couple glitches/things I
> > don't understand. I'm not sure if they're related to the package or to a
> > problem with my growth data, which is messy. Some strains don't follow
> > a proper logarithmic growth curve because they died or didn't grow over
> > the course of the experiment. I could remove those but it will get more
> > time consuming once I have more cultures going.
> >
> > I've attached the 'time' matrix and 'data' data frame. This code should
> > fit the growth curves, but when I run it I get an error related to
> > `smooth.spline`:
> >
> > require(grofit)
> > mytime <- as.matrix(read.table('time.txt'))
> > mydata <- read.csv('data.csv')
> > dimnames(mytime) <- NULL
> > fits <- gcFit(mytime, mydata, grofit.control(
> >   interactive=FALSE, # don't ask if the graphs look OK
> >   nboot.gc=1000,     # number of bootstraps
> >   fit.opt="s"        # just do splines, no models
> > ))
> >
> > = 1. growth curve =================================
> > ----------------------------------------------------
> > = 2. growth curve =================================
> > ----------------------------------------------------
> > = 3. growth curve =================================
> > ----------------------------------------------------
> > Error in smooth.spline(time, data, spar = control$smooth.gc) :
> >   'tol' must be strictly positive and finite
> > Error in gcFitSpline(time.cur, data.cur, gcID, control.change) :
> >   object 'y.spl' not found
> >
> > That error usually occurs at some point, though I've run through all 17
> > successfully a couple times. The documentation says:
> >
> >> smooth.gc: Parameter describing the smoothness of the spline fit;
> >> usually (not necessary) in (0;1]. Set ‘smooth.gc=NULL’ causes the
> >> program to query an optimal value via cross validation techniques.
> >> Note: This is partly experimental. In future improved implementations
> >> of the ‘smooth.spline’ function may lead to different results. See
> >> documentation of the R function ‘smooth.spline’ for further details.
> >> Especially for datasets with few data points the option ‘NULL’ might
> >> result in a too small smoothing parameter, which produces an error in
> >> ‘smooth.spline’. In that case the usage of a fixed value is
> >> recommended. Default: ‘NULL’.
> >
> > I tried setting different values (0.1, 0.5, 0.9, 1, 10) and they all
> > cause the same error. If instead I use the `gcBootSpline` function
> > directly, it gives a different error about the number of bootstraps
> > being 0, when they clearly aren't:
> >
> > fits <- gcBootSpline(mytime, mydata, grofit.control(nboot.gc=1000))
> >
> > Error in gcBootSpline(mytime, mydata, grofit.control(nboot.gc =
> > 1000)) : Number of bootstrap samples is zero! See grofit.control()
> >
> > Am I using these right? Is there something about the data that would
> > make it un-fittable?
> > Jeff
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: smooth.spline error while fitting bacterial growth curves with grofit

Bert Gunter
Your question is OFFTOPIC for this list. Post on a statistics list
like stats.stackexchange.com .

But both your proposals are wrong, though depending on your data and
purpose, they may be adequate. I suggest you consult wit a local
statistician on the use of mixed effects models for repeated
measures/growth curves or post it on the same topics.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sun, May 17, 2015 at 7:08 PM, Jeffrey David Johnson
<[hidden email]> wrote:

> Thanks, I think you're right. I removed the strains whose final OD was
> below 0.2 since all the ones that clearly grew are above that, and
> grofit produces fewer errors on the remaining 6. The error still happens
> occasionally, but if I stick to 1000 bootstraps instead of 10000 it's
> not often. Of course I won't rely on these numbers! I'll try again once
> my current timecourse is done with 6 replicates per strain, and if
> everything is still messy rethink the experimental design.
>
> ... Which brings up another question. Would it be better to estimate
> growth parameters (mu, lambda, etc.) for each replicate and then take
> the mean and standard deviation of those, or to average the growth data
> first and calculate one set of parameters per strain? (Sorry if that's
> very basic statistics)
> Jeff
>
> On Sun, 17 May 2015 11:42:27 -0700
> Bert Gunter <[hidden email]> wrote:
>
>> 1. Very likely, you have insufficient data in some of your growth
>> curves to do the fits using gcv. If  you remove the curves where the
>> bacteria didn't grow, things should work. Alternatively, there may
>> well be ways of expressing the model that would allow pooling across
>> cultures that didn't grow. (Sounds like a mixtures problem, actually:
>> you are mixing cultures that grow  with those that don't and need to
>> determine the mixing proportion and the growth parameters of those
>> that grew).
>>
>> 2. HOWEVER, IF you remove the curves, you may very well be getting the
>> wrong (biased) results -- i.e. your results will be irreproducible
>> garbage, as you will only be taking data from cultures that grew well.
>> I would **strongly** suggest you work with a local statistical expert
>> to help you deal with these issues. I do not think you should trust
>> remote advice from the internet on such complex data (including mine!)
>>
>> Cheers,
>> Bert
>>
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>> (650) 467-7374
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>> Clifford Stoll
>>
>>
>>
>>
>> On Sun, May 17, 2015 at 10:42 AM, Jeffrey David Johnson
>> <[hidden email]> wrote:
>> > I'm trying to use the grofit package to compare growth rates between
>> > bacterial cultures, but I've come across a couple glitches/things I
>> > don't understand. I'm not sure if they're related to the package or to a
>> > problem with my growth data, which is messy. Some strains don't follow
>> > a proper logarithmic growth curve because they died or didn't grow over
>> > the course of the experiment. I could remove those but it will get more
>> > time consuming once I have more cultures going.
>> >
>> > I've attached the 'time' matrix and 'data' data frame. This code should
>> > fit the growth curves, but when I run it I get an error related to
>> > `smooth.spline`:
>> >
>> > require(grofit)
>> > mytime <- as.matrix(read.table('time.txt'))
>> > mydata <- read.csv('data.csv')
>> > dimnames(mytime) <- NULL
>> > fits <- gcFit(mytime, mydata, grofit.control(
>> >   interactive=FALSE, # don't ask if the graphs look OK
>> >   nboot.gc=1000,     # number of bootstraps
>> >   fit.opt="s"        # just do splines, no models
>> > ))
>> >
>> > = 1. growth curve =================================
>> > ----------------------------------------------------
>> > = 2. growth curve =================================
>> > ----------------------------------------------------
>> > = 3. growth curve =================================
>> > ----------------------------------------------------
>> > Error in smooth.spline(time, data, spar = control$smooth.gc) :
>> >   'tol' must be strictly positive and finite
>> > Error in gcFitSpline(time.cur, data.cur, gcID, control.change) :
>> >   object 'y.spl' not found
>> >
>> > That error usually occurs at some point, though I've run through all 17
>> > successfully a couple times. The documentation says:
>> >
>> >> smooth.gc: Parameter describing the smoothness of the spline fit;
>> >> usually (not necessary) in (0;1]. Set ‘smooth.gc=NULL’ causes the
>> >> program to query an optimal value via cross validation techniques.
>> >> Note: This is partly experimental. In future improved implementations
>> >> of the ‘smooth.spline’ function may lead to different results. See
>> >> documentation of the R function ‘smooth.spline’ for further details.
>> >> Especially for datasets with few data points the option ‘NULL’ might
>> >> result in a too small smoothing parameter, which produces an error in
>> >> ‘smooth.spline’. In that case the usage of a fixed value is
>> >> recommended. Default: ‘NULL’.
>> >
>> > I tried setting different values (0.1, 0.5, 0.9, 1, 10) and they all
>> > cause the same error. If instead I use the `gcBootSpline` function
>> > directly, it gives a different error about the number of bootstraps
>> > being 0, when they clearly aren't:
>> >
>> > fits <- gcBootSpline(mytime, mydata, grofit.control(nboot.gc=1000))
>> >
>> > Error in gcBootSpline(mytime, mydata, grofit.control(nboot.gc =
>> > 1000)) : Number of bootstrap samples is zero! See grofit.control()
>> >
>> > Am I using these right? Is there something about the data that would
>> > make it un-fittable?
>> > Jeff
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.