predicted values

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

predicted values

Felipe Carrillo
Consider this dummy dataset.
My real dataset with over 1000 records has
scatter large and small values.
I want to predict for values with NA but I
get negative predictions. Is this a normal
behaviour or I am missing a gam argument
to force the model to predict positive values.
library(mgcv)
test <- data.frame(iddate=seq(as.Date("2014-01-01"),
        as.Date("2014-01-12"), by="days"),
        value=c(300,29,22,NA,128,24,15,1,3,30,NA,2))
test
str(test)
mod <- gam(value ~ s(as.numeric(iddate)),data=test)
# Predict for values with NA's
test$pred <- with(test,ifelse(is.na(value),predict(mod,test),value))
test
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: predicted values

Joshua Wiley-2
Dear Felipe,

That is a normal behavior --- The prediction for that simple model
decreases over time, and ends up negative.  If the outcome cannot take
on negative values, treating it as a continuous gaussian may not be
optimal --- perhaps some transformation, like using a log link so that
the expoentiated values are always positive would be better?
Alternately, if the predictions are going negative, not because the
data is over all, but say there is a quick decrease in values in the
first part of time but later on it slows, but if you have an overly
simplisitic time model, it may just keep decreasing.  Using a smoother
with a higher basis dimensions may help more accurately model the
function over the span of time in your dataset and then not have
predicted values.

I do not think that there would be any straight forward 'force' the
model to be positive only.

Best,

Joshua


On Sat, Feb 1, 2014 at 5:05 PM, Felipe Carrillo
<[hidden email]> wrote:

> Consider this dummy dataset.
> My real dataset with over 1000 records has
> scatter large and small values.
> I want to predict for values with NA but I
> get negative predictions. Is this a normal
> behaviour or I am missing a gam argument
> to force the model to predict positive values.
> library(mgcv)
> test <- data.frame(iddate=seq(as.Date("2014-01-01"),
>         as.Date("2014-01-12"), by="days"),
>         value=c(300,29,22,NA,128,24,15,1,3,30,NA,2))
> test
> str(test)
> mod <- gam(value ~ s(as.numeric(iddate)),data=test)
> # Predict for values with NA's
> test$pred <- with(test,ifelse(is.na(value),predict(mod,test),value))
> test
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: predicted values

Felipe Carrillo
Hi Joshua,
Thanks for the suggestion, I will check into log link. I just basically want to fill in
missing values for days where data is not available. Negative values definetely won't work
for the kind of data that I am collecting.  





On Saturday, February 1, 2014 7:51 PM, Joshua Wiley <[hidden email]> wrote:
 
Dear Felipe,

>
>That is a normal behavior --- The prediction for that simple model
>decreases over time, and ends up negative.  If the outcome cannot take
>on negative values, treating it as a continuous gaussian may not be
>optimal --- perhaps some transformation, like using a log link so that
>the expoentiated values are always positive would be better?
>Alternately, if the predictions are going negative, not because the
>data is over all, but say there is a quick decrease in values in the
>first part of time but later on it slows, but if you have an overly
>simplisitic time model, it may just keep
 decreasing.  Using a smoother

>with a higher basis dimensions may help more accurately model the
>function over the span of time in your dataset and then not have
>predicted values.
>
>I do not think that there would be any straight forward 'force' the
>model to be positive only.
>
>Best,
>
>Joshua
>
>
>
>On Sat, Feb 1, 2014 at 5:05 PM, Felipe Carrillo
><[hidden email]> wrote:
>> Consider this dummy dataset.
>> My real dataset with over 1000 records has
>> scatter large and
 small values.
>> I want to predict for
 values with NA but I

>> get negative predictions. Is this a normal
>> behaviour or I am missing a gam argument
>> to force the model to predict positive values.
>> library(mgcv)
>> test <- data.frame(iddate=seq(as.Date("2014-01-01"),
>>         as.Date("2014-01-12"), by="days"),
>>         value=c(300,29,22,NA,128,24,15,1,3,30,NA,2))
>> test
>> str(test)
>> mod <- gam(value ~ s(as.numeric(iddate)),data=test)
>> # Predict for values with NA's
>> test$pred <- with(test,ifelse(is.na(value),predict(mod,test),value))
>> test
>>         [[alternative HTML version deleted]]
>>
>>
>>
 ______________________________________________

>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>--
>Joshua Wiley
>Ph.D. Student, Health Psychology
>University of California, Los Angeles
>http://joshuawiley.com/
>Senior Analyst - Elkhart Group Ltd.
>http://elkhartgroup.com/
>
>
>
>    
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: predicted values

Bert Gunter
... but do note that doing what you describe (using predicted values
for missings) can mess up inference: it obviously results in
underestimating error variability. If you're not doing inference, then
probably no harm, no foul. If you are, then here's to
irreproducibility! If you want to handle missings and still get
meaningful inference (an oxymoron?), then find someone expert in such
matters to consult. R has several packages devoted to this (but I'm
not the person to advise about them).

Also note that often scientists treat censoring as missing. That's
another booboo. And my humble apology if this is not you.

Finally note that graphics often handles missings sensibly, gracefully
ignoring them. So if graphs are what you seek, maybe you don't need to
worry about it.

And, it should go without saying that given my complete ignorance of
what you're up to, all the above should be taken with the appropriate
dose of salt.

Cheers,
Bert





Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Mon, Feb 3, 2014 at 2:23 PM, Felipe Carrillo
<[hidden email]> wrote:

> Hi Joshua,
> Thanks for the suggestion, I will check into log link. I just basically want to fill in
> missing values for days where data is not available. Negative values definetely won't work
> for the kind of data that I am collecting.
>
>
>
>
>
> On Saturday, February 1, 2014 7:51 PM, Joshua Wiley <[hidden email]> wrote:
>
> Dear Felipe,
>>
>>That is a normal behavior --- The prediction for that simple model
>>decreases over time, and ends up negative.  If the outcome cannot take
>>on negative values, treating it as a continuous gaussian may not be
>>optimal --- perhaps some transformation, like using a log link so that
>>the expoentiated values are always positive would be better?
>>Alternately, if the predictions are going negative, not because the
>>data is over all, but say there is a quick decrease in values in the
>>first part of time but later on it slows, but if you have an overly
>>simplisitic time model, it may just keep
>  decreasing.  Using a smoother
>>with a higher basis dimensions may help more accurately model the
>>function over the span of time in your dataset and then not have
>>predicted values.
>>
>>I do not think that there would be any straight forward 'force' the
>>model to be positive only.
>>
>>Best,
>>
>>Joshua
>>
>>
>>
>>On Sat, Feb 1, 2014 at 5:05 PM, Felipe Carrillo
>><[hidden email]> wrote:
>>> Consider this dummy dataset.
>>> My real dataset with over 1000 records has
>>> scatter large and
>  small values.
>>> I want to predict for
>  values with NA but I
>>> get negative predictions. Is this a normal
>>> behaviour or I am missing a gam argument
>>> to force the model to predict positive values.
>>> library(mgcv)
>>> test <- data.frame(iddate=seq(as.Date("2014-01-01"),
>>>         as.Date("2014-01-12"), by="days"),
>>>         value=c(300,29,22,NA,128,24,15,1,3,30,NA,2))
>>> test
>>> str(test)
>>> mod <- gam(value ~ s(as.numeric(iddate)),data=test)
>>> # Predict for values with NA's
>>> test$pred <- with(test,ifelse(is.na(value),predict(mod,test),value))
>>> test
>>>         [[alternative HTML version deleted]]
>>>
>>>
>>>
>  ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>>--
>>Joshua Wiley
>>Ph.D. Student, Health Psychology
>>University of California, Los Angeles
>>http://joshuawiley.com/
>>Senior Analyst - Elkhart Group Ltd.
>>http://elkhartgroup.com/
>>
>>
>>
>>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.