about interpolating data in r

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

about interpolating data in r

lily li
I have a question about interpolating missing values in a dataframe. The
dataframe is in the following, Column C has no data before 2009-01-05 and
after 2009-12-31, how to interpolate data for the blanks? That is to say,
interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.


df
time                A      B     C
2009-01-01    3      4.5
2009-01-02    4      5
2009-01-03    3.3   6
2009-01-04    4.1   7
2009-01-05    4.4   6.2   5.4
...

2009-11-20    5.1   5.5   6.1
2009-11-21    5.4   4
...
2009-12-31    4.5   6

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about interpolating data in r

R help mailing list-2
Try approx(), as in:

df <-
data.frame(A=c(10,11,12),B=c(5,5,4),C=c(3.3,4,3),time=as.Date(c("1990-01-01","1990-02-07","1990-02-14")))
with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))

Do you notice how one can copy and paste that example out of the
mail an into R to see how it works?  It would help if your questions
had that same property - show how the example data could be created.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jul 21, 2016 at 3:34 PM, lily li <[hidden email]> wrote:

> I have a question about interpolating missing values in a dataframe. The
> dataframe is in the following, Column C has no data before 2009-01-05 and
> after 2009-12-31, how to interpolate data for the blanks? That is to say,
> interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
>
>
> df
> time                A      B     C
> 2009-01-01    3      4.5
> 2009-01-02    4      5
> 2009-01-03    3.3   6
> 2009-01-04    4.1   7
> 2009-01-05    4.4   6.2   5.4
> ...
>
> 2009-11-20    5.1   5.5   6.1
> 2009-11-21    5.4   4
> ...
> 2009-12-31    4.5   6
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about interpolating data in r

lily li
Thanks, I meant if there are missing data at the beginning and end of a
dataframe, how to interpolate according to available data?

For example, the A column has missing values at the beginning and end, how
to interpolate linearly between 10 and 12 for the missing values?

df <- data.frame(A=c(NA, NA,10,11,12, NA),B=c(5,5,4,3,4,5),C=c(3.3,4,3,1.5,
2.2,4),time=as.Date(c("1990-01-01","1990-02-
07","1990-02-14","1990-02-28","1990-03-01","1990-03-20")))


On Thu, Jul 21, 2016 at 4:48 PM, William Dunlap <[hidden email]> wrote:

> Try approx(), as in:
>
> df <-
> data.frame(A=c(10,11,12),B=c(5,5,4),C=c(3.3,4,3),time=as.Date(c("1990-01-01","1990-02-07","1990-02-14")))
> with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))
>
> Do you notice how one can copy and paste that example out of the
> mail an into R to see how it works?  It would help if your questions
> had that same property - show how the example data could be created.
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jul 21, 2016 at 3:34 PM, lily li <[hidden email]> wrote:
>
>> I have a question about interpolating missing values in a dataframe. The
>> dataframe is in the following, Column C has no data before 2009-01-05 and
>> after 2009-12-31, how to interpolate data for the blanks? That is to say,
>> interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
>>
>>
>> df
>> time                A      B     C
>> 2009-01-01    3      4.5
>> 2009-01-02    4      5
>> 2009-01-03    3.3   6
>> 2009-01-04    4.1   7
>> 2009-01-05    4.4   6.2   5.4
>> ...
>>
>> 2009-11-20    5.1   5.5   6.1
>> 2009-11-21    5.4   4
>> ...
>> 2009-12-31    4.5   6
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about interpolating data in r

Ismail SEZEN
In reply to this post by lily li

> On 22 Jul 2016, at 01:34, lily li <[hidden email]> wrote:
>
> I have a question about interpolating missing values in a dataframe.

First of all, filling missing values action must be taken into account very carefully. It must be known the nature of the data that wanted to be filled and most of the time, to let them be NA is the most appropriate action.

> The
> dataframe is in the following, Column C has no data before 2009-01-05 and
> after 2009-12-31, how to interpolate data for the blanks?

Why a dataframe? Is there any relationship between columns A,B and C? If there is, then you might want to consider filling missing values by a linear model approach instead of interpolation. You said that there is not data before 2009-01-05 and after 2009-12-31 but according to dataframe, there is not data after 2009-11-20?

> That is to say,
> interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.

Also you metion interpolating blanks but you want interpolation between two gaps? Do you want to fill missing values before 2009-01-05 and after 2009-11-20 or do you want to find intermediate values between 2009-01-05 and 2009-11-20? This is a bit unclear.

>
>
> df
> time                A      B     C
> 2009-01-01    3      4.5
> 2009-01-02    4      5
> 2009-01-03    3.3   6
> 2009-01-04    4.1   7
> 2009-01-05    4.4   6.2   5.4
> ...
>
> 2009-11-20    5.1   5.5   6.1
> 2009-11-21    5.4   4
> ...
> 2009-12-31    4.5   6


If you want to fill missing values at the end-points for column C (before 2009-01-05 and after 2009-11-20), and all data you have is between 2009-01-05 and 2009-11-20, this means that you want extrapolation (guessing unkonwn values that is out of known values). So, you can use only values at column C to guess missing end-point values. You can use splinefun (or spline) functions for this purpose. But let me note that this kind of approach might help you only for a few missing values close to end-points. Otherwise, you might find yourself in a huge mistake.

As I mentioned in my first sentence, If you have a relationship between all columns or you have data for column C for other years (for instance, assume that you have data for column C for 2007, 2008, and 2010 but not 2009) you may want to try a statistical approach to fill the missing values.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about interpolating data in r

Ismail SEZEN
In reply to this post by lily li

> On 22 Jul 2016, at 01:54, lily li <[hidden email]> wrote:
>
> Thanks, I meant if there are missing data at the beginning and end of a
> dataframe, how to interpolate according to available data?
>
> For example, the A column has missing values at the beginning and end, how
> to interpolate linearly between 10 and 12 for the missing values?
>
> df <- data.frame(A=c(NA, NA,10,11,12, NA),B=c(5,5,4,3,4,5),C=c(3.3,4,3,1.5,
> 2.2,4),time=as.Date(c("1990-01-01","1990-02-
> 07","1990-02-14","1990-02-28","1990-03-01","1990-03-20")))
>

As William was answered;

with(df, approx(x=time, y=A, xout=seq(min(time, na.rm =T), max(time, na.rm = T), by="days")))

will help you interpolate linearly between knwon values even column has NA’s.


>
> On Thu, Jul 21, 2016 at 4:48 PM, William Dunlap <[hidden email]> wrote:
>
>> Try approx(), as in:
>>
>> df <-
>> data.frame(A=c(10,11,12),B=c(5,5,4),C=c(3.3,4,3),time=as.Date(c("1990-01-01","1990-02-07","1990-02-14")))
>> with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))
>>
>> Do you notice how one can copy and paste that example out of the
>> mail an into R to see how it works?  It would help if your questions
>> had that same property - show how the example data could be created.
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Thu, Jul 21, 2016 at 3:34 PM, lily li <[hidden email]> wrote:
>>
>>> I have a question about interpolating missing values in a dataframe. The
>>> dataframe is in the following, Column C has no data before 2009-01-05 and
>>> after 2009-12-31, how to interpolate data for the blanks? That is to say,
>>> interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
>>>
>>>
>>> df
>>> time                A      B     C
>>> 2009-01-01    3      4.5
>>> 2009-01-02    4      5
>>> 2009-01-03    3.3   6
>>> 2009-01-04    4.1   7
>>> 2009-01-05    4.4   6.2   5.4
>>> ...
>>>
>>> 2009-11-20    5.1   5.5   6.1
>>> 2009-11-21    5.4   4
>>> ...
>>> 2009-12-31    4.5   6
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about interpolating data in r

Jim Lemon-4
Hi lili,
The problem may lie in the fact that I think you are using
"interpolate" when you mean "extrapolate". In that case, the best you
can do is spread values beyond the points that you have. Find the
slope of the line, put a point at each end of your time data
(2009-01-01 and 2009-12-31) and use "approx" on all three gaps. Note
that this slope is a slippery one indeed and few will accept that the
values so generated mean anything.

Jim

On Fri, Jul 22, 2016 at 9:38 AM, Ismail SEZEN <[hidden email]> wrote:

>
>> On 22 Jul 2016, at 01:54, lily li <[hidden email]> wrote:
>>
>> Thanks, I meant if there are missing data at the beginning and end of a
>> dataframe, how to interpolate according to available data?
>>
>> For example, the A column has missing values at the beginning and end, how
>> to interpolate linearly between 10 and 12 for the missing values?
>>
>> df <- data.frame(A=c(NA, NA,10,11,12, NA),B=c(5,5,4,3,4,5),C=c(3.3,4,3,1.5,
>> 2.2,4),time=as.Date(c("1990-01-01","1990-02-
>> 07","1990-02-14","1990-02-28","1990-03-01","1990-03-20")))
>>
>
> As William was answered;
>
> with(df, approx(x=time, y=A, xout=seq(min(time, na.rm =T), max(time, na.rm = T), by="days")))
>
> will help you interpolate linearly between knwon values even column has NA’s.
>
>
>>
>> On Thu, Jul 21, 2016 at 4:48 PM, William Dunlap <[hidden email]> wrote:
>>
>>> Try approx(), as in:
>>>
>>> df <-
>>> data.frame(A=c(10,11,12),B=c(5,5,4),C=c(3.3,4,3),time=as.Date(c("1990-01-01","1990-02-07","1990-02-14")))
>>> with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))
>>>
>>> Do you notice how one can copy and paste that example out of the
>>> mail an into R to see how it works?  It would help if your questions
>>> had that same property - show how the example data could be created.
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Thu, Jul 21, 2016 at 3:34 PM, lily li <[hidden email]> wrote:
>>>
>>>> I have a question about interpolating missing values in a dataframe. The
>>>> dataframe is in the following, Column C has no data before 2009-01-05 and
>>>> after 2009-12-31, how to interpolate data for the blanks? That is to say,
>>>> interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
>>>>
>>>>
>>>> df
>>>> time                A      B     C
>>>> 2009-01-01    3      4.5
>>>> 2009-01-02    4      5
>>>> 2009-01-03    3.3   6
>>>> 2009-01-04    4.1   7
>>>> 2009-01-05    4.4   6.2   5.4
>>>> ...
>>>>
>>>> 2009-11-20    5.1   5.5   6.1
>>>> 2009-11-21    5.4   4
>>>> ...
>>>> 2009-12-31    4.5   6
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about interpolating data in r

lily li
In reply to this post by Ismail SEZEN
Thanks, Ismail.
For the gaps before 2009-01-05 and after 2009-11-20, I use the year 2010 to
fill in the missing values for column C. There is no relationship between
column A, B, and C.
For the missing values between 2009-01-05 and 2009-11-20, if there are any,
I found this approach is very helpful.
with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))



On Thu, Jul 21, 2016 at 5:14 PM, Ismail SEZEN <[hidden email]> wrote:

>
> > On 22 Jul 2016, at 01:34, lily li <[hidden email]> wrote:
> >
> > I have a question about interpolating missing values in a dataframe.
>
> First of all, filling missing values action must be taken into account
> very carefully. It must be known the nature of the data that wanted to be
> filled and most of the time, to let them be NA is the most appropriate
> action.
>
> > The
> > dataframe is in the following, Column C has no data before 2009-01-05 and
> > after 2009-12-31, how to interpolate data for the blanks?
>
> Why a dataframe? Is there any relationship between columns A,B and C? If
> there is, then you might want to consider filling missing values by a
> linear model approach instead of interpolation. You said that there is not
> data before 2009-01-05 and after 2009-12-31 but according to dataframe,
> there is not data after 2009-11-20?
>
> > That is to say,
> > interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
>
> Also you metion interpolating blanks but you want interpolation between
> two gaps? Do you want to fill missing values before 2009-01-05 and after
> 2009-11-20 or do you want to find intermediate values between 2009-01-05
> and 2009-11-20? This is a bit unclear.
>
> >
> >
> > df
> > time                A      B     C
> > 2009-01-01    3      4.5
> > 2009-01-02    4      5
> > 2009-01-03    3.3   6
> > 2009-01-04    4.1   7
> > 2009-01-05    4.4   6.2   5.4
> > ...
> >
> > 2009-11-20    5.1   5.5   6.1
> > 2009-11-21    5.4   4
> > ...
> > 2009-12-31    4.5   6
>
>
> If you want to fill missing values at the end-points for column C (before
> 2009-01-05 and after 2009-11-20), and all data you have is between
> 2009-01-05 and 2009-11-20, this means that you want extrapolation (guessing
> unkonwn values that is out of known values). So, you can use only values at
> column C to guess missing end-point values. You can use splinefun (or
> spline) functions for this purpose. But let me note that this kind of
> approach might help you only for a few missing values close to end-points.
> Otherwise, you might find yourself in a huge mistake.
>
> As I mentioned in my first sentence, If you have a relationship between
> all columns or you have data for column C for other years (for instance,
> assume that you have data for column C for 2007, 2008, and 2010 but not
> 2009) you may want to try a statistical approach to fill the missing values.
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about interpolating data in r

R help mailing list-2
approx() has a 'rule' argument that controls how it deals with
extrapolation.  Run help(approx) and read about the details.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Jul 22, 2016 at 8:29 AM, lily li <[hidden email]> wrote:

> Thanks, Ismail.
> For the gaps before 2009-01-05 and after 2009-11-20, I use the year 2010 to
> fill in the missing values for column C. There is no relationship between
> column A, B, and C.
> For the missing values between 2009-01-05 and 2009-11-20, if there are any,
> I found this approach is very helpful.
> with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))
>
>
>
> On Thu, Jul 21, 2016 at 5:14 PM, Ismail SEZEN <[hidden email]>
> wrote:
>
> >
> > > On 22 Jul 2016, at 01:34, lily li <[hidden email]> wrote:
> > >
> > > I have a question about interpolating missing values in a dataframe.
> >
> > First of all, filling missing values action must be taken into account
> > very carefully. It must be known the nature of the data that wanted to be
> > filled and most of the time, to let them be NA is the most appropriate
> > action.
> >
> > > The
> > > dataframe is in the following, Column C has no data before 2009-01-05
> and
> > > after 2009-12-31, how to interpolate data for the blanks?
> >
> > Why a dataframe? Is there any relationship between columns A,B and C? If
> > there is, then you might want to consider filling missing values by a
> > linear model approach instead of interpolation. You said that there is
> not
> > data before 2009-01-05 and after 2009-12-31 but according to dataframe,
> > there is not data after 2009-11-20?
> >
> > > That is to say,
> > > interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
> >
> > Also you metion interpolating blanks but you want interpolation between
> > two gaps? Do you want to fill missing values before 2009-01-05 and after
> > 2009-11-20 or do you want to find intermediate values between 2009-01-05
> > and 2009-11-20? This is a bit unclear.
> >
> > >
> > >
> > > df
> > > time                A      B     C
> > > 2009-01-01    3      4.5
> > > 2009-01-02    4      5
> > > 2009-01-03    3.3   6
> > > 2009-01-04    4.1   7
> > > 2009-01-05    4.4   6.2   5.4
> > > ...
> > >
> > > 2009-11-20    5.1   5.5   6.1
> > > 2009-11-21    5.4   4
> > > ...
> > > 2009-12-31    4.5   6
> >
> >
> > If you want to fill missing values at the end-points for column C (before
> > 2009-01-05 and after 2009-11-20), and all data you have is between
> > 2009-01-05 and 2009-11-20, this means that you want extrapolation
> (guessing
> > unkonwn values that is out of known values). So, you can use only values
> at
> > column C to guess missing end-point values. You can use splinefun (or
> > spline) functions for this purpose. But let me note that this kind of
> > approach might help you only for a few missing values close to
> end-points.
> > Otherwise, you might find yourself in a huge mistake.
> >
> > As I mentioned in my first sentence, If you have a relationship between
> > all columns or you have data for column C for other years (for instance,
> > assume that you have data for column C for 2007, 2008, and 2010 but not
> > 2009) you may want to try a statistical approach to fill the missing
> values.
> >
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.