Imputing Missing Values

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Imputing Missing Values

Rmetrics mailing list
This might be a very basic query for this erudite group. However, i am hopeful some help will be forthcoming nevertheless.I have a monthly time series of annualized t-bill rates on Indian markets. For some months, the values are missing randomly. I need to convert the annualized yields into daily as well as monthly yields. I have two questions:1. I am using package zoo. Which of the methods of NA imputations will be advisable for this series, viz., na.agggregate, na.locf, na.spline or na.approx etc.?2. Should the imputation be done on monthly annual yields and then the conversion to daily and monthly yields be performed or imputation be done afterwards?3. Are there better methods than above for this task?
I will be extremely grateful for comments. Thanks a ton. Regards,Pankaj
        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Imputing Missing Values

Frank-2
1)

I use na.locf to replace NAs from FRED. I believe na.rm=TRUE means next
observation is carried backward:

library(quantmod)
library(chron)


##
## Get DGS3MO Treasury yield from FRED
##

getSymbols('DGS3MO',src='FRED')
DGS3MO<-na.locf(DGS3MO/100.0,na.rm = TRUE)
tail(DGS3MO)
file_name <- "DGS3MO.csv"
write.zoo(DGS3MO, file = file_name, append = FALSE, quote = TRUE, sep = ",")
quit()

2)

Would you be so kind as to show what you are currently using to imply
yields? And what is your intended use for the implied daily/monthly data.

Thanks,

Frank
Chicago
-----Original Message-----
From: R-SIG-Finance [mailto:[hidden email]] On Behalf
Of Pankaj K Agarwal via R-SIG-Finance
Sent: Sunday, June 26, 2016 7:53 AM
To: R-sig-finance; R-Finance
Subject: [R-SIG-Finance] Imputing Missing Values

This might be a very basic query for this erudite group. However, i am
hopeful some help will be forthcoming nevertheless.I have a monthly time
series of annualized t-bill rates on Indian markets. For some months, the
values are missing randomly. I need to convert the annualized yields into
daily as well as monthly yields. I have two questions:1. I am using package
zoo. Which of the methods of NA imputations will be advisable for this
series, viz., na.agggregate, na.locf, na.spline or na.approx etc.?2. Should
the imputation be done on monthly annual yields and then the conversion to
daily and monthly yields be performed or imputation be done afterwards?3.
Are there better methods than above for this task?
I will be extremely grateful for comments. Thanks a ton. Regards,Pankaj
        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Imputing Missing Values

braverock
In reply to this post by Rmetrics mailing list
On Sun, 2016-06-26 at 12:53 +0000, Pankaj K Agarwal via R-SIG-Finance
wrote:

> This might be a very basic query for this erudite group. However, i am
> hopeful some help will be forthcoming nevertheless.I have a monthly
> time series of annualized t-bill rates on Indian markets. For some
> months, the values are missing randomly. I need to convert the
> annualized yields into daily as well as monthly yields. I have two
> questions:1. I am using package zoo. Which of the methods of NA
> imputations will be advisable for this series, viz., na.agggregate,
> na.locf, na.spline or na.approx etc.?2. Should the imputation be done
> on monthly annual yields and then the conversion to daily and monthly
> yields be performed or imputation be done afterwards?3. Are there
> better methods than above for this task? I will be extremely grateful
> for comments. Thanks a ton. Regards,Pankaj

The short answer is: Don't do it.  This is a bad idea.

You need to find a better source of data.  Daily data on Indian 3-mo
bill yields is widely available from free sources.  See, e.g.

http://www.investing.com/rates-bonds/india-3-month-bond-yield

There are many other sources of this data as well, I have no opinion on
the data quality of one over the other, but *any* of them would likely
be orders of magnitude better than what you're asking to do.

You can't impute from lower-frequency data to higher frequency data with
any confidence.  

These NA imputation methods are designed to fill some occasionally
missing data, or to do something like Last Observation Carried Forward
on Bid/Ask spreads (which is not imputation at all, since that is the
prevailing market).

Basically, it always makes sense to start with the highest frequency
data available, and aggregate to lower frequencies.  In the case of your
query, start with the spot yield and do whatever adjustments you need.

Regards,

Brian

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Imputing Missing Values

Rmetrics mailing list
In reply to this post by Frank-2
Dear FrankSo kind of you to have replied. Thanks.I am conducting an academic research on mutual funds. I am modelling fund returns on daily as well as monthly frequencies. The model uses zero investment strategy returns (fund/portfolio returns minus risk free rate) as variables. For risk free rates i have good quality 91-day t-bill rates available. But it appeared that using 91-day risk free rate with daily fund returns is not a very good idea, and i needed to search t-bill rates of lesser maturity. So, in India next lower maturity available is 14-day t-bill, but there are missing values for some months in the monthly series of annualized rates. For example, in some years, data for 3 months running is missing. Hope i could convey the problem.
Will na.locf be a good idea in this case? Also, if i use 91-day rates with daily fund returns, will it matter much?Kindly advise.
 Regards,Pankaj

 

    On Sunday, 26 June 2016 6:39 PM, Frank <[hidden email]> wrote:
 
 

 1)

I use na.locf to replace NAs from FRED. I believe na.rm=TRUE means next
observation is carried backward:

library(quantmod)
library(chron)


##
## Get DGS3MO Treasury yield from FRED
##

getSymbols('DGS3MO',src='FRED')
DGS3MO<-na.locf(DGS3MO/100.0,na.rm = TRUE)
tail(DGS3MO)
file_name <- "DGS3MO.csv"
write.zoo(DGS3MO, file = file_name, append = FALSE, quote = TRUE, sep = ",")
quit()

2)

Would you be so kind as to show what you are currently using to imply
yields? And what is your intended use for the implied daily/monthly data.

Thanks,

Frank
Chicago
-----Original Message-----
From: R-SIG-Finance [mailto:[hidden email]] On Behalf
Of Pankaj K Agarwal via R-SIG-Finance
Sent: Sunday, June 26, 2016 7:53 AM
To: R-sig-finance; R-Finance
Subject: [R-SIG-Finance] Imputing Missing Values

This might be a very basic query for this erudite group. However, i am
hopeful some help will be forthcoming nevertheless.I have a monthly time
series of annualized t-bill rates on Indian markets. For some months, the
values are missing randomly. I need to convert the annualized yields into
daily as well as monthly yields. I have two questions:1. I am using package
zoo. Which of the methods of NA imputations will be advisable for this
series, viz., na.agggregate, na.locf, na.spline or na.approx etc.?2. Should
the imputation be done on monthly annual yields and then the conversion to
daily and monthly yields be performed or imputation be done afterwards?3.
Are there better methods than above for this task?
I will be extremely grateful for comments. Thanks a ton. Regards,Pankaj
    [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.



 
        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Imputing Missing Values

Rmetrics mailing list
In reply to this post by braverock
Thanks so much Brian sir. Your reply is a great help. Regards,Pankaj 

    On Sunday, 26 June 2016 6:47 PM, Brian G. Peterson <[hidden email]> wrote:
 
 

 On Sun, 2016-06-26 at 12:53 +0000, Pankaj K Agarwal via R-SIG-Finance
wrote:

> This might be a very basic query for this erudite group. However, i am
> hopeful some help will be forthcoming nevertheless.I have a monthly
> time series of annualized t-bill rates on Indian markets. For some
> months, the values are missing randomly. I need to convert the
> annualized yields into daily as well as monthly yields. I have two
> questions:1. I am using package zoo. Which of the methods of NA
> imputations will be advisable for this series, viz., na.agggregate,
> na.locf, na.spline or na.approx etc.?2. Should the imputation be done
> on monthly annual yields and then the conversion to daily and monthly
> yields be performed or imputation be done afterwards?3. Are there
> better methods than above for this task? I will be extremely grateful
> for comments. Thanks a ton. Regards,Pankaj    

The short answer is: Don't do it.  This is a bad idea.

You need to find a better source of data.  Daily data on Indian 3-mo
bill yields is widely available from free sources.  See, e.g.

http://www.investing.com/rates-bonds/india-3-month-bond-yield

There are many other sources of this data as well, I have no opinion on
the data quality of one over the other, but *any* of them would likely
be orders of magnitude better than what you're asking to do.

You can't impute from lower-frequency data to higher frequency data with
any confidence. 

These NA imputation methods are designed to fill some occasionally
missing data, or to do something like Last Observation Carried Forward
on Bid/Ask spreads (which is not imputation at all, since that is the
prevailing market).

Basically, it always makes sense to start with the highest frequency
data available, and aggregate to lower frequencies.  In the case of your
query, start with the spot yield and do whatever adjustments you need.

Regards,

Brian



 
 
        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.