Michael, Mikhail

Many thanks for your helpful comments. My faith in community support continues to grow.

Michael: I'm looking to use some sort of flexible spline-like fit (smooth.spline, lowess etc).

Many thanks for sharing your expertise. I actually cross posted this on to the "manipulatr" google group, here is the response from Peter Meilstrup:

" For (1) you might want to take a look at rollapply() and related functions in the zoo package.

for (2), don't put the different samples of your curve fit into different columns. Instead imagine generating a data frame with three columns:

bae.date (date each your fit is based around)

prediction.date (date you are extrapolating to)

preciction (the fitted value)

so if you have 100 dates, and generate a 7 point curve from each date, you end up with 700 rows."

As ever time pressures kind of dictate that I start from what I know. I've only pretty basic database skills at the moment, so will try zoo/TTR first and try PostgreSQL if that isn't satisfactory.

-----Original Message-----

From: Mikhail Titov [mailto:

[hidden email]]

Sent: 12 July 2012 00:22

To: R. Michael Weylandt

Cc: Russell Bowdrey;

[hidden email]
Subject: Re: do I need plyr, apply or something else?

"R. Michael Weylandt" <

[hidden email]> writes:

> On Wed, Jul 11, 2012 at 10:05 AM, Russell Bowdrey

> <

[hidden email]> wrote:

>>

>> Dear all,

>>

>> This is what I'd like to do (I have an implementation using for

>> loops, which I designed before I realised just how slow R is at

>> executing them - this process currently takes days to run).

>>

>> I have a large dataframe containing corporate bond data, columns are:

>> BondID

>> Date (goes back 5years)

>> Var1

>> Var2

>> Term2Maturity

>>

>> What I want to do is this:

>>

>> 1) For each bond, at each given date, look back over 1 year and append some statistics to each row ( sd(Var1), cor(Var1,Var2) over that year etc)

>>

>

> Look at the TTR package and the various run** functions. Much faster.

>

>> a. It seems I might be able to use ddply for this, but I can't work

>> out how to code the stats function to only look back over one year,

>> rather than the full data range

>>

>> b. For example: dfBondsWithCorr<-ddply(dfBonds, .(BondID), transform,corr=cor(Var1,Var2),.progress="text")

>> returns a dataframe where for each bond it has same corr for each

>> date

>>

>> 2) On each date, subset dfBondsWithCorr by certain qualification

>> criteria, then to the qualifiers fit a regression through a Var1 and

>> Term2Maturity, output the regression as a df of curves (say for each

>> date, a curve represented by points every 0.5 years)

>>

>> a. I can do this pretty efficiently for a single date (and I suppose

>> I could wrap that in a function) , but can't quite see how to do the

>> filtering and spitting out of curves over multiple dates without

>> using for loops

>>

>

> This ones harder. For simple linear regressions, you can solve the

> regression analytically (e.g., slope = runCov / runVar and mean

> similarly) but doing it for more complicated regressions will pretty

> much require a for loop of one sort or another. Can you say what sort

> of model you are looking to use?

>

>> Would appreciate any thoughts, many thanks in advance

I feel like PostgreSQL will do the work better. It has support for basic statistics [1] and you can use window functions [2] to limit the scope for last year only. Then you get your data with RODBC or something.

I suspect you have you data in some sort of DB in the first place. Perhaps it has similar features.

[1]

http://www.postgresql.org/docs/9.1/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE[2]

http://www.postgresql.org/docs/9.1/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS--

Mikhail

This email and any attachments are confidential and inte...{{dropped:29}}

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.