

Hi,
I have a data.frame that is date ordered by row number  earliest
date first and most current last. I want to create a couple of new
columns that show the max and min values from other columns *so far* 
not for the whole data.frame.
It seems this sort of question is really coming from my lack of
understanding about how R intends me to limit myself to portions of a
data.frame. I get the impression from the help files that the generic
way is that if I'm on the 500th row of a 1000 row data.frame and want
to limit the search max does to rows 1:500 I should use something
like [1:row] but it's not working inside my function. The idea works
outside the function, in the sense I can create tempt1[1:7] and the
max function returns what I expect. How do I do this with row?
Simple example attached. hp should be 'highest p', ll should be
'lowest l'. I get an error message "Error in 1:row : NA/NaN argument"
Thanks,
Mark
AddCols = function (MyFrame) {
MyFrame$p<0
MyFrame$l<0
MyFrame$pc<0
MyFrame$lc<0
MyFrame$pwin<0
MyFrame$hp<0
MyFrame$ll<0
return(MyFrame)
}
BinPosNeg = function (MyFrame) {
## Positive y in p column, negative y in l column
pos < MyFrame$y > 0
MyFrame$p[pos] < MyFrame$y[pos]
MyFrame$l[!pos] < MyFrame$y[!pos]
return(MyFrame)
}
RunningCount = function (MyFrame) {
## Running count of p & l events
pos < (MyFrame$p > 0)
MyFrame$pc < cumsum(pos)
pos < (MyFrame$l < 0)
MyFrame$lc < cumsum(pos)
return(MyFrame)
}
PercentWins = function (MyFrame) {
MyFrame$pwin < round((MyFrame$pc / (MyFrame$pc+MyFrame$lc)),2)
return(MyFrame)
}
HighLow = function (MyFrame) {
temp1 < MyFrame$p[1:row]
MyFrame$hp < max(temp1) ## Highest p
temp1 < MyFrame$l[1:row]
MyFrame$ll < min(temp1) ## Lowest l
return(MyFrame)
}
F1 < data.frame(x=1:10, y=2*(4:5) )
F1 < AddCols(F1)
F1 < BinPosNeg(F1)
F1 < RunningCount(F1)
F1 < PercentWins(F1)
F1
F1 < HighLow(F1)
F1
temp1<F1$p[1:5]
max(temp1)
temp1<F1$p[1:7]
max(temp1)
temp1<F1$p[1:10]
max(temp1)
markknecht@gmail.com


On 01/07/2009 11:49 AM, Mark Knecht wrote:
> Hi,
> I have a data.frame that is date ordered by row number  earliest
> date first and most current last. I want to create a couple of new
> columns that show the max and min values from other columns *so far* 
> not for the whole data.frame.
>
> It seems this sort of question is really coming from my lack of
> understanding about how R intends me to limit myself to portions of a
> data.frame. I get the impression from the help files that the generic
> way is that if I'm on the 500th row of a 1000 row data.frame and want
> to limit the search max does to rows 1:500 I should use something
> like [1:row] but it's not working inside my function. The idea works
> outside the function, in the sense I can create tempt1[1:7] and the
> max function returns what I expect. How do I do this with row?
>
> Simple example attached. hp should be 'highest p', ll should be
> 'lowest l'. I get an error message "Error in 1:row : NA/NaN argument"
>
> Thanks,
> Mark
>
> AddCols = function (MyFrame) {
> MyFrame$p<0
> MyFrame$l<0
> MyFrame$pc<0
> MyFrame$lc<0
> MyFrame$pwin<0
> MyFrame$hp<0
> MyFrame$ll<0
> return(MyFrame)
> }
>
> BinPosNeg = function (MyFrame) {
>
> ## Positive y in p column, negative y in l column
> pos < MyFrame$y > 0
> MyFrame$p[pos] < MyFrame$y[pos]
> MyFrame$l[!pos] < MyFrame$y[!pos]
> return(MyFrame)
> }
>
> RunningCount = function (MyFrame) {
> ## Running count of p & l events
>
> pos < (MyFrame$p > 0)
> MyFrame$pc < cumsum(pos)
> pos < (MyFrame$l < 0)
> MyFrame$lc < cumsum(pos)
>
> return(MyFrame)
> }
>
> PercentWins = function (MyFrame) {
>
> MyFrame$pwin < round((MyFrame$pc / (MyFrame$pc+MyFrame$lc)),2)
>
> return(MyFrame)
> }
>
> HighLow = function (MyFrame) {
> temp1 < MyFrame$p[1:row]
> MyFrame$hp < max(temp1) ## Highest p
> temp1 < MyFrame$l[1:row]
> MyFrame$ll < min(temp1) ## Lowest l
>
> return(MyFrame)
> }
You get an error in this function because you didn't define row, so R
assumes you mean the function in the base package, and 1:row doesn't
make sense.
What you want for the "highest so far" is the cummax (for "cumulative
maximum") function. See ?cummax.
Duncan Murdoch
>
> F1 < data.frame(x=1:10, y=2*(4:5) )
> F1 < AddCols(F1)
> F1 < BinPosNeg(F1)
> F1 < RunningCount(F1)
> F1 < PercentWins(F1)
> F1
> F1 < HighLow(F1)
> F1
>
> temp1<F1$p[1:5]
> max(temp1)
> temp1<F1$p[1:7]
> max(temp1)
> temp1<F1$p[1:10]
> max(temp1)
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
On Wed, Jul 1, 2009 at 9:39 AM, Duncan Murdoch< [hidden email]> wrote:
> On 01/07/2009 11:49 AM, Mark Knecht wrote:
>>
>> Hi,
>> I have a data.frame that is date ordered by row number  earliest
>> date first and most current last. I want to create a couple of new
>> columns that show the max and min values from other columns *so far* 
>> not for the whole data.frame.
>>
>> It seems this sort of question is really coming from my lack of
>> understanding about how R intends me to limit myself to portions of a
>> data.frame. I get the impression from the help files that the generic
>> way is that if I'm on the 500th row of a 1000 row data.frame and want
>> to limit the search max does to rows 1:500 I should use something
>> like [1:row] but it's not working inside my function. The idea works
>> outside the function, in the sense I can create tempt1[1:7] and the
>> max function returns what I expect. How do I do this with row?
>>
>> Simple example attached. hp should be 'highest p', ll should be
>> 'lowest l'. I get an error message "Error in 1:row : NA/NaN argument"
>>
>> Thanks,
>> Mark
>>
>>
>> HighLow = function (MyFrame) {
>> temp1 < MyFrame$p[1:row]
>> MyFrame$hp < max(temp1) ## Highest p
>> temp1 < MyFrame$l[1:row]
>> MyFrame$ll < min(temp1) ## Lowest l
>>
>> return(MyFrame)
>> }
>
> You get an error in this function because you didn't define row, so R
> assumes you mean the function in the base package, and 1:row doesn't make
> sense.
>
> What you want for the "highest so far" is the cummax (for "cumulative
> maximum") function. See ?cummax.
>
> Duncan Murdoch
>
Duncon,
OK, thanks. That makes sense, as long as I want the cummax from the
beginning of the data.frame. (Which is exactly what I asked for!)
How would I do this in the more general case if I was looking for
the cummax of only the most recent 50 rows in my data.frame? What I'm
trying to get down to is that as I fill in my data.frame I need to be
able get a max or min or standard deviation of the previous so many
rows of data  not the whole column  and I'm just not grasping how to
do this. Is seems like I should be able to create a data set that's
only a portion of a column while I'm in the function and then take the
cummax on that, or use it as an input to a standard deviation, etc.?
Thanks,
Mark
On 01/07/2009 1:26 PM, Mark Knecht wrote:
> On Wed, Jul 1, 2009 at 9:39 AM, Duncan Murdoch< [hidden email]> wrote:
>> On 01/07/2009 11:49 AM, Mark Knecht wrote:
>>> Hi,
>>> I have a data.frame that is date ordered by row number  earliest
>>> date first and most current last. I want to create a couple of new
>>> columns that show the max and min values from other columns *so far* 
>>> not for the whole data.frame.
>>>
>>> It seems this sort of question is really coming from my lack of
>>> understanding about how R intends me to limit myself to portions of a
>>> data.frame. I get the impression from the help files that the generic
>>> way is that if I'm on the 500th row of a 1000 row data.frame and want
>>> to limit the search max does to rows 1:500 I should use something
>>> like [1:row] but it's not working inside my function. The idea works
>>> outside the function, in the sense I can create tempt1[1:7] and the
>>> max function returns what I expect. How do I do this with row?
>>>
>>> Simple example attached. hp should be 'highest p', ll should be
>>> 'lowest l'. I get an error message "Error in 1:row : NA/NaN argument"
>>>
>>> Thanks,
>>> Mark
>>>
>>> HighLow = function (MyFrame) {
>>> temp1 < MyFrame$p[1:row]
>>> MyFrame$hp < max(temp1) ## Highest p
>>> temp1 < MyFrame$l[1:row]
>>> MyFrame$ll < min(temp1) ## Lowest l
>>>
>>> return(MyFrame)
>>> }
>> You get an error in this function because you didn't define row, so R
>> assumes you mean the function in the base package, and 1:row doesn't make
>> sense.
>>
>> What you want for the "highest so far" is the cummax (for "cumulative
>> maximum") function. See ?cummax.
>>
>> Duncan Murdoch
>>
>
> Duncon,
> OK, thanks. That makes sense, as long as I want the cummax from the
> beginning of the data.frame. (Which is exactly what I asked for!)
>
> How would I do this in the more general case if I was looking for
> the cummax of only the most recent 50 rows in my data.frame? What I'm
> trying to get down to is that as I fill in my data.frame I need to be
> able get a max or min or standard deviation of the previous so many
> rows of data  not the whole column  and I'm just not grasping how to
> do this. Is seems like I should be able to create a data set that's
> only a portion of a column while I'm in the function and then take the
> cummax on that, or use it as an input to a standard deviation, etc.?
What you describe might be called a "running max". The caTools package
has a runmax function that probably does what you want.
More generally, you can always write a loop. They aren't necesssrily
fast or elegant, but they're pretty general. For example, to calculate
the max of the previous 50 observations (or fewer near the start of a
vector), you could do
x < ... some vector ...
result < numeric(length(x))
for (i in seq_along(x)) {
result[i] < max( x[ max(1, i49):i ])
}
Duncan Murdoch
> On 01/07/2009 1:26 PM, Mark Knecht wrote:
>>
>> On Wed, Jul 1, 2009 at 9:39 AM, Duncan Murdoch< [hidden email]>
>> wrote:
>>>
>>> On 01/07/2009 11:49 AM, Mark Knecht wrote:
>>>>
>>>> Hi,
>>>> I have a data.frame that is date ordered by row number  earliest
>>>> date first and most current last. I want to create a couple of new
>>>> columns that show the max and min values from other columns *so far* 
>>>> not for the whole data.frame.
>>>>
>>>> It seems this sort of question is really coming from my lack of
>>>> understanding about how R intends me to limit myself to portions of a
>>>> data.frame. I get the impression from the help files that the generic
>>>> way is that if I'm on the 500th row of a 1000 row data.frame and want
>>>> to limit the search max does to rows 1:500 I should use something
>>>> like [1:row] but it's not working inside my function. The idea works
>>>> outside the function, in the sense I can create tempt1[1:7] and the
>>>> max function returns what I expect. How do I do this with row?
>>>>
>>>> Simple example attached. hp should be 'highest p', ll should be
>>>> 'lowest l'. I get an error message "Error in 1:row : NA/NaN argument"
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>
>>>> HighLow = function (MyFrame) {
>>>> temp1 < MyFrame$p[1:row]
>>>> MyFrame$hp < max(temp1) ## Highest p
>>>> temp1 < MyFrame$l[1:row]
>>>> MyFrame$ll < min(temp1) ## Lowest l
>>>>
>>>> return(MyFrame)
>>>> }
>>>
>>> You get an error in this function because you didn't define row, so R
>>> assumes you mean the function in the base package, and 1:row doesn't make
>>> sense.
>>>
>>> What you want for the "highest so far" is the cummax (for "cumulative
>>> maximum") function. See ?cummax.
>>>
>>> Duncan Murdoch
>>>
>>
>> Duncon,
>> OK, thanks. That makes sense, as long as I want the cummax from the
>> beginning of the data.frame. (Which is exactly what I asked for!)
>>
>> How would I do this in the more general case if I was looking for
>> the cummax of only the most recent 50 rows in my data.frame? What I'm
>> trying to get down to is that as I fill in my data.frame I need to be
>> able get a max or min or standard deviation of the previous so many
>> rows of data  not the whole column  and I'm just not grasping how to
>> do this. Is seems like I should be able to create a data set that's
>> only a portion of a column while I'm in the function and then take the
>> cummax on that, or use it as an input to a standard deviation, etc.?
>
> What you describe might be called a "running max". The caTools package has
> a runmax function that probably does what you want.
>
> More generally, you can always write a loop. They aren't necesssrily fast
> or elegant, but they're pretty general. For example, to calculate the max
> of the previous 50 observations (or fewer near the start of a vector), you
> could do
>
> x < ... some vector ...
>
> result < numeric(length(x))
> for (i in seq_along(x)) {
> result[i] < max( x[ max(1, i49):i ])
> }
>
> Duncan Murdoch
>
Thanks for the pointer. I'll check it out.
Today I've managed to get pretty much all of my Excel spreadsheet
built in R except for some of the charts. It took me a week and a half
in Excel. This is my 3rd full day with R. Charts are next.
I appreciate your help and the help I've gotten from others. Thanks so much.
cheers,
Mark
For another generic approach, you might be interested in the Reduce
function,
rolling < function( x, window=seq_along(x), f=max){
Reduce(f, x[window])
}
x= c(1:10, 2:10, 15, 1)
rolling(x)
#15
rolling(x, 1:10)
#10
rolling(x, 1:12)
#10
Of course this is only part of the solution to the initial problem (where
the window needs to move along).
HTH,
baptiste
2009/7/1 Duncan Murdoch < [hidden email]>
> On 01/07/2009 1:26 PM, Mark Knecht wrote:
>
>> On Wed, Jul 1, 2009 at 9:39 AM, Duncan Murdoch< [hidden email]>
>> wrote:
>>
>>> On 01/07/2009 11:49 AM, Mark Knecht wrote:
>>>
>>>> Hi,
>>>> I have a data.frame that is date ordered by row number  earliest
>>>> date first and most current last. I want to create a couple of new
>>>> columns that show the max and min values from other columns *so far* 
>>>> not for the whole data.frame.
>>>>
>>>> It seems this sort of question is really coming from my lack of
>>>> understanding about how R intends me to limit myself to portions of a
>>>> data.frame. I get the impression from the help files that the generic
>>>> way is that if I'm on the 500th row of a 1000 row data.frame and want
>>>> to limit the search max does to rows 1:500 I should use something
>>>> like [1:row] but it's not working inside my function. The idea works
>>>> outside the function, in the sense I can create tempt1[1:7] and the
>>>> max function returns what I expect. How do I do this with row?
>>>>
>>>> Simple example attached. hp should be 'highest p', ll should be
>>>> 'lowest l'. I get an error message "Error in 1:row : NA/NaN argument"
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>
>>> HighLow = function (MyFrame) {
>>>> temp1 < MyFrame$p[1:row]
>>>> MyFrame$hp < max(temp1) ## Highest p
>>>> temp1 < MyFrame$l[1:row]
>>>> MyFrame$ll < min(temp1) ## Lowest l
>>>>
>>>> return(MyFrame)
>>>> }
>>>>
>>> You get an error in this function because you didn't define row, so R
>>> assumes you mean the function in the base package, and 1:row doesn't make
>>> sense.
>>>
>>> What you want for the "highest so far" is the cummax (for "cumulative
>>> maximum") function. See ?cummax.
>>>
>>> Duncan Murdoch
>>>
>>>
>> Duncon,
>> OK, thanks. That makes sense, as long as I want the cummax from the
>> beginning of the data.frame. (Which is exactly what I asked for!)
>>
>> How would I do this in the more general case if I was looking for
>> the cummax of only the most recent 50 rows in my data.frame? What I'm
>> trying to get down to is that as I fill in my data.frame I need to be
>> able get a max or min or standard deviation of the previous so many
>> rows of data  not the whole column  and I'm just not grasping how to
>> do this. Is seems like I should be able to create a data set that's
>> only a portion of a column while I'm in the function and then take the
>> cummax on that, or use it as an input to a standard deviation, etc.?
>>
>
> What you describe might be called a "running max". The caTools package has
> a runmax function that probably does what you want.
>
> More generally, you can always write a loop. They aren't necesssrily fast
> or elegant, but they're pretty general. For example, to calculate the max
> of the previous 50 observations (or fewer near the start of a vector), you
> could do
>
> x < ... some vector ...
>
> result < numeric(length(x))
> for (i in seq_along(x)) {
> result[i] < max( x[ max(1, i49):i ])
> }
>
> Duncan Murdoch
>
>

Hi
what about do inside some function a subset of your whole data frame
fff < function( data, rows) {
data.1 < data[1:rows,]
get all necessary stuf on data.1
return what you want
}
You can put a dimension check if you want the function to be more robust
Regards
Petr
Belated answer:
A few remarks regarding your questions:
Your running max problem could be solved in the following way:
(which is a soution based o Duncan Murdoch's suggestion,
but a little bit more general.
foldOrbit<function(x,fun){
res<numeric(length(x))
res[1]<x[1]
for (i in 2:length(x)) res[i]<fun(res[i1],x[i])
res
}
or more generally
applySliding<function(x,fun,winlength=length(x)){
res<numeric(length(x))
for (i in seq_along(x)) {res[i]<fun(x[(max(1,iwinlength+1)):i])}
res
}
foldOrbit(x,max)
will give you the running maxes of vector x.
For max, taking the max of the max of the sequence without the last
element
and the last element gives the max of the whole sequence.
It also works for min, sum, prod (all these are associative).
applySliding is more general. The second argument is the function you
want to apply "in running mode".
If you do not give the winlength, it will apply the function in
"running mode" an give correct result for
nonassociatve functions also.
If you give the winlength, it will only use the last winlength
elements of the vector.
Examples:
foldOrbit(1:10,max)
applySliding(1:10,max)
applySliding(1:10,max,3)
And now, for something completely different:
You seem to want to combine Excel and R in you work.
Possibly you can make your work easier if you user RExcel,
which is an addin allowing to use R from within Excel.
Information is available at rcom.univie.ac.at
and there is (half hour long) video demonstrating how to use
R from within Excel.
