

Hi fellows,
I am facing a case that I cannot solve with my limited knowledge of R,
unless I write the function myself  which I would like to avoid
(reusing is better than reinventing the wheel). Following the relevant
information.
Input scenario:
An xts time series object with duplicates, the object contains bid,
bid volume, ask, ask volume.
Example:
01012010 09:00:01 100 1 101 1
01012010 09:00:02 100 1 101 1
01012010 09:00:03 100 1 101 1
01012010 09:00:04 101 1 102 1
01012010 09:00:05 102 1 102 1
01012010 09:00:06 100 1 101 1
...
Goal:
A timeseries with only nonrepeating values, removing the duplicates
in between the values.
I tried "unique" already, but that one returns only the unique values
from within the whole timeseries and not on a running base.
Example code:
The following example code exemplifies with a nonxts series what I
want to achieve ...
> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
> removeDuplicates < function(input)
{
index = 2
ret = c(input[1])
for(i in 2:length(input))
{
if(input[i]!=input[i1])
{
ret[index] = input[i]
index = index + 1
}
}
ret
}
>
> removeDuplicates(y)
[1] 1 2 1 2 3 4 3 1
>
How can I make this with an xts series? Is there a function for this?
Thanks in advance,
with kind regards,
Ulrich

Ulrich Staudinger
activequant.org
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


Ulrich,
try duplicated(xts.object, ...) or possibly duplicated(as.data.frame(xts.object), ...) if all columns should be considered.
Regards, david
Original Message
From: [hidden email] [mailto: [hidden email]] On Behalf Of Ulrich Staudinger
Sent: Wednesday, September 15, 2010 8:28 AM
To: rsigfinance
Subject: [RSIGFinance] removing repeating values from xts series
Hi fellows,
I am facing a case that I cannot solve with my limited knowledge of R,
unless I write the function myself  which I would like to avoid
(reusing is better than reinventing the wheel). Following the relevant
information.
Input scenario:
An xts time series object with duplicates, the object contains bid,
bid volume, ask, ask volume.
Example:
01012010 09:00:01 100 1 101 1
01012010 09:00:02 100 1 101 1
01012010 09:00:03 100 1 101 1
01012010 09:00:04 101 1 102 1
01012010 09:00:05 102 1 102 1
01012010 09:00:06 100 1 101 1
...
Goal:
A timeseries with only nonrepeating values, removing the duplicates
in between the values.
I tried "unique" already, but that one returns only the unique values
from within the whole timeseries and not on a running base.
Example code:
The following example code exemplifies with a nonxts series what I
want to achieve ...
> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
> removeDuplicates < function(input)
{
index = 2
ret = c(input[1])
for(i in 2:length(input))
{
if(input[i]!=input[i1])
{
ret[index] = input[i]
index = index + 1
}
}
ret
}
>
> removeDuplicates(y)
[1] 1 2 1 2 3 4 3 1
>
How can I make this with an xts series? Is there a function for this?
Thanks in advance,
with kind regards,
Ulrich

Ulrich Staudinger
activequant.org
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


Hi David,
as far as I understand, duplicated works from the inner workings very
much like unique.
With a vector y (in this case no timeseries), duplicated yields:
> y
[1] 1 1 2 3 2 2 2 2 1
> duplicated(y)
[1] FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
But what I would like to have is:
FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
or ...
1 2 3 2 1
I am not so sure that duplicated is what I want, unless I didn't spot
something ... some other approach maybe?
Regards,
Ulrich
On Wed, Sep 15, 2010 at 9:08 AM, Lüthi David (XICD 1)
< [hidden email]> wrote:
> Ulrich,
> try duplicated(xts.object, ...) or possibly duplicated(as.data.frame(xts.object), ...) if all columns should be considered.
> Regards, david
>
> Original Message
> From: [hidden email] [mailto: [hidden email]] On Behalf Of Ulrich Staudinger
> Sent: Wednesday, September 15, 2010 8:28 AM
> To: rsigfinance
> Subject: [RSIGFinance] removing repeating values from xts series
>
> Hi fellows,
>
> I am facing a case that I cannot solve with my limited knowledge of R,
> unless I write the function myself  which I would like to avoid
> (reusing is better than reinventing the wheel). Following the relevant
> information.
>
> Input scenario:
> An xts time series object with duplicates, the object contains bid,
> bid volume, ask, ask volume.
> Example:
> 01012010 09:00:01 100 1 101 1
> 01012010 09:00:02 100 1 101 1
> 01012010 09:00:03 100 1 101 1
> 01012010 09:00:04 101 1 102 1
> 01012010 09:00:05 102 1 102 1
> 01012010 09:00:06 100 1 101 1
> ...
>
> Goal:
> A timeseries with only nonrepeating values, removing the duplicates
> in between the values.
>
> I tried "unique" already, but that one returns only the unique values
> from within the whole timeseries and not on a running base.
>
>
> Example code:
> The following example code exemplifies with a nonxts series what I
> want to achieve ...
>> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
>> removeDuplicates < function(input)
> {
> index = 2
> ret = c(input[1])
> for(i in 2:length(input))
> {
> if(input[i]!=input[i1])
> {
> ret[index] = input[i]
> index = index + 1
> }
> }
> ret
> }
>>
>> removeDuplicates(y)
> [1] 1 2 1 2 3 4 3 1
>>
>
>
>
> How can I make this with an xts series? Is there a function for this?
>
> Thanks in advance,
> with kind regards,
> Ulrich
>
> 
> Ulrich Staudinger
> activequant.org
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rsigfinance>  Subscriberposting only. If you want to post, subscribe first.
>  Also note that this is not the rhelp list where general R questions should go.
>

Ulrich Staudinger
activequant.org
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


Hi Ulrich
I see. Ad hoc I'd use rle (run length encoding) and some function of cumsum(rle(y)$lengths) to get indexes of nonduplicates.
Regards, david
Original Message
From: Ulrich Staudinger [mailto: [hidden email]]
Sent: Wednesday, September 15, 2010 9:25 AM
To: Lüthi David (XICD 1)
Cc: rsigfinance
Subject: Re: [RSIGFinance] removing repeating values from xts series
Hi David,
as far as I understand, duplicated works from the inner workings very
much like unique.
With a vector y (in this case no timeseries), duplicated yields:
> y
[1] 1 1 2 3 2 2 2 2 1
> duplicated(y)
[1] FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
But what I would like to have is:
FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
or ...
1 2 3 2 1
I am not so sure that duplicated is what I want, unless I didn't spot
something ... some other approach maybe?
Regards,
Ulrich
On Wed, Sep 15, 2010 at 9:08 AM, Lüthi David (XICD 1)
< [hidden email]> wrote:
> Ulrich,
> try duplicated(xts.object, ...) or possibly duplicated(as.data.frame(xts.object), ...) if all columns should be considered.
> Regards, david
>
> Original Message
> From: [hidden email] [mailto: [hidden email]] On Behalf Of Ulrich Staudinger
> Sent: Wednesday, September 15, 2010 8:28 AM
> To: rsigfinance
> Subject: [RSIGFinance] removing repeating values from xts series
>
> Hi fellows,
>
> I am facing a case that I cannot solve with my limited knowledge of R,
> unless I write the function myself  which I would like to avoid
> (reusing is better than reinventing the wheel). Following the relevant
> information.
>
> Input scenario:
> An xts time series object with duplicates, the object contains bid,
> bid volume, ask, ask volume.
> Example:
> 01012010 09:00:01 100 1 101 1
> 01012010 09:00:02 100 1 101 1
> 01012010 09:00:03 100 1 101 1
> 01012010 09:00:04 101 1 102 1
> 01012010 09:00:05 102 1 102 1
> 01012010 09:00:06 100 1 101 1
> ...
>
> Goal:
> A timeseries with only nonrepeating values, removing the duplicates
> in between the values.
>
> I tried "unique" already, but that one returns only the unique values
> from within the whole timeseries and not on a running base.
>
>
> Example code:
> The following example code exemplifies with a nonxts series what I
> want to achieve ...
>> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
>> removeDuplicates < function(input)
> {
> index = 2
> ret = c(input[1])
> for(i in 2:length(input))
> {
> if(input[i]!=input[i1])
> {
> ret[index] = input[i]
> index = index + 1
> }
> }
> ret
> }
>>
>> removeDuplicates(y)
> [1] 1 2 1 2 3 4 3 1
>>
>
>
>
> How can I make this with an xts series? Is there a function for this?
>
> Thanks in advance,
> with kind regards,
> Ulrich
>
> 
> Ulrich Staudinger
> activequant.org
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rsigfinance>  Subscriberposting only. If you want to post, subscribe first.
>  Also note that this is not the rhelp list where general R questions should go.
>

Ulrich Staudinger
activequant.org
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


So you want to compare
y[1,]
with
y[nrow(y),]
I think. And save the rows
that aren't all equal. Yes?
On 15/09/2010 08:25, Ulrich Staudinger wrote:
> Hi David,
>
> as far as I understand, duplicated works from the inner workings very
> much like unique.
>
> With a vector y (in this case no timeseries), duplicated yields:
>> y
> [1] 1 1 2 3 2 2 2 2 1
>> duplicated(y)
> [1] FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
>
>
> But what I would like to have is:
> FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
> or ...
> 1 2 3 2 1
>
>
> I am not so sure that duplicated is what I want, unless I didn't spot
> something ... some other approach maybe?
>
>
> Regards,
> Ulrich
>
>
>
>
> On Wed, Sep 15, 2010 at 9:08 AM, Lüthi David (XICD 1)
> < [hidden email]> wrote:
>> Ulrich,
>> try duplicated(xts.object, ...) or possibly duplicated(as.data.frame(xts.object), ...) if all columns should be considered.
>> Regards, david
>>
>> Original Message
>> From: [hidden email] [mailto: [hidden email]] On Behalf Of Ulrich Staudinger
>> Sent: Wednesday, September 15, 2010 8:28 AM
>> To: rsigfinance
>> Subject: [RSIGFinance] removing repeating values from xts series
>>
>> Hi fellows,
>>
>> I am facing a case that I cannot solve with my limited knowledge of R,
>> unless I write the function myself  which I would like to avoid
>> (reusing is better than reinventing the wheel). Following the relevant
>> information.
>>
>> Input scenario:
>> An xts time series object with duplicates, the object contains bid,
>> bid volume, ask, ask volume.
>> Example:
>> 01012010 09:00:01 100 1 101 1
>> 01012010 09:00:02 100 1 101 1
>> 01012010 09:00:03 100 1 101 1
>> 01012010 09:00:04 101 1 102 1
>> 01012010 09:00:05 102 1 102 1
>> 01012010 09:00:06 100 1 101 1
>> ...
>>
>> Goal:
>> A timeseries with only nonrepeating values, removing the duplicates
>> in between the values.
>>
>> I tried "unique" already, but that one returns only the unique values
>> from within the whole timeseries and not on a running base.
>>
>>
>> Example code:
>> The following example code exemplifies with a nonxts series what I
>> want to achieve ...
>>> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
>>> removeDuplicates< function(input)
>> {
>> index = 2
>> ret = c(input[1])
>> for(i in 2:length(input))
>> {
>> if(input[i]!=input[i1])
>> {
>> ret[index] = input[i]
>> index = index + 1
>> }
>> }
>> ret
>> }
>>>
>>> removeDuplicates(y)
>> [1] 1 2 1 2 3 4 3 1
>>>
>>
>>
>>
>> How can I make this with an xts series? Is there a function for this?
>>
>> Thanks in advance,
>> with kind regards,
>> Ulrich
>>
>> 
>> Ulrich Staudinger
>> activequant.org
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rsigfinance>>  Subscriberposting only. If you want to post, subscribe first.
>>  Also note that this is not the rhelp list where general R questions should go.
>>
>
>
>

Patrick Burns
[hidden email]
http://www.burnsstat.comhttp://www.portfolioprobe.com/blog_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


I want to compare
y(t) with y(t1)
where
t = 2... length(y)
y is an xts timeseries
On Wed, Sep 15, 2010 at 9:33 AM, Patrick Burns < [hidden email]> wrote:
> So you want to compare
>
> y[1,]
>
> with
>
> y[nrow(y),]
>
> I think. And save the rows
> that aren't all equal. Yes?
>

Ulrich Staudinger
activequant.org
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


I think diff and a logical operation on all four colums would help.
I hoped I would find a ready function for ...
Thanks ...
On Wed, Sep 15, 2010 at 9:46 AM, Ulrich Staudinger
< [hidden email]> wrote:
> I want to compare
>
> y(t) with y(t1)
> where
> t = 2... length(y)
> y is an xts timeseries
>
>
>
> On Wed, Sep 15, 2010 at 9:33 AM, Patrick Burns < [hidden email]> wrote:
>> So you want to compare
>>
>> y[1,]
>>
>> with
>>
>> y[nrow(y),]
>>
>> I think. And save the rows
>> that aren't all equal. Yes?
>>
>
>
>
>
> 
> Ulrich Staudinger
> activequant.org
>

Ulrich Staudinger
[hidden email]
http://www.activequant.org_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


Ulrich,
Patrick's suggestion is a vectorized solution to your problem. But it
won't work for xts objects because they are merged by time index
before the comparison.
You need to use lag:
> x < xts(cbind(c(100,100,100,101,102,100),
+ 1,c(101,101,101,102,102,101),1),
+ as.POSIXct("20100101 09:00:01")+0:5)
> x[!c(FALSE,apply(lag(x)==x,1,all)[1]),]
[,1] [,2] [,3] [,4]
20100101 09:00:01 100 1 101 1
20100101 09:00:04 101 1 102 1
20100101 09:00:05 102 1 102 1
20100101 09:00:06 100 1 101 1
Or you could use diff (as you suggest):
> x[!c(FALSE,apply(diff(x)==0,1,all)[1]),]
[,1] [,2] [,3] [,4]
20100101 09:00:01 100 1 101 1
20100101 09:00:04 101 1 102 1
20100101 09:00:05 102 1 102 1
20100101 09:00:06 100 1 101 1
Best,

Joshua Ulrich
FOSS Trading: www.fosstrading.com
On Wed, Sep 15, 2010 at 4:33 AM, Ulrich Staudinger
< [hidden email]> wrote:
> I think diff and a logical operation on all four colums would help.
> I hoped I would find a ready function for ...
> Thanks ...
>
> On Wed, Sep 15, 2010 at 9:46 AM, Ulrich Staudinger
> < [hidden email]> wrote:
>> I want to compare
>>
>> y(t) with y(t1)
>> where
>> t = 2... length(y)
>> y is an xts timeseries
>>
>>
>>
>> On Wed, Sep 15, 2010 at 9:33 AM, Patrick Burns < [hidden email]> wrote:
>>> So you want to compare
>>>
>>> y[1,]
>>>
>>> with
>>>
>>> y[nrow(y),]
>>>
>>> I think. And save the rows
>>> that aren't all equal. Yes?
>>>
>>
>>
>>
>>
>> 
>> Ulrich Staudinger
>> activequant.org
>>
>
>
>
> 
> Ulrich Staudinger
> [hidden email]
> http://www.activequant.org>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rsigfinance>  Subscriberposting only. If you want to post, subscribe first.
>  Also note that this is not the rhelp list where general R questions should go.
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


Thanks, that works very well. I'll check later about the performance
aspects of these two solutions, especially with large data sets.
Have a nice day!
On Wed, Sep 15, 2010 at 4:06 PM, Joshua Ulrich < [hidden email]> wrote:
> Ulrich,
>
> Patrick's suggestion is a vectorized solution to your problem. But it
> won't work for xts objects because they are merged by time index
> before the comparison.
>
> You need to use lag:
>
>> x < xts(cbind(c(100,100,100,101,102,100),
> + 1,c(101,101,101,102,102,101),1),
> + as.POSIXct("20100101 09:00:01")+0:5)
>> x[!c(FALSE,apply(lag(x)==x,1,all)[1]),]
> [,1] [,2] [,3] [,4]
> 20100101 09:00:01 100 1 101 1
> 20100101 09:00:04 101 1 102 1
> 20100101 09:00:05 102 1 102 1
> 20100101 09:00:06 100 1 101 1
>
> Or you could use diff (as you suggest):
>
>> x[!c(FALSE,apply(diff(x)==0,1,all)[1]),]
> [,1] [,2] [,3] [,4]
> 20100101 09:00:01 100 1 101 1
> 20100101 09:00:04 101 1 102 1
> 20100101 09:00:05 102 1 102 1
> 20100101 09:00:06 100 1 101 1
>
> Best,
> 
> Joshua Ulrich
> FOSS Trading: www.fosstrading.com
>
>
>
> On Wed, Sep 15, 2010 at 4:33 AM, Ulrich Staudinger
> < [hidden email]> wrote:
>> I think diff and a logical operation on all four colums would help.
>> I hoped I would find a ready function for ...
>> Thanks ...
>>
>> On Wed, Sep 15, 2010 at 9:46 AM, Ulrich Staudinger
>> < [hidden email]> wrote:
>>> I want to compare
>>>
>>> y(t) with y(t1)
>>> where
>>> t = 2... length(y)
>>> y is an xts timeseries
>>>
>>>
>>>
>>> On Wed, Sep 15, 2010 at 9:33 AM, Patrick Burns < [hidden email]> wrote:
>>>> So you want to compare
>>>>
>>>> y[1,]
>>>>
>>>> with
>>>>
>>>> y[nrow(y),]
>>>>
>>>> I think. And save the rows
>>>> that aren't all equal. Yes?
>>>>
>>>
>>>
>>>
>>>
>>> 
>>> Ulrich Staudinger
>>> activequant.org
>>>
>>
>>
>>
>> 
>> Ulrich Staudinger
>> [hidden email]
>> http://www.activequant.org>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rsigfinance>>  Subscriberposting only. If you want to post, subscribe first.
>>  Also note that this is not the rhelp list where general R questions should go.
>

Ulrich Staudinger
[hidden email]
http://www.activequant.org_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.

