

Hi fellows,
I am facing a case that I cannot solve with my limited knowledge of R,
unless I write the function myself  which I would like to avoid
(reusing is better than reinventing the wheel). Following the relevant
information.
Input scenario:
An xts time series object with duplicates, the object contains bid,
bid volume, ask, ask volume.
Example:
01012010 09:00:01 100 1 101 1
01012010 09:00:02 100 1 101 1
01012010 09:00:03 100 1 101 1
01012010 09:00:04 101 1 102 1
01012010 09:00:05 102 1 102 1
01012010 09:00:06 100 1 101 1
...
Goal:
A timeseries with only nonrepeating values, removing the duplicates
in between the values.
I tried "unique" already, but that one returns only the unique values
from within the whole timeseries and not on a running base.
Example code:
The following example code exemplifies with a nonxts series what I
want to achieve ...
> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
> removeDuplicates < function(input)
{
index = 2
ret = c(input[1])
for(i in 2:length(input))
{
if(input[i]!=input[i1])
{
ret[index] = input[i]
index = index + 1
}
}
ret
}
>
> removeDuplicates(y)
[1] 1 2 1 2 3 4 3 1
>
How can I make this with an xts series? Is there a function for this?
Thanks in advance,
with kind regards,
Ulrich

Ulrich Staudinger
activequant.org
Ulrich,
try duplicated(xts.object, ...) or possibly duplicated(as.data.frame(xts.object), ...) if all columns should be considered.
Regards, david
Regards, david
Hi David,
as far as I understand, duplicated works from the inner workings very
much like unique.
With a vector y (in this case no timeseries), duplicated yields:
> y
[1] 1 1 2 3 2 2 2 2 1
> duplicated(y)
[1] FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
But what I would like to have is:
FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
or ...
1 2 3 2 1
I am not so sure that duplicated is what I want, unless I didn't spot
something ... some other approach maybe?
Regards,
Ulrich
Ulrich Staudinger
activequant.org
Hi Ulrich
I see. Ad hoc I'd use rle (run length encoding) and some function of cumsum(rle(y)$lengths) to get indexes of nonduplicates.
Regards, david
Ulrich Staudinger
activequant.org
So you want to compare
y[1,]
with
y[nrow(y),]
I think. And save the rows
that aren't all equal. Yes?
Patrick Burns
[hidden email]
I want to compare
y(t) with y(t1)
where
t = 2... length(y)
y is an xts timeseries
Ulrich Staudinger
activequant.org
I think diff and a logical operation on all four colums would help.
I hoped I would find a ready function for ...
Thanks ...
Ulrich Staudinger
[hidden email]
Ulrich,
Patrick's suggestion is a vectorized solution to your problem. But it
won't work for xts objects because they are merged by time index
before the comparison.
You need to use lag:
> x < xts(cbind(c(100,100,100,101,102,100),
+ 1,c(101,101,101,102,102,101),1),
+ as.POSIXct("20100101 09:00:01")+0:5)
> x[!c(FALSE,apply(lag(x)==x,1,all)[1]),]
[,1] [,2] [,3] [,4]
20100101 09:00:01 100 1 101 1
20100101 09:00:04 101 1 102 1
20100101 09:00:05 102 1 102 1
20100101 09:00:06 100 1 101 1
Or you could use diff (as you suggest):
> x[!c(FALSE,apply(diff(x)==0,1,all)[1]),]
[,1] [,2] [,3] [,4]
20100101 09:00:01 100 1 101 1
20100101 09:00:04 101 1 102 1
20100101 09:00:05 102 1 102 1
20100101 09:00:06 100 1 101 1
Best,

Joshua Ulrich
FOSS Trading: www.fosstrading.com
Thanks, that works very well. I'll check later about the performance
aspects of these two solutions, especially with large data sets.
Have a nice day!
Ulrich Staudinger
[hidden email]
