removing repeating values from xts series

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

removing repeating values from xts series

Ulrich Staudinger-2
Hi fellows,

I am facing a case that I cannot solve with my limited knowledge of R,
unless I write the function myself - which I would like to avoid
(reusing is better than reinventing the wheel). Following the relevant
information.

Input scenario:
An xts time series object with duplicates, the object contains bid,
bid volume, ask, ask volume.
Example:
01-01-2010 09:00:01 100 1 101 1
01-01-2010 09:00:02 100 1 101 1
01-01-2010 09:00:03 100 1 101 1
01-01-2010 09:00:04 101 1 102 1
01-01-2010 09:00:05 102 1 102 1
01-01-2010 09:00:06 100 1 101 1
...

Goal:
A timeseries with only non-repeating values, removing the duplicates
in between the values.

I tried "unique" already, but that one returns only the unique values
from within the whole timeseries and not on a running base.


Example code:
The following example code exemplifies with a non-xts series what I
want to achieve ...
> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
> removeDuplicates <- function(input)
{
        index = 2
        ret = c(input[1])
        for(i in 2:length(input))
        {
                if(input[i]!=input[i-1])
                {
                        ret[index] = input[i]
                        index = index + 1
                }
        }
        ret
}
>
> removeDuplicates(y)
[1] 1 2 1 2 3 4 3 1
>



How can I make this with an xts series? Is there a function for this?

Thanks in advance,
with kind regards,
Ulrich

--
Ulrich Staudinger
activequant.org

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: removing repeating values from xts series

Lüthi David (XICD 1)
Ulrich,
try duplicated(xts.object, ...) or possibly duplicated(as.data.frame(xts.object), ...) if all columns should be considered.
Regards, david

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Ulrich Staudinger
Sent: Wednesday, September 15, 2010 8:28 AM
To: r-sig-finance
Subject: [R-SIG-Finance] removing repeating values from xts series

Hi fellows,

I am facing a case that I cannot solve with my limited knowledge of R,
unless I write the function myself - which I would like to avoid
(reusing is better than reinventing the wheel). Following the relevant
information.

Input scenario:
An xts time series object with duplicates, the object contains bid,
bid volume, ask, ask volume.
Example:
01-01-2010 09:00:01 100 1 101 1
01-01-2010 09:00:02 100 1 101 1
01-01-2010 09:00:03 100 1 101 1
01-01-2010 09:00:04 101 1 102 1
01-01-2010 09:00:05 102 1 102 1
01-01-2010 09:00:06 100 1 101 1
...

Goal:
A timeseries with only non-repeating values, removing the duplicates
in between the values.

I tried "unique" already, but that one returns only the unique values
from within the whole timeseries and not on a running base.


Example code:
The following example code exemplifies with a non-xts series what I
want to achieve ...
> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
> removeDuplicates <- function(input)
{
        index = 2
        ret = c(input[1])
        for(i in 2:length(input))
        {
                if(input[i]!=input[i-1])
                {
                        ret[index] = input[i]
                        index = index + 1
                }
        }
        ret
}
>
> removeDuplicates(y)
[1] 1 2 1 2 3 4 3 1
>



How can I make this with an xts series? Is there a function for this?

Thanks in advance,
with kind regards,
Ulrich

--
Ulrich Staudinger
activequant.org

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: removing repeating values from xts series

Ulrich Staudinger-2
Hi David,

as far as I understand, duplicated works from the inner workings very
much like unique.

With a vector y (in this case no timeseries), duplicated yields:
> y
[1] 1 1 2 3 2 2 2 2 1
> duplicated(y)
[1] FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE


But what I would like to have is:
FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
or ...
1 2 3 2 1


I am not so sure that duplicated is what I want, unless I didn't spot
something ... some other approach maybe?


Regards,
Ulrich




On Wed, Sep 15, 2010 at 9:08 AM, Lüthi David (XICD 1)
<[hidden email]> wrote:

> Ulrich,
> try duplicated(xts.object, ...) or possibly duplicated(as.data.frame(xts.object), ...) if all columns should be considered.
> Regards, david
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Ulrich Staudinger
> Sent: Wednesday, September 15, 2010 8:28 AM
> To: r-sig-finance
> Subject: [R-SIG-Finance] removing repeating values from xts series
>
> Hi fellows,
>
> I am facing a case that I cannot solve with my limited knowledge of R,
> unless I write the function myself - which I would like to avoid
> (reusing is better than reinventing the wheel). Following the relevant
> information.
>
> Input scenario:
> An xts time series object with duplicates, the object contains bid,
> bid volume, ask, ask volume.
> Example:
> 01-01-2010 09:00:01     100     1       101     1
> 01-01-2010 09:00:02     100     1       101     1
> 01-01-2010 09:00:03     100     1       101     1
> 01-01-2010 09:00:04     101     1       102     1
> 01-01-2010 09:00:05     102     1       102     1
> 01-01-2010 09:00:06     100     1       101     1
> ...
>
> Goal:
> A timeseries with only non-repeating values, removing the duplicates
> in between the values.
>
> I tried "unique" already, but that one returns only the unique values
> from within the whole timeseries and not on a running base.
>
>
> Example code:
> The following example code exemplifies with a non-xts series what I
> want to achieve ...
>> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
>> removeDuplicates <- function(input)
> {
>        index = 2
>        ret = c(input[1])
>        for(i in 2:length(input))
>        {
>                if(input[i]!=input[i-1])
>                {
>                        ret[index] = input[i]
>                        index = index + 1
>                }
>        }
>        ret
> }
>>
>> removeDuplicates(y)
> [1] 1 2 1 2 3 4 3 1
>>
>
>
>
> How can I make this with an xts series? Is there a function for this?
>
> Thanks in advance,
> with kind regards,
> Ulrich
>
> --
> Ulrich Staudinger
> activequant.org
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>



--
Ulrich Staudinger
activequant.org

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: removing repeating values from xts series

Lüthi David (XICD 1)
Hi Ulrich
I see. Ad hoc I'd use rle (run length encoding) and some function of cumsum(rle(y)$lengths) to get indexes of non-duplicates.
Regards, david

-----Original Message-----
From: Ulrich Staudinger [mailto:[hidden email]]
Sent: Wednesday, September 15, 2010 9:25 AM
To: Lüthi David (XICD 1)
Cc: r-sig-finance
Subject: Re: [R-SIG-Finance] removing repeating values from xts series

Hi David,

as far as I understand, duplicated works from the inner workings very
much like unique.

With a vector y (in this case no timeseries), duplicated yields:
> y
[1] 1 1 2 3 2 2 2 2 1
> duplicated(y)
[1] FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE


But what I would like to have is:
FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
or ...
1 2 3 2 1


I am not so sure that duplicated is what I want, unless I didn't spot
something ... some other approach maybe?


Regards,
Ulrich




On Wed, Sep 15, 2010 at 9:08 AM, Lüthi David (XICD 1)
<[hidden email]> wrote:

> Ulrich,
> try duplicated(xts.object, ...) or possibly duplicated(as.data.frame(xts.object), ...) if all columns should be considered.
> Regards, david
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Ulrich Staudinger
> Sent: Wednesday, September 15, 2010 8:28 AM
> To: r-sig-finance
> Subject: [R-SIG-Finance] removing repeating values from xts series
>
> Hi fellows,
>
> I am facing a case that I cannot solve with my limited knowledge of R,
> unless I write the function myself - which I would like to avoid
> (reusing is better than reinventing the wheel). Following the relevant
> information.
>
> Input scenario:
> An xts time series object with duplicates, the object contains bid,
> bid volume, ask, ask volume.
> Example:
> 01-01-2010 09:00:01     100     1       101     1
> 01-01-2010 09:00:02     100     1       101     1
> 01-01-2010 09:00:03     100     1       101     1
> 01-01-2010 09:00:04     101     1       102     1
> 01-01-2010 09:00:05     102     1       102     1
> 01-01-2010 09:00:06     100     1       101     1
> ...
>
> Goal:
> A timeseries with only non-repeating values, removing the duplicates
> in between the values.
>
> I tried "unique" already, but that one returns only the unique values
> from within the whole timeseries and not on a running base.
>
>
> Example code:
> The following example code exemplifies with a non-xts series what I
> want to achieve ...
>> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
>> removeDuplicates <- function(input)
> {
>        index = 2
>        ret = c(input[1])
>        for(i in 2:length(input))
>        {
>                if(input[i]!=input[i-1])
>                {
>                        ret[index] = input[i]
>                        index = index + 1
>                }
>        }
>        ret
> }
>>
>> removeDuplicates(y)
> [1] 1 2 1 2 3 4 3 1
>>
>
>
>
> How can I make this with an xts series? Is there a function for this?
>
> Thanks in advance,
> with kind regards,
> Ulrich
>
> --
> Ulrich Staudinger
> activequant.org
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>



--
Ulrich Staudinger
activequant.org

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: removing repeating values from xts series

Patrick Burns-2
In reply to this post by Ulrich Staudinger-2
So you want to compare

y[-1,]

with

y[-nrow(y),]

I think.  And save the rows
that aren't all equal.  Yes?

On 15/09/2010 08:25, Ulrich Staudinger wrote:

> Hi David,
>
> as far as I understand, duplicated works from the inner workings very
> much like unique.
>
> With a vector y (in this case no timeseries), duplicated yields:
>> y
> [1] 1 1 2 3 2 2 2 2 1
>> duplicated(y)
> [1] FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
>
>
> But what I would like to have is:
> FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
> or ...
> 1 2 3 2 1
>
>
> I am not so sure that duplicated is what I want, unless I didn't spot
> something ... some other approach maybe?
>
>
> Regards,
> Ulrich
>
>
>
>
> On Wed, Sep 15, 2010 at 9:08 AM, Lüthi David (XICD 1)
> <[hidden email]>  wrote:
>> Ulrich,
>> try duplicated(xts.object, ...) or possibly duplicated(as.data.frame(xts.object), ...) if all columns should be considered.
>> Regards, david
>>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]] On Behalf Of Ulrich Staudinger
>> Sent: Wednesday, September 15, 2010 8:28 AM
>> To: r-sig-finance
>> Subject: [R-SIG-Finance] removing repeating values from xts series
>>
>> Hi fellows,
>>
>> I am facing a case that I cannot solve with my limited knowledge of R,
>> unless I write the function myself - which I would like to avoid
>> (reusing is better than reinventing the wheel). Following the relevant
>> information.
>>
>> Input scenario:
>> An xts time series object with duplicates, the object contains bid,
>> bid volume, ask, ask volume.
>> Example:
>> 01-01-2010 09:00:01     100     1       101     1
>> 01-01-2010 09:00:02     100     1       101     1
>> 01-01-2010 09:00:03     100     1       101     1
>> 01-01-2010 09:00:04     101     1       102     1
>> 01-01-2010 09:00:05     102     1       102     1
>> 01-01-2010 09:00:06     100     1       101     1
>> ...
>>
>> Goal:
>> A timeseries with only non-repeating values, removing the duplicates
>> in between the values.
>>
>> I tried "unique" already, but that one returns only the unique values
>> from within the whole timeseries and not on a running base.
>>
>>
>> Example code:
>> The following example code exemplifies with a non-xts series what I
>> want to achieve ...
>>> y = c(1,1,2,2,1,1,1,2,3,4,3,3,3,3,3,1)
>>> removeDuplicates<- function(input)
>> {
>>         index = 2
>>         ret = c(input[1])
>>         for(i in 2:length(input))
>>         {
>>                 if(input[i]!=input[i-1])
>>                 {
>>                         ret[index] = input[i]
>>                         index = index + 1
>>                 }
>>         }
>>         ret
>> }
>>>
>>> removeDuplicates(y)
>> [1] 1 2 1 2 3 4 3 1
>>>
>>
>>
>>
>> How can I make this with an xts series? Is there a function for this?
>>
>> Thanks in advance,
>> with kind regards,
>> Ulrich
>>
>> --
>> Ulrich Staudinger
>> activequant.org
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions should go.
>>
>
>
>

--
Patrick Burns
[hidden email]
http://www.burns-stat.com
http://www.portfolioprobe.com/blog

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: removing repeating values from xts series

Ulrich Staudinger-2
I want to compare

y(t) with y(t-1)
where
t = 2... length(y)
y is an xts timeseries



On Wed, Sep 15, 2010 at 9:33 AM, Patrick Burns <[hidden email]> wrote:

> So you want to compare
>
> y[-1,]
>
> with
>
> y[-nrow(y),]
>
> I think.  And save the rows
> that aren't all equal.  Yes?
>




--
Ulrich Staudinger
activequant.org

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: removing repeating values from xts series

Ulrich Staudinger-2
I think diff and a logical operation on all four colums would help.
I hoped I would find a ready function for ...
Thanks ...

On Wed, Sep 15, 2010 at 9:46 AM, Ulrich Staudinger
<[hidden email]> wrote:

> I want to compare
>
> y(t) with y(t-1)
> where
> t = 2... length(y)
> y is an xts timeseries
>
>
>
> On Wed, Sep 15, 2010 at 9:33 AM, Patrick Burns <[hidden email]> wrote:
>> So you want to compare
>>
>> y[-1,]
>>
>> with
>>
>> y[-nrow(y),]
>>
>> I think.  And save the rows
>> that aren't all equal.  Yes?
>>
>
>
>
>
> --
> Ulrich Staudinger
> activequant.org
>



--
Ulrich Staudinger
[hidden email]
http://www.activequant.org

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: removing repeating values from xts series

Joshua Ulrich
Ulrich,

Patrick's suggestion is a vectorized solution to your problem.  But it
won't work for xts objects because they are merged by time index
before the comparison.

You need to use lag:

> x <- xts(cbind(c(100,100,100,101,102,100),
+   1,c(101,101,101,102,102,101),1),
+   as.POSIXct("2010-01-01 09:00:01")+0:5)
> x[!c(FALSE,apply(lag(x)==x,1,all)[-1]),]
                    [,1] [,2] [,3] [,4]
2010-01-01 09:00:01  100    1  101    1
2010-01-01 09:00:04  101    1  102    1
2010-01-01 09:00:05  102    1  102    1
2010-01-01 09:00:06  100    1  101    1

Or you could use diff (as you suggest):

> x[!c(FALSE,apply(diff(x)==0,1,all)[-1]),]
                    [,1] [,2] [,3] [,4]
2010-01-01 09:00:01  100    1  101    1
2010-01-01 09:00:04  101    1  102    1
2010-01-01 09:00:05  102    1  102    1
2010-01-01 09:00:06  100    1  101    1

Best,
--
Joshua Ulrich
FOSS Trading: www.fosstrading.com



On Wed, Sep 15, 2010 at 4:33 AM, Ulrich Staudinger
<[hidden email]> wrote:

> I think diff and a logical operation on all four colums would help.
> I hoped I would find a ready function for ...
> Thanks ...
>
> On Wed, Sep 15, 2010 at 9:46 AM, Ulrich Staudinger
> <[hidden email]> wrote:
>> I want to compare
>>
>> y(t) with y(t-1)
>> where
>> t = 2... length(y)
>> y is an xts timeseries
>>
>>
>>
>> On Wed, Sep 15, 2010 at 9:33 AM, Patrick Burns <[hidden email]> wrote:
>>> So you want to compare
>>>
>>> y[-1,]
>>>
>>> with
>>>
>>> y[-nrow(y),]
>>>
>>> I think.  And save the rows
>>> that aren't all equal.  Yes?
>>>
>>
>>
>>
>>
>> --
>> Ulrich Staudinger
>> activequant.org
>>
>
>
>
> --
> Ulrich Staudinger
> [hidden email]
> http://www.activequant.org
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: removing repeating values from xts series

Ulrich Staudinger-2
Thanks, that works very well. I'll check later about the performance
aspects of these two solutions, especially with large data sets.
Have a nice day!


On Wed, Sep 15, 2010 at 4:06 PM, Joshua Ulrich <[hidden email]> wrote:

> Ulrich,
>
> Patrick's suggestion is a vectorized solution to your problem.  But it
> won't work for xts objects because they are merged by time index
> before the comparison.
>
> You need to use lag:
>
>> x <- xts(cbind(c(100,100,100,101,102,100),
> +   1,c(101,101,101,102,102,101),1),
> +   as.POSIXct("2010-01-01 09:00:01")+0:5)
>> x[!c(FALSE,apply(lag(x)==x,1,all)[-1]),]
>                    [,1] [,2] [,3] [,4]
> 2010-01-01 09:00:01  100    1  101    1
> 2010-01-01 09:00:04  101    1  102    1
> 2010-01-01 09:00:05  102    1  102    1
> 2010-01-01 09:00:06  100    1  101    1
>
> Or you could use diff (as you suggest):
>
>> x[!c(FALSE,apply(diff(x)==0,1,all)[-1]),]
>                    [,1] [,2] [,3] [,4]
> 2010-01-01 09:00:01  100    1  101    1
> 2010-01-01 09:00:04  101    1  102    1
> 2010-01-01 09:00:05  102    1  102    1
> 2010-01-01 09:00:06  100    1  101    1
>
> Best,
> --
> Joshua Ulrich
> FOSS Trading: www.fosstrading.com
>
>
>
> On Wed, Sep 15, 2010 at 4:33 AM, Ulrich Staudinger
> <[hidden email]> wrote:
>> I think diff and a logical operation on all four colums would help.
>> I hoped I would find a ready function for ...
>> Thanks ...
>>
>> On Wed, Sep 15, 2010 at 9:46 AM, Ulrich Staudinger
>> <[hidden email]> wrote:
>>> I want to compare
>>>
>>> y(t) with y(t-1)
>>> where
>>> t = 2... length(y)
>>> y is an xts timeseries
>>>
>>>
>>>
>>> On Wed, Sep 15, 2010 at 9:33 AM, Patrick Burns <[hidden email]> wrote:
>>>> So you want to compare
>>>>
>>>> y[-1,]
>>>>
>>>> with
>>>>
>>>> y[-nrow(y),]
>>>>
>>>> I think.  And save the rows
>>>> that aren't all equal.  Yes?
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> Ulrich Staudinger
>>> activequant.org
>>>
>>
>>
>>
>> --
>> Ulrich Staudinger
>> [hidden email]
>> http://www.activequant.org
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions should go.
>



--
Ulrich Staudinger
[hidden email]
http://www.activequant.org

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.