How can I map "by" results to original list of indices or first difference of column of data.frame with two factors?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How can I map "by" results to original list of indices or first difference of column of data.frame with two factors?

Mikhail Titov-2
Hello!

I’m having stacked data in a data.frame with 2 factors, ordered POSIXct, and actual value as numeric (as if for lattice::xyplot).

I would like to calculate first difference using “diff” function within corresponding subsets/partitions. Since data.frame is organized by factors and has sorted dates, it seems like "by" is a good candidate for the job. However it returns just a dumb list of vectors.

It seems that I can use either expand.grid to remap results of "by" and hope that I won't mess up order, or I can use "unique(subset(x,select=c(foo,bar)))"

In overall it looks like quite many steps for such task not counting assignment of those differences back to original data.frame starting from 2nd position in each partition (as diff returns shorter vector).

Am I on the right track or is there an easier way to do that?

Mikhail

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How can I map "by" results to original list of indices or first difference of column of data.frame with two factors?

jholtman
If you would post a subset of your data so that we can see what you
are talking about, we could probably help you come up with a solution.

On Sat, Mar 3, 2012 at 7:50 PM, Mikhail Titov <[hidden email]> wrote:

> Hello!
>
> I’m having stacked data in a data.frame with 2 factors, ordered POSIXct, and actual value as numeric (as if for lattice::xyplot).
>
> I would like to calculate first difference using “diff” function within corresponding subsets/partitions. Since data.frame is organized by factors and has sorted dates, it seems like "by" is a good candidate for the job. However it returns just a dumb list of vectors.
>
> It seems that I can use either expand.grid to remap results of "by" and hope that I won't mess up order, or I can use "unique(subset(x,select=c(foo,bar)))"
>
> In overall it looks like quite many steps for such task not counting assignment of those differences back to original data.frame starting from 2nd position in each partition (as diff returns shorter vector).
>
> Am I on the right track or is there an easier way to do that?
>
> Mikhail
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How can I map "by" results to original list of indices or first difference of column of data.frame with two factors?

Michael Weylandt
It'd be doubly helpful if you could post desired output as well.

If you haven't seen it before, the easiest way to post R data is to
use the dput() function to get a plain-text (mailing list friendly)
representation. If your data is large, dput(head(DATA, 30)) should
suffice.

(We wouldn't want to clog those internet tubes...)

Michael

On Sat, Mar 3, 2012 at 8:55 PM, jim holtman <[hidden email]> wrote:

> If you would post a subset of your data so that we can see what you
> are talking about, we could probably help you come up with a solution.
>
> On Sat, Mar 3, 2012 at 7:50 PM, Mikhail Titov <[hidden email]> wrote:
>> Hello!
>>
>> I’m having stacked data in a data.frame with 2 factors, ordered POSIXct, and actual value as numeric (as if for lattice::xyplot).
>>
>> I would like to calculate first difference using “diff” function within corresponding subsets/partitions. Since data.frame is organized by factors and has sorted dates, it seems like "by" is a good candidate for the job. However it returns just a dumb list of vectors.
>>
>> It seems that I can use either expand.grid to remap results of "by" and hope that I won't mess up order, or I can use "unique(subset(x,select=c(foo,bar)))"
>>
>> In overall it looks like quite many steps for such task not counting assignment of those differences back to original data.frame starting from 2nd position in each partition (as diff returns shorter vector).
>>
>> Am I on the right track or is there an easier way to do that?
>>
>> Mikhail
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How can I map "by" results to original list of indices or first difference of column of data.frame with two factors?

Mikhail Titov-2
"R. Michael Weylandt" <[hidden email]> writes:

> It'd be doubly helpful if you could post desired output as well.

I beg alls pardon, I suddenly realized that in my case the solution is
trivial. Here is an example with a mock-up data.

Let's generate some data

#+begin_src R
qq <-
    expand.grid(
                day=seq(ISOdate(2011,1,1),ISOdate(2011,12,31),by='day'),
                bar=1:4,
                foo=factor(c('A','B','G','I'))
                )
ww <-
    within(qq,
           val <- bar * sin(as.double(day-day[1],"days")
                            / as.double(diff(range(day)),"days")
                            * 2*pi
                            + as.numeric(foo)/2
                            )
           )
#+end_src

We can take a look at it with

#+begin_src R :results graphics :exports both :file z.png
library(lattice)
xyplot(val~day|foo,ww,group=ww$bar, type='l')
#+end_src

Now since we ditch first element in each partition anyway,
we can apply diff on entire data set at once.
Then we should ditch very first element in each partition.

#+begin_src R
ww[-1,"diff"] <- diff(ww$val)
ee <- subset(ww, day>ISOdate(2011,1,1))
#+end_src

And a final result

#+begin_src R :results graphics :exports both :file x.png
xyplot(diff~day|foo,ee,group=ee$bar, type='l')
#+end_src

> If you haven't seen it before, the easiest way to post R data is to
> use the dput() function to get a plain-text (mailing list friendly)
> representation. If your data is large, dput(head(DATA, 30)) should
> suffice.
>
> (We wouldn't want to clog those internet tubes...)
>
> Michael
>
> On Sat, Mar 3, 2012 at 8:55 PM, jim holtman <[hidden email]> wrote:
>> If you would post a subset of your data so that we can see what you
>> are talking about, we could probably help you come up with a solution.
>>
>> On Sat, Mar 3, 2012 at 7:50 PM, Mikhail Titov <[hidden email]> wrote:
>>> Hello!
>>>
>>> I’m having stacked data in a data.frame with 2 factors, ordered POSIXct, and actual value as numeric (as if for lattice::xyplot).
>>>
>>> I would like to calculate first difference using “diff” function
>>> within corresponding subsets/partitions. Since data.frame is
>>> organized by factors and has sorted dates, it seems like "by" is a
>>> good candidate for the job. However it returns just a dumb list of
>>> vectors.
>>>
>>> It seems that I can use either expand.grid to remap results of "by" and hope that I won't mess up order, or I can use "unique(subset(x,select=c(foo,bar)))"
>>>
>>> In overall it looks like quite many steps for such task not
>>> counting assignment of those differences back to original
>>> data.frame starting from 2nd position in each partition (as diff
>>> returns shorter vector).
>>>
>>> Am I on the right track or is there an easier way to do that?
>>>
>>> Mikhail
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
--
Mikhail

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.