Yearly aggregates and matrices

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Yearly aggregates and matrices

mdvaan
Hi,

I need to perform calculations on subsets of a data frame:

DF = data.frame(read.table(textConnection("    A  B  C  D  E  F
1 a  1995  0  4  1
2 a  1997  1  1  3
3 b  1995  3  7  0
4 b  1996  1  2  3
5 b  1997  1  2  3
6 b  1998  6  0  0
7 b  1999  3  7  0
8 c  1997  1  2  3
9 c  1998  1  2  3
10 c  1999  6  0  0
11 d  1999  3  7  0
12 e  1995  1  2  3
13 e  1998  1  2  3
14 e  1999  6  0  0"),head=TRUE,stringsAsFactors=FALSE))

I'd like to create new dataframes for each unique year in which for each value of A, the values of D, E and F are summed over the last 3 years (e.g. 1998 = 1998, 1997, 1996):
Question 1: How do I go from DF to newDFyear?

Examples:

newDF1995
B  D  E  F
a  0  4  1
b  3  7  0
e  1  2  3

newDF1998
B  D  E  F
a  1  1  3
b  8  4  6
c  2  4  6
e  1  2  3

Then, for each new DF I need to generate a square matrix after doing the following:

newDF1998$G<-newDF1998$D + newDF1998$E + newDF1998$F
newDF1998$D<-newDF1998$D/newDF1998$G
newDF1998$E<-newDF1998$E/newDF1998$G
newDF1998$F<-newDF1998$F/newDF1998$G
newDF1998<-NewDF1998[,c(-5)]

newDF1998
B  D  E  F
a  0.2  0.2  0.6
b  0.4  0.2  0.3
c  0.2  0.3  0.5
e  0.2  0.3  0.5

Question 2: How do I go from newDF1998 to a matrix

  a  b  c  e
a
b
c
e

in which Cell ab = (0.2*0.4 + 0.2*0.2 + 0.6*0.3)/((0.2*0.2 + 0.2*0.2 + 0.6*0.6)^0.5) * ((0.4*0.4 + 0.2*0.2 + 0.3*0.3)^0.5) = 0.84

Thanks a lot for your help!
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

Gabor Grothendieck
On Sat, Apr 9, 2011 at 5:14 AM, mathijsdevaan <[hidden email]> wrote:

> Hi,
>
> I need to perform calculations on subsets of a data frame:
>
> DF = data.frame(read.table(textConnection("    A  B  C  D  E  F
> 1 a  1995  0  4  1
> 2 a  1997  1  1  3
> 3 b  1995  3  7  0
> 4 b  1996  1  2  3
> 5 b  1997  1  2  3
> 6 b  1998  6  0  0
> 7 b  1999  3  7  0
> 8 c  1997  1  2  3
> 9 c  1998  1  2  3
> 10 c  1999  6  0  0
> 11 d  1999  3  7  0
> 12 e  1995  1  2  3
> 13 e  1998  1  2  3
> 14 e  1999  6  0  0"),head=TRUE,stringsAsFactors=FALSE))
>
> I'd like to create new dataframes for each unique year in which for each
> value of A, the values of D, E and F are summed over the last 3 years (e.g.
> 1998 = 1998, 1997, 1996):
> Question 1: How do I go from DF to newDFyear?
>
> Examples:
>
> newDF1995
> B  D  E  F
> a  0  4  1
> b  3  7  0
> e  1  2  3
>
> newDF1998
> B  D  E  F
> a  1  1  3
> b  8  4  6
> c  2  4  6
> e  1  2  3
>
> Then, for each new DF I need to generate a square matrix after doing the
> following:
>
> newDF1998$G<-newDF1998$D + newDF1998$E + newDF1998$F
> newDF1998$D<-newDF1998$D/newDF1998$G
> newDF1998$E<-newDF1998$E/newDF1998$G
> newDF1998$F<-newDF1998$F/newDF1998$G
> newDF1998<-NewDF1998[,c(-5)]
>
> newDF1998
> B  D  E  F
> a  0.2  0.2  0.6
> b  0.4  0.2  0.3
> c  0.2  0.3  0.5
> e  0.2  0.3  0.5
>
> Question 2: How do I go from newDF1998 to a matrix
>
>  a  b  c  e
> a
> b
> c
> e
>
> in which Cell ab = (0.2*0.4 + 0.2*0.2 + 0.6*0.3)/((0.2*0.2 + 0.2*0.2 +
> 0.6*0.6)^0.5) * ((0.4*0.4 + 0.2*0.2 + 0.3*0.3)^0.5) = 0.84

First we use read.zoo to reform DF into a multivariate time series and
use rollapply (where we have used the devel version of zoo since it
supports the partial= argument on rollapply).  We then reform each
resulting row into a matrix converting each row of each matrix to
proportions.  Finally we form the desired scaled cross product.

# devel version of zoo
install.packages("zoo", repos = "http://r-forge.r-project.org")
library(zoo)

z <- read.zoo(DF, split = 2, index = 3, FUN = identity)

sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)

newDF <- lapply(1:nrow(r), function(i)
        prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE,
                dimnames = list(unique(DF$B), names(DF)[-2:-3]))[, -1]), 1))
names(newDF) <- time(z)

lapply(mats, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

Gabor Grothendieck
On Sat, Apr 9, 2011 at 11:45 PM, Gabor Grothendieck
<[hidden email]> wrote:

> On Sat, Apr 9, 2011 at 5:14 AM, mathijsdevaan <[hidden email]> wrote:
>> Hi,
>>
>> I need to perform calculations on subsets of a data frame:
>>
>> DF = data.frame(read.table(textConnection("    A  B  C  D  E  F
>> 1 a  1995  0  4  1
>> 2 a  1997  1  1  3
>> 3 b  1995  3  7  0
>> 4 b  1996  1  2  3
>> 5 b  1997  1  2  3
>> 6 b  1998  6  0  0
>> 7 b  1999  3  7  0
>> 8 c  1997  1  2  3
>> 9 c  1998  1  2  3
>> 10 c  1999  6  0  0
>> 11 d  1999  3  7  0
>> 12 e  1995  1  2  3
>> 13 e  1998  1  2  3
>> 14 e  1999  6  0  0"),head=TRUE,stringsAsFactors=FALSE))
>>
>> I'd like to create new dataframes for each unique year in which for each
>> value of A, the values of D, E and F are summed over the last 3 years (e.g.
>> 1998 = 1998, 1997, 1996):
>> Question 1: How do I go from DF to newDFyear?
>>
>> Examples:
>>
>> newDF1995
>> B  D  E  F
>> a  0  4  1
>> b  3  7  0
>> e  1  2  3
>>
>> newDF1998
>> B  D  E  F
>> a  1  1  3
>> b  8  4  6
>> c  2  4  6
>> e  1  2  3
>>
>> Then, for each new DF I need to generate a square matrix after doing the
>> following:
>>
>> newDF1998$G<-newDF1998$D + newDF1998$E + newDF1998$F
>> newDF1998$D<-newDF1998$D/newDF1998$G
>> newDF1998$E<-newDF1998$E/newDF1998$G
>> newDF1998$F<-newDF1998$F/newDF1998$G
>> newDF1998<-NewDF1998[,c(-5)]
>>
>> newDF1998
>> B  D  E  F
>> a  0.2  0.2  0.6
>> b  0.4  0.2  0.3
>> c  0.2  0.3  0.5
>> e  0.2  0.3  0.5
>>
>> Question 2: How do I go from newDF1998 to a matrix
>>
>>  a  b  c  e
>> a
>> b
>> c
>> e
>>
>> in which Cell ab = (0.2*0.4 + 0.2*0.2 + 0.6*0.3)/((0.2*0.2 + 0.2*0.2 +
>> 0.6*0.6)^0.5) * ((0.4*0.4 + 0.2*0.2 + 0.3*0.3)^0.5) = 0.84
>
> First we use read.zoo to reform DF into a multivariate time series and
> use rollapply (where we have used the devel version of zoo since it
> supports the partial= argument on rollapply).  We then reform each
> resulting row into a matrix converting each row of each matrix to
> proportions.  Finally we form the desired scaled cross product.
>
> # devel version of zoo
> install.packages("zoo", repos = "http://r-forge.r-project.org")
> library(zoo)
>
> z <- read.zoo(DF, split = 2, index = 3, FUN = identity)
>
> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
> r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)
>
> newDF <- lapply(1:nrow(r), function(i)
>        prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE,
>                dimnames = list(unique(DF$B), names(DF)[-2:-3]))[, -1]), 1))
> names(newDF) <- time(z)
>
> lapply(mats, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))

mats in the last line should be newDF:

lapply(newDF, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

mdvaan
This post has NOT been accepted by the mailing list yet.
Thanks Gabor, but I get the following error:

library(zoo)
source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/rollapply.R?root=zoo")

DF = data.frame(read.table(textConnection("    A  B  C  D  E  F
1 a  1995  0  4  1
2 a  1997  1  1  3
3 b  1995  3  7  0
4 b  1996  1  2  3
5 b  1997  1  2  3
6 b  1998  6  0  0
7 b  1999  3  7  0
8 c  1997  1  2  3
9 c  1998  1  2  3
10 c  1999  6  0  0
11 d  1999  3  7  0
12 e  1995  1  2  3
13 e  1998  1  2  3
14 e  1999  6  0  0"),head=TRUE,stringsAsFactors=FALSE))

z <- read.zoo(DF, split = 2, index = 3, FUN = identity)
sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA

Error in rollapply.zoo(data[, j], width = width, FUN = FUN, ..., by = by,  :
  could not find function "na.fill"

What's going wrong here? Thanks!
Gabor Grothendieck wrote
On Sat, Apr 9, 2011 at 11:45 PM, Gabor Grothendieck
<[hidden email]> wrote:
> On Sat, Apr 9, 2011 at 5:14 AM, mathijsdevaan <[hidden email]> wrote:
>> Hi,
>>
>> I need to perform calculations on subsets of a data frame:
>>
>> DF = data.frame(read.table(textConnection("    A  B  C  D  E  F
>> 1 a  1995  0  4  1
>> 2 a  1997  1  1  3
>> 3 b  1995  3  7  0
>> 4 b  1996  1  2  3
>> 5 b  1997  1  2  3
>> 6 b  1998  6  0  0
>> 7 b  1999  3  7  0
>> 8 c  1997  1  2  3
>> 9 c  1998  1  2  3
>> 10 c  1999  6  0  0
>> 11 d  1999  3  7  0
>> 12 e  1995  1  2  3
>> 13 e  1998  1  2  3
>> 14 e  1999  6  0  0"),head=TRUE,stringsAsFactors=FALSE))
>>
>> I'd like to create new dataframes for each unique year in which for each
>> value of A, the values of D, E and F are summed over the last 3 years (e.g.
>> 1998 = 1998, 1997, 1996):
>> Question 1: How do I go from DF to newDFyear?
>>
>> Examples:
>>
>> newDF1995
>> B  D  E  F
>> a  0  4  1
>> b  3  7  0
>> e  1  2  3
>>
>> newDF1998
>> B  D  E  F
>> a  1  1  3
>> b  8  4  6
>> c  2  4  6
>> e  1  2  3
>>
>> Then, for each new DF I need to generate a square matrix after doing the
>> following:
>>
>> newDF1998$G<-newDF1998$D + newDF1998$E + newDF1998$F
>> newDF1998$D<-newDF1998$D/newDF1998$G
>> newDF1998$E<-newDF1998$E/newDF1998$G
>> newDF1998$F<-newDF1998$F/newDF1998$G
>> newDF1998<-NewDF1998[,c(-5)]
>>
>> newDF1998
>> B  D  E  F
>> a  0.2  0.2  0.6
>> b  0.4  0.2  0.3
>> c  0.2  0.3  0.5
>> e  0.2  0.3  0.5
>>
>> Question 2: How do I go from newDF1998 to a matrix
>>
>>  a  b  c  e
>> a
>> b
>> c
>> e
>>
>> in which Cell ab = (0.2*0.4 + 0.2*0.2 + 0.6*0.3)/((0.2*0.2 + 0.2*0.2 +
>> 0.6*0.6)^0.5) * ((0.4*0.4 + 0.2*0.2 + 0.3*0.3)^0.5) = 0.84
>
> First we use read.zoo to reform DF into a multivariate time series and
> use rollapply (where we have used the devel version of zoo since it
> supports the partial= argument on rollapply).  We then reform each
> resulting row into a matrix converting each row of each matrix to
> proportions.  Finally we form the desired scaled cross product.
>
> # devel version of zoo
> install.packages("zoo", repos = "http://r-forge.r-project.org")
> library(zoo)
>
> z <- read.zoo(DF, split = 2, index = 3, FUN = identity)
>
> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
> r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)
>
> newDF <- lapply(1:nrow(r), function(i)
>        prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE,
>                dimnames = list(unique(DF$B), names(DF)[-2:-3]))[, -1]), 1))
> names(newDF) <- time(z)
>
> lapply(mats, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))

mats in the last line should be newDF:

lapply(newDF, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

mdvaan
Solved the problem: I guess I was still using the main version of zoo. Thanks again!
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

mdvaan
As a follow up on this post, I am trying to slightly adjust the solution kindly provided by Gabor. However, I am getting some results that I do not understand. Example:

# devel version of zoo
install.packages("zoo", repos = "http://r-forge.r-project.org")
library(zoo)

DF1 = data.frame(read.table(textConnection("    B  C  D  E  F  G
8025  1995  0  4  1  2
8025  1997  1  1  3  4
8026  1995  0  7  0  0
8026  1996  1  2  3  0
8026  1997  1  2  3  1
8026  1998  6  0  0  4
8026  1999  3  7  0  3
8027  1997  1  2  3  9
8027  1998  1  2  3  1
8027  1999  6  0  0  2
8028  1999  3  7  0  0
8029  1995  0  2  3  3
8029  1998  1  2  3  2
8029  1999  6  0  0  1"),head=TRUE,stringsAsFactors=FALSE))

a <- read.zoo(DF1, split = 1, index = 2, FUN = identity)
sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
b <- rollapply(a, 3,  sum.na, align = "right", partial = TRUE)
newDF <- lapply(1:nrow(b), function(i)
       prop.table(na.omit(matrix(b[i,], nc = 4, byrow = TRUE,
               dimnames = list(unique(DF1$B), names(DF1)[-1:-2]))), 1))
names(newDF) <- time(a)
c<-lapply(newDF, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))

Now I would like the elements e in c to be equal to 1-e. However,

c<-lapply(newDF, function(mat) 1 - tcrossprod(mat / sqrt(rowSums(mat^2))))

gives a value  of 2.220446e-16 for as.data.frame(c['1999'])[2,2] instead of 0

What am I doing wrong here? Thanks a lot!


> First we use read.zoo to reform DF into a multivariate time series and
> use rollapply (where we have used the devel version of zoo since it
> supports the partial= argument on rollapply).  We then reform each
> resulting row into a matrix converting each row of each matrix to
> proportions.  Finally we form the desired scaled cross product.
>
> # devel version of zoo
> install.packages("zoo", repos = "http://r-forge.r-project.org")
> library(zoo)
>
> z <- read.zoo(DF, split = 2, index = 3, FUN = identity)
>
> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
> r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)
>
> newDF <- lapply(1:nrow(r), function(i)
>        prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE,
>                dimnames = list(unique(DF$B), names(DF)[-2:-3]))[, -1]), 1))
> names(newDF) <- time(z)
>
> lapply(newDF, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))

Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

Gabor Grothendieck
On Wed, Apr 20, 2011 at 5:49 AM, mathijsdevaan <[hidden email]> wrote:

> As a follow up on this post, I am trying to slightly adjust the solution
> kindly provided by Gabor. However, I am getting some results that I do not
> understand. Example:
>
> # devel version of zoo
> install.packages("zoo", repos = "http://r-forge.r-project.org")
> library(zoo)
>
> DF1 = data.frame(read.table(textConnection("    B  C  D  E  F  G
> 8025  1995  0  4  1  2
> 8025  1997  1  1  3  4
> 8026  1995  0  7  0  0
> 8026  1996  1  2  3  0
> 8026  1997  1  2  3  1
> 8026  1998  6  0  0  4
> 8026  1999  3  7  0  3
> 8027  1997  1  2  3  9
> 8027  1998  1  2  3  1
> 8027  1999  6  0  0  2
> 8028  1999  3  7  0  0
> 8029  1995  0  2  3  3
> 8029  1998  1  2  3  2
> 8029  1999  6  0  0  1"),head=TRUE,stringsAsFactors=FALSE))
>
> a <- read.zoo(DF1, split = 1, index = 2, FUN = identity)
> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
> b <- rollapply(a, 3,  sum.na, align = "right", partial = TRUE)
> newDF <- lapply(1:nrow(b), function(i)
>       prop.table(na.omit(matrix(b[i,], nc = 4, byrow = TRUE,
>               dimnames = list(unique(DF1$B), names(DF1)[-1:-2]))), 1))
> names(newDF) <- time(a)
> c<-lapply(newDF, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))
>
> Now I would like the elements e in c to be equal to 1-e. However,
>
> c<-lapply(newDF, function(mat) 1 - tcrossprod(mat / sqrt(rowSums(mat^2))))
>
> gives a value  of 2.220446e-16 for as.data.frame(c['1999'])[2,2] instead of
> 0
>
> What am I doing wrong here? Thanks a lot!
>

See FAQ 7.31 at:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

mdvaan
Thanks for clarifying that.

Best

Gabor Grothendieck wrote
On Wed, Apr 20, 2011 at 5:49 AM, mathijsdevaan <[hidden email]> wrote:
> As a follow up on this post, I am trying to slightly adjust the solution
> kindly provided by Gabor. However, I am getting some results that I do not
> understand. Example:
>
> # devel version of zoo
> install.packages("zoo", repos = "http://r-forge.r-project.org")
> library(zoo)
>
> DF1 = data.frame(read.table(textConnection("    B  C  D  E  F  G
> 8025  1995  0  4  1  2
> 8025  1997  1  1  3  4
> 8026  1995  0  7  0  0
> 8026  1996  1  2  3  0
> 8026  1997  1  2  3  1
> 8026  1998  6  0  0  4
> 8026  1999  3  7  0  3
> 8027  1997  1  2  3  9
> 8027  1998  1  2  3  1
> 8027  1999  6  0  0  2
> 8028  1999  3  7  0  0
> 8029  1995  0  2  3  3
> 8029  1998  1  2  3  2
> 8029  1999  6  0  0  1"),head=TRUE,stringsAsFactors=FALSE))
>
> a <- read.zoo(DF1, split = 1, index = 2, FUN = identity)
> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
> b <- rollapply(a, 3,  sum.na, align = "right", partial = TRUE)
> newDF <- lapply(1:nrow(b), function(i)
>       prop.table(na.omit(matrix(b[i,], nc = 4, byrow = TRUE,
>               dimnames = list(unique(DF1$B), names(DF1)[-1:-2]))), 1))
> names(newDF) <- time(a)
> c<-lapply(newDF, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))
>
> Now I would like the elements e in c to be equal to 1-e. However,
>
> c<-lapply(newDF, function(mat) 1 - tcrossprod(mat / sqrt(rowSums(mat^2))))
>
> gives a value  of 2.220446e-16 for as.data.frame(c['1999'])[2,2] instead of
> 0
>
> What am I doing wrong here? Thanks a lot!
>

See FAQ 7.31 at:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

mdvaan
In reply to this post by Gabor Grothendieck
Hi,

Is there an alternative to "z <- read.zoo(DF, split = 2, index = 3, FUN = identity)" and "r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)"? I am trying to use the following script in which the split data (B) contains about 300000 unique cases and obviously I am getting an allocation error. Thanks!

# devel version of zoo
install.packages("zoo", repos = "http://r-forge.r-project.org")

DF = data.frame(read.table(textConnection("  A  B  C  D  E  F
1 a  1995  0  4  1
2 a  1997  1  1  3
3 b  1995  3  7  0
4 b  1996  1  2  3
5 b  1997  1  2  3
6 b  1998  6  0  0
7 b  1999  3  7  0
8 c  1997  1  2  3
9 c  1998  1  2  3
10 c  1999  6  0  0
11 d  1999  3  7  0
12 e  1995  1  2  3
13 e  1998  1  2  3
14 e  1999  6  0  0"),head=TRUE,stringsAsFactors=FALSE))


library(zoo)

z <- read.zoo(DF, split = 2, index = 3, FUN = identity)

sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)

newDF <- lapply(1:nrow(r), function(i)
        prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE,
                dimnames = list(unique(DF$B), names(DF)[-2:-3]))[, -1]), 1))
names(newDF) <- time(z)

lapply(newDF, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))


Gabor Grothendieck wrote
On Sat, Apr 9, 2011 at 5:14 AM, mathijsdevaan <[hidden email]> wrote:
> Hi,
>
> I need to perform calculations on subsets of a data frame:
>
> DF = data.frame(read.table(textConnection("    A  B  C  D  E  F
> 1 a  1995  0  4  1
> 2 a  1997  1  1  3
> 3 b  1995  3  7  0
> 4 b  1996  1  2  3
> 5 b  1997  1  2  3
> 6 b  1998  6  0  0
> 7 b  1999  3  7  0
> 8 c  1997  1  2  3
> 9 c  1998  1  2  3
> 10 c  1999  6  0  0
> 11 d  1999  3  7  0
> 12 e  1995  1  2  3
> 13 e  1998  1  2  3
> 14 e  1999  6  0  0"),head=TRUE,stringsAsFactors=FALSE))
>
> I'd like to create new dataframes for each unique year in which for each
> value of A, the values of D, E and F are summed over the last 3 years (e.g.
> 1998 = 1998, 1997, 1996):
> Question 1: How do I go from DF to newDFyear?
>
> Examples:
>
> newDF1995
> B  D  E  F
> a  0  4  1
> b  3  7  0
> e  1  2  3
>
> newDF1998
> B  D  E  F
> a  1  1  3
> b  8  4  6
> c  2  4  6
> e  1  2  3
>
> Then, for each new DF I need to generate a square matrix after doing the
> following:
>
> newDF1998$G<-newDF1998$D + newDF1998$E + newDF1998$F
> newDF1998$D<-newDF1998$D/newDF1998$G
> newDF1998$E<-newDF1998$E/newDF1998$G
> newDF1998$F<-newDF1998$F/newDF1998$G
> newDF1998<-NewDF1998[,c(-5)]
>
> newDF1998
> B  D  E  F
> a  0.2  0.2  0.6
> b  0.4  0.2  0.3
> c  0.2  0.3  0.5
> e  0.2  0.3  0.5
>
> Question 2: How do I go from newDF1998 to a matrix
>
>  a  b  c  e
> a
> b
> c
> e
>
> in which Cell ab = (0.2*0.4 + 0.2*0.2 + 0.6*0.3)/((0.2*0.2 + 0.2*0.2 +
> 0.6*0.6)^0.5) * ((0.4*0.4 + 0.2*0.2 + 0.3*0.3)^0.5) = 0.84

First we use read.zoo to reform DF into a multivariate time series and
use rollapply (where we have used the devel version of zoo since it
supports the partial= argument on rollapply).  We then reform each
resulting row into a matrix converting each row of each matrix to
proportions.  Finally we form the desired scaled cross product.

# devel version of zoo
install.packages("zoo", repos = "http://r-forge.r-project.org")
library(zoo)

z <- read.zoo(DF, split = 2, index = 3, FUN = identity)

sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)

newDF <- lapply(1:nrow(r), function(i)
        prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE,
                dimnames = list(unique(DF$B), names(DF)[-2:-3]))[, -1]), 1))
names(newDF) <- time(z)

lapply(mats, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

Gabor Grothendieck
On Wed, Apr 27, 2011 at 2:03 PM, mathijsdevaan <[hidden email]> wrote:
> Hi,
>
> Is there an alternative to "z <- read.zoo(DF, split = 2, index = 3, FUN =
> identity)" and "r <- rollapply(z, 3,  sum.na, align = "right", partial =
> TRUE)"? I am trying to use the following script in which the split data (B)
> contains about 300000 unique cases and obviously I am getting an allocation
> error. Thanks!
>

You could test the speed of this to see if its faster:

library(reshape2)
library(zoo)
mm <- melt(DF, id = c("B", "C"))
aa <- acast(mm, C ~ B + variable, FUN = sum)
sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
r <- rollapply(aa, 3,  sum.na, align = "right", partial = TRUE)


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

mdvaan
Thanks, but it did not really improve the speed. Why is it that when I change the layout of the matrix (which does not give the required results), the speed increases tremendously? So:

library(reshape2)
library(zoo)
z <- read.zoo(DF, split = 3, index = 2, FUN = identity) # Split on 3 and index on 2 instead of vice versa
sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)

or

mm <- melt(DF, id = c("B", "C"))
aa <- acast(mm, B ~ C + variable, FUN = sum) # B ~ C instead of C ~ B
sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
r <- rollapply(aa, 3,  sum.na, align = "right", partial = TRUE)

Thanks!




Gabor Grothendieck wrote
On Wed, Apr 27, 2011 at 2:03 PM, mathijsdevaan <[hidden email]> wrote:
> Hi,
>
> Is there an alternative to "z <- read.zoo(DF, split = 2, index = 3, FUN =
> identity)" and "r <- rollapply(z, 3,  sum.na, align = "right", partial =
> TRUE)"? I am trying to use the following script in which the split data (B)
> contains about 300000 unique cases and obviously I am getting an allocation
> error. Thanks!
>

You could test the speed of this to see if its faster:

library(reshape2)
library(zoo)
mm <- melt(DF, id = c("B", "C"))
aa <- acast(mm, C ~ B + variable, FUN = sum)
sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
r <- rollapply(aa, 3,  sum.na, align = "right", partial = TRUE)


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

Gabor Grothendieck
On Thu, Apr 28, 2011 at 4:49 PM, mathijsdevaan <[hidden email]> wrote:

> Thanks, but it did not really improve the speed. Why is it that when I change
> the layout of the matrix (which does not give the required results), the
> speed increases tremendously? So:
>
> library(reshape2)
> library(zoo)
> z <- read.zoo(DF, split = 3, index = 2, FUN = identity) # Split on 3 and
> index on 2 instead of vice versa
> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
> r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)
>
> or
>
> mm <- melt(DF, id = c("B", "C"))
> aa <- acast(mm, B ~ C + variable, FUN = sum) # B ~ C instead of C ~ B
> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
> r <- rollapply(aa, 3,  sum.na, align = "right", partial = TRUE)
>


For me it makes little difference:

> system.time(for(i in 1:100) read.zoo(DF, split = 3, index = 2, FUN = identity))
   user  system elapsed
   1.72    0.00    1.71
> system.time(for(i in 1:100) read.zoo(DF, split = 2, index = 3, FUN = identity))
   user  system elapsed
   1.75    0.00    1.74


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Yearly aggregates and matrices

Gabor Grothendieck
On Thu, Apr 28, 2011 at 10:13 PM, Gabor Grothendieck
<[hidden email]> wrote:

> On Thu, Apr 28, 2011 at 4:49 PM, mathijsdevaan <[hidden email]> wrote:
>> Thanks, but it did not really improve the speed. Why is it that when I change
>> the layout of the matrix (which does not give the required results), the
>> speed increases tremendously? So:
>>
>> library(reshape2)
>> library(zoo)
>> z <- read.zoo(DF, split = 3, index = 2, FUN = identity) # Split on 3 and
>> index on 2 instead of vice versa
>> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
>> r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)
>>
>> or
>>
>> mm <- melt(DF, id = c("B", "C"))
>> aa <- acast(mm, B ~ C + variable, FUN = sum) # B ~ C instead of C ~ B
>> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
>> r <- rollapply(aa, 3,  sum.na, align = "right", partial = TRUE)
>>
>
>
> For me it makes little difference:
>
>> system.time(for(i in 1:100) read.zoo(DF, split = 3, index = 2, FUN = identity))
>   user  system elapsed
>   1.72    0.00    1.71
>> system.time(for(i in 1:100) read.zoo(DF, split = 2, index = 3, FUN = identity))
>   user  system elapsed
>   1.75    0.00    1.74

and here it is with rollapply included:

> system.time(for(i in 1:100) { read.zoo(DF, split = 3, index = 2, FUN = identity)
+ r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)})
   user  system elapsed
  14.74    0.00   14.76
.
> system.time(for(i in 1:100) {read.zoo(DF, split = 2, index = 3, FUN = identity)
+ r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)})
   user  system elapsed
  14.72    0.00   14.74

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.