apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

Massimo Bressan
hi

I need to apply a user defined formula over some selected columns of a dataframe by subsetting group of rows (blocks) and get back a new dataframe

I’ve been managed to get the the calculations right but I’m not satisfied at all by the form of the results

please refer to my reproducible example

##########
# my user function (an example)
mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, na.rm=TRUE))}

# my dataframe to apply the formula by blocks
mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0))


#my attempts (not satisfied by final output)

tapply(mydf$v1, mydf$blocks, mynorm)

byf<-factor(mydf$blocks)
aggregate(mydf[2:3], list(byf), mynorm)
aggregate(mydf[2:3], list(mydf$blocks), mynorm, simplify = FALSE)

###########

please can anyone give me some hints on how to properly proceed?

I need a dataframe with all variables as final result
sorry but I’m sort of definitely stuck with this…

thanks


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

David Carlson
You can do this with split/unsplit:

> mydf.split <- split(mydf, mydf$blocks)
> str(mydf.split)
List of 3
 $ a:'data.frame':      5 obs. of  3 variables:
  ..$ blocks: Factor w/ 3 levels "a","b","c": 1 1 1 1 1
  ..$ v1    : num [1:5] 19 15 17 22 16
  ..$ v2    : num [1:5] 35 31 35 31 39
 $ b:'data.frame':      5 obs. of  3 variables:
  ..$ blocks: Factor w/ 3 levels "a","b","c": 2 2 2 2 2
  ..$ v1    : num [1:5] 12 24 25 22 18
  ..$ v2    : num [1:5] 31 19 35 32 38
 $ c:'data.frame':      5 obs. of  3 variables:
  ..$ blocks: Factor w/ 3 levels "a","b","c": 3 3 3 3 3
  ..$ v1    : num [1:5] 17 14 21 21 22
  ..$ v2    : num [1:5] 27 25 23 23 27
> mydf.split2 <- lapply(mydf.split, function(x) data.frame(x,
+      v1mod=mynorm(x$v1)))
> str(mydf.split2)
List of 3
 $ a:'data.frame':      5 obs. of  4 variables:
  ..$ blocks: Factor w/ 3 levels "a","b","c": 1 1 1 1 1
  ..$ v1    : num [1:5] 19 15 17 22 16
  ..$ v2    : num [1:5] 35 31 35 31 39
  ..$ v1mod : num [1:5] 0.571 0 0.286 1 0.143
 $ b:'data.frame':      5 obs. of  4 variables:
  ..$ blocks: Factor w/ 3 levels "a","b","c": 2 2 2 2 2
  ..$ v1    : num [1:5] 12 24 25 22 18
  ..$ v2    : num [1:5] 31 19 35 32 38
  ..$ v1mod : num [1:5] 0 0.923 1 0.769 0.462
 $ c:'data.frame':      5 obs. of  4 variables:
  ..$ blocks: Factor w/ 3 levels "a","b","c": 3 3 3 3 3
  ..$ v1    : num [1:5] 17 14 21 21 22
  ..$ v2    : num [1:5] 27 25 23 23 27
  ..$ v1mod : num [1:5] 0.375 0 0.875 0.875 1
> mydf2 <- unsplit(mydf.split2, mydf$blocks)
> str(mydf2)
'data.frame':   15 obs. of  4 variables:
 $ blocks: Factor w/ 3 levels "a","b","c": 1 1 1 1 1 2 2 2 2 2 ...
 $ v1    : num  19 15 17 22 16 12 24 25 22 18 ...
 $ v2    : num  35 31 35 31 39 31 19 35 32 38 ...
 $ v1mod : num  0.571 0 0.286 1 0.143 ...

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Massimo Bressan
Sent: Friday, May 13, 2016 6:56 AM
To: [hidden email]
Subject: [R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

hi

I need to apply a user defined formula over some selected columns of a dataframe by subsetting group of rows (blocks) and get back a new dataframe

I’ve been managed to get the the calculations right but I’m not satisfied at all by the form of the results

please refer to my reproducible example

##########
# my user function (an example)
mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, na.rm=TRUE))}

# my dataframe to apply the formula by blocks
mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0))


#my attempts (not satisfied by final output)

tapply(mydf$v1, mydf$blocks, mynorm)

byf<-factor(mydf$blocks)
aggregate(mydf[2:3], list(byf), mynorm)
aggregate(mydf[2:3], list(mydf$blocks), mynorm, simplify = FALSE)

###########

please can anyone give me some hints on how to properly proceed?

I need a dataframe with all variables as final result
sorry but I’m sort of definitely stuck with this…

thanks


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

Massimo Bressan
yes, thanks

you pointed me in the right direction: split/unplist was the trick

I completely left behind that possibility!

here the final version

############

mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, na.rm=TRUE))}

mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0))

g <- mydf$blocks
l <- split(mydf, g)
l <- lapply(l, transform, v1.mod = mynorm(v1))
mydf_new <- unsplit(l, g)

############

thanks again

massimo

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

R help mailing list-2
ave() encapsulates the split/lapply/unsplit stuff so
   transform(mydf, v1.mod = ave(v1, blocks, FUN=mynorm))
also gives what you got above.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, May 13, 2016 at 7:44 AM, Massimo Bressan <
[hidden email]> wrote:

> yes, thanks
>
> you pointed me in the right direction: split/unplist was the trick
>
> I completely left behind that possibility!
>
> here the final version
>
> ############
>
> mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) -
> min(x, na.rm=TRUE))}
>
> mydf<-data.frame(blocks=rep(c("a","b","c"),each=5),
> v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0))
>
> g <- mydf$blocks
> l <- split(mydf, g)
> l <- lapply(l, transform, v1.mod = mynorm(v1))
> mydf_new <- unsplit(l, g)
>
> ############
>
> thanks again
>
> massimo
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

Massimo Bressan
thank you, what a nice compact solution with ave()

I learned something new about the subtleties of R

let me here summarize the alternative solutions, just in case someonelse might be interested...

thanks, bye

#

# my user function (an example)
mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, na.rm=TRUE))}

# my dataframe to apply the formula by blocks
mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0))

# blocks (factors) to be used for splitting
b <- mydf$blocks

# 1 - split-lapply-unsplit with anonimous function to return a new df
s <- split(mydf, b)
l<- lapply(s, function(x) data.frame(x, v1mod=mynorm(x$v1)))
mydf_new <- unsplit(l, mydf$blocks)

# 2 - split-lapply-unsplit with function trasnform to return a new df
l <- split(mydf, b)
l <- lapply(l, transform, v1.mod = mynorm(v1))
mydf_new <- unsplit(l, b)

# 3 - ave() encapsulating split-lapply-unsplit approach
mydf_new<-transform(mydf, v1.mod = ave(v1, blocks, FUN=mynorm))

#





Da: "William Dunlap" <[hidden email]>
A: "Massimo Bressan" <[hidden email]>
Cc: "David L Carlson" <[hidden email]>, "r-help" <[hidden email]>
Inviato: Venerdì, 13 maggio 2016 19:22:21
Oggetto: Re: [R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

ave() encapsulates the split/lapply/unsplit stuff so
transform(mydf, v1.mod = ave(v1, blocks, FUN=mynorm))
also gives what you got above.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, May 13, 2016 at 7:44 AM, Massimo Bressan < [hidden email] > wrote:


yes, thanks

you pointed me in the right direction: split/unplist was the trick

I completely left behind that possibility!

here the final version

############

mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, na.rm=TRUE))}

mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0))

g <- mydf$blocks
l <- split(mydf, g)
l <- lapply(l, transform, v1.mod = mynorm(v1))
mydf_new <- unsplit(l, g)

############

thanks again

massimo

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.





--

------------------------------------------------------------
Massimo Bressan

ARPAV
Agenzia Regionale per la Prevenzione e
Protezione Ambientale del Veneto

Dipartimento Provinciale di Treviso
Via Santa Barbara, 5/a
31100 Treviso, Italy

tel: +39 0422 558545
fax: +39 0422 558516
e-mail: [hidden email]
------------------------------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.