# How to create a new data.frame based on calculation of subsets of an existing data.frame

10 messages
Open this post in threaded view
|

## How to create a new data.frame based on calculation of subsets of an existing data.frame

 Hello everyone, I have the following problem: I have a data.frame with multiple fields. If I had to do my calculations for a given combination of IM.type and Taxonomy is the following: D <- read.csv('Test_v2.csv') names(D) VC <- 0.01*( subset(D, IM.type == 'PGA' & Damage.state == 'DS1' & Taxonomy == 'ER+ETR_H1')[10:13] -               subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy == 'ER+ETR_H1')[10:13])  +   0.02*(     subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy == 'ER+ETR_H1')[10:13] -               subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy == 'ER+ETR_H1')[10:13])  +   0.43*( subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy == 'ER+ETR_H1')[10:13] -            subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy == 'ER+ETR_H1')[10:13])  +   1.0*( subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy == 'ER+ETR_H1')[10:13]) So the question is how can I do that in an automated way for all possible combinations and store the results in new data.frame  which would look like this: Ref.No. Region  IM.type Taxonomy        IM_1    IM_2    IM_3    IM_4    VC_1    VC_2    VC_3    VC_4 1622    South America   PGA     ER+ETR_H1       1.00E-06        0.08    0.16    0.24      3.49e-294               3.449819e-05  0.002748889     0.01122911 Best, , ioanna ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## FW: How to create a new data.frame based on calculation of subsets of an existing data.frame

 Hello everyone,   I have the following problem: I have a data.frame with multiple fields. If I had to do my calculations for a given combination of IM.type and Taxonomy is the following: D <- read.csv('Test_v2.csv') names(D) VC <- 0.01*( subset(D, IM.type == 'PGA' & Damage.state == 'DS1' & Taxonomy == 'ER+ETR_H1')[10:13] -               subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy == 'ER+ETR_H1')[10:13])  +   0.02*(     subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy == 'ER+ETR_H1')[10:13] -               subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy == 'ER+ETR_H1')[10:13])  +   0.43*( subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy == 'ER+ETR_H1')[10:13] -            subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy == 'ER+ETR_H1')[10:13])  +   1.0*( subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy == 'ER+ETR_H1')[10:13]) So the question is how can I do that in an automated way for all possible combinations and store the results in new data.frame  which would look like this: Ref.No. Region IM.type Taxonomy IM_1 IM_2 IM_3 IM_4 VC_1 VC_2 VC_3 VC_4 1622 South America PGA ER+ETR_H1 1.00E-06 0.08 0.16 0.24  3.49e-294          3.449819e-05 0.002748889 0.01122911 Best, , ioanna ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

Open this post in threaded view
|

## Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

 Hi Ioanna, I looked at the problem this morning and tried to work out what you wanted. With a problem like this, it is often easy when you have someone point to the data and say "I want this added to that and this multiplied by that". I have probably made the wrong guesses, but I hope that you can correct my guesses and I can get the calculations correct for you. For example, I have assumed that you want the sum of the IM_* values for each set of damage states as the values for VC_1, VC_2 etc. D<-data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629),  Region = rep(c('South America'), times = 8),  IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'),  Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'),  Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2',  'ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),  IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00),  IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08),  IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16),  IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24),  Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),  Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),  Prob.of.exceedance_3 =  c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),  Prob.of.exceedance_4 =  c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405),  stringsAsFactors=FALSE) # assume the above has been read in # add the four columns to the data frame filled with NAs D\$VC_1<-D\$VC_2<-D\$VC_3<-D\$VC_4<-NA # names of the variables used in the calculations calc_vars<-paste("Prob.of.exceedance",1:4,sep="_") # get the rows for the four damage states DS1_rows<-D\$Damage.state == "DS1" DS2_rows<-D\$Damage.state == "DS2" DS3_rows<-D\$Damage.state == "DS3" DS4_rows<-D\$Damage.state == "DS4" # step through all possible values of IM.type and Taxonomy for(IM in unique(D\$IM.type)) {  for(Tax in unique(D\$Taxonomy)) {   # get a logical vector of the rows to be used in this calculation   calc_rows<-D\$IM.type == IM & D\$Taxonomy == Tax   cat(IM,Tax,calc_rows,"\n")   # check that there are any such rows in the data frame   if(sum(calc_rows)) {    # if so, fill in the four values for these rows    D\$VC_1[calc_rows]<-sum(0.01 * (D[calc_rows & DS1_rows,calc_vars] -     D[calc_rows & DS2_rows,calc_vars]))    D\$VC_2[calc_rows]<-sum(0.02 * (D[calc_rows & DS2_rows,calc_vars] -     D[calc_rows & DS3_rows,calc_vars]))    D\$VC_3[calc_rows]<-sum(0.43 * (D[calc_rows & DS3_rows,calc_vars] -     D[calc_rows & DS4_rows,calc_vars]))    D\$VC_4[calc_rows]<-sum(D[calc_rows & DS4_rows,calc_vars])   }  } } Jim ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

 Hello Jim, Thank you every so  much it ws very helful. In fact what I want to calculate is the following. My very last question is if I want to save the outcome VC, IM.type and Taxonomy in a new data.frame how can I do it? # names of the variables used in the calculations calc_vars<-paste("Prob.of.exceedance",1:4,sep="_") # get the rows for the four damage states DS1_rows <-D\$Damage.state == "DS1" DS2_rows <-D\$Damage.state == "DS2" DS3_rows <-D\$Damage.state == "DS3" DS4_rows <-D\$Damage.state == "DS4" # step through all possible values of IM.type and Taxonomy for(IM in unique(D\$IM.type)) {  for(Tax in unique(D\$Taxonomy)) { # get a logical vector of the rows to be used in this calculation calc_rows <- D\$IM.type == IM & D\$Taxonomy == Tax cat(IM,Tax,calc_rows,"\n") # check that there are any such rows in the data frame if(sum(calc_rows)) {   # if so, fill in the four values for these rows   VC <- 0.0 * (1- D[calc_rows & DS1_rows,calc_vars]) +     0.02* (D[calc_rows & DS1_rows,calc_vars] -                D[calc_rows & DS2_rows,calc_vars]) +     0.10* (D[calc_rows & DS2_rows,calc_vars] -                                    D[calc_rows & DS3_rows,calc_vars]) +     0.43 * (D[calc_rows & DS3_rows,calc_vars] -                                    D[calc_rows & DS4_rows,calc_vars]) +     1.0*   D[calc_rows & DS4_rows,calc_vars] } } } -----Original Message----- From: Jim Lemon [mailto:[hidden email]] Sent: Thursday, December 19, 2019 2:05 AM To: Ioannou, Ioanna <[hidden email]>; r-help mailing list <[hidden email]> Subject: Re: [R] How to create a new data.frame based on calculation of subsets of an existing data.frame Hi Ioanna, I looked at the problem this morning and tried to work out what you wanted. With a problem like this, it is often easy when you have someone point to the data and say "I want this added to that and this multiplied by that". I have probably made the wrong guesses, but I hope that you can correct my guesses and I can get the calculations correct for you. For example, I have assumed that you want the sum of the IM_* values for each set of damage states as the values for VC_1, VC_2 etc. D<-data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629),  Region = rep(c('South America'), times = 8),  IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'),  Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'),  Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2',  'ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),  IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00),  IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08),  IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16),  IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24),  Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),  Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),  Prob.of.exceedance_3 =  c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),  Prob.of.exceedance_4 =  c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405),  stringsAsFactors=FALSE) # assume the above has been read in # add the four columns to the data frame filled with NAs D\$VC_1<-D\$VC_2<-D\$VC_3<-D\$VC_4<-NA # names of the variables used in the calculations calc_vars<-paste("Prob.of.exceedance",1:4,sep="_") # get the rows for the four damage states DS1_rows<-D\$Damage.state == "DS1" DS2_rows<-D\$Damage.state == "DS2" DS3_rows<-D\$Damage.state == "DS3" DS4_rows<-D\$Damage.state == "DS4" # step through all possible values of IM.type and Taxonomy for(IM in unique(D\$IM.type)) {  for(Tax in unique(D\$Taxonomy)) {   # get a logical vector of the rows to be used in this calculation   calc_rows<-D\$IM.type == IM & D\$Taxonomy == Tax   cat(IM,Tax,calc_rows,"\n")   # check that there are any such rows in the data frame   if(sum(calc_rows)) {    # if so, fill in the four values for these rows    D\$VC_1[calc_rows]<-sum(0.01 * (D[calc_rows & DS1_rows,calc_vars] -     D[calc_rows & DS2_rows,calc_vars]))    D\$VC_2[calc_rows]<-sum(0.02 * (D[calc_rows & DS2_rows,calc_vars] -     D[calc_rows & DS3_rows,calc_vars]))    D\$VC_3[calc_rows]<-sum(0.43 * (D[calc_rows & DS3_rows,calc_vars] -     D[calc_rows & DS4_rows,calc_vars]))    D\$VC_4[calc_rows]<-sum(D[calc_rows & DS4_rows,calc_vars])   }  } } Jim ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

 Hi Ioanna, For simplicity assume that the new data frame will be named E: E<-D[,c("Taxonomy","IM.type",paste("VC,1:4,sep="_"))] While I haven't tested this, I'm pretty sure I have it correct. Just extract the columns you want from D and assign that to E. Jim On Fri, Dec 20, 2019 at 9:02 PM Ioannou, Ioanna <[hidden email]> wrote: > > Hello Jim, > > Thank you every so  much it ws very helful. In fact what I want to calculate is the following. My very last question is if I want to save the outcome VC, IM.type and Taxonomy in a new data.frame how can I do it? > ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

 Hello Jim, I made some changes to the code essentially I substitute each 4 lines DS1-4 with one. I estimate VC which in an ideal world should be a matrix with 4 columns one for every exceedance_probability_1-4 and 2 rowsfor each unique combination of taxonomy and IM.Type. Coukd you please check the code I sent last and based on that give your solution? Many thanks. Get Outlook for Android ________________________________ From: Jim Lemon <[hidden email]> Sent: Friday, December 20, 2019 11:40:28 AM To: Ioannou, Ioanna <[hidden email]> Cc: r-help mailing list <[hidden email]> Subject: Re: [R] How to create a new data.frame based on calculation of subsets of an existing data.frame Hi Ioanna, For simplicity assume that the new data frame will be named E: E<-D[,c("Taxonomy","IM.type",paste("VC,1:4,sep="_"))] While I haven't tested this, I'm pretty sure I have it correct. Just extract the columns you want from D and assign that to E. Jim On Fri, Dec 20, 2019 at 9:02 PM Ioannou, Ioanna <[hidden email]> wrote: > > Hello Jim, > > Thank you every so  much it ws very helful. In fact what I want to calculate is the following. My very last question is if I want to save the outcome VC, IM.type and Taxonomy in a new data.frame how can I do it? >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

 Hi Ioanna, We're getting somewhere, but there are four unique combinations  of Taxonomy and IM.type: ER+ETR_H1,PGA ER+ETR_H2,PGA ER+ETR_H1,Sa ER+ETR_H2,Sa Perhaps you mean that ER+ETR_H1 only occurs with PGA and ER+ETR_H2 only occurs with Sa. I handled that by checking that there were any rows that corresponded to the condition requested. Also you want a matrix for each row containing Taxonomy and IM.type in the output. When I run what I think you are asking, I only get a two element list, each a vector of values. Maybe this is what you want, and it could be coerced into matrix format: D<- data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629),  Region = rep(c('South America'), times = 8),  IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'),  Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'),  Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),  Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),  Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),  Prob.of.exceedance_3 =   c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),  Prob.of.exceedance_4 =   c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405),  stringsAsFactors=FALSE) # names of the variables used in the calculations calc_vars<-paste("Prob.of.exceedance",1:4,sep="_") # get the rows for the four damage states DS1_rows <-D\$Damage.state == "DS1" DS2_rows <-D\$Damage.state == "DS2" DS3_rows <-D\$Damage.state == "DS3" DS4_rows <-D\$Damage.state == "DS4" # create an empty list VC<-list() # set an index variable for VC VCindex<-1 # step through all possible values of IM.type and Taxonomy for(IM in unique(D\$IM.type)) {  for(Tax in unique(D\$Taxonomy)) {   # get a logical vector of the rows to be used in this calculation   calc_rows <- D\$IM.type == IM & D\$Taxonomy == Tax   cat(IM,Tax,calc_rows,"\n")   # check that there are any such rows in the data frame   if(sum(calc_rows)) {    # if so, fill in the four values for these rows    VC[[VCindex]] <- 0.0 * (1- D[calc_rows & DS1_rows,calc_vars]) +     0.02* (D[calc_rows & DS1_rows,calc_vars] -                D[calc_rows & DS2_rows,calc_vars]) +     0.10* (D[calc_rows & DS2_rows,calc_vars] -                                    D[calc_rows & DS3_rows,calc_vars]) +     0.43 * (D[calc_rows & DS3_rows,calc_vars] -                                    D[calc_rows & DS4_rows,calc_vars]) +     1.0*   D[calc_rows & DS4_rows,calc_vars]    # increment the index    VCindex<-VCindex+1   }  } } I think we'll get there. Jim On Sat, Dec 21, 2019 at 12:45 AM Ioannou, Ioanna <[hidden email]> wrote: > > Hello Jim, > > I made some changes to the code essentially I substitute each 4 lines DS1-4 with one. I estimate VC which in an ideal world should be a matrix with 4 columns one for every exceedance_probability_1-4 and 2 rowsfor each unique combination of taxonomy and IM.Type. Coukd you please check the code I sent last and based on that give your solution? ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.