How to create a new data.frame based on calculation of subsets of an existing data.frame

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

How to create a new data.frame based on calculation of subsets of an existing data.frame

Ioannou, Ioanna
Hello everyone,

I have the following problem: I have a data.frame with multiple fields.

If I had to do my calculations for a given combination of IM.type and Taxonomy is the following:
D <- read.csv('Test_v2.csv')
names(D)

VC <- 0.01*( subset(D, IM.type == 'PGA' & Damage.state == 'DS1' & Taxonomy == 'ER+ETR_H1')[10:13] -
              subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy == 'ER+ETR_H1')[10:13])  +
  0.02*(     subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy == 'ER+ETR_H1')[10:13] -
              subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy == 'ER+ETR_H1')[10:13])  +
  0.43*( subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy == 'ER+ETR_H1')[10:13] -
           subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy == 'ER+ETR_H1')[10:13])  +
  1.0*( subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy == 'ER+ETR_H1')[10:13])

So the question is how can I do that in an automated way for all possible combinations and store the results in new data.frame  which would look like this:

Ref.No. Region  IM.type Taxonomy        IM_1    IM_2    IM_3    IM_4    VC_1    VC_2    VC_3    VC_4
1622    South America   PGA     ER+ETR_H1       1.00E-06        0.08    0.16    0.24      3.49e-294               3.449819e-05  0.002748889     0.01122911

Best, ,
ioanna

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

FW: How to create a new data.frame based on calculation of subsets of an existing data.frame

IIoanna
Hello everyone,
 
I have the following problem: I have a data.frame with multiple fields.

If I had to do my calculations for a given combination of IM.type and
Taxonomy is the following:
D <- read.csv('Test_v2.csv')
names(D)

VC <- 0.01*( subset(D, IM.type == 'PGA' & Damage.state == 'DS1' & Taxonomy
== 'ER+ETR_H1')[10:13] -
              subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy
== 'ER+ETR_H1')[10:13])  +
  0.02*(     subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy
== 'ER+ETR_H1')[10:13] -
              subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy
== 'ER+ETR_H1')[10:13])  +
  0.43*( subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy ==
'ER+ETR_H1')[10:13] -
           subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy ==
'ER+ETR_H1')[10:13])  +
  1.0*( subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy ==
'ER+ETR_H1')[10:13])

So the question is how can I do that in an automated way for all possible
combinations and store the results in new data.frame  which would look like
this:

Ref.No. Region IM.type Taxonomy IM_1 IM_2 IM_3 IM_4 VC_1
VC_2 VC_3 VC_4
1622 South America PGA ER+ETR_H1 1.00E-06 0.08 0.16
0.24  3.49e-294          3.449819e-05 0.002748889 0.01122911


Best, ,
ioanna
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

Jim Lemon-4
In reply to this post by Ioannou, Ioanna
Okay, I'm away for most of the day and might not be able to look at it
until tomorrow.

Jim

On Wed, Dec 18, 2019 at 9:27 AM Ioannou, Ioanna
<[hidden email]> wrote:

>
> Hello Jim ,
>
> I am very sorry.  Here is the corrected sample data to play with:
>
> Test.v2 <- data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629),
>                       Region = rep(c('South America'), times = 8),
>                       IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'),
>                       Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'),
>                       Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),
>                       IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00),
>                       IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08),
>                       IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16),
>                       IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24),
>                       Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),
>                       Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),
>                       Prob.of.exceedance_3 = c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),
>                       Prob.of.exceedance_4 = c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405)
>                       )
>
> Basically I am using the total probability theorem to calculate a best estimate. I am stuck how to do it for many cases. Many thanks for your patience.
>
> -----Original Message-----
> From: Jim Lemon [mailto:[hidden email]]
> Sent: Tuesday, December 17, 2019 10:22 PM
> To: Ioannou, Ioanna <[hidden email]>
> Subject: Re: [R] How to create a new data.frame based on calculation of subsets of an existing data.frame
>
> Hi Ioanna,
> After looking at your post for a while, I think that you are combining columns IM_1 to IM_4 to generate VC_1 to VC_4. First, you seem to have omitted the "Region" column from Test_v2, which means that your indices (10:13) run out of range. It seems to me that you would find it easier to write down what arithmetic operations you want and translate these into logical expressions to extract the rows.
>
> Jim
>
> On Wed, Dec 18, 2019 at 7:47 AM Ioannou, Ioanna <[hidden email]> wrote:
> >
> > Hello everyone,
> >
> > I have the following problem: I have a data.frame with multiple fields.
> >
> > If I had to do my calculations for a given combination of IM.type and Taxonomy is the following:
> > D <- read.csv('Test_v2.csv')
> > names(D)
> >
> > VC <- 0.01*( subset(D, IM.type == 'PGA' & Damage.state == 'DS1' & Taxonomy == 'ER+ETR_H1')[10:13] -
> >               subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy == 'ER+ETR_H1')[10:13])  +
> >   0.02*(     subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy == 'ER+ETR_H1')[10:13] -
> >               subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy == 'ER+ETR_H1')[10:13])  +
> >   0.43*( subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy == 'ER+ETR_H1')[10:13] -
> >            subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy == 'ER+ETR_H1')[10:13])  +
> >   1.0*( subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy
> > == 'ER+ETR_H1')[10:13])
> >
> > So the question is how can I do that in an automated way for all possible combinations and store the results in new data.frame  which would look like this:
> >
> > Ref.No. Region  IM.type Taxonomy        IM_1    IM_2    IM_3    IM_4    VC_1    VC_2    VC_3    VC_4
> > 1622    South America   PGA     ER+ETR_H1       1.00E-06        0.08    0.16    0.24      3.49e-294               3.449819e-05  0.002748889     0.01122911
> >
> > Best, ,
> > ioanna
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=02%7C01%7C%7C2808d89de
> > 79441309c4808d7833f7f81%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C6
> > 37122181061837860&amp;sdata=B%2FmCVpyLnCghj3KxgP7fYu3aOxy7uRjAVZ8fgdhc
> > u4w%3D&amp;reserved=0 PLEASE do read the posting guide
> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R
> > -project.org%2Fposting-guide.html&amp;data=02%7C01%7C%7C2808d89de79441
> > 309c4808d7833f7f81%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637122
> > 181061837860&amp;sdata=e4YB5rlwfSLO%2B01i92q4%2F8otuyjv%2FoZnuIwfDWPGi
> > EE%3D&amp;reserved=0 and provide commented, minimal, self-contained,
> > reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

Jim Lemon-4
Hi Ioanna,
I looked at the problem this morning and tried to work out what you
wanted. With a problem like this, it is often easy when you have
someone point to the data and say "I want this added to that and this
multiplied by that". I have probably made the wrong guesses, but I
hope that you can correct my guesses and I can get the calculations
correct for you. For example, I have assumed that you want the sum of
the IM_* values for each set of damage states as the values for VC_1,
VC_2 etc.

D<-data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629),
 Region = rep(c('South America'), times = 8),
 IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'),
 Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'),
 Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2',
 'ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),
 IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00),
 IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08),
 IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16),
 IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24),
 Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_3 =
 c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),
 Prob.of.exceedance_4 =
 c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405),
 stringsAsFactors=FALSE)
# assume the above has been read in
# add the four columns to the data frame filled with NAs
D$VC_1<-D$VC_2<-D$VC_3<-D$VC_4<-NA
# names of the variables used in the calculations
calc_vars<-paste("Prob.of.exceedance",1:4,sep="_")
# get the rows for the four damage states
DS1_rows<-D$Damage.state == "DS1"
DS2_rows<-D$Damage.state == "DS2"
DS3_rows<-D$Damage.state == "DS3"
DS4_rows<-D$Damage.state == "DS4"
# step through all possible values of IM.type and Taxonomy
for(IM in unique(D$IM.type)) {
 for(Tax in unique(D$Taxonomy)) {
  # get a logical vector of the rows to be used in this calculation
  calc_rows<-D$IM.type == IM & D$Taxonomy == Tax
  cat(IM,Tax,calc_rows,"\n")
  # check that there are any such rows in the data frame
  if(sum(calc_rows)) {
   # if so, fill in the four values for these rows
   D$VC_1[calc_rows]<-sum(0.01 * (D[calc_rows & DS1_rows,calc_vars] -
    D[calc_rows & DS2_rows,calc_vars]))
   D$VC_2[calc_rows]<-sum(0.02 * (D[calc_rows & DS2_rows,calc_vars] -
    D[calc_rows & DS3_rows,calc_vars]))
   D$VC_3[calc_rows]<-sum(0.43 * (D[calc_rows & DS3_rows,calc_vars] -
    D[calc_rows & DS4_rows,calc_vars]))
   D$VC_4[calc_rows]<-sum(D[calc_rows & DS4_rows,calc_vars])
  }
 }
}

Jim

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

Ioannou, Ioanna
Hello Jim,

Thank you every so  much it ws very helful. In fact what I want to calculate is the following. My very last question is if I want to save the outcome VC, IM.type and Taxonomy in a new data.frame how can I do it?

# names of the variables used in the calculations
calc_vars<-paste("Prob.of.exceedance",1:4,sep="_")
# get the rows for the four damage states
DS1_rows <-D$Damage.state == "DS1"
DS2_rows <-D$Damage.state == "DS2"
DS3_rows <-D$Damage.state == "DS3"
DS4_rows <-D$Damage.state == "DS4"
# step through all possible values of IM.type and Taxonomy
for(IM in unique(D$IM.type)) {  for(Tax in unique(D$Taxonomy)) {
# get a logical vector of the rows to be used in this calculation
calc_rows <- D$IM.type == IM & D$Taxonomy == Tax
cat(IM,Tax,calc_rows,"\n")
# check that there are any such rows in the data frame
if(sum(calc_rows)) {
  # if so, fill in the four values for these rows
  VC <- 0.0 * (1- D[calc_rows & DS1_rows,calc_vars]) +
    0.02* (D[calc_rows & DS1_rows,calc_vars] -
               D[calc_rows & DS2_rows,calc_vars]) +
    0.10* (D[calc_rows & DS2_rows,calc_vars] -
                                   D[calc_rows & DS3_rows,calc_vars]) +
    0.43 * (D[calc_rows & DS3_rows,calc_vars] -
                                   D[calc_rows & DS4_rows,calc_vars]) +
    1.0*   D[calc_rows & DS4_rows,calc_vars]

}
}
}

-----Original Message-----
From: Jim Lemon [mailto:[hidden email]]
Sent: Thursday, December 19, 2019 2:05 AM
To: Ioannou, Ioanna <[hidden email]>; r-help mailing list <[hidden email]>
Subject: Re: [R] How to create a new data.frame based on calculation of subsets of an existing data.frame

Hi Ioanna,
I looked at the problem this morning and tried to work out what you wanted. With a problem like this, it is often easy when you have someone point to the data and say "I want this added to that and this multiplied by that". I have probably made the wrong guesses, but I hope that you can correct my guesses and I can get the calculations correct for you. For example, I have assumed that you want the sum of the IM_* values for each set of damage states as the values for VC_1,
VC_2 etc.

D<-data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629),  Region = rep(c('South America'), times = 8),  IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'),  Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'),  Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2',
 'ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),
 IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00),
 IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08),
 IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16),
 IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24),
 Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_3 =
 c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),
 Prob.of.exceedance_4 =
 c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405),
 stringsAsFactors=FALSE)
# assume the above has been read in
# add the four columns to the data frame filled with NAs D$VC_1<-D$VC_2<-D$VC_3<-D$VC_4<-NA
# names of the variables used in the calculations
calc_vars<-paste("Prob.of.exceedance",1:4,sep="_")
# get the rows for the four damage states DS1_rows<-D$Damage.state == "DS1"
DS2_rows<-D$Damage.state == "DS2"
DS3_rows<-D$Damage.state == "DS3"
DS4_rows<-D$Damage.state == "DS4"
# step through all possible values of IM.type and Taxonomy for(IM in unique(D$IM.type)) {  for(Tax in unique(D$Taxonomy)) {
  # get a logical vector of the rows to be used in this calculation
  calc_rows<-D$IM.type == IM & D$Taxonomy == Tax
  cat(IM,Tax,calc_rows,"\n")
  # check that there are any such rows in the data frame
  if(sum(calc_rows)) {
   # if so, fill in the four values for these rows
   D$VC_1[calc_rows]<-sum(0.01 * (D[calc_rows & DS1_rows,calc_vars] -
    D[calc_rows & DS2_rows,calc_vars]))
   D$VC_2[calc_rows]<-sum(0.02 * (D[calc_rows & DS2_rows,calc_vars] -
    D[calc_rows & DS3_rows,calc_vars]))
   D$VC_3[calc_rows]<-sum(0.43 * (D[calc_rows & DS3_rows,calc_vars] -
    D[calc_rows & DS4_rows,calc_vars]))
   D$VC_4[calc_rows]<-sum(D[calc_rows & DS4_rows,calc_vars])
  }
 }
}

Jim
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

Jim Lemon-4
Hi Ioanna,
For simplicity assume that the new data frame will be named E:

E<-D[,c("Taxonomy","IM.type",paste("VC,1:4,sep="_"))]

While I haven't tested this, I'm pretty sure I have it correct. Just
extract the columns you want from D and assign that to E.

Jim

On Fri, Dec 20, 2019 at 9:02 PM Ioannou, Ioanna
<[hidden email]> wrote:
>
> Hello Jim,
>
> Thank you every so  much it ws very helful. In fact what I want to calculate is the following. My very last question is if I want to save the outcome VC, IM.type and Taxonomy in a new data.frame how can I do it?
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

Ioannou, Ioanna
Hello Jim,

I made some changes to the code essentially I substitute each 4 lines DS1-4 with one. I estimate VC which in an ideal world should be a matrix with 4 columns one for every exceedance_probability_1-4 and 2 rowsfor each unique combination of taxonomy and IM.Type. Coukd you please check the code I sent last and based on that give your solution?

Many thanks.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Jim Lemon <[hidden email]>
Sent: Friday, December 20, 2019 11:40:28 AM
To: Ioannou, Ioanna <[hidden email]>
Cc: r-help mailing list <[hidden email]>
Subject: Re: [R] How to create a new data.frame based on calculation of subsets of an existing data.frame

Hi Ioanna,
For simplicity assume that the new data frame will be named E:

E<-D[,c("Taxonomy","IM.type",paste("VC,1:4,sep="_"))]

While I haven't tested this, I'm pretty sure I have it correct. Just
extract the columns you want from D and assign that to E.

Jim

On Fri, Dec 20, 2019 at 9:02 PM Ioannou, Ioanna
<[hidden email]> wrote:
>
> Hello Jim,
>
> Thank you every so  much it ws very helful. In fact what I want to calculate is the following. My very last question is if I want to save the outcome VC, IM.type and Taxonomy in a new data.frame how can I do it?
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

Jim Lemon-4
Hi Ioanna,
We're getting somewhere, but there are four unique combinations  of
Taxonomy and IM.type:

ER+ETR_H1,PGA
ER+ETR_H2,PGA
ER+ETR_H1,Sa
ER+ETR_H2,Sa

Perhaps you mean that ER+ETR_H1 only occurs with PGA and ER+ETR_H2
only occurs with Sa. I handled that by checking that there were any
rows that corresponded to the condition requested.

Also you want a matrix for each row containing Taxonomy and IM.type in
the output. When I run what I think you are asking, I only get a two
element list, each a vector of values. Maybe this is what you want,
and it could be coerced into matrix format:

D<- data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628,
1629),  Region = rep(c('South America'), times = 8),
 IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'),
 Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'),
 Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),
 Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_3 =
  c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),
 Prob.of.exceedance_4 =
  c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405),
 stringsAsFactors=FALSE)

# names of the variables used in the calculations
calc_vars<-paste("Prob.of.exceedance",1:4,sep="_")
# get the rows for the four damage states
DS1_rows <-D$Damage.state == "DS1"
DS2_rows <-D$Damage.state == "DS2"
DS3_rows <-D$Damage.state == "DS3"
DS4_rows <-D$Damage.state == "DS4"
# create an empty list
VC<-list()
# set an index variable for VC
VCindex<-1
# step through all possible values of IM.type and Taxonomy
for(IM in unique(D$IM.type)) {
 for(Tax in unique(D$Taxonomy)) {
  # get a logical vector of the rows to be used in this calculation
  calc_rows <- D$IM.type == IM & D$Taxonomy == Tax
  cat(IM,Tax,calc_rows,"\n")
  # check that there are any such rows in the data frame
  if(sum(calc_rows)) {
   # if so, fill in the four values for these rows
   VC[[VCindex]] <- 0.0 * (1- D[calc_rows & DS1_rows,calc_vars]) +
    0.02* (D[calc_rows & DS1_rows,calc_vars] -
               D[calc_rows & DS2_rows,calc_vars]) +
    0.10* (D[calc_rows & DS2_rows,calc_vars] -
                                   D[calc_rows & DS3_rows,calc_vars]) +
    0.43 * (D[calc_rows & DS3_rows,calc_vars] -
                                   D[calc_rows & DS4_rows,calc_vars]) +
    1.0*   D[calc_rows & DS4_rows,calc_vars]
   # increment the index
   VCindex<-VCindex+1
  }
 }
}

I think we'll get there.

Jim


On Sat, Dec 21, 2019 at 12:45 AM Ioannou, Ioanna
<[hidden email]> wrote:
>
> Hello Jim,
>
> I made some changes to the code essentially I substitute each 4 lines DS1-4 with one. I estimate VC which in an ideal world should be a matrix with 4 columns one for every exceedance_probability_1-4 and 2 rowsfor each unique combination of taxonomy and IM.Type. Coukd you please check the code I sent last and based on that give your solution?

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

Ioannou, Ioanna
Hello Jim ,

Thank you ever so much for your help. I was truly stuck!

This looks much better and yes I can turn them into a matrix no problem. Indeed I need only the results for ER+ETR_H1,PGA and ER+ETR_H2,Sa. One minor point as it is the VC has 4 values for three cases instead of the aforementioned two. In fact, the third is identical to the first. Could you please optimize?

Thank you very much again,
Best,
ioanna

-----Original Message-----
From: Jim Lemon [mailto:[hidden email]]
Sent: Friday, December 20, 2019 9:04 PM
To: Ioannou, Ioanna <[hidden email]>
Cc: r-help mailing list <[hidden email]>
Subject: Re: [R] How to create a new data.frame based on calculation of subsets of an existing data.frame

Hi Ioanna,
We're getting somewhere, but there are four unique combinations  of Taxonomy and IM.type:

ER+ETR_H1,PGA
ER+ETR_H2,PGA
ER+ETR_H1,Sa
ER+ETR_H2,Sa

Perhaps you mean that ER+ETR_H1 only occurs with PGA and ER+ETR_H2 only occurs with Sa. I handled that by checking that there were any rows that corresponded to the condition requested.

Also you want a matrix for each row containing Taxonomy and IM.type in the output. When I run what I think you are asking, I only get a two element list, each a vector of values. Maybe this is what you want, and it could be coerced into matrix format:

D<- data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629),  Region = rep(c('South America'), times = 8),  IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'),  Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'),  Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),
 Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_3 =
  c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),
 Prob.of.exceedance_4 =
  c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405),
 stringsAsFactors=FALSE)

# names of the variables used in the calculations
calc_vars<-paste("Prob.of.exceedance",1:4,sep="_")
# get the rows for the four damage states DS1_rows <-D$Damage.state == "DS1"
DS2_rows <-D$Damage.state == "DS2"
DS3_rows <-D$Damage.state == "DS3"
DS4_rows <-D$Damage.state == "DS4"
# create an empty list
VC<-list()
# set an index variable for VC
VCindex<-1
# step through all possible values of IM.type and Taxonomy for(IM in unique(D$IM.type)) {  for(Tax in unique(D$Taxonomy)) {
  # get a logical vector of the rows to be used in this calculation
  calc_rows <- D$IM.type == IM & D$Taxonomy == Tax
  cat(IM,Tax,calc_rows,"\n")
  # check that there are any such rows in the data frame
  if(sum(calc_rows)) {
   # if so, fill in the four values for these rows
   VC[[VCindex]] <- 0.0 * (1- D[calc_rows & DS1_rows,calc_vars]) +
    0.02* (D[calc_rows & DS1_rows,calc_vars] -
               D[calc_rows & DS2_rows,calc_vars]) +
    0.10* (D[calc_rows & DS2_rows,calc_vars] -
                                   D[calc_rows & DS3_rows,calc_vars]) +
    0.43 * (D[calc_rows & DS3_rows,calc_vars] -
                                   D[calc_rows & DS4_rows,calc_vars]) +
    1.0*   D[calc_rows & DS4_rows,calc_vars]
   # increment the index
   VCindex<-VCindex+1
  }
 }
}

I think we'll get there.

Jim


On Sat, Dec 21, 2019 at 12:45 AM Ioannou, Ioanna <[hidden email]> wrote:
>
> Hello Jim,
>
> I made some changes to the code essentially I substitute each 4 lines DS1-4 with one. I estimate VC which in an ideal world should be a matrix with 4 columns one for every exceedance_probability_1-4 and 2 rowsfor each unique combination of taxonomy and IM.Type. Coukd you please check the code I sent last and based on that give your solution?
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a new data.frame based on calculation of subsets of an existing data.frame

Jim Lemon-4
I'm probably misunderstanding what you want. I get this from the code I sent:

 VC
[[1]]
 Prob.of.exceedance_1 Prob.of.exceedance_2 Prob.of.exceedance_3
1                    0                    0          0.005343027
 Prob.of.exceedance_4
1           0.01947477

[[2]]
 Prob.of.exceedance_1 Prob.of.exceedance_2 Prob.of.exceedance_3
5                    0                    0           0.00115359
 Prob.of.exceedance_4
5           0.01122235

Two list elements with four values. Perhaps you want a matrix for each
block of Taxonomy and IM.type that has a row for each element of the
block? This often happens with a remotely specified problem.

Jim

On Sat, Dec 21, 2019 at 8:33 AM Ioannou, Ioanna
<[hidden email]> wrote:
>
> Hello Jim ,
>
> Thank you ever so much for your help. I was truly stuck!
>
> This looks much better and yes I can turn them into a matrix no problem. Indeed I need only the results for ER+ETR_H1,PGA and ER+ETR_H2,Sa. One minor point as it is the VC has 4 values for three cases instead of the aforementioned two. In fact, the third is identical to the first. Could you please optimize?
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.