adding variable into dataframe by indice

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

adding variable into dataframe by indice

Adrian Katschke
  R-Helpers,
   
  I am trying to insert a value into a dataframe. This value is a proportion calculated by counting the number of those individuals with that value and then inserting the proportion at the end of the dataframe to only those individuals with the given value. The problem I am running into is that the proportions are not being attached to only those individuals with the specified value for that proportion.
   
  Below is an example of the code that I am using. The data is made up for the dataframe. Should give you an idea, but the original has 'NA' in many rows. The original data is what is reported in the output below.
   
    #Read in Data
  age.int <- data.frame(IND_ID = seq(1, 140, 10),   rs1042364 = sample( c("(1,1)","(1,2)","(2,2)"),14,replace = T),
  first_drink = sample(5:17,14,replace = T))

   

    asubs112 <- subset(age.int, rs1042364 != "(2,2)")

   
    ages112 <- sort(unique(na.omit(asubs112$first_drink)))
   
  for ( i in ages112) {
    indce <- which(na.omit(asubs112$first_drink == i))
    prop <- length(indce)/nrow(asubs112)
    asubs112[indce,4] <- prop
    asubs112[indce,]
  }
   
  Below is the output that I get from the script above. Notice the proportion for the first NA but not any of the others. Not sure what I am doing wrong, any suggestions are a big help.
   
  TIA,
  Adrian
   
   asubs112[1:50,]
      IND_ID rs1042364 first_drink age_int          V5
4   10008007     (1,2)          NA      16 0.003891051
6   10013012     (1,2)          13      14 0.116731518
7   10015006     (1,2)          12      17 0.105058366
8   10015007     (1,1)          12      16 0.105058366
10  10021009     (1,2)          NA      15          NA
14  10039036     (1,2)          NA      15          NA
15  10039037     (1,2)          NA      13          NA
17  10045005     (1,2)          13      17 0.116731518
18  10045014     (1,2)          13      14 0.116731518
21  10055022     (1,2)          NA      15          NA






        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: adding variable into dataframe by indice

PIKAL Petr
Hi

not sure if I understand correctly but table() can be used

ttt <- table(asubs112$fir)
prop <- ttt/nrow(asubs112)
asubs112$prop <-
prop[match(asubs112$first_drink,as.numeric(names(prop)))]

> asubs112
   IND_ID rs1042364 first_drink      prop
2      11     (1,2)           7 0.3333333
5      41     (1,2)          11 0.1111111
6      51     (1,2)           7 0.3333333
7      61     (1,1)           7 0.3333333
8      71     (1,1)          12 0.2222222
10     91     (1,1)           6 0.1111111
11    101     (1,2)           5 0.1111111
12    111     (1,2)          13 0.1111111
14    131     (1,2)          12 0.2222222
>

HTH
Petr


On 8 Feb 2006 at 9:40, Adrian Katschke wrote:

Date sent:       Wed, 8 Feb 2006 09:40:43 -0800 (PST)
From:           Adrian Katschke <[hidden email]>
To:             RHelp <[hidden email]>
Subject:         [R] adding variable into dataframe by indice

>   R-Helpers,
>
>   I am trying to insert a value into a dataframe. This value is a
>   proportion calculated by counting the number of those individuals
>   with that value and then inserting the proportion at the end of the
>   dataframe to only those individuals with the given value. The
>   problem I am running into is that the proportions are not being
>   attached to only those individuals with the specified value for that
>   proportion.
>
>   Below is an example of the code that I am using. The data is made up
>   for the dataframe. Should give you an idea, but the original has
>   'NA' in many rows. The original data is what is reported in the
>   output below.
>
>     #Read in Data
>   age.int <- data.frame(IND_ID = seq(1, 140, 10),   rs1042364 =
>   sample( c("(1,1)","(1,2)","(2,2)"),14,replace = T), first_drink =
>   sample(5:17,14,replace = T))
>
>
>
>     asubs112 <- subset(age.int, rs1042364 != "(2,2)")
>
>
>     ages112 <- sort(unique(na.omit(asubs112$first_drink)))
>
>   for ( i in ages112) {
>     indce <- which(na.omit(asubs112$first_drink == i))
>     prop <- length(indce)/nrow(asubs112)
>     asubs112[indce,4] <- prop
>     asubs112[indce,]
>   }
>
>   Below is the output that I get from the script above. Notice the
>   proportion for the first NA but not any of the others. Not sure what
>   I am doing wrong, any suggestions are a big help.
>
>   TIA,
>   Adrian
>
>    asubs112[1:50,]
>       IND_ID rs1042364 first_drink age_int          V5
> 4   10008007     (1,2)          NA      16 0.003891051
> 6   10013012     (1,2)          13      14 0.116731518
> 7   10015006     (1,2)          12      17 0.105058366
> 8   10015007     (1,1)          12      16 0.105058366
> 10  10021009     (1,2)          NA      15          NA
> 14  10039036     (1,2)          NA      15          NA
> 15  10039037     (1,2)          NA      13          NA
> 17  10045005     (1,2)          13      17 0.116731518
> 18  10045014     (1,2)          13      14 0.116731518
> 21  10055022     (1,2)          NA      15          NA
>
>
>
>
>
>
>  [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: adding variable into dataframe by indice

Adrian Katschke
Thank you for your assistance. I find that the by using the list I learn something new everytime.
   
  --Adrian
   
   
  Hi

not sure if I understand correctly but table() can be used

ttt <- table(asubs112$fir)
prop <- ttt/nrow(asubs112)
asubs112$prop <-
prop[match(asubs112$first_drink,as.numeric(names(prop)))]

> asubs112
   IND_ID rs1042364 first_drink      prop
2      11     (1,2)           7 0.3333333
5      41     (1,2)          11 0.1111111
6      51     (1,2)           7 0.3333333
7      61     (1,1)           7 0.3333333
8      71     (1,1)          12 0.2222222
10     91     (1,1)           6 0.1111111
11    101     (1,2)           5 0.1111111
12    111     (1,2)          13 0.1111111
14    131     (1,2)          12 0.2222222
>

HTH
Petr


On 8 Feb 2006 at 9:40, Adrian Katschke wrote:

Date sent:       Wed, 8 Feb 2006 09:40:43 -0800 (PST)
From:            Adrian Katschke <[hidden email]>
To:              RHelp <[hidden email]>
Subject:         [R] adding variable into dataframe by indice

>   R-Helpers,
>
>   I am trying to insert a value into a dataframe. This value is a
>   proportion calculated by counting the number of those individuals
>   with that value and then inserting the proportion at the end of the
>   dataframe to only those individuals with the given value. The
>   problem I am running into is that the proportions are not being
>   attached to only those individuals with the specified value for
that
>   proportion.
>
>   Below is an example of the code that I am using. The data is made
up

>   for the dataframe. Should give you an idea, but the original has
>   'NA' in many rows. The original data is what is reported in the
>   output below.
>
>     #Read in Data
>   age.int <- data.frame(IND_ID = seq(1, 140, 10),   rs1042364 =
>   sample( c("(1,1)","(1,2)","(2,2)"),14,replace = T), first_drink =
>   sample(5:17,14,replace = T))
>
>
>
>     asubs112 <- subset(age.int, rs1042364 != "(2,2)")
>
>
>     ages112 <- sort(unique(na.omit(asubs112$first_drink)))
>
>   for ( i in ages112) {
>     indce <- which(na.omit(asubs112$first_drink == i))
>     prop <- length(indce)/nrow(asubs112)
>     asubs112[indce,4] <- prop
>     asubs112[indce,]
>   }
>
>   Below is the output that I get from the script above. Notice the
>   proportion for the first NA but not any of the others. Not sure
what

>   I am doing wrong, any suggestions are a big help.
>
>   TIA,
>   Adrian
>
>    asubs112[1:50,]
>       IND_ID rs1042364 first_drink age_int          V5
> 4   10008007     (1,2)          NA      16 0.003891051
> 6   10013012     (1,2)          13      14 0.116731518
> 7   10015006     (1,2)          12      17 0.105058366
> 8   10015007     (1,1)          12      16 0.105058366
> 10  10021009     (1,2)          NA      15          NA
> 14  10039036     (1,2)          NA      15          NA
> 15  10039037     (1,2)          NA      13          NA
> 17  10045005     (1,2)          13      17 0.116731518
> 18  10045014     (1,2)          13      14 0.116731518
> 21  10055022     (1,2)          NA      15          NA
>
>
>
>
>
>
>  [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[hidden email]




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html