How to replace missing values by mean of subgroup of a group

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to replace missing values by mean of subgroup of a group

R help mailing list-2
 Hello,I have the following food data with some NA values in the food prices. I will like to replace the NA values in the food price column for each food item by the mean price of the specific food item for each city. For example, the price of bean for the household with hhid 102 in the data set is missing. I will like to replace the missing value with the mean price of bean for the households living in Paxton city (that is households 101 and 103). the data set is presented below. Any help will be greatly appreciated.

| hhid | city | food | food price |
| 101 | Paxton | rice | 10 |
| 101 | Paxton | beans | 30 |
| 101 | Paxton | flour | NA |
| 101 | Paxton | eggs | 20 |
| 102 | Paxton | rice | NA |
| 102 | Paxton | beans | NA |
| 102 | Paxton | flour | 34 |
| 102 | Paxton | eggs | 21 |
| 103 | Paxton | rice | 15 |
| 103 | Paxton | beans | 28 |
| 103 | Paxton | flour | 32 |
| 103 | Paxton | eggs | NA |
| 104 | Hull | rice | NA |
| 104 | Hull | beans | 34 |
| 104 | Hull | flour | NA |
| 104 | Hull | eggs | 24 |
| 105 | Hull | rice | 18 |
| 105 | Hull | beans | 38 |
| 105 | Hull | flour | 36 |
| 105 | Hull | eggs | 26 |
| 106 | Hull | rice | NA |
| 106 | Hull | beans | NA |
| 106 | Hull | flour | 40 |
| 106 | Hull | eggs | NA |


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to replace missing values by mean of subgroup of a group

Boris Steipe
Pedestrian code, so you can analyze this easily. However entirely untested since I have no ambitionto recreate your input data as a data frame. This code assumes:
 - your data _is_ a data frame
 - the desired column is called food.price, not "food price" (cf. ?make.names )

# define a function that imputes NA values in the same city, for the same food
imputeFoodPrice <- function(DF, i) {
  sel <- DF$city == DF$city[i] & DF$food == DF$food[i]
  imputed <- mean(DF$food.price[sel], na.rm = TRUE)
  if (is.nan(imputed)) { # careful, there might be no other match
    imputed <- NA
  }
  return(imputed)
}


# apply the function to replace NA values
for (iMissing in which(is.na(myDF$food.price))) {
  myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing)
}


B.



> On May 9, 2017, at 3:14 PM, Olu Ola via R-help <[hidden email]> wrote:
>
> Hello,I have the following food data with some NA values in the food prices. I will like to replace the NA values in the food price column for each food item by the mean price of the specific food item for each city. For example, the price of bean for the household with hhid 102 in the data set is missing. I will like to replace the missing value with the mean price of bean for the households living in Paxton city (that is households 101 and 103). the data set is presented below. Any help will be greatly appreciated.
>
> | hhid | city | food | food price |
> | 101 | Paxton | rice | 10 |
> | 101 | Paxton | beans | 30 |
> | 101 | Paxton | flour | NA |
> | 101 | Paxton | eggs | 20 |
> | 102 | Paxton | rice | NA |
> | 102 | Paxton | beans | NA |
> | 102 | Paxton | flour | 34 |
> | 102 | Paxton | eggs | 21 |
> | 103 | Paxton | rice | 15 |
> | 103 | Paxton | beans | 28 |
> | 103 | Paxton | flour | 32 |
> | 103 | Paxton | eggs | NA |
> | 104 | Hull | rice | NA |
> | 104 | Hull | beans | 34 |
> | 104 | Hull | flour | NA |
> | 104 | Hull | eggs | 24 |
> | 105 | Hull | rice | 18 |
> | 105 | Hull | beans | 38 |
> | 105 | Hull | flour | 36 |
> | 105 | Hull | eggs | 26 |
> | 106 | Hull | rice | NA |
> | 106 | Hull | beans | NA |
> | 106 | Hull | flour | 40 |
> | 106 | Hull | eggs | NA |
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to replace missing values by mean of subgroup of a group

Boris Steipe
Great.

I am CC'ing the list - this is important so that others who may come across this thread in the archives know that this question has been resolved.

Cheers,

B.


> On May 9, 2017, at 5:18 PM, Olu Ola <[hidden email]> wrote:
>
> Thank you!!! It worked.
>
> Regards,
> Olu
>
>
> On Tuesday, May 9, 2017 4:20 PM, Boris Steipe <[hidden email]> wrote:
>
>
> Pedestrian code, so you can analyze this easily. However entirely untested since I have no ambitionto recreate your input data as a data frame. This code assumes:
> - your data _is_ a data frame
> - the desired column is called food.price, not "food price" (cf. ?make.names )
>
> # define a function that imputes NA values in the same city, for the same food
> imputeFoodPrice <- function(DF, i) {
>   sel <- DF$city == DF$city[i] & DF$food == DF$food[i]
>   imputed <- mean(DF$food.price[sel], na.rm = TRUE)
>   if (is.nan(imputed)) { # careful, there might be no other match
>     imputed <- NA
>   }
>   return(imputed)
> }
>
>
> # apply the function to replace NA values
> for (iMissing in which(is.na(myDF$food.price))) {
>   myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing)
> }
>
>
> B.
>
>
>
> > On May 9, 2017, at 3:14 PM, Olu Ola via R-help <[hidden email]> wrote:
> >
> > Hello,I have the following food data with some NA values in the food prices. I will like to replace the NA values in the food price column for each food item by the mean price of the specific food item for each city. For example, the price of bean for the household with hhid 102 in the data set is missing. I will like to replace the missing value with the mean price of bean for the households living in Paxton city (that is households 101 and 103). the data set is presented below. Any help will be greatly appreciated.
> >
> > | hhid | city | food | food price |
> > | 101 | Paxton | rice | 10 |
> > | 101 | Paxton | beans | 30 |
> > | 101 | Paxton | flour | NA |
> > | 101 | Paxton | eggs | 20 |
> > | 102 | Paxton | rice | NA |
> > | 102 | Paxton | beans | NA |
> > | 102 | Paxton | flour | 34 |
> > | 102 | Paxton | eggs | 21 |
> > | 103 | Paxton | rice | 15 |
> > | 103 | Paxton | beans | 28 |
> > | 103 | Paxton | flour | 32 |
> > | 103 | Paxton | eggs | NA |
> > | 104 | Hull | rice | NA |
> > | 104 | Hull | beans | 34 |
> > | 104 | Hull | flour | NA |
> > | 104 | Hull | eggs | 24 |
> > | 105 | Hull | rice | 18 |
> > | 105 | Hull | beans | 38 |
> > | 105 | Hull | flour | 36 |
> > | 105 | Hull | eggs | 26 |
> > | 106 | Hull | rice | NA |
> > | 106 | Hull | beans | NA |
> > | 106 | Hull | flour | 40 |
> > | 106 | Hull | eggs | NA |
> >
> >
> >     [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to replace missing values by mean of subgroup of a group

Bert Gunter-2
Of course, statistically, one should not do this. But that's another thread
on another site.

Cheers,
Bert



On May 9, 2017 3:36 PM, "Boris Steipe" <[hidden email]> wrote:

Great.

I am CC'ing the list - this is important so that others who may come across
this thread in the archives know that this question has been resolved.

Cheers,

B.


> On May 9, 2017, at 5:18 PM, Olu Ola <[hidden email]> wrote:
>
> Thank you!!! It worked.
>
> Regards,
> Olu
>
>
> On Tuesday, May 9, 2017 4:20 PM, Boris Steipe <[hidden email]>
wrote:
>
>
> Pedestrian code, so you can analyze this easily. However entirely
untested since I have no ambitionto recreate your input data as a data
frame. This code assumes:
> - your data _is_ a data frame
> - the desired column is called food.price, not "food price" (cf.
?make.names )
>
> # define a function that imputes NA values in the same city, for the same
food

> imputeFoodPrice <- function(DF, i) {
>   sel <- DF$city == DF$city[i] & DF$food == DF$food[i]
>   imputed <- mean(DF$food.price[sel], na.rm = TRUE)
>   if (is.nan(imputed)) { # careful, there might be no other match
>     imputed <- NA
>   }
>   return(imputed)
> }
>
>
> # apply the function to replace NA values
> for (iMissing in which(is.na(myDF$food.price))) {
>   myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing)
> }
>
>
> B.
>
>
>
> > On May 9, 2017, at 3:14 PM, Olu Ola via R-help <[hidden email]>
wrote:
> >
> > Hello,I have the following food data with some NA values in the food
prices. I will like to replace the NA values in the food price column for
each food item by the mean price of the specific food item for each city.
For example, the price of bean for the household with hhid 102 in the data
set is missing. I will like to replace the missing value with the mean
price of bean for the households living in Paxton city (that is households
101 and 103). the data set is presented below. Any help will be greatly
appreciated.

> >
> > | hhid | city | food | food price |
> > | 101 | Paxton | rice | 10 |
> > | 101 | Paxton | beans | 30 |
> > | 101 | Paxton | flour | NA |
> > | 101 | Paxton | eggs | 20 |
> > | 102 | Paxton | rice | NA |
> > | 102 | Paxton | beans | NA |
> > | 102 | Paxton | flour | 34 |
> > | 102 | Paxton | eggs | 21 |
> > | 103 | Paxton | rice | 15 |
> > | 103 | Paxton | beans | 28 |
> > | 103 | Paxton | flour | 32 |
> > | 103 | Paxton | eggs | NA |
> > | 104 | Hull | rice | NA |
> > | 104 | Hull | beans | 34 |
> > | 104 | Hull | flour | NA |
> > | 104 | Hull | eggs | 24 |
> > | 105 | Hull | rice | 18 |
> > | 105 | Hull | beans | 38 |
> > | 105 | Hull | flour | 36 |
> > | 105 | Hull | eggs | 26 |
> > | 106 | Hull | rice | NA |
> > | 106 | Hull | beans | NA |
> > | 106 | Hull | flour | 40 |
> > | 106 | Hull | eggs | NA |
> >
> >
> >     [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to replace missing values by mean of subgroup of a group

Boris Steipe
Of course, and I neglected to point this out:

"The other thread" would be how to properly impute missing values,
"The other site" could be https://stats.stackexchange.com/
... there is _lots_ of information available if you search for it.

An applicable R Package is MICE and you can find an introduction here:
  https://datascienceplus.com/imputing-missing-data-with-r-mice-package/

(Besides the usual documentation.)


B.



> On May 9, 2017, at 8:44 PM, Bert Gunter <[hidden email]> wrote:
>
> Of course, statistically, one should not do this. But that's another thread on another site.
>
> Cheers,
> Bert
>
>
>
> On May 9, 2017 3:36 PM, "Boris Steipe" <[hidden email]> wrote:
> Great.
>
> I am CC'ing the list - this is important so that others who may come across this thread in the archives know that this question has been resolved.
>
> Cheers,
>
> B.
>
>
> > On May 9, 2017, at 5:18 PM, Olu Ola <[hidden email]> wrote:
> >
> > Thank you!!! It worked.
> >
> > Regards,
> > Olu
> >
> >
> > On Tuesday, May 9, 2017 4:20 PM, Boris Steipe <[hidden email]> wrote:
> >
> >
> > Pedestrian code, so you can analyze this easily. However entirely untested since I have no ambitionto recreate your input data as a data frame. This code assumes:
> > - your data _is_ a data frame
> > - the desired column is called food.price, not "food price" (cf. ?make.names )
> >
> > # define a function that imputes NA values in the same city, for the same food
> > imputeFoodPrice <- function(DF, i) {
> >   sel <- DF$city == DF$city[i] & DF$food == DF$food[i]
> >   imputed <- mean(DF$food.price[sel], na.rm = TRUE)
> >   if (is.nan(imputed)) { # careful, there might be no other match
> >     imputed <- NA
> >   }
> >   return(imputed)
> > }
> >
> >
> > # apply the function to replace NA values
> > for (iMissing in which(is.na(myDF$food.price))) {
> >   myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing)
> > }
> >
> >
> > B.
> >
> >
> >
> > > On May 9, 2017, at 3:14 PM, Olu Ola via R-help <[hidden email]> wrote:
> > >
> > > Hello,I have the following food data with some NA values in the food prices. I will like to replace the NA values in the food price column for each food item by the mean price of the specific food item for each city. For example, the price of bean for the household with hhid 102 in the data set is missing. I will like to replace the missing value with the mean price of bean for the households living in Paxton city (that is households 101 and 103). the data set is presented below. Any help will be greatly appreciated.
> > >
> > > | hhid | city | food | food price |
> > > | 101 | Paxton | rice | 10 |
> > > | 101 | Paxton | beans | 30 |
> > > | 101 | Paxton | flour | NA |
> > > | 101 | Paxton | eggs | 20 |
> > > | 102 | Paxton | rice | NA |
> > > | 102 | Paxton | beans | NA |
> > > | 102 | Paxton | flour | 34 |
> > > | 102 | Paxton | eggs | 21 |
> > > | 103 | Paxton | rice | 15 |
> > > | 103 | Paxton | beans | 28 |
> > > | 103 | Paxton | flour | 32 |
> > > | 103 | Paxton | eggs | NA |
> > > | 104 | Hull | rice | NA |
> > > | 104 | Hull | beans | 34 |
> > > | 104 | Hull | flour | NA |
> > > | 104 | Hull | eggs | 24 |
> > > | 105 | Hull | rice | 18 |
> > > | 105 | Hull | beans | 38 |
> > > | 105 | Hull | flour | 36 |
> > > | 105 | Hull | eggs | 26 |
> > > | 106 | Hull | rice | NA |
> > > | 106 | Hull | beans | NA |
> > > | 106 | Hull | flour | 40 |
> > > | 106 | Hull | eggs | NA |
> > >
> > >
> > >     [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.