Data frame with Factor column missing data change to NA

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Data frame with Factor column missing data change to NA

Bill Poling
Good morning.

#I have df with a Factor column called "NonAcceptanceOther" that contains missing data.

#Not every record in the df is expected to have a value in this column.

# Typical values look like:
# ERS
# Claim paid without PHX recommended savings
# Claim paid without PHX recommended savings
# MRC Amount
# MRC Amount
# PPO per provider
#Or they are missing (blank)

#Example

df2 <- df[,c("PlaceOfService","ClaimStatusID","NonAcceptanceOther","RejectionCodeID","CPTCats","RevCodeCats","GCode2","ClaimTypeID")]
head(df2, n=20)

   PlaceOfService ClaimStatusID                         NonAcceptanceOther RejectionCodeID          CPTCats     RevCodeCats GCode2 ClaimTypeID

1              11             2                                                         NA          ResPSys NotValidRevCode      2           2

2              81             3                                                         53       PathandLab NotValidRevCode      2           2

3              11             3                                                         47         Medicine NotValidRevCode      1           2

4              09             2                                                         NA           NotCPT NotValidRevCode      1           2

5              11             2                                                         NA        Radiology NotValidRevCode      2           2

6              23             2                                                         NA       MusculoSys NotValidRevCode      2           2

7              12             3                                                         47           NotCPT NotValidRevCode      2           2

8              12             2                                                         NA         Medicine NotValidRevCode      2           2

9              11             3                                                         47         Medicine NotValidRevCode      1           2

10             21             2                                                         NA       Anesthesia NotValidRevCode      2           2

11             11             3                                        ERS              30      EvalandMgmt NotValidRevCode      2           2

12             81             2                                                         NA       PathandLab NotValidRevCode      2           2

13             21             2                                                         NA        Radiology NotValidRevCode      1           2

14             11             2                                                         NA         Medicine NotValidRevCode      1           2

15             99             3 Claim paid without PHX recommended savings              30 CardioHemLympSys             Lab      0           1

16             99             3 Claim paid without PHX recommended savings              30       PathandLab             Lab      0           1

17             99             3                                 MRC Amount              30           NotCPT          Pharma      2           1

18             99             3                                 MRC Amount              30       PathandLab             Lab      2           1

19             81             2                                                         NA       PathandLab NotValidRevCode      2           2

20             23             2                                                         NA         IntegSys NotValidRevCode      1           2

#I would like to set these missing to NA and have them reflected similarly to an NA in a numeric or integer column if possible.

#I have tried several approaches from Googled references:

NonAcceptanceOther <- df$NonAcceptanceOther
table(addNA(NonAcceptanceOther))

is.na <- df$NonAcceptanceOther

df[NonAcceptanceOther == '' | NonAcceptanceOther == 'NA'] <- NA

#However, when I go to use:

missingDF <- PlotMissing(df)

#Only the columns that are numeric or integer reflect their missing values (i.e. RejectionCodeID)  and this "NonAcceptanceOther" column does not reflect or hold the NA values?

Thank you for any advice.

WHP












Confidentiality Notice This message is sent from Zelis. ...{{dropped:16}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data frame with Factor column missing data change to NA

Jim Lemon-4
Hi Bill,
It may be that the NonAcceptanceOther, being a character value, has ""
(0 length string) rather than NA. You can convert that to NA like
this:

df2$NonAcceptanceOther[nchar(df2$NonAcceptanceOther) == 0]<-NA

Jim


On Thu, Jun 14, 2018 at 12:47 AM, Bill Poling <[hidden email]> wrote:

> Good morning.
>
> #I have df with a Factor column called "NonAcceptanceOther" that contains missing data.
>
> #Not every record in the df is expected to have a value in this column.
>
> # Typical values look like:
> # ERS
> # Claim paid without PHX recommended savings
> # Claim paid without PHX recommended savings
> # MRC Amount
> # MRC Amount
> # PPO per provider
> #Or they are missing (blank)
>
> #Example
>
> df2 <- df[,c("PlaceOfService","ClaimStatusID","NonAcceptanceOther","RejectionCodeID","CPTCats","RevCodeCats","GCode2","ClaimTypeID")]
> head(df2, n=20)
>
>    PlaceOfService ClaimStatusID                         NonAcceptanceOther RejectionCodeID          CPTCats     RevCodeCats GCode2 ClaimTypeID
>
> 1              11             2                                                         NA          ResPSys NotValidRevCode      2           2
>
> 2              81             3                                                         53       PathandLab NotValidRevCode      2           2
>
> 3              11             3                                                         47         Medicine NotValidRevCode      1           2
>
> 4              09             2                                                         NA           NotCPT NotValidRevCode      1           2
>
> 5              11             2                                                         NA        Radiology NotValidRevCode      2           2
>
> 6              23             2                                                         NA       MusculoSys NotValidRevCode      2           2
>
> 7              12             3                                                         47           NotCPT NotValidRevCode      2           2
>
> 8              12             2                                                         NA         Medicine NotValidRevCode      2           2
>
> 9              11             3                                                         47         Medicine NotValidRevCode      1           2
>
> 10             21             2                                                         NA       Anesthesia NotValidRevCode      2           2
>
> 11             11             3                                        ERS              30      EvalandMgmt NotValidRevCode      2           2
>
> 12             81             2                                                         NA       PathandLab NotValidRevCode      2           2
>
> 13             21             2                                                         NA        Radiology NotValidRevCode      1           2
>
> 14             11             2                                                         NA         Medicine NotValidRevCode      1           2
>
> 15             99             3 Claim paid without PHX recommended savings              30 CardioHemLympSys             Lab      0           1
>
> 16             99             3 Claim paid without PHX recommended savings              30       PathandLab             Lab      0           1
>
> 17             99             3                                 MRC Amount              30           NotCPT          Pharma      2           1
>
> 18             99             3                                 MRC Amount              30       PathandLab             Lab      2           1
>
> 19             81             2                                                         NA       PathandLab NotValidRevCode      2           2
>
> 20             23             2                                                         NA         IntegSys NotValidRevCode      1           2
>
> #I would like to set these missing to NA and have them reflected similarly to an NA in a numeric or integer column if possible.
>
> #I have tried several approaches from Googled references:
>
> NonAcceptanceOther <- df$NonAcceptanceOther
> table(addNA(NonAcceptanceOther))
>
> is.na <- df$NonAcceptanceOther
>
> df[NonAcceptanceOther == '' | NonAcceptanceOther == 'NA'] <- NA
>
> #However, when I go to use:
>
> missingDF <- PlotMissing(df)
>
> #Only the columns that are numeric or integer reflect their missing values (i.e. RejectionCodeID)  and this "NonAcceptanceOther" column does not reflect or hold the NA values?
>
> Thank you for any advice.
>
> WHP
>
>
>
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:16}}
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data frame with Factor column missing data change to NA

Bill Poling
#Good morning Jim, thank you for your response and guidance.


So I ran the suggested and got: Error in nchar(df2$NonAcceptanceOther) :   'nchar()' requires a character vector

So I ran this:

df2$NonAcceptanceOther[] <- lapply(df2$NonAcceptanceOther,as.character)

#Then tried again.

#But still getting the error?

#Because the column remains a factor?
names(df2)

#[1] "PlaceOfService"     "ClaimStatusID"      "NonAcceptanceOther" "RejectionCodeID"    "CPTCats"            "RevCodeCats"        "GCode2"             "ClaimTypeID"

classes <- as.character(sapply(df2, class))
classes


#[1] "factor"  "integer" "factor"  "integer" "factor"  "factor"  "integer" "integer"



#Not sure if this structure helps, I guess that the 1L’s are the missing

dput(head(df2$NonAcceptanceOther, 25))

#structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 118L, 1L,

#1L, 1L, 64L, 64L, 134L, 134L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)



#View from the CSV file original data


NonAcceptanceOther











ERS




Claim paid without PHX recommended savings

Claim paid without PHX recommended savings

MRC Amount

MRC Amount









Appreciate your help Sir.

WHP

From: Jim Lemon [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 8:30 PM
To: Bill Poling <[hidden email]>
Cc: r-help ([hidden email]) <[hidden email]>
Subject: Re: [R] Data frame with Factor column missing data change to NA

Hi Bill,
It may be that the NonAcceptanceOther, being a character value, has ""
(0 length string) rather than NA. You can convert that to NA like
this:

df2$NonAcceptanceOther[nchar(df2$NonAcceptanceOther) == 0]<-NA

Jim


On Thu, Jun 14, 2018 at 12:47 AM, Bill Poling <[hidden email]<mailto:[hidden email]>> wrote:

> Good morning.
>
> #I have df with a Factor column called "NonAcceptanceOther" that contains missing data.
>
> #Not every record in the df is expected to have a value in this column.
>
> # Typical values look like:
> # ERS
> # Claim paid without PHX recommended savings
> # Claim paid without PHX recommended savings
> # MRC Amount
> # MRC Amount
> # PPO per provider
> #Or they are missing (blank)
>
> #Example
>
> df2 <- df[,c("PlaceOfService","ClaimStatusID","NonAcceptanceOther","RejectionCodeID","CPTCats","RevCodeCats","GCode2","ClaimTypeID")]
> head(df2, n=20)
>
> PlaceOfService ClaimStatusID NonAcceptanceOther RejectionCodeID CPTCats RevCodeCats GCode2 ClaimTypeID
>
> 1 11 2 NA ResPSys NotValidRevCode 2 2
>
> 2 81 3 53 PathandLab NotValidRevCode 2 2
>
> 3 11 3 47 Medicine NotValidRevCode 1 2
>
> 4 09 2 NA NotCPT NotValidRevCode 1 2
>
> 5 11 2 NA Radiology NotValidRevCode 2 2
>
> 6 23 2 NA MusculoSys NotValidRevCode 2 2
>
> 7 12 3 47 NotCPT NotValidRevCode 2 2
>
> 8 12 2 NA Medicine NotValidRevCode 2 2
>
> 9 11 3 47 Medicine NotValidRevCode 1 2
>
> 10 21 2 NA Anesthesia NotValidRevCode 2 2
>
> 11 11 3 ERS 30 EvalandMgmt NotValidRevCode 2 2
>
> 12 81 2 NA PathandLab NotValidRevCode 2 2
>
> 13 21 2 NA Radiology NotValidRevCode 1 2
>
> 14 11 2 NA Medicine NotValidRevCode 1 2
>
> 15 99 3 Claim paid without PHX recommended savings 30 CardioHemLympSys Lab 0 1
>
> 16 99 3 Claim paid without PHX recommended savings 30 PathandLab Lab 0 1
>
> 17 99 3 MRC Amount 30 NotCPT Pharma 2 1
>
> 18 99 3 MRC Amount 30 PathandLab Lab 2 1
>
> 19 81 2 NA PathandLab NotValidRevCode 2 2
>
> 20 23 2 NA IntegSys NotValidRevCode 1 2
>
> #I would like to set these missing to NA and have them reflected similarly to an NA in a numeric or integer column if possible.
>
> #I have tried several approaches from Googled references:
>
> NonAcceptanceOther <- df$NonAcceptanceOther
> table(addNA(NonAcceptanceOther))
>
> is.na<http://is.na> <- df$NonAcceptanceOther
>
> df[NonAcceptanceOther == '' | NonAcceptanceOther == 'NA'] <- NA
>
> #However, when I go to use:
>
> missingDF <- PlotMissing(df)
>
> #Only the columns that are numeric or integer reflect their missing values (i.e. RejectionCodeID) and this "NonAcceptanceOther" column does not reflect or hold the NA values?
>
> Thank you for any advice.
>
> WHP
>
>
>
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:16}}
>
> ______________________________________________
> [hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.

Confidentiality Notice This message is sent from Zelis. This transmission may contain information which is privileged and confidential and is intended for the personal and confidential use of the named recipient only. Such information may be protected by applicable State and Federal laws from this disclosure or unauthorized use. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any disclosure, review, discussion, copying, or taking any action in reliance on the contents of this transmission is strictly prohibited. If you have received this transmission in error, please contact the sender immediately. Zelis, 2018.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data frame with Factor column missing data change to NA

Bill Poling
Jim,
Actually, I got this to work.

df$NonAcceptanceOther[df$NonAcceptanceOther==""]<- NA
df$NonAcceptanceOther

missingDF <- plot_missing(df)

# missingDF
#                  feature                num_missing   pct_missing    group
# 13 NonAcceptanceOther       26157       0.86859932257 Remove


Good to go now, for the moment, big smile!

Thank you for your help Sir.


WHP




From: Bill Poling
Sent: Thursday, June 14, 2018 6:49 AM
To: 'Jim Lemon' <[hidden email]>
Cc: r-help ([hidden email]) <[hidden email]>
Subject: RE: [R] Data frame with Factor column missing data change to NA

#Good morning Jim, thank you for your response and guidance.


So I ran the suggested and got: Error in nchar(df2$NonAcceptanceOther) :   'nchar()' requires a character vector

So I ran this:

df2$NonAcceptanceOther[] <- lapply(df2$NonAcceptanceOther,as.character)

#Then tried again.

#But still getting the error?

#Because the column remains a factor?
names(df2)

#[1] "PlaceOfService"     "ClaimStatusID"      "NonAcceptanceOther" "RejectionCodeID"    "CPTCats"            "RevCodeCats"        "GCode2"             "ClaimTypeID"

classes <- as.character(sapply(df2, class))
classes


#[1] "factor"  "integer" "factor"  "integer" "factor"  "factor"  "integer" "integer"



#Not sure if this structure helps, I guess that the 1L’s are the missing

dput(head(df2$NonAcceptanceOther, 25))

#structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 118L, 1L,

#1L, 1L, 64L, 64L, 134L, 134L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)



#View from the CSV file original data


NonAcceptanceOther











ERS




Claim paid without PHX recommended savings

Claim paid without PHX recommended savings

MRC Amount

MRC Amount









Appreciate your help Sir.

WHP

From: Jim Lemon [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 8:30 PM
To: Bill Poling <[hidden email]<mailto:[hidden email]>>
Cc: r-help ([hidden email]<mailto:[hidden email]>) <[hidden email]<mailto:[hidden email]>>
Subject: Re: [R] Data frame with Factor column missing data change to NA

Hi Bill,
It may be that the NonAcceptanceOther, being a character value, has ""
(0 length string) rather than NA. You can convert that to NA like
this:

df2$NonAcceptanceOther[nchar(df2$NonAcceptanceOther) == 0]<-NA

Jim


On Thu, Jun 14, 2018 at 12:47 AM, Bill Poling <[hidden email]<mailto:[hidden email]>> wrote:

> Good morning.
>
> #I have df with a Factor column called "NonAcceptanceOther" that contains missing data.
>
> #Not every record in the df is expected to have a value in this column.
>
> # Typical values look like:
> # ERS
> # Claim paid without PHX recommended savings
> # Claim paid without PHX recommended savings
> # MRC Amount
> # MRC Amount
> # PPO per provider
> #Or they are missing (blank)
>
> #Example
>
> df2 <- df[,c("PlaceOfService","ClaimStatusID","NonAcceptanceOther","RejectionCodeID","CPTCats","RevCodeCats","GCode2","ClaimTypeID")]
> head(df2, n=20)
>
> PlaceOfService ClaimStatusID NonAcceptanceOther RejectionCodeID CPTCats RevCodeCats GCode2 ClaimTypeID
>
> 1 11 2 NA ResPSys NotValidRevCode 2 2
>
> 2 81 3 53 PathandLab NotValidRevCode 2 2
>
> 3 11 3 47 Medicine NotValidRevCode 1 2
>
> 4 09 2 NA NotCPT NotValidRevCode 1 2
>
> 5 11 2 NA Radiology NotValidRevCode 2 2
>
> 6 23 2 NA MusculoSys NotValidRevCode 2 2
>
> 7 12 3 47 NotCPT NotValidRevCode 2 2
>
> 8 12 2 NA Medicine NotValidRevCode 2 2
>
> 9 11 3 47 Medicine NotValidRevCode 1 2
>
> 10 21 2 NA Anesthesia NotValidRevCode 2 2
>
> 11 11 3 ERS 30 EvalandMgmt NotValidRevCode 2 2
>
> 12 81 2 NA PathandLab NotValidRevCode 2 2
>
> 13 21 2 NA Radiology NotValidRevCode 1 2
>
> 14 11 2 NA Medicine NotValidRevCode 1 2
>
> 15 99 3 Claim paid without PHX recommended savings 30 CardioHemLympSys Lab 0 1
>
> 16 99 3 Claim paid without PHX recommended savings 30 PathandLab Lab 0 1
>
> 17 99 3 MRC Amount 30 NotCPT Pharma 2 1
>
> 18 99 3 MRC Amount 30 PathandLab Lab 2 1
>
> 19 81 2 NA PathandLab NotValidRevCode 2 2
>
> 20 23 2 NA IntegSys NotValidRevCode 1 2
>
> #I would like to set these missing to NA and have them reflected similarly to an NA in a numeric or integer column if possible.
>
> #I have tried several approaches from Googled references:
>
> NonAcceptanceOther <- df$NonAcceptanceOther
> table(addNA(NonAcceptanceOther))
>
> is.na<http://is.na> <- df$NonAcceptanceOther
>
> df[NonAcceptanceOther == '' | NonAcceptanceOther == 'NA'] <- NA
>
> #However, when I go to use:
>
> missingDF <- PlotMissing(df)
>
> #Only the columns that are numeric or integer reflect their missing values (i.e. RejectionCodeID) and this "NonAcceptanceOther" column does not reflect or hold the NA values?
>
> Thank you for any advice.
>
> WHP
>
>
>
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:16}}
>
> ______________________________________________
> [hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.

Confidentiality Notice This message is sent from Zelis. This transmission may contain information which is privileged and confidential and is intended for the personal and confidential use of the named recipient only. Such information may be protected by applicable State and Federal laws from this disclosure or unauthorized use. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any disclosure, review, discussion, copying, or taking any action in reliance on the contents of this transmission is strictly prohibited. If you have received this transmission in error, please contact the sender immediately. Zelis, 2018.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.