Blanking out specific cells in a data frame

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Blanking out specific cells in a data frame

Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
Hi R-Help,

I am a new R user.  I have used SAS for many years (just FYI on what I
am used to and possible obstacles it presents).

I have a data frame:

mydf <-data.frame(matrix(rnorm(102), ncol=6)

I would like to be able to delete ALL the information in column 6 for
rows 2 through r, where r=#rows.

How can I do that?

I want the resulting data frame to look like this:

X1 X2 X3 X4 X5 X6
Data data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data


(Sorry for using the word "data" and not having a cut and paste of my
actual data frame...the computer I am writing this email on does not
have R so I have to improvise.)

Thanks,

Jen

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Blanking out specific cells in a data frame

Stefan Grosse-2
Am 19.05.2010 19:36, schrieb Sabatier, Jennifer F. (CDC/OID/NCHHSTP):
> mydf<-data.frame(matrix(rnorm(102), ncol=6)
>    
you mean something like:
mydf[2:length(mydf[,1]),6]<-NA

hth
Stefan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Blanking out specific cells in a data frame

Ista Zahn
In reply to this post by Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
Hi Jen,
You cannot have a dataframe with unequal column lengths, so you have two
options (well maybe more, but two that come to mind): set the values to
missing instead of deleting them, or store your data in a list instead of a
data frame.

For option 1 (recommended) all you need is
mydf[-1, 6] <- NA

Best,
Ista
On Wednesday 19 May 2010 1:36:50 pm Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
wrote:

> Hi R-Help,
>
> I am a new R user.  I have used SAS for many years (just FYI on what I
> am used to and possible obstacles it presents).
>
> I have a data frame:
>
> mydf <-data.frame(matrix(rnorm(102), ncol=6)
>
> I would like to be able to delete ALL the information in column 6 for
> rows 2 through r, where r=#rows.
>
> How can I do that?
>
> I want the resulting data frame to look like this:
>
> X1 X2 X3 X4 X5 X6
> Data data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
>
>
> (Sorry for using the word "data" and not having a cut and paste of my
> actual data frame...the computer I am writing this email on does not
> have R so I have to improvise.)
>
> Thanks,
>
> Jen
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Blanking out specific cells in a data frame

Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
In reply to this post by Stefan Grosse-2
SWEET!

That's EXACTLY what I need.  Thanks!

I can now just use NAToUnknown to turn the NA into blanks.

Thanks a lot!

Jen


-----Original Message-----
From: Stefan Grosse [mailto:[hidden email]]
Sent: Wednesday, May 19, 2010 1:53 PM
To: [hidden email]; Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
Subject: Re: [R] Blanking out specific cells in a data frame

Am 19.05.2010 19:36, schrieb Sabatier, Jennifer F. (CDC/OID/NCHHSTP):
> mydf<-data.frame(matrix(rnorm(102), ncol=6)
>    
you mean something like:
mydf[2:length(mydf[,1]),6]<-NA

hth
Stefan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Blanking out specific cells in a data frame

Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
In reply to this post by Ista Zahn
Hi Ista,

Thanks a lot for your response.  It looks like the solution Stefan
suggested is the same as you are and it works great.  

I do know that you can't actually have unequal column lengths.  The
reality is I am creating a pretty table to export to EXCEL and it
contains some summary statistics as well as the information from a
chi-square test (test stat, df, pvalue).  When I added the info from the
chi-square test to the data frame it populated all the rows and I just
wanted to know how to blank them out.

Now that I have Stefan's solution, which turns all the un-needed info
into NAs I can use NAToUnkown() to blank them out completely.

Thanks, again, for getting back to me!

Jen



-----Original Message-----
From: Ista Zahn [mailto:[hidden email]]
Sent: Wednesday, May 19, 2010 1:59 PM
To: [hidden email]
Cc: Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
Subject: Re: [R] Blanking out specific cells in a data frame

Hi Jen,
You cannot have a dataframe with unequal column lengths, so you have two

options (well maybe more, but two that come to mind): set the values to
missing instead of deleting them, or store your data in a list instead
of a
data frame.

For option 1 (recommended) all you need is
mydf[-1, 6] <- NA

Best,
Ista
On Wednesday 19 May 2010 1:36:50 pm Sabatier, Jennifer F.
(CDC/OID/NCHHSTP)
wrote:

> Hi R-Help,
>
> I am a new R user.  I have used SAS for many years (just FYI on what I
> am used to and possible obstacles it presents).
>
> I have a data frame:
>
> mydf <-data.frame(matrix(rnorm(102), ncol=6)
>
> I would like to be able to delete ALL the information in column 6 for
> rows 2 through r, where r=#rows.
>
> How can I do that?
>
> I want the resulting data frame to look like this:
>
> X1 X2 X3 X4 X5 X6
> Data data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
> Data data data data data
>
>
> (Sorry for using the word "data" and not having a cut and paste of my
> actual data frame...the computer I am writing this email on does not
> have R so I have to improvise.)
>
> Thanks,
>
> Jen
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Blanking out specific cells in a data frame

Stefan Grosse-2
Am 19.05.2010 20:08, schrieb Sabatier, Jennifer F. (CDC/OID/NCHHSTP):
> I do know that you can't actually have unequal column lengths.  The
> reality is I am creating a pretty table to export to EXCEL and it
>
> Now that I have Stefan's solution, which turns all the un-needed info
> into NAs I can use NAToUnkown() to blank them out completely.
>    
when I import data to Excel NA's are imported as empty cell's. So you
should not even need this function (which I do not know).

Stefan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Using svychisq inside user-defined function

Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
In reply to this post by Ista Zahn
Hi R-help,

Yes, this is my second request for assistance in a single day....

I am attempting to use svychisq() inside a function  I made.  The goal
of this function is to produce a table of summary statistics that I can
later output to EXCEL (simple frequencies and sample sizes from regular
crosstabulation on dataset "data" but the chi-square using survey
methods on "audit").

Here's my code (I can't supply data for you as I am not that
sophisticated and the real data is not cleared for public consumption -
I really apologize):


# create my svydesign object

audit <- svydesign(id~id, strata=~field, weights=~wt, data=data,
fpc=~AllocProportion)

# my function to create my table

mkMyCrossTable <- function(X, svyX, T) {

tbl <- crosstab(X, data$SEX, prop.c=TRUE)
tbl <- data.frame(cbind(tbl$t, tbl$prop.col))
tbl$var <- rownames(tbl)

chisq <- svychisq(~svyX + SEX, design=audit, statistic="adjWald",
round=4)
chisq <- data.frame(do.call("cbind", chisq)
chisq <- data.frame(chisq[,3])

Table <- data.frame(tbl$var,
                    paste(formatC(tbl$X0.1*100, format="f", digits=1),
"%", sep=""),
           tbl$X0,
                          paste(formatC(tbl$X1.1*100, format="f",
digits=1), "%", sep=""),
                          tbl$X1,
                          chisq[1])
Table[2: length(Table[,1]), 6] <- NA
Table <- NAToUnkown(Table, unknown = " ")
Colnames(Table) <- c(T, "Male (%)", "Male (n)", "Female (%)", "Female
(n)", "p-value")
Table

}

con3 <- mkMyCrossTable(data$con, con, "Constituency")





The error occurs with the "chisq <- svychisq(~X+SEX, design=audit,
statistic="adjWald", round=4)" part of my function.  I did debug() to
double check.

I get the error:  "Error in '[.data.frame'(design$variables, ,
as.character(rows)) :
                     Undefined columns selected"

My suspicion is that it doesn't like me referencing the variables in
"audit", but I don't know how to fix it.


Thanks,

Jen

PS.  I know my table-making function is terribly inelegant...

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using svychisq inside user-defined function

Thomas Lumley
On Wed, 19 May 2010, Sabatier, Jennifer F. (CDC/OID/NCHHSTP) wrote:

> Hi R-help,
>
> Yes, this is my second request for assistance in a single day....
>
> I am attempting to use svychisq() inside a function  I made.  The goal
> of this function is to produce a table of summary statistics that I can
> later output to EXCEL (simple frequencies and sample sizes from regular
> crosstabulation on dataset "data" but the chi-square using survey
> methods on "audit").
>
> Here's my code (I can't supply data for you as I am not that
> sophisticated and the real data is not cleared for public consumption -
> I really apologize):
>

You could have checked to see if the same problem occurs using one of the built-in data sets, but it turns out that the diagnosis is fairly straightforward. Also, since you use the function 'crosstab' without saying which package you got it from, it might be hard to get your code to run anyway.

Your problem is that you are passing a bare name and trying to substitute it into a formula, and that isn't how R function arguments or formulas work.

Simplifying the problem to the essentials, your function would work only if

myFormulaMakingFunction<-function(svyX){
    ~svyX+SEX
}

myFormulaMakingFunction(con)

returned ~con+SEX.  It actually returns ~svyX+SEX

There are various ways around this.  My preferred one would be

myPreferredWay<-function(formula){
  update(formula, ~.+SEX)
}

myPreferredWay(~con)

which does return ~con+SEX.

Then, as an example:

data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

myIllustrativeExample<-function(X,formula){
    tab<-table(X,apiclus1$stype)
    chisq<-svychisq(update(formula, ~.+stype), statistic="adjWald",design=dclus1)
    list(tab,chisq)
}

myIllustrativeExample(apiclus1$comp.imp, ~comp.imp)


Now, this can be improved: it appears that the vector argument is supposed to be the same variable as the formula argument, so we shouldn't make the user supply both of them.

mySupererogatoryEffort <- function(formula, design){
     X<-model.frame(formula, model.frame(design))[[1]]
    tab<-table(X, model.frame(design)$stype)
    chisq<-svychisq(update(formula, ~.+stype), statistic="adjWald",design=design)
    list(table=tab, F=chisq$statistic, df=chisq$parameter, p=chisq$p.value)
}

mySupererogatoryEffort(~comp.imp, dclus1)


As a final note, there is no round= argument to svychisq().  There is a round= argument to svytable(), which is documented on the same help page, but 4 is not one of the allowed values.

      -thomas

> # create my svydesign object
>
> audit <- svydesign(id~id, strata=~field, weights=~wt, data=data,
> fpc=~AllocProportion)
>
> # my function to create my table
>
> mkMyCrossTable <- function(X, svyX, T) {
>
> tbl <- crosstab(X, data$SEX, prop.c=TRUE)
> tbl <- data.frame(cbind(tbl$t, tbl$prop.col))
> tbl$var <- rownames(tbl)
>
> chisq <- svychisq(~svyX + SEX, design=audit, statistic="adjWald",
> round=4)
> chisq <- data.frame(do.call("cbind", chisq)
> chisq <- data.frame(chisq[,3])
>
> Table <- data.frame(tbl$var,
>                    paste(formatC(tbl$X0.1*100, format="f", digits=1),
> "%", sep=""),
>          tbl$X0,
>  paste(formatC(tbl$X1.1*100, format="f",
> digits=1), "%", sep=""),
>  tbl$X1,
>  chisq[1])
> Table[2: length(Table[,1]), 6] <- NA
> Table <- NAToUnkown(Table, unknown = " ")
> Colnames(Table) <- c(T, "Male (%)", "Male (n)", "Female (%)", "Female
> (n)", "p-value")
> Table
>
> }
>
> con3 <- mkMyCrossTable(data$con, con, "Constituency")
>


Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.