

Hi,
Newbie here. I read the R for Beginners but i still don't get this.
I have the following data (this is just an example) in a CSV file:
courseid numstudents
101 209
141 13
246 140
263 8
321 10
361 10
364 28
365 25
366 23
367 34
I load my data using:
fs<read.csv(file="C:\\num_students_inallmodules.csv",header=T, sep=',')
I want to get the ecdf. So, I looked at the ?ecdf which says usage:ecdf(x)
So I expected ecdf(fs$numstudents) to work
Instead it just returned:
Call: ecdf(fs$numstudents)
x[1:210] = 1, 2, 3, ..., 3717, 4538
After Googling, got this to work:
ecdf(fs$numstudents)(unique(fs$numstudents))
But I don't understand why if the ?ecdf says usage is ecdf(x) ... I
need to use ecdf(fs$numstudents)(unique(fs$numstudents)) to get this
to work?
Can somebody explain this to me?
Regards
Gawesh
On Oct 16, 2011, at 11:31 AM, gj wrote:
Hi:
I don't understand what you're attempting to do. Wouldn't courseid be
a categorical variable with a numeric label? If that is so, why are
you trying to compute an EDF? An EDF computes cumulative relative
frequency of a random variable, which by definition is numeric. If we
were talking about EDFs for a distribution of student course grades on
a numeric point system by course, that would make some sense, but I
don't see how the course IDs themselves qualify as being on an
interval scale of measurement. Could you clarify your intent?
Dennis
On Oct 16, 2011, at 3:53 PM, Dennis Murphy wrote:
> Hi:
>
> I don't understand what you're attempting to do. Wouldn't courseid be
> a categorical variable with a numeric label? If that is so, why are
> you trying to compute an EDF? An EDF computes cumulative relative
> frequency of a random variable, which by definition is numeric. If we
> were talking about EDFs for a distribution of student course grades on
> a numeric point system by course, that would make some sense, but I
> don't see how the course IDs themselves qualify as being on an
> interval scale of measurement. Could you clarify your intent?
Huh? gawesh asked for ecdf on numstrudents (not courseid) ... pretty
clearly a numeric value for which an ECDF should make sense.

David.

> Dennis
>
David is right. I am looking for the ecfd for fs$numstudents. The
other column is just an id.
I guess I don't know how to read the R documentation when it comes to functions.
looking at the documentation, i now notice that it says "Compute an
empirical cummulative distribution function and not a vector.
But still I would had assumed that in ecdf(x) ... the x is the argument.
So ecdf(fs$numstudents)(unique(fs$numstudents))
=============== ==================
function arguments
Yes? But I can't read that from the documentation? I suspect it has
something to those dots .... in the arguments which I don't
understand.
Why it says usage ecdf(x) when it's clearly not the case?
I don't get it.
Gawesh
Hi,
Thanks for the clarification. I stand corrected.

Dennis
Dennis
Hi Sarah,
Thanks for your very lucid explanations.
Thanks also to David and Dennis.
I got it completely. I now have some nice ggplot of a couple ecdf in
my paper :)
Now on to do some matrix plots of correlation matrices and some lm().
I'm like a child in a candy shop. :)
I'm learning something about R every day.
Regards
Gawesh
