Changing values (factors) does not change levels of that value?!

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Changing values (factors) does not change levels of that value?!

Oliver-3
Hello,


 * I read in a server weblog with read.table.
   -> OK.

 * I look for the downloaded-size-values (filesize of the download)
   -> OK

 * I found "-" and wanted to substitute them with "0" and
   used:  weblog$V8[ weblog$V8 == "-" ] <- 0
   -> OK

 * checked the contents on "-" vs. "0" and found all "-" substituted by
0
  -> OK

 * when then looking at str(weblog),
   the "-" will stay in the levels, mentioned for the variable weblog$V8
   -> BAD!

Is this snormal behaviour?
Do I have to throw out the unwanted level by myself?

Ciao,
   Oliver

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Changing values (factors) does not change levels of that value?!

Philipp Pagel-5
>  * when then looking at str(weblog),
>    the "-" will stay in the levels, mentioned for the variable weblog$V8
>    -> BAD!
>
> Is this snormal behaviour?

Yes, it is. The idea is that a factor has a given set of levels
independent of how often you find them in your data - including
the case that a level is not observed at all. E.g. gender cn take
levels 'male' or 'female' but you may have a sample of females.

> Do I have to throw out the unwanted level by myself?

Yes, and it's easy:

> x <- factor(c('A','B','C','A','C'))
> y <- x[x!='C']
> y
[1] A B A
Levels: A B C
> factor(y)
[1] A B A
Levels: A B

cu
        Philipp

--
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Changing values (factors) does not change levels of that value?!

Bernd Weiss
Philipp Pagel schrieb:

>>  * when then looking at str(weblog),
>>    the "-" will stay in the levels, mentioned for the variable weblog$V8
>>    -> BAD!
>>
>> Is this snormal behaviour?
>
> Yes, it is. The idea is that a factor has a given set of levels
> independent of how often you find them in your data - including
> the case that a level is not observed at all. E.g. gender cn take
> levels 'male' or 'female' but you may have a sample of females.
>
>> Do I have to throw out the unwanted level by myself?
>
> Yes, and it's easy:
>
>> x <- factor(c('A','B','C','A','C'))
>> y <- x[x!='C']
>> y
> [1] A B A
> Levels: A B C
>> factor(y)
> [1] A B A
> Levels: A B
>

another solution might be

 > x <- factor(c('A','B','C','A','C'))
 > y <- x[x!='C']
 > y
[1] A B A
Levels: A B C
 > y[drop = TRUE]
[1] A B A
Levels: A B


HTH,

Bernd

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Changing values (factors) does not change levels of that value?!

Oliver-3
Zitat von "Weiss, Bernd " <[hidden email]>:

> Philipp Pagel schrieb:
> >>  * when then looking at str(weblog),
> >>    the "-" will stay in the levels, mentioned for the variable
> weblog$V8
> >>    -> BAD!
> >>
> >> Is this snormal behaviour?
> >
> > Yes, it is. The idea is that a factor has a given set of levels
> > independent of how often you find them in your data - including
> > the case that a level is not observed at all. E.g. gender cn take
> > levels 'male' or 'female' but you may have a sample of females.

OK, but I thought, when touching the data, it will
recalculate the levels. Now I see, it does not.
I found a function "relevel", but it does not help me.


> >
> >> Do I have to throw out the unwanted level by myself?
> >
> > Yes, and it's easy:
> >
> >> x <- factor(c('A','B','C','A','C'))
> >> y <- x[x!='C']
> >> y
> > [1] A B A
> > Levels: A B C
> >> factor(y)
> > [1] A B A
> > Levels: A B

Sorry, this looks to me like you throw out all the values,
where the unwanted attribute is. (?!)
That is not what I meant. Or at least it's disturbing because
you use one value, not working on a data-frame, as I do.

After some experimentation I found out the following solution:

========================
weblog <- read.table("web.log") # reading the log

weblog$V8[ weblog$V8 == "-" ] <- 0  # substituting "-" by 0

# and now changing the levels-attribute to the new values !!
attr(weblog$V8, "levels") <- levels( factor( as.vector(weblog$V8) ) )
========================


But after I found that, I saw, that this was a detour from what I
tried when I started, and now using I do the following:

========================
weblog <- read.table("web.log") # read in the weblog

weblog$V8[ weblog$V8 == "-" ] <- 0 # substituting "-" by 0

weblog$V8 <- as.numeric( as.vector(weblog$V8) ) # changing it to numeric

tapply( weblog$V8, weblog$V1, sum) # do my calculations
========================


Ciao,
   Oliver

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Changing values (factors) does not change levels of that value?!

Philipp Pagel-5
On Sun, Nov 16, 2008 at 02:52:10PM +0100, Oliver Bandel wrote:
> OK, but I thought, when touching the data, it will
> recalculate the levels. Now I see, it does not.

No it doesn't - for the reasons given in my explanation.

> > >> x <- factor(c('A','B','C','A','C'))
> > >> y <- x[x!='C']
> > >> y
> > > [1] A B A
> > > Levels: A B C
> > >> factor(y)
> > > [1] A B A
> > > Levels: A B
>
> Sorry, this looks to me like you throw out all the values,
> where the unwanted attribute is. (?!)

Correct, that's what my example does to create a factor with
missing levels.

> That is not what I meant.

I know, but it does not matter how you got a factor with missing
levles - both problem and solution are the same.

> Or at least it's disturbing because
> you use one value, not working on a data-frame, as I do.

Not a real difference either - a data.frame is just a collection
of vectors and/or factors. So all you need to do apply this to
whatever column holds the factor in question:

foo$bar <- factor(foo$bar)

You may want to have a look at the Introdution to R - especially
the section on data frames.


> After some experimentation I found out the following solution:
>
> ========================
> weblog <- read.table("web.log") # reading the log
>
> weblog$V8[ weblog$V8 == "-" ] <- 0  # substituting "-" by 0
>
> # and now changing the levels-attribute to the new values !!
> attr(weblog$V8, "levels") <- levels( factor( as.vector(weblog$V8) ) )

weblog$V8 <- factor(weblog$V8)

is all you need.

> But after I found that, I saw, that this was a detour from what I
> tried when I started, and now using I do the following:
>
> ========================
> weblog <- read.table("web.log") # read in the weblog
>
> weblog$V8[ weblog$V8 == "-" ] <- 0 # substituting "-" by 0
>
> weblog$V8 <- as.numeric( as.vector(weblog$V8) ) # changing it to numeric

Dangerous:

> x <- factor(c(0,1,3,4,5,7))
> x
[1] 0 1 3 4 5 7
Levels: 0 1 3 4 5 7
> as.numeric(x)
[1] 1 2 3 4 5 6

See "7.10 How do I convert factors to numeric?" in the R-FAQ for
details.

As you are reading the data from a file anyway, the simplest
solution would probably be to use the colClasses argument ot
read.table in order to get numeric avlues in the first place.

cu
        Philipp


--
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Changing values (factors) does not change levels of that value?!

David Winsemius

On Nov 16, 2008, at 9:25 AM, Philipp Pagel wrote:
>
snip
>
> As you are reading the data from a file anyway, the simplest
> solution would probably be to use the colClasses argument ot
> read.table in order to get numeric avlues in the first place.

Or use stringsAsFactors = FALSE,

If you have a number of variables greater than 10, setting up  
colClasses can be fairly laborious and error prone, at least for this  
newbie.

--
David Winsemius, MD
Heritage Labs


>
>
> cu
> Philipp
>
>
> --
> Dr. Philipp Pagel
> Lehrstuhl für Genomorientierte Bioinformatik
> Technische Universität München
> Wissenschaftszentrum Weihenstephan
> 85350 Freising, Germany
> http://mips.gsf.de/staff/pagel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Changing values (factors) does not change levels of that value?!

Oliver-3
In reply to this post by Philipp Pagel-5
Philipp Pagel <p.pagel <at> wzw.tum.de> writes:

[...]
> As you are reading the data from a file anyway, the simplest
> solution would probably be to use the colClasses argument ot
> read.table in order to get numeric avlues in the first place.

Oh, ok, this looks interesting.
I didn't used that option  so far.

It seems to be very convenient!

Thanks for the hint.

Ciao,
   Oliver

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Changing values (factors) does not change levels of that value?!

Oliver-3
In reply to this post by Philipp Pagel-5
Philipp Pagel <p.pagel <at> wzw.tum.de> writes:

[...]
> As you are reading the data from a file anyway, the simplest
> solution would probably be to use the colClasses argument ot
> read.table in order to get numeric avlues in the first place.
[...]

Hey, I tried this colClasses-option.
It's really fine! :)

Ciao,
   Oliver

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Changing values (factors) does not change levels of that value?!

Oliver-3
In reply to this post by Philipp Pagel-5
Philipp Pagel <p.pagel <at> wzw.tum.de> writes:

[...]
>
> foo$bar <- factor(foo$bar)

This was my first attemot, before posting here,
and it somehow did not worked...
...now it works.... so I maybe was too tired,
when trying it and messed something up. :(



[...]
> > x <- factor(c(0,1,3,4,5,7))
> > x
> [1] 0 1 3 4 5 7
> Levels: 0 1 3 4 5 7
> > as.numeric(x)
> [1] 1 2 3 4 5 6

I know, and that's the reason why I first used
as.vector() before passing the results to as.numeric().

Without as.vector() one get's the int-representation of the factors,
and not the values of the column.

Ciao,
   Oliver

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.