On Sun, Nov 16, 2008 at 02:52:10PM +0100, Oliver Bandel wrote:

> OK, but I thought, when touching the data, it will

> recalculate the levels. Now I see, it does not.

No it doesn't - for the reasons given in my explanation.

> > >> x <- factor(c('A','B','C','A','C'))

> > >> y <- x[x!='C']

> > >> y

> > > [1] A B A

> > > Levels: A B C

> > >> factor(y)

> > > [1] A B A

> > > Levels: A B

>

> Sorry, this looks to me like you throw out all the values,

> where the unwanted attribute is. (?!)

Correct, that's what my example does to create a factor with

missing levels.

> That is not what I meant.

I know, but it does not matter how you got a factor with missing

levles - both problem and solution are the same.

> Or at least it's disturbing because

> you use one value, not working on a data-frame, as I do.

Not a real difference either - a data.frame is just a collection

of vectors and/or factors. So all you need to do apply this to

whatever column holds the factor in question:

foo$bar <- factor(foo$bar)

You may want to have a look at the Introdution to R - especially

the section on data frames.

> After some experimentation I found out the following solution:

>

> ========================

> weblog <- read.table("web.log") # reading the log

>

> weblog$V8[ weblog$V8 == "-" ] <- 0 # substituting "-" by 0

>

> # and now changing the levels-attribute to the new values !!

> attr(weblog$V8, "levels") <- levels( factor( as.vector(weblog$V8) ) )

weblog$V8 <- factor(weblog$V8)

is all you need.

> But after I found that, I saw, that this was a detour from what I

> tried when I started, and now using I do the following:

>

> ========================

> weblog <- read.table("web.log") # read in the weblog

>

> weblog$V8[ weblog$V8 == "-" ] <- 0 # substituting "-" by 0

>

> weblog$V8 <- as.numeric( as.vector(weblog$V8) ) # changing it to numeric

Dangerous:

> x <- factor(c(0,1,3,4,5,7))

> x

[1] 0 1 3 4 5 7

Levels: 0 1 3 4 5 7

> as.numeric(x)

[1] 1 2 3 4 5 6

See "7.10 How do I convert factors to numeric?" in the R-FAQ for

details.

As you are reading the data from a file anyway, the simplest

solution would probably be to use the colClasses argument ot

read.table in order to get numeric avlues in the first place.

cu

Philipp

--

Dr. Philipp Pagel

Lehrstuhl für Genomorientierte Bioinformatik

Technische Universität München

Wissenschaftszentrum Weihenstephan

85350 Freising, Germany

http://mips.gsf.de/staff/pagel______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.