Scatter plot - using colour to group points?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Scatter plot - using colour to group points?

SarahH
Dear All,

I am very new to R - trying to teach myself it for some MSc coursework.

I am plotting temperature data for two different sites over the same time period which I have downloaded from a university weather station data archive.

I am using the following code to create the plot

plot ( x = TEMP3[,"TIME"], y = TEMP3[,"TEMP"], type = "p", col = TEMP3[,"SITE"], pch = 3, main = "Temperature changes", xlab = "Date", ylab = "Temberature[C]")

I managed to use col = TEMP3["SITE"] to plot the two different sites( BG1 and EA7) in different colours, but I am struggling to change the colours.

I wanted to up a colour scheme to match the site, so tried

BG1 <- "blue"
EA7 <- "green"

before the plot function, but the graphic just came out with red and black as before.

There are other datasets in which there are more than two sites so I would really like to learn how to use colour to distinguish between them on a plot.

Any direction would be very greatly received!

Thank you very much

Sarah



Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

David Winsemius

On Nov 21, 2011, at 2:17 PM, SarahH wrote:

> Dear All,
>
> I am very new to R - trying to teach myself it for some MSc  
> coursework.
>
> I am plotting temperature data for two different sites over the same  
> time
> period which I have downloaded from a university weather station data
> archive.
>
> I am using the following code to create the plot
>
> plot ( x = TEMP3[,"TIME"], y = TEMP3[,"TEMP"], type = "p", col =
> TEMP3[,"SITE"], pch = 3, main = "Temperature changes", xlab =  
> "Date", ylab =
> "Temberature[C]")
>
> I managed to use col = TEMP3["SITE"] to plot the two different  
> sites( BG1
> and EA7) in different colours, but I am struggling to change the  
> colours.

>
> I wanted to up a colour scheme to match the site, so tried

Instead try

num.site <- as.numeric(TEMP3[,"SITE"])
plot ( x = TEMP3[,"TIME"], y = TEMP3[,"TEMP"], type = "p", col =
num.site, pch = 3, main = "Temperature changes", xlab = "Date", ylab =
"Temberature[C]")

Would create a vector of integer values that are specific to the sites  
and then offere that as argument to col=


>
> BG1 <- "blue"
> EA7 <- "green"

That would only have created two new objects by that name (unless of  
course you were following someone's misguided directions to use  
attach().)

>
> before the plot function, but the graphic just came out with red and  
> black
> as before.
>
> There are other datasets in which there are more than two sites so I  
> would
> really like to learn how to use colour to distinguish between them  
> on a
> plot.
>
> Any direction would be very greatly received!
>
> Thank you very much
>
> Sarah
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Scatter-plot-using-colour-to-group-points-tp4092794p4092794.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

Michael Weylandt
In reply to this post by SarahH
I think the easiest way to do this is to set up a color vector with
ifelse and hand that off to the plot command: something like

col = ifelse(TEMP3[,"SITE"] == "BG1", "blue", "green") # Syntax is
ifelse(TEST, OUT_IF_TRUE, OUT_IF_FALSE)

For more complicated schemes, a set of nested ifelse()'s can get you
what you need. There are some other tricks with factors as well, but
they require a little more advanced use of R. Just for the record,
they'd look something like this:

X = letters[c(1,2,3,3,1,2,1,3,3,1,2,2,1)]

colX = c("red","green","blue")[as.factor(X)]

Hope this helps,
Michael

On Mon, Nov 21, 2011 at 2:17 PM, SarahH <[hidden email]> wrote:

> Dear All,
>
> I am very new to R - trying to teach myself it for some MSc coursework.
>
> I am plotting temperature data for two different sites over the same time
> period which I have downloaded from a university weather station data
> archive.
>
> I am using the following code to create the plot
>
> plot ( x = TEMP3[,"TIME"], y = TEMP3[,"TEMP"], type = "p", col =
> TEMP3[,"SITE"], pch = 3, main = "Temperature changes", xlab = "Date", ylab =
> "Temberature[C]")
>
> I managed to use col = TEMP3["SITE"] to plot the two different sites( BG1
> and EA7) in different colours, but I am struggling to change the colours.
>
> I wanted to up a colour scheme to match the site, so tried
>
> BG1 <- "blue"
> EA7 <- "green"
>
> before the plot function, but the graphic just came out with red and black
> as before.
>
> There are other datasets in which there are more than two sites so I would
> really like to learn how to use colour to distinguish between them on a
> plot.
>
> Any direction would be very greatly received!
>
> Thank you very much
>
> Sarah
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Scatter-plot-using-colour-to-group-points-tp4092794p4092794.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

SarahH
I got the colour vector with ifelse to work, great! Thank you.

Is it possible to use the ifelse colour vector with other plot types? For example with type=l ? I tried but the graphic came back with blue lines for both sites and also a straight line connecting the start and end point of the data?

Thanks
Sarah





Michael Weylandt wrote
I think the easiest way to do this is to set up a color vector with
ifelse and hand that off to the plot command: something like

col = ifelse(TEMP3[,"SITE"] == "BG1", "blue", "green") # Syntax is
ifelse(TEST, OUT_IF_TRUE, OUT_IF_FALSE)

For more complicated schemes, a set of nested ifelse()'s can get you
what you need. There are some other tricks with factors as well, but
they require a little more advanced use of R. Just for the record,
they'd look something like this:

X = letters[c(1,2,3,3,1,2,1,3,3,1,2,2,1)]

colX = c("red","green","blue")[as.factor(X)]

Hope this helps,
Michael

On Mon, Nov 21, 2011 at 2:17 PM, SarahH <[hidden email]> wrote:
> Dear All,
>
> I am very new to R - trying to teach myself it for some MSc coursework.
>
> I am plotting temperature data for two different sites over the same time
> period which I have downloaded from a university weather station data
> archive.
>
> I am using the following code to create the plot
>
> plot ( x = TEMP3[,"TIME"], y = TEMP3[,"TEMP"], type = "p", col =
> TEMP3[,"SITE"], pch = 3, main = "Temperature changes", xlab = "Date", ylab =
> "Temberature[C]")
>
> I managed to use col = TEMP3["SITE"] to plot the two different sites( BG1
> and EA7) in different colours, but I am struggling to change the colours.
>
> I wanted to up a colour scheme to match the site, so tried
>
> BG1 <- "blue"
> EA7 <- "green"
>
> before the plot function, but the graphic just came out with red and black
> as before.
>
> There are other datasets in which there are more than two sites so I would
> really like to learn how to use colour to distinguish between them on a
> plot.
>
> Any direction would be very greatly received!
>
> Thank you very much
>
> Sarah
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Scatter-plot-using-colour-to-group-points-tp4092794p4092794.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

John Kane-2
In reply to this post by SarahH
Another approach would be to use ggplot2. 
Code can look a bit daunting to begin with but ggplot2 is a
very versitile graphing package and well worth learning.

Simple example
=============================================================
library(ggplot2)
mydata <- data.frame(site=c("A","A","A", "B","B","B"), time1 = 1:6, t1=c(23,24,13,7,19,12),
                        t2=c(7, 4,6,8,5,9))
       
p <- ggplot(mydata, aes(x=time1)) +
    geom_point(aes(y= t1, colour= site)) +
     geom_point(aes(y = t2, colour=site))

p   <- ggplot(mydata, aes(x=time1)) +
    geom_point(aes(y= t1, colour= site)) +
     geom_point(aes(y = t2, colour=site))
     
p <- p + scale_x_continuous('Time')+
              scale_y_continuous('Temperature')

p
=============================================================

--- On Mon, 11/21/11, SarahH <[hidden email]> wrote:

> From: SarahH <[hidden email]>
> Subject: [R] Scatter plot - using colour to group points?
> To: [hidden email]
> Received: Monday, November 21, 2011, 2:17 PM
> Dear All,
>
> I am very new to R - trying to teach myself it for some MSc
> coursework.
>
> I am plotting temperature data for two different sites over
> the same time
> period which I have downloaded from a university weather
> station data
> archive.
>
> I am using the following code to create the plot
>
> plot ( x = TEMP3[,"TIME"], y = TEMP3[,"TEMP"], type = "p",
> col =
> TEMP3[,"SITE"], pch = 3, main = "Temperature changes", xlab
> = "Date", ylab =
> "Temberature[C]")
>
> I managed to use col = TEMP3["SITE"] to plot the two
> different sites( BG1
> and EA7) in different colours, but I am struggling to
> change the colours.
>
> I wanted to up a colour scheme to match the site, so tried
>
>
> BG1 <- "blue"
> EA7 <- "green"
>
> before the plot function, but the graphic just came out
> with red and black
> as before.
>
> There are other datasets in which there are more than two
> sites so I would
> really like to learn how to use colour to distinguish
> between them on a
> plot.
>
> Any direction would be very greatly received!
>
> Thank you very much
>
> Sarah
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Scatter-plot-using-colour-to-group-points-tp4092794p4092794.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email]
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

Michael Weylandt
In reply to this post by SarahH
I don't think you can do different colors for a single line (not an
ifelse thing, just a what would that mean sort of thing), but a plot
type like "b" "o" or "h" will work the same way.

Michael

On Mon, Nov 21, 2011 at 4:23 PM, SarahH <[hidden email]> wrote:

> I got the colour vector with ifelse to work, great! Thank you.
>
> Is it possible to use the ifelse colour vector with other plot types? For
> example with type=l ? I tried but the graphic came back with blue lines for
> both sites and also a straight line connecting the start and end point of
> the data?
>
> Thanks
> Sarah
>
>
>
>
>
>
> Michael Weylandt wrote
>>
>> I think the easiest way to do this is to set up a color vector with
>> ifelse and hand that off to the plot command: something like
>>
>> col = ifelse(TEMP3[,"SITE"] == "BG1", "blue", "green") # Syntax is
>> ifelse(TEST, OUT_IF_TRUE, OUT_IF_FALSE)
>>
>> For more complicated schemes, a set of nested ifelse()'s can get you
>> what you need. There are some other tricks with factors as well, but
>> they require a little more advanced use of R. Just for the record,
>> they'd look something like this:
>>
>> X = letters[c(1,2,3,3,1,2,1,3,3,1,2,2,1)]
>>
>> colX = c("red","green","blue")[as.factor(X)]
>>
>> Hope this helps,
>> Michael
>>
>> On Mon, Nov 21, 2011 at 2:17 PM, SarahH <sarah.g10@.co> wrote:
>>> Dear All,
>>>
>>> I am very new to R - trying to teach myself it for some MSc coursework.
>>>
>>> I am plotting temperature data for two different sites over the same time
>>> period which I have downloaded from a university weather station data
>>> archive.
>>>
>>> I am using the following code to create the plot
>>>
>>> plot ( x = TEMP3[,"TIME"], y = TEMP3[,"TEMP"], type = "p", col =
>>> TEMP3[,"SITE"], pch = 3, main = "Temperature changes", xlab = "Date",
>>> ylab =
>>> "Temberature[C]")
>>>
>>> I managed to use col = TEMP3["SITE"] to plot the two different sites( BG1
>>> and EA7) in different colours, but I am struggling to change the colours.
>>>
>>> I wanted to up a colour scheme to match the site, so tried
>>>
>>> BG1 <- "blue"
>>> EA7 <- "green"
>>>
>>> before the plot function, but the graphic just came out with red and
>>> black
>>> as before.
>>>
>>> There are other datasets in which there are more than two sites so I
>>> would
>>> really like to learn how to use colour to distinguish between them on a
>>> plot.
>>>
>>> Any direction would be very greatly received!
>>>
>>> Thank you very much
>>>
>>> Sarah
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Scatter-plot-using-colour-to-group-points-tp4092794p4092794.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help@ mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help@ mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Scatter-plot-using-colour-to-group-points-tp4092794p4093337.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

David Winsemius

On Nov 21, 2011, at 10:18 PM, R. Michael Weylandt wrote:

> I don't think you can do different colors for a single line (not an
> ifelse thing, just a what would that mean sort of thing), but a plot
> type like "b" "o" or "h" will work the same way.

I think Jim Lemon has a multicolored line function in package:plotrix.

--  
David.

>
> Michael
>
> On Mon, Nov 21, 2011 at 4:23 PM, SarahH <[hidden email]>  
> wrote:
>> I got the colour vector with ifelse to work, great! Thank you.
>>
>> Is it possible to use the ifelse colour vector with other plot  
>> types? For
>> example with type=l ? I tried but the graphic came back with blue  
>> lines for
>> both sites and also a straight line connecting the start and end  
>> point of
>> the data?
>>
>> Thanks
>> Sarah
>>
>>
>>
>>
>>
>>
>> Michael Weylandt wrote
>>>
>>> I think the easiest way to do this is to set up a color vector with
>>> ifelse and hand that off to the plot command: something like
>>>
>>> col = ifelse(TEMP3[,"SITE"] == "BG1", "blue", "green") # Syntax is
>>> ifelse(TEST, OUT_IF_TRUE, OUT_IF_FALSE)
>>>
>>> For more complicated schemes, a set of nested ifelse()'s can get you
>>> what you need. There are some other tricks with factors as well, but
>>> they require a little more advanced use of R. Just for the record,
>>> they'd look something like this:
>>>
>>> X = letters[c(1,2,3,3,1,2,1,3,3,1,2,2,1)]
>>>
>>> colX = c("red","green","blue")[as.factor(X)]
>>>
>>> Hope this helps,
>>> Michael
>>>
>>> On Mon, Nov 21, 2011 at 2:17 PM, SarahH <sarah.g10@.co> wrote:
>>>> Dear All,
>>>>
>>>> I am very new to R - trying to teach myself it for some MSc  
>>>> coursework.
>>>>
>>>> I am plotting temperature data for two different sites over the  
>>>> same time
>>>> period which I have downloaded from a university weather station  
>>>> data
>>>> archive.
>>>>
>>>> I am using the following code to create the plot
>>>>
>>>> plot ( x = TEMP3[,"TIME"], y = TEMP3[,"TEMP"], type = "p", col =
>>>> TEMP3[,"SITE"], pch = 3, main = "Temperature changes", xlab =  
>>>> "Date",
>>>> ylab =
>>>> "Temberature[C]")
>>>>
>>>> I managed to use col = TEMP3["SITE"] to plot the two different  
>>>> sites( BG1
>>>> and EA7) in different colours, but I am struggling to change the  
>>>> colours.
>>>>
>>>> I wanted to up a colour scheme to match the site, so tried
>>>>
>>>> BG1 <- "blue"
>>>> EA7 <- "green"
>>>>
>>>> before the plot function, but the graphic just came out with red  
>>>> and
>>>> black
>>>> as before.
>>>>
>>>> There are other datasets in which there are more than two sites  
>>>> so I
>>>> would
>>>> really like to learn how to use colour to distinguish between  
>>>> them on a
>>>> plot.
>>>>
>>>> Any direction would be very greatly received!
>>>>
>>>> Thank you very much
>>>>
>>>> Sarah
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://r.789695.n4.nabble.com/Scatter-plot-using-colour-to-group-points-tp4092794p4092794.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help@ mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help@ mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Scatter-plot-using-colour-to-group-points-tp4092794p4093337.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

Jim Lemon
On 11/22/2011 05:00 PM, David Winsemius wrote:
>
> On Nov 21, 2011, at 10:18 PM, R. Michael Weylandt wrote:
>
>> I don't think you can do different colors for a single line (not an
>> ifelse thing, just a what would that mean sort of thing), but a plot
>> type like "b" "o" or "h" will work the same way.
>
> I think Jim Lemon has a multicolored line function in package:plotrix.
>
Hi David (and everybody else),
The color.scale.lines function will display multicolored lines, just
force the colors to what you want using the "col" argument.

Jim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

SarahH
In reply to this post by SarahH
Thanks all for suggestions.

I now have a nice plot showing the temperature of 6 different sites, each site distinguished by different coloured points, using nested ifelse. My apologies I thought I could change the type to "l" and the same arguments would be applied to line graph, with 6 different lines for each site...?
I wanted to try lines as I think they might show the trends more clearly.  
I have just found the plottrix package manual and will try that to achieve this, and look at ggplot too.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

Michael Weylandt
There's also the lines() command which takes a col argument if you want to do multiple lines (I usually wind up wrapping it in a for loop though there might be something smarter)

ggplot2 is great, though the learning curve is a little rough: you can get good help here but if you go down that path, there's also a dedicated ggplot2 list that's worth checking out.

Glad to have you as a new useR!

Michael

On Nov 22, 2011, at 5:13 AM, SarahH <[hidden email]> wrote:

> Thanks all for suggestions.
>
> I now have a nice plot showing the temperature of 6 different sites, each
> site distinguished by different coloured points, using nested ifelse. My
> apologies I thought I could change the type to "l" and the same arguments
> would be applied to line graph, with 6 different lines for each site...?
> I wanted to try lines as I think they might show the trends more clearly.  
> I have just found the plottrix package manual and will try that to achieve
> this, and look at ggplot too.
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Scatter-plot-using-colour-to-group-points-tp4092794p4095079.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

SarahH
In reply to this post by SarahH
Success with the lines command and col argument! I have some nice point and line plots.
Thanks so much for you help. Ongoing project - I will probably be back!

Sarah
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot - using colour to group points?

Ian Robertson-3
In reply to this post by SarahH
Hello all,

Yesterday I wrote Michael Weylandt to ask for some help in understanding
a line of code he used responding to SarahH's query about controlling
colours in scatter plots. He wrote an excellent explanation that
deserves to be shared here. Below I include the code I wrote while
experimenting with the problem (indicating the specific line of code I
asked him about) followed by Michael's thoughtful reply.

Saludos - Ian

--
Ian G. Robertson
Department of Anthropology
Building 50, 450 Serra Mall
Stanford University, CA 94305-2034
e:    [hidden email]

#the code:
##########################################
x1 <- rnorm(13)
y1 <- rnorm(13)

#these two lines from R. Michael Weylandt
X = letters[c(1,2,3,3,1,2,1,3,3,1,2,2,1)]
colX = c("red","green","blue")[as.factor(X)] #?? How does this work? Ask RMW

table(colX)
plot(x1, y1, col=colX, pch=20, cex=2)
##########################################
#Michael Weylandt's explanation:

In short, there are two key bits to follow:

1) What happens when you "factorize" something -- R stores factors
internally as integers with special labels and a few special behaviors
for some calculations that won't come up here: the labels aren't so
important for our purpose, but the key is that each unique value of X
gets assigned to its own factor. The order that they appear in X
corresponds to the integers they get, not their "real" values (if they
were already integers or doubles). As a side point this means that
floating point trouble can sometimes show up so if you want to bin
real numbers, it's safer to use cut() for the factoring step.

2) What happens when you use a factor to subset -- R simply tosses out
the "factor"-ness and only uses the internal integer representation.
If we wanted to be more explicit, we'd write
colVec[as.integer(as.factor(X))] but the as.integer happens
automatically.

So the whole path is: assign integers to each unique value of X and
subset by those integers: if there are as many unique values as there
are elements of the color vector, the end result is a direct matching:
if there are too many, it throws and error: too few and some colors go
unused:

something like:

col("red","green","blue")[as.factor(letters[1:4])] ## ERROR

col("red","green","blue")[as.factor(letters[1:2])] ## blue not used.

Hope this helps,

Michael

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.