Linear Discriminant Analysis in R

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Linear Discriminant Analysis in R

cobbler_squad
Dear R gurus,

Thank you all for continuous support and guidance -- learning without you would not be efficient.

I have a question regarding LD analysis and how to best code it up in R.

I have a file of (V52 and 671 time points across all columns) and another file of phonetic features (each vowel is aligned with a distinct binary sequence, i.e.
E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 and so on). I need to run lda (at first for one of the features, meaning one column only extracted from the "binary" file mentioned above). In code so far I have very little, but here the short examples of both files:
V57 file:

              V27       V28           V29       V30           V31       V32           V33       V34
1   -2.515000e-03 -0.203858  6.531000e-03  0.248686  6.760000e-04  0.084677 -1.262000e-03
2   -2.406000e-03 -0.194943  6.248000e-03  0.237851  6.470000e-04  0.081001 -1.207000e-03
3   -4.860000e-04 -0.039288  1.263000e-03  0.047980  1.300000e-04  0.016292 -2.430000e-04

and "binary" file

    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26
1    E  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   0   1   0   0   0   0   0   0   0
2    o  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0   1   0   1   0   1   0   1   0   0   0
3    I  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   1   0   0   0   0   0   0   0   0

thus in code I have the following:

library(MASS)

vowel_features <- read.table(file = "mappings_for_vowels.txt")
mask_features <- read.table(file = "3dmaskdump_ICA_37_Combined.txt")

#scale the mask_features file

scaled_features <- scale(mask_features, center = FALSE, scale = apply(abs(mask_features, 2, median)))

#input vowel feature, lda

lda(ROI_values ~ mappings_for_vowels[15]...)

not sure what is the correct approach to use for lda

any pointers would be greatly appreciated

thanks again all!

Cobbler
Reply | Threaded
Open this post in threaded view
|

Re: Linear Discriminant Analysis in R

Joris FA Meys
Why exactly do you need lda and not another method? For lda to be
applicable, you should check :
1) whether the regressors are normally distributed within the classes
2) whether the variance-covariance matrices are equal for all classes

Essentially, this means that the boundary between both classes is a
hyperplane (or in 2 dimensions, a straight line). Otherwise you can try qda,
or go to other supervised learning methods.

How to use lda is explained rather well in the help files. if it doesn't
work, provide us with self-contained code (i.e. code that can be run without
need of extra information like data frames) that reproduces the error.

Cheers
Joris

PS : There's an error in your code.
scaled_features <- scale(mask_features, center = FALSE, scale =
apply(abs(mask_features, 2, median)))

should be
scaled_features <- scale(mask_features, center = FALSE, scale =
apply(abs(mask_features), 2, median))


On Wed, May 26, 2010 at 5:55 PM, cobbler_squad <[hidden email]> wrote:

>
> Dear R gurus,
>
> Thank you all for continuous support and guidance -- learning without you
> would not be efficient.
>
> I have a question regarding LD analysis and how to best code it up in R.
>
> I have a file of (V52 and 671 time points across all columns) and another
> file of phonetic features (each vowel is aligned with a distinct binary
> sequence, i.e.
> E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 and so on). I need to
> run lda (at first for one of the features, meaning one column only
> extracted
> from the "binary" file mentioned above). In code so far I have very little,
> but here the short examples of both files:
> V57 file:
>
>              V27       V28           V29       V30           V31       V32
> V33       V34
> 1   -2.515000e-03 -0.203858  6.531000e-03  0.248686  6.760000e-04  0.084677
> -1.262000e-03
> 2   -2.406000e-03 -0.194943  6.248000e-03  0.237851  6.470000e-04  0.081001
> -1.207000e-03
> 3   -4.860000e-04 -0.039288  1.263000e-03  0.047980  1.300000e-04  0.016292
> -2.430000e-04
>
> and "binary" file
>
>    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
> V21 V22 V23 V24 V25 V26
> 1    E  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   0   1   0
> 0   0   0   0   0   0
> 2    o  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0   1   0   1   0
> 1   0   1   0   0   0
> 3    I  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   1   0   0
> 0   0   0   0   0   0
>
> thus in code I have the following:
>
> library(MASS)
>
> vowel_features <- read.table(file = "mappings_for_vowels.txt")
> mask_features <- read.table(file = "3dmaskdump_ICA_37_Combined.txt")
>
> #scale the mask_features file
>
> scaled_features <- scale(mask_features, center = FALSE, scale =
> apply(abs(mask_features, 2, median)))
>
> #input vowel feature, lda
>
> lda(ROI_values ~ mappings_for_vowels[15]...)
>
> not sure what is the correct approach to use for lda
>
> any pointers would be greatly appreciated
>
> thanks again all!
>
> Cobbler
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2231922.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Linear Discriminant Analysis in R

cobbler_squad
Joris,

You are a life saver. Based on two sample files above, I think lda should go something like this:

vowel_features <- read.table(file = "mappings_for_vowels.txt")
mask_features <- data.frame(as.matrix(read.table(file = "3dmaskdump_ICA_37_Combined.txt")))
G <- vowel_features[15]

cvc_lda <- lda(G~ vowel_features[15], data=mask_features, na.action="na.omit", CV=TRUE)

ERROR: Error in model.frame.default(formula = G ~ vowel_features[15], data = mask_features,  :
  invalid type (list) for variable 'G'

I am clearly doing something wrong declaring G (how should I declare grouping in R when I need to use one column from vowel_feature file)? Sorry for stupid questions and thank you for being so helpful!

---------
again, sample files that I am working with:

mappings_for_vowels.txt:

    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26
1    E  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   0   1   0   0   0   0   0   0   0
2    o  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0   1   0   1   0   1   0   1   0   0   0
3    I  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   1   0   0   0   0   0   0   0   0
4    ^  0  0  0  0  0  0  0  0   0   0   0   0   1   0   1   0   0   1   0   0   0   0   0   0   0
5    @  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0   1   0   0   1   0   0   0   0   0   0

and the mask_features file is:

              V42          V43          V44          V45          V46          V47          V48          V49
  [1,]  2.890891625  2.881188521  2.887784444 -2.882606612 -2.888877341  2.879834384  2.886483229  2.883815864
  [2,]  2.763404707  2.756198683  2.761863881 -2.756827983 -2.762268531  2.754305072  2.760017050  2.758399799
  [3,]  0.556614506  0.556377530  0.556247414 -0.556300910 -0.556098321  0.557495060  0.557383073  0.556867424
  [4,]  0.367065248  0.366962036  0.366870087 -0.366794442 -0.366644148  0.366613343  0.366537320  0.366953464
  [5,]  0.423692393  0.421835623  0.421741829 -0.421897460 -0.421659824  0.421567705  0.421465738  0.422407838
Reply | Threaded
Open this post in threaded view
|

Re: Linear Discriminant Analysis in R

Joris FA Meys
Could you provide us with data to test the code? use dput (and limit the
size!!!!!)

eg:
dput(vowel_features)
dput(mask_features)

Without this information, it's impossible to say what's going wrong. It
looks like you're doing something wrong in the selection. What should
vowel_features[15] return? Did you check it's actually what you want? Did
you use str(G) to check the type?

Cheers
Joris

On Thu, May 27, 2010 at 5:28 PM, cobbler_squad <[hidden email]> wrote:

>
> Joris,
>
> You are a life saver. Based on two sample files above, I think lda should
> go
> something like this:
>
> vowel_features <- read.table(file = "mappings_for_vowels.txt")
> mask_features <- data.frame(as.matrix(read.table(file =
> "3dmaskdump_ICA_37_Combined.txt")))
> G <- vowel_features[15]
>
> cvc_lda <- lda(G~ vowel_features[15], data=mask_features,
> na.action="na.omit", CV=TRUE)
>
> ERROR: Error in model.frame.default(formula = G ~ vowel_features[15], data
> =
> mask_features,  :
>  invalid type (list) for variable 'G'
>
> I am clearly doing something wrong declaring G (how should I declare
> grouping in R when I need to use one column from vowel_feature file)? Sorry
> for stupid questions and thank you for being so helpful!
>
> ---------
> again, sample files that I am working with:
>
> mappings_for_vowels.txt:
>
>    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
> V21 V22 V23 V24 V25 V26
> 1    E  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   0   1   0
> 0   0   0   0   0   0
> 2    o  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0   1   0   1   0
> 1   0   1   0   0   0
> 3    I  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   1   0   0
> 0   0   0   0   0   0
> 4    ^  0  0  0  0  0  0  0  0   0   0   0   0   1   0   1   0   0   1   0
> 0   0   0   0   0   0
> 5    @  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0   1   0   0   1
> 0   0   0   0   0   0
>
> and the mask_features file is:
>
>              V42          V43          V44          V45          V46
> V47          V48          V49
>  [1,]  2.890891625  2.881188521  2.887784444 -2.882606612 -2.888877341
> 2.879834384  2.886483229  2.883815864
>  [2,]  2.763404707  2.756198683  2.761863881 -2.756827983 -2.762268531
> 2.754305072  2.760017050  2.758399799
>  [3,]  0.556614506  0.556377530  0.556247414 -0.556300910 -0.556098321
> 0.557495060  0.557383073  0.556867424
>  [4,]  0.367065248  0.366962036  0.366870087 -0.366794442 -0.366644148
> 0.366613343  0.366537320  0.366953464
>  [5,]  0.423692393  0.421835623  0.421741829 -0.421897460 -0.421659824
> 0.421567705  0.421465738  0.422407838
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2233333.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Linear Discriminant Analysis in R

Liaw, Andy
cobler_squad needs more basic help than doing lda.  The data input just
doesn't make sense.  

If vowel_feature is a data frame, than G <- vowel_feature[15] creates
another data frame containing the 15th variable in vowel_feature, so "G"
is the name of a data frame, not a variable in a data frame.  The lda()
call makes even less sense.  I wonder if he had tried to go through the
examples in the help file and try to understand how it is used?

Andy

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Joris Meys
> Sent: Friday, May 28, 2010 8:50 AM
> To: cobbler_squad
> Cc: [hidden email]
> Subject: Re: [R] Linear Discriminant Analysis in R
>
> Could you provide us with data to test the code? use dput
> (and limit the
> size!!!!!)
>
> eg:
> dput(vowel_features)
> dput(mask_features)
>
> Without this information, it's impossible to say what's going
> wrong. It looks like you're doing something wrong in the
> selection. What should vowel_features[15] return? Did you
> check it's actually what you want? Did you use str(G) to
> check the type?
>
> Cheers
> Joris
>
> On Thu, May 27, 2010 at 5:28 PM, cobbler_squad
> <[hidden email]> wrote:
>
> >
> > Joris,
> >
> > You are a life saver. Based on two sample files above, I think lda
> > should go something like this:
> >
> > vowel_features <- read.table(file = "mappings_for_vowels.txt")
> > mask_features <- data.frame(as.matrix(read.table(file =
> > "3dmaskdump_ICA_37_Combined.txt")))
> > G <- vowel_features[15]
> >
> > cvc_lda <- lda(G~ vowel_features[15], data=mask_features,
> > na.action="na.omit", CV=TRUE)
> >
> > ERROR: Error in model.frame.default(formula = G ~
> vowel_features[15],
> > data = mask_features,  :
> >  invalid type (list) for variable 'G'
> >
> > I am clearly doing something wrong declaring G (how should
> I declare
> > grouping in R when I need to use one column from
> vowel_feature file)?
> > Sorry for stupid questions and thank you for being so helpful!
> >
> > ---------
> > again, sample files that I am working with:
> >
> > mappings_for_vowels.txt:
> >
> >    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
> V17 V18 V19
> > V20
> > V21 V22 V23 V24 V25 V26
> > 1    E  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0  
>  0   0   1   0
> > 0   0   0   0   0   0
> > 2    o  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0  
>  1   0   1   0
> > 1   0   1   0   0   0
> > 3    I  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0  
>  0   1   0   0
> > 0   0   0   0   0   0
> > 4    ^  0  0  0  0  0  0  0  0   0   0   0   0   1   0   1  
>  0   0   1   0
> > 0   0   0   0   0   0
> > 5    @  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0  
>  1   0   0   1
> > 0   0   0   0   0   0
> >
> > and the mask_features file is:
> >
> >              V42          V43          V44          V45          V46
> > V47          V48          V49
> >  [1,]  2.890891625  2.881188521  2.887784444 -2.882606612
> -2.888877341
> > 2.879834384  2.886483229  2.883815864
> >  [2,]  2.763404707  2.756198683  2.761863881 -2.756827983
> -2.762268531
> > 2.754305072  2.760017050  2.758399799
> >  [3,]  0.556614506  0.556377530  0.556247414 -0.556300910
> -0.556098321
> > 0.557495060  0.557383073  0.556867424  [4,]  0.367065248  
> 0.366962036  
> > 0.366870087 -0.366794442 -0.366644148
> > 0.366613343  0.366537320  0.366953464
> >  [5,]  0.423692393  0.421835623  0.421741829 -0.421897460
> -0.421659824
> > 0.421567705  0.421465738  0.422407838
> >
> > --
> > View this message in context:
> >
> http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231
> > 922p2233333.html Sent from the R help mailing list archive at
> > Nabble.com.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Joris Meys
> Statistical Consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> Coupure Links 653
> B-9000 Gent
>
> tel : +32 9 264 59 87
> [hidden email]
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Linear Discriminant Analysis in R

cobbler_squad
In reply to this post by Joris FA Meys
Thanks for being patient with me.

I guess my problem is with understand how grouping in this particular case is used:

one of the sample codes I found online (http://www.statmethods.net/advstats/discriminant.html)
library(MASS)
fit <- lda(G ~ x1 + x2 + x3, data=mydata, na.action="na.omit", CV=TRUE)

the "mydata" file in my case is the 3dmaskdump file with 52 columns and 671 rows (all values range between 0 and 1 after they're scaled)

the other file, what I assumed was the "grouping file" (or the "vowel_feature") is the file that defines features for the vowels (i.e. column 1 of the file is vowel name (a, i, u) and every other column in a distinct combination of 0's and 1's defining the vowel (so this file has 26 columns and 254 rows). Therefore, every column that follows represents a particular "feature" of that vowel.. (hope this makes sense!!)

So, the reason I wanted to return G <- vowel_feature[15] in my previous post is because I need to extract a column that represents "backness" of the vowel  (while other columns represent "roundedness", "nasalization" features, etc). So what (in my mind) G <- vowel_feature[15] would return is 1 column which is 254 rows long with 0's and 1's in it.
i.e.

1       0
2       1
3       1
4       0
...
..
.
254    1

However, when I was trying to run the code as I pasted above, it was giving me the above error (i.e.
ERROR: Error in model.frame.default(formula = G ~ vowel_features[15], data = mask_features,  :
  invalid type (list) for variable 'G' )

I am a novice with R (so I know my questions are pretty dumb!), but I really hope I clarified my confusion a bit better.  I very much appreciate your help.

Looking forward to your replies.

Thank you again,
Cobbler

Reply | Threaded
Open this post in threaded view
|

Re: Linear Discriminant Analysis in R

Joris FA Meys
It's not your questions, Cobbler, but could you PLEASE just do what we asked
for?
Copy-paste the following in R and copy-paste ALL output you get in your next
mail.

test.vowel <- vowel_features[,1:10]
test.mask <- mask_features[,1:10]
dput(test.vowel)
dput(test.mask)

I don't know whether your vowel_features is a list or a data-frame (which is
technically also a list). But I know for sure that vowel_features[15] is NOT
giving you a column. Probably it has to be vowel_features[,15]. So start
with that one, and I'll take a look at the rest to get your lda running.

Cheers
Joris

On Sat, May 29, 2010 at 6:53 PM, cobbler_squad <[hidden email]> wrote:

>
> Thanks for being patient with me.
>
> I guess my problem is with understand how grouping in this particular case
> is used:
>
> one of the sample codes I found online
> (http://www.statmethods.net/advstats/discriminant.html)
> library(MASS)
> fit <- lda(G ~ x1 + x2 + x3, data=mydata, na.action="na.omit", CV=TRUE)
>
> the "mydata" file in my case is the 3dmaskdump file with 52 columns and 671
> rows (all values range between 0 and 1 after they're scaled)
>
> the other file, what I assumed was the "grouping file" (or the
> "vowel_feature") is the file that defines features for the vowels (i.e.
> column 1 of the file is vowel name (a, i, u) and every other column in a
> distinct combination of 0's and 1's defining the vowel (so this file has 26
> columns and 254 rows). Therefore, every column that follows represents a
> particular "feature" of that vowel.. (hope this makes sense!!)
>
> So, the reason I wanted to return G <- vowel_feature[15] in my previous
> post
> is because I need to extract a column that represents "backness" of the
> vowel  (while other columns represent "roundedness", "nasalization"
> features, etc). So what (in my mind) G <- vowel_feature[15] would return is
> 1 column which is 254 rows long with 0's and 1's in it.
> i.e.
>
> 1       0
> 2       1
> 3       1
> 4       0
> ...
> ..
> .
> 254    1
>
> I am a novice with R (so I know my questions are pretty dumb!), but I
> really
> hope I clarified my confusion a bit better.  I very much appreciate your
> help.
>
> Looking forward to your replies.
>
> Thank you again,
> Cobbler
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2235777.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Linear Discriminant Analysis in R

cobbler_squad
Hi Janis,

As you have suggested below is the output for the following:

test.vowel <- vowel_features[,1:10]
test.mask <- mask_features[,1:10]  
dput(test.vowel)
dput(test.mask)

--- NOTE: outputs are limited

>>test_vowel  ---- first 12 columns are all zero (total of 26 columns)
    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1    0  0  0  0  0  0  0  0  0   0
2    0  0  0  0  0  0  0  0  0   0
3    0  0  0  0  0  0  0  0  0   0
4    0  0  0  0  0  0  0  0  0   0
5    0  0  0  0  0  0  0  0  0   0
6    0  0  0  0  0  0  0  0  0   0
7    0  0  0  0  0  0  0  0  0   0
8    0  0  0  0  0  0  0  0  0   0
9    0  0  0  0  0  0  0  0  0   0
10   0  0  0  0  0  0  0  0  0   0

>>test_mask (sample output for first 6 columns and 5 rows)

             V1          V2                    V3                  V4          V5          V6
1   0.034495155 0.990218632 0.601464511 0.014837676 0.058299799 0.818202398
2   0.683688879 0.541566798 0.898061753 0.008456439 0.800863858 0.381366477
3   0.464978895 0.844494807 0.281241401 0.290183593 0.552412608 0.158107894
4   0.200058599 0.270115497 0.179173377 0.341301213 0.672338934 0.322934948
5   0.595020534 0.633111358 0.861024861 0.811241462 0.326562913 0.363330793


>>dput(test.vowel)
structure(list(V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), V2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), V3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), V4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), V5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), V6 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), V7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), V8 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), V9 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,  
0L, 0L, 0L, 0L), V10 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L)), .Names = c("V1", "V2", "V3", "V4", "V5",
"V6", "V7", "V8", "V9", "V10"), class = "data.frame", row.names = c(NA,
-254L))

>>dput(test.mask)
structure(list(V1 = c(0.034495155, 0.683688879, 0.464978895,
0.877838275, 0.943014871, 0.163438168), V2 = c(0.990218632, 0.541566798,
0.025567579, 0.159811845, 0.13874224, 0.752357297, 0.669662897,
0.854803677, 0.28129096, 0.858919573, 0.98992922, 0.980733255,
0.452405459, 0.376828532, 0.901208552), V3 = c(0.601464511, 0.898061753,
0.38395498, 0.923324665, 0.529832526, 0.182135661), V4 = c(0.014837676,
0.166132726, 0.893089168, 0.45962114, 0.018438501, 0.667720635
), V5 = c(0.058299799, 0.800863858, 0.552412608, 0.672338934,
0.185407787, 0.691367432), V6 = c(0.818202398, 0.381366477, 0.158107894,
0.322934948, 0.363330793, 0.161321704, 0.052999774, 0.513440813,
0.402895033, 0.201576687, 0.076826481), V7 = c(0.642136394, 0.099776129,
0.148801865, 0.603051825, 0.440594157, 0.215038249, 0.531623479,
0.534920743, 0.45784502, 0.080887221), V8 = c(0.016004048, 0.519115043,
0.149317949, 0.088362708, 0.705002368, 0.185590863, 0.434963787,
0.847410734, 0.78777694, 0.443995646, 0.53903599), V9 = c(0.400620271,
0.918472003, 0.446820588, 0.310981412, 0.734013866, 0.172112916
), V10 = c(0.532136091, 0.350028839, 0.40424688, 0.607395545,
0.392450857, 0.306530929, 0.756277707, 0.63606622, 0.718866192,
0.258778101)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6",
"V7", "V8", "V9", "V10"), class = "data.frame", row.names = c(NA,
-671L))


Thank you once more for your help. I really can not say it enough.

ps. original files i work with are attached.

Cobbler.

3dMaskDump.txtvowel_features.txt

Reply | Threaded
Open this post in threaded view
|

Re: Linear Discriminant Analysis in R

Joris FA Meys
I checked your data. Now I have to get some sense out of your code. You do :
G <- vowel_features[15]

cvc_lda <- lda(G~ vowel_features[15], data=mask_features,
na.action="na.omit", CV=TRUE)

Firstly, as I suspected, you need to select a column by using
vowel_features[,15] . Mind the comma! Essentially, your data frame is a list
and a matrix. You select by using [x,y] with x being the row number and y
being the column number.

Essentially, your code says :
cvc_lda=lda(vowel_features[,15]~vowel_features[,15]...)

You're modelling a variable on itself which gives an error. What do you want
to do in fact? If I take a look at your first code, it appears as if you
want to do this :

cvc_lda <- lda(G~ ., data=mask_features,na.action="na.omit", CV=TRUE)

The dot indicates you want to model G in function of all variables in the
dataset mask_features. Ain't going to work, as the dimensions are completely
wrong.

> dim(mask_features)
[1] 671  52
> dim(vowel_features)
[1] 254  26
>

For lda, you need a dataset that has following structure :
mydata
group    V1   V2   V3   V4 ...
0           x1    y1   z1   q1 ...
1           x2    y2   z2   q2 ...
...

So you can do lda(group~V1+V2+V3+V4+..., data=mydata,...)

For example :

# make some random data
x <- rep(c(0,1),50)
y1 <- rnorm(100,x)
y2 <- rnorm(100,1-x)

# combine it in a dataframe
mydata <- data.frame(x,y1,y2)
str(mydata) # look at the structure, you should have something similar
head(mydata) # look the values, this shows you whether it all worked

# example of lda function
my.lda <- lda(x~y1+y2,data=mydata,CV=T)
summary(my.lda)

Take a look at your data again, and first figure out which data you actually
want to use. Basically, for every observation in G you need a set of linked
observations in some variables. But as it is now, it's impossible to link
one dataframe with the other.

Cheers
Joris

On Sun, May 30, 2010 at 7:00 AM, cobbler_squad <[hidden email]> wrote:

>
> Hi Janis,
>
> As you have suggested below is the output for the following:
>
> test.vowel <- vowel_features[,1:10]
> test.mask <- mask_features[,1:10]
> dput(test.vowel)
> dput(test.mask)
>
> --- NOTE: outputs are limited
>
> >>test_vowel  ---- first 12 columns are all zero (total of 26 columns)
>     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
> 1    0  0  0  0  0  0  0  0  0   0
> 2    0  0  0  0  0  0  0  0  0   0
> 3    0  0  0  0  0  0  0  0  0   0
> 4    0  0  0  0  0  0  0  0  0   0
> 5    0  0  0  0  0  0  0  0  0   0
> 6    0  0  0  0  0  0  0  0  0   0
> 7    0  0  0  0  0  0  0  0  0   0
> 8    0  0  0  0  0  0  0  0  0   0
> 9    0  0  0  0  0  0  0  0  0   0
> 10   0  0  0  0  0  0  0  0  0   0
>
> >>test_mask (sample output for first 6 columns and 5 rows)
>
>             V1          V2                    V3                  V4
> V5          V6
> 1   0.034495155 0.990218632 0.601464511 0.014837676 0.058299799 0.818202398
> 2   0.683688879 0.541566798 0.898061753 0.008456439 0.800863858 0.381366477
> 3   0.464978895 0.844494807 0.281241401 0.290183593 0.552412608 0.158107894
> 4   0.200058599 0.270115497 0.179173377 0.341301213 0.672338934 0.322934948
> 5   0.595020534 0.633111358 0.861024861 0.811241462 0.326562913 0.363330793
>
>
> >>dput(test.vowel)
> structure(list(V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), V2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), V3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), V4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), V5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), V6 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), V7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), V8 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), V9 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), V10 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L)), .Names = c("V1", "V2", "V3", "V4", "V5",
> "V6", "V7", "V8", "V9", "V10"), class = "data.frame", row.names = c(NA,
> -254L))
>
> >>dput(test.mask)
> structure(list(V1 = c(0.034495155, 0.683688879, 0.464978895,
> 0.877838275, 0.943014871, 0.163438168), V2 = c(0.990218632, 0.541566798,
> 0.025567579, 0.159811845, 0.13874224, 0.752357297, 0.669662897,
> 0.854803677, 0.28129096, 0.858919573, 0.98992922, 0.980733255,
> 0.452405459, 0.376828532, 0.901208552), V3 = c(0.601464511, 0.898061753,
> 0.38395498, 0.923324665, 0.529832526, 0.182135661), V4 = c(0.014837676,
> 0.166132726, 0.893089168, 0.45962114, 0.018438501, 0.667720635
> ), V5 = c(0.058299799, 0.800863858, 0.552412608, 0.672338934,
> 0.185407787, 0.691367432), V6 = c(0.818202398, 0.381366477, 0.158107894,
> 0.322934948, 0.363330793, 0.161321704, 0.052999774, 0.513440813,
> 0.402895033, 0.201576687, 0.076826481), V7 = c(0.642136394, 0.099776129,
> 0.148801865, 0.603051825, 0.440594157, 0.215038249, 0.531623479,
> 0.534920743, 0.45784502, 0.080887221), V8 = c(0.016004048, 0.519115043,
> 0.149317949, 0.088362708, 0.705002368, 0.185590863, 0.434963787,
> 0.847410734, 0.78777694, 0.443995646, 0.53903599), V9 = c(0.400620271,
> 0.918472003, 0.446820588, 0.310981412, 0.734013866, 0.172112916
> ), V10 = c(0.532136091, 0.350028839, 0.40424688, 0.607395545,
> 0.392450857, 0.306530929, 0.756277707, 0.63606622, 0.718866192,
> 0.258778101)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6",
> "V7", "V8", "V9", "V10"), class = "data.frame", row.names = c(NA,
> -671L))
>
>
> Thank you once more for your help. I really can not say it enough.
>
> ps. original files i work with are attached.
>
> Cobbler.
>
> http://r.789695.n4.nabble.com/file/n2236083/3dMaskDump.txt 3dMaskDump.txt
> http://r.789695.n4.nabble.com/file/n2236083/vowel_features.txt
> vowel_features.txt
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2236083.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Linear Discriminant Analysis in R

cobbler_squad
Joris,

Thank you, I have corrected my mistakes. I very much appreciate your time and patience.

All my best,
Cobbler.
Reply | Threaded
Open this post in threaded view
|

Re: Linear Discriminant Analysis in R

dylan
This post has NOT been accepted by the mailing list yet.
In reply to this post by Joris FA Meys
Hi, how can I do the Normality Test in R?
And how to calculate the variance-covariance matrix?