Dataframes in PLS package

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Dataframes in PLS package

westland
I have been working with the pls procedure and have problems getting the procedure to work with matrix or frame data.  I suspect the problem lies in my understanding of frames, but can't find anything in the documentation that will help.

Here is what I have done:

I read in an 10000 x 8 table of data, and assign the first four columns to matrix A and the second four to matrix B

pls <-    read.table("C:/Users/Chris/Desktop/SEM Book/SEM Stat Example/Simple Header Data for SEM.csv",    header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
A <- c(pls[1],pls[2],pls[3],pls[4])
B <- c(pls[5],pls[6],pls[7],pls[8])

I then put these into the data.fram C, retaining the matrix structure per guidelines in the JSS article on the pls package

C <- data.frame(h = I(as.matrix(A)), c = I(as.matrix(B)))
showData(C, placement='-20+200', font=getRcmdr('logFont'), maxwidth=80,    maxheight=30)

63  55 1 0  44  37200 4 0
145  52 1 1  33  69300 4 1
104  32 0 1  68  56900 3 1
109  69 1 1  94  44300 6 1
221  61 0 1  72  79800 6 0
110  40 1 1  48  17600 5 1
194  41 0 0  85  58100 4 0
120  76 1 1  19  76700 3 0
210  61 0 0  41  37600 1 0
243 101 1 1  57  40800 5 1
163  62 0 1  64    400 3 0


So the h. and the c. columns should be matrices that I can regress in plsr function:

apls <- plsr(h ~ c, data = C)
summary(apls)

But this gives me:   [34] ERROR:    invalid type (list) for variable 'h'

I can get the plsr function to work with scalars for both predictor and response.   Can anyone tell me where I have gone wrong on the pls input?

apls <- plsr(w ~ h, data = pls)
summary(apls)

Data: X dimension: 10000 1
        Y dimension: 10000 1
Fit method: kernelpls
Number of components considered: 1
TRAINING: % variance explained
   1 comps
X  100.000
w    2.293
Chris Westland
University of Illinois  Chicago, IL    60607-7124
westland@uic.edu
http://uic.edu/~westland
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

Bjørn-Helge Mevik-3
westland <[hidden email]> writes:

> Here is what I have done:
>
> I read in an 10000 x 8 table of data, and assign the first four columns to
> matrix A and the second four to matrix B
>
> pls <-    read.table("C:/Users/Chris/Desktop/SEM Book/SEM Stat
> Example/Simple Header Data for SEM.csv",    header=TRUE, sep=",",
> na.strings="NA", dec=".", strip.white=TRUE)

The problem is here:

> A <- c(pls[1],pls[2],pls[3],pls[4])
> B <- c(pls[5],pls[6],pls[7],pls[8])

This creates lists A and B, not data frames.

Either use cbind() instead of c(), or simply say

A <- pls[,1:4]
B <- pls[,5:8]

The the rest should work.

Btw. it is probably a good idea to avoid single-character names for
variables.  Especially c and C, because they are names of functions in R.

--
Regards,
Bjørn-Helge Mevik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

westland
This post was updated on .
Dear  Bjørn-Helge Mevik:

Thank you so much (!!) for your speedy help.   This works perfectly, no
fuss.   My understanding of data.frames is still fuzzy, so I knew it was
some flaw in my understanding.

I enjoyed your very informative writeups on the package in JSS and R-News;
they told me a lot about the method and its objectives.  

Thanks again for your help on this

Chris





2012/2/21 Bjørn-Helge Mevik <b.h.mevik@usit.uio.no>

> westland <westland@uic.edu> writes:
>
> > Here is what I have done:
> >
> > I read in an 10000 x 8 table of data, and assign the first four columns
> to
> > matrix A and the second four to matrix B
> >
> > pls <-    read.table("C:/Users/Chris/Desktop/SEM Book/SEM Stat
> > Example/Simple Header Data for SEM.csv",    header=TRUE, sep=",",
> > na.strings="NA", dec=".", strip.white=TRUE)
>
> The problem is here:
>
> > A <- c(pls[1],pls[2],pls[3],pls[4])
> > B <- c(pls[5],pls[6],pls[7],pls[8])
>
> This creates lists A and B, not data frames.
>
> Either use cbind() instead of c(), or simply say
>
> A <- pls[,1:4]
> B <- pls[,5:8]
>
> The the rest should work.
>
> Btw. it is probably a good idea to avoid single-character names for
> variables.  Especially c and C, because they are names of functions in R.
>
> --
> Regards,
> Bjørn-Helge Mevik
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
J. Christopher Westland
Professor, Information & Decision Sciences, University of Illinois - Chicago
601 S. Morgan Street (UH2400) Chicago, IL    60607-7124
Telephone       +1.312.860.0587
Google Voice  +1.209.757.8849
westland@uic.edu
http://uic.edu/~westland

        [[alternative HTML version deleted]]


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Chris Westland
University of Illinois  Chicago, IL    60607-7124
westland@uic.edu
http://uic.edu/~westland
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

westland
I am still/again having trouble getting PLSR to recognize the input data frames.   Here is what I have done:
 
 I read in an 10000 x 8 table of data to 'pls'

assign the first four columns to  matrix 'dep' and the second four to matrix 'ind' with the following commands:

dep <- pls[,1:4]
ind <- pls[,5:8]

I create the data.frame 'eqn' :

eqn <- data.frame(depy = I(as.matrix(dep)), indx = I(as.matrix(ind)))

And run the PLSR package

apls <- plsr(depy ~ indx, data=eqn)

I seem to be getting either one of two error messages:

[12] ERROR:  
  invalid type (list) for variable 'dep'
[13] ERROR:  
  object of type 'closure' is not subsettable

I'm sure now that this is a problem in my creation of data.frames, but can't seem to find anything that describes the problem
Chris Westland
University of Illinois  Chicago, IL    60607-7124
westland@uic.edu
http://uic.edu/~westland
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

Michael Weylandt
Can you post dput(head(eqn, 30)) so we can take a look at your data?
It's something of a cryptic error and that would go a long way in
helping us help you.

Without that though, I'm not sure you need the I(as.matrix.(dep)) and
I(as.matrix(ind)), I would imagine (untested) that eqn <-
data.frame(depy = dep, indx = ind) would work (probably better as I()
changes things just a little).

I have a hunch that the colnames of eqn are not actually depy and indx
and that's what ultimately leads to the error. Can you look at
colnames(eqn) and use those exactly in the formula to plsr? That might
fix it.

Michael



On Sat, Mar 3, 2012 at 5:01 PM, westland <[hidden email]> wrote:

> I am still/again having trouble getting PLSR to recognize the input data
> frames.   Here is what I have done:
>
>  I read in an 10000 x 8 table of data to 'pls'
>
> assign the first four columns to  matrix 'dep' and the second four to matrix
> 'ind' with the following commands:
>
> dep <- pls[,1:4]
> ind <- pls[,5:8]
>
> I create the data.frame 'eqn' :
>
> eqn <- data.frame(depy = I(as.matrix(dep)), indx = I(as.matrix(ind)))
>
> And run the PLSR package
>
> apls <- plsr(depy ~ indx, data=eqn)
>
> I seem to be getting either one of two error messages:
>
> [12] ERROR:
>  invalid type (list) for variable 'dep'
> [13] ERROR:
>  object of type 'closure' is not subsettable
>
> I'm sure now that this is a problem in my creation of data.frames, but can't
> seem to find anything that describes the problem
>
>
> -----
> J. Christopher Westland
> Professor, Information & Decision Sciences, University of Illinois - Chicago
> 601 S. Morgan Street (UH2400) Chicago, IL    60607-7124
> Telephone       +1.312.860.0587
> Google Voice  +1.209.757.8849
> [hidden email]
> http://uic.edu/~westland
> --
> View this message in context: http://r.789695.n4.nabble.com/Dataframes-in-PLS-package-tp4405798p4442436.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

Kazuo Ishii
In reply to this post by westland
2012/3/4, westland <[hidden email]>:

> I am still/again having trouble getting PLSR to recognize the input data
> frames.   Here is what I have done:
>
>  I read in an 10000 x 8 table of data to 'pls'
>
> assign the first four columns to  matrix 'dep' and the second four to matrix
> 'ind' with the following commands:
>
> dep <- pls[,1:4]
> ind <- pls[,5:8]
>
> I create the data.frame 'eqn' :
>
> eqn <- data.frame(depy = I(as.matrix(dep)), indx = I(as.matrix(ind)))

please, type as below, after you make egn:

attach(egn)


I think it work well.

>
> And run the PLSR package
>
> apls <- plsr(depy ~ indx, data=eqn)
>
> I seem to be getting either one of two error messages:
>
> [12] ERROR:
>   invalid type (list) for variable 'dep'
> [13] ERROR:
>   object of type 'closure' is not subsettable
>
> I'm sure now that this is a problem in my creation of data.frames, but can't
> seem to find anything that describes the problem
>
>
> -----
> J. Christopher Westland
> Professor, Information & Decision Sciences, University of Illinois - Chicago
> 601 S. Morgan Street (UH2400) Chicago, IL    60607-7124
> Telephone       +1.312.860.0587
> Google Voice  +1.209.757.8849
> [hidden email]
> http://uic.edu/~westland
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Dataframes-in-PLS-package-tp4405798p4442436.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Kazuo Ishii, Ph.D., Professor of Genome Science,
Tokyo University of Agriculture and Technology
3-5-8 Saiwai-cho, Fuchu, Tokyo 183-8509, JAPAN
Email: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

westland
The attach(eqn) seems to change some things in the format, but doesn't solve the problem ... here is my script


> dep <- pls[,1:4]

> ind <- pls[,5:8]

> eqn <- data.frame(depy = dep, indx = ind)

> attach(eqn)
The following object(s) are masked from 'eqn (position 3)':

    depy.d, depy.h, depy.s, depy.w, indx.a, indx.i, indx.r, indx.x

> dput(eqn)
structure(list(depy.w = c(63L, 145L, 104L, 109L, 221L, 110L,
194L, 120L, 210L, 243L, 163L, 93L, 167L, 232L, 112L, 185L, 103L,
…. a lot of formatting information ….
    0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L,
    0L, 1L, 0L, 1L, 1L, 1L, 1L)), .Names = c("depy.w", "depy.h",
"depy.d", "depy.s", "indx.a", "indx.i", "indx.r", "indx.x"), row.names = c(NA,
-10000L), class = "data.frame")


'eqn'  still doesn't seem to be a data.frame that the PLSR package will recognize

Chris Westland
Chris Westland
University of Illinois  Chicago, IL    60607-7124
westland@uic.edu
http://uic.edu/~westland
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

westland
In reply to this post by Michael Weylandt
Thanks Michael.  I had tried to drop the I(as.matrix(...)) conversions, and fiddled with a number of other permutations of code ... I still can't seem to get it right.  

The col names appear to be depy and indx ... here is the output (and the rows are just line numbers)

 
> colnames(eqn)
[1] "depy.w" "depy.h" "depy.d" "depy.s" "indx.a" "indx.i" "indx.r" "indx.x"row>
 
> rownames(eqn)
    [1] "1"     "2"     "3"     "4"     "5"     "6"     "7"     "8"     "9"     "10"    "11"    "12"    "13"    "14"    "15"    "16"    "17"    "18"    "19"    "20"    "21"    "22"    "23"    "24"    "25"    "26"    "27"    "28"  
   [29] "29"    "30"    "31"….etc.





Here is the dput(eqn)  and showData for the file 'eqn':

 
> dput(head(eqn, 30))
structure(list(depy.w = c(63L, 145L, 104L, 109L, 221L, 110L,
194L, 120L, 210L, 243L, 163L, 93L, 167L, 232L, 112L, 185L, 103L,
202L, 203L, 207L, 239L, 109L, 112L, 176L, 126L, 145L, 125L, 191L,
110L, 92L), depy.h = c(55L, 52L, 32L, 69L, 61L, 40L, 41L, 76L,
61L, 101L, 62L, 55L, 61L, 65L, 52L, 52L, 43L, 87L, 57L, 37L,
74L, 44L, 45L, 52L, 54L, 51L, 66L, 53L, 43L, 36L), depy.d = c(1L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L), depy.s = c(0L,
1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), indx.a = c(44L,
33L, 68L, 94L, 72L, 48L, 85L, 19L, 41L, 57L, 64L, 27L, 64L, 32L,
31L, 88L, 80L, 70L, 68L, 58L, 42L, 87L, 69L, 52L, 45L, 25L, 66L,
80L, 17L, 70L), indx.i = c(37200L, 69300L, 56900L, 44300L, 79800L,
17600L, 58100L, 76700L, 37600L, 40800L, 400L, 33400L, 6000L,
7400L, 94000L, 84200L, 0L, 0L, 43300L, 0L, 68600L, 47300L, 16100L,
95900L, 69200L, 12200L, 7500L, 70600L, 11400L, 0L), indx.r = c(4L,
4L, 3L, 6L, 6L, 5L, 4L, 3L, 1L, 5L, 3L, 3L, 5L, 1L, 6L, 4L, 2L,
1L, 4L, 1L, 4L, 6L, 1L, 6L, 4L, 2L, 2L, 5L, 3L, 4L), indx.x = c(0L,
1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 2L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L)), .Names = c("depy.w",
"depy.h", "depy.d", "depy.s", "indx.a", "indx.i", "indx.r", "indx.x"
), row.names = c(NA, 30L), class = "data.frame")
 
 


> showData(eqn)
 
depy.w depy.h depy.d depy.s indx.a indx.i indx.r indx.x
  63     55      1      0     44  37200      4      0
   145     52      1      1     33  69300      4      1
   104     32      0      1     68  56900      3      1
   109     69      1      1     94  44300      6      1
   221     61      0      1     72  79800      6      0
   110     40      1      1     48  17600      5      1
   194     41      0      0     85  58100      4      0
   120     76      1      1     19  76700      3      0
   210     61      0      0     41  37600      1      0 ... etc.



Initially, I had input a file 'pls' with the script:

dep <- pls[,1:4]
ind <- pls[,5:8]
eqn <- data.frame(depy = dep, indx = ind)
apls <- plsr(depy ~ indx, data=eqn)

.... and this gives me   [7] ERROR:  object 'depy' not found

Note that the original input comes from a matrix 'pls' and my intent is to convert this to data.frames that the plsr package can parse ...  a dput(pls) gives me ...


  .... lots and lots of leading line information ...  0L, 0L, 1L, 2L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 2L, 1L, 0L, 1L,
    2L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L,
    0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L,
    1L, 0L, 1L, 1L, 1L, 1L)), .Names = c("w", "h", "d", "s",
"a", "i", "r", "x"), class = "data.frame", row.names = c(NA,
-10000L))


If you have any other suggestions concerning how I might fiddle the files to get them into a format that PLSR package would like, that would be great

Chris Westland
Chris Westland
University of Illinois  Chicago, IL    60607-7124
westland@uic.edu
http://uic.edu/~westland
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

Michael Weylandt
In reply to this post by Kazuo Ishii
No. Do not do that. While it looks nice at first, it quickly becomes
the source of innumerable errors.

Michael

On Sun, Mar 4, 2012 at 1:56 AM, Kazuo Ishii <[hidden email]> wrote:

> 2012/3/4, westland <[hidden email]>:
>> I am still/again having trouble getting PLSR to recognize the input data
>> frames.   Here is what I have done:
>>
>>  I read in an 10000 x 8 table of data to 'pls'
>>
>> assign the first four columns to  matrix 'dep' and the second four to matrix
>> 'ind' with the following commands:
>>
>> dep <- pls[,1:4]
>> ind <- pls[,5:8]
>>
>> I create the data.frame 'eqn' :
>>
>> eqn <- data.frame(depy = I(as.matrix(dep)), indx = I(as.matrix(ind)))
>
> please, type as below, after you make egn:
>
> attach(egn)
>
>
> I think it work well.
>
>>
>> And run the PLSR package
>>
>> apls <- plsr(depy ~ indx, data=eqn)
>>
>> I seem to be getting either one of two error messages:
>>
>> [12] ERROR:
>>   invalid type (list) for variable 'dep'
>> [13] ERROR:
>>   object of type 'closure' is not subsettable
>>
>> I'm sure now that this is a problem in my creation of data.frames, but can't
>> seem to find anything that describes the problem
>>
>>
>> -----
>> J. Christopher Westland
>> Professor, Information & Decision Sciences, University of Illinois - Chicago
>> 601 S. Morgan Street (UH2400) Chicago, IL    60607-7124
>> Telephone       +1.312.860.0587
>> Google Voice  +1.209.757.8849
>> [hidden email]
>> http://uic.edu/~westland
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Dataframes-in-PLS-package-tp4405798p4442436.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Kazuo Ishii, Ph.D., Professor of Genome Science,
> Tokyo University of Agriculture and Technology
> 3-5-8 Saiwai-cho, Fuchu, Tokyo 183-8509, JAPAN
> Email: [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

Michael Weylandt
In reply to this post by Michael Weylandt
It's nice to cc the list for archival reasons -- it also usually gets
you a faster response as more folks can see how the thread develops.

The problem is that the colnames aren't ctually depy and indx: they
are depy.w, depy.h, etc. If you want to model, you need to use those
as is: e.g., with your code

eqn <- structure(list(depy.w = c(63L, 145L, 104L, 109L, 221L, 110L,
194L, 120L, 210L, 243L, 163L, 93L, 167L, 232L, 112L, 185L, 103L,
202L, 203L, 207L, 239L, 109L, 112L, 176L, 126L, 145L, 125L, 191L,
110L, 92L), depy.h = c(55L, 52L, 32L, 69L, 61L, 40L, 41L, 76L,
61L, 101L, 62L, 55L, 61L, 65L, 52L, 52L, 43L, 87L, 57L, 37L,
74L, 44L, 45L, 52L, 54L, 51L, 66L, 53L, 43L, 36L), depy.d = c(1L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L), depy.s = c(0L,
1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), indx.a = c(44L,
33L, 68L, 94L, 72L, 48L, 85L, 19L, 41L, 57L, 64L, 27L, 64L, 32L,
31L, 88L, 80L, 70L, 68L, 58L, 42L, 87L, 69L, 52L, 45L, 25L, 66L,
80L, 17L, 70L), indx.i = c(37200L, 69300L, 56900L, 44300L, 79800L,
17600L, 58100L, 76700L, 37600L, 40800L, 400L, 33400L, 6000L,
7400L, 94000L, 84200L, 0L, 0L, 43300L, 0L, 68600L, 47300L, 16100L,
95900L, 69200L, 12200L, 7500L, 70600L, 11400L, 0L), indx.r = c(4L,
4L, 3L, 6L, 6L, 5L, 4L, 3L, 1L, 5L, 3L, 3L, 5L, 1L, 6L, 4L, 2L,
1L, 4L, 1L, 4L, 6L, 1L, 6L, 4L, 2L, 2L, 5L, 3L, 4L), indx.x = c(0L,
1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 2L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L)), .Names = c("depy.w",
"depy.h", "depy.d", "depy.s", "indx.a", "indx.i", "indx.r", "indx.x"
), row.names = c(NA, 30L), class = "data.frame")

library(pls)

apls <- plsr(depy.w + depy.h + depy.d + depy.s ~ ., data=eqn) # Works
like a charm

I don't believe there's a way to do wildcard names in general
(something like depy.*, but I'd welcome correction) but you can save
some key strokes by using the "." term to mean "everything else I
haven't already used"

Hope this helps,

Michael



On Sun, Mar 4, 2012 at 1:30 PM, Chris Westland <[hidden email]> wrote:

> Thanks Michael.  I had tried to drop the I(as.matrix(...)) conversions, and
> fiddled with a number of other permutations of code ... I still can't seem
> to get it right.
>
> The col names appear to be depy and indx ... here is the output (and the
> rows are just line numbers)
>
>
>
>> colnames(eqn)
>
> [1] "depy.w" "depy.h" "depy.d" "depy.s" "indx.a" "indx.i" "indx.r"
> "indx.x"row>
>
>
>
>> rownames(eqn)
>
>     [1]
> "1"     "2"     "3"     "4"     "5"     "6"     "7"     "8"     "9"     "10"    "11"    "12"    "13"    "14"    "15"    "16"    "17"    "18"    "19"    "20"    "21"    "22"    "23"    "24"    "25"    "26"    "27"    "28"
>
>    [29] "29"    "30"    "31"….etc.
>
>
>
>
>
>
> Here is the dput(eqn)  and showData for the file 'eqn':
>
>
>
>> dput(head(eqn, 30))
>
> structure(list(depy.w = c(63L, 145L, 104L, 109L, 221L, 110L,
>
> 194L, 120L, 210L, 243L, 163L, 93L, 167L, 232L, 112L, 185L, 103L,
>
> 202L, 203L, 207L, 239L, 109L, 112L, 176L, 126L, 145L, 125L, 191L,
>
> 110L, 92L), depy.h = c(55L, 52L, 32L, 69L, 61L, 40L, 41L, 76L,
>
> 61L, 101L, 62L, 55L, 61L, 65L, 52L, 52L, 43L, 87L, 57L, 37L,
>
> 74L, 44L, 45L, 52L, 54L, 51L, 66L, 53L, 43L, 36L), depy.d = c(1L,
>
> 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
>
> 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L), depy.s = c(0L,
>
> 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
>
> 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), indx.a = c(44L,
>
> 33L, 68L, 94L, 72L, 48L, 85L, 19L, 41L, 57L, 64L, 27L, 64L, 32L,
>
> 31L, 88L, 80L, 70L, 68L, 58L, 42L, 87L, 69L, 52L, 45L, 25L, 66L,
>
> 80L, 17L, 70L), indx.i = c(37200L, 69300L, 56900L, 44300L, 79800L,
>
> 17600L, 58100L, 76700L, 37600L, 40800L, 400L, 33400L, 6000L,
>
> 7400L, 94000L, 84200L, 0L, 0L, 43300L, 0L, 68600L, 47300L, 16100L,
>
> 95900L, 69200L, 12200L, 7500L, 70600L, 11400L, 0L), indx.r = c(4L,
>
> 4L, 3L, 6L, 6L, 5L, 4L, 3L, 1L, 5L, 3L, 3L, 5L, 1L, 6L, 4L, 2L,
>
> 1L, 4L, 1L, 4L, 6L, 1L, 6L, 4L, 2L, 2L, 5L, 3L, 4L), indx.x = c(0L,
>
> 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 2L, 0L, 0L, 1L, 0L,
>
> 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L)), .Names = c("depy.w",
>
> "depy.h", "depy.d", "depy.s", "indx.a", "indx.i", "indx.r", "indx.x"
>
> ), row.names = c(NA, 30L), class = "data.frame")
>
>
>
>
>
>
>
>> showData(eqn)
>
>
>
> depy.w depy.h depy.d depy.s indx.a indx.i indx.r indx.x
>
>   63     55      1      0     44  37200      4      0
>
>    145     52      1      1     33  69300      4      1
>
>    104     32      0      1     68  56900      3      1
>
>    109     69      1      1     94  44300      6      1
>
>    221     61      0      1     72  79800      6      0
>
>    110     40      1      1     48  17600      5      1
>
>    194     41      0      0     85  58100      4      0
>
>    120     76      1      1     19  76700      3      0
>
>    210     61      0      0     41  37600      1      0 ... etc.
>
>
>
>
> Initially, I had input a file 'pls' with the script:
>
>
> dep <- pls[,1:4]
>
> ind <- pls[,5:8]
>
> eqn <- data.frame(depy = dep, indx = ind)
>
> apls <- plsr(depy ~ indx, data=eqn)
>
>
> .... and this gives me   [7] ERROR:  object 'depy' not found
>
>
> Note that the original input comes from a matrix 'pls' and my intent is to
> convert this to data.frames that the plsr package can parse ...  a dput(pls)
> gives me ...
>
>
>
>   .... lots and lots of leading line information ...  0L, 0L, 1L, 2L, 0L,
> 0L, 1L, 1L, 0L, 0L, 0L, 2L, 1L, 0L, 1L,
>
>     2L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L,
>
>     0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L,
>
>     1L, 0L, 1L, 1L, 1L, 1L)), .Names = c("w", "h", "d", "s",
>
> "a", "i", "r", "x"), class = "data.frame", row.names = c(NA,
>
> -10000L))
>
>
>
> If you have any other suggestions concerning how I might fiddle the files to
> get them into a format that PLSR package would like, that would be great
>
>
> Chris Westland
>
>
>
> On Sat, Mar 3, 2012 at 8:17 PM, R. Michael Weylandt
> <[hidden email]> wrote:
>>
>> Can you post dput(head(eqn, 30)) so we can take a look at your data?
>> It's something of a cryptic error and that would go a long way in
>> helping us help you.
>>
>> Without that though, I'm not sure you need the I(as.matrix.(dep)) and
>> I(as.matrix(ind)), I would imagine (untested) that eqn <-
>> data.frame(depy = dep, indx = ind) would work (probably better as I()
>> changes things just a little).
>>
>> I have a hunch that the colnames of eqn are not actually depy and indx
>> and that's what ultimately leads to the error. Can you look at
>> colnames(eqn) and use those exactly in the formula to plsr? That might
>> fix it.
>>
>> Michael
>>
>>
>>
>> On Sat, Mar 3, 2012 at 5:01 PM, westland <[hidden email]> wrote:
>> > I am still/again having trouble getting PLSR to recognize the input data
>> > frames.   Here is what I have done:
>> >
>> >  I read in an 10000 x 8 table of data to 'pls'
>> >
>> > assign the first four columns to  matrix 'dep' and the second four to
>> > matrix
>> > 'ind' with the following commands:
>> >
>> > dep <- pls[,1:4]
>> > ind <- pls[,5:8]
>> >
>> > I create the data.frame 'eqn' :
>> >
>> > eqn <- data.frame(depy = I(as.matrix(dep)), indx = I(as.matrix(ind)))
>> >
>> > And run the PLSR package
>> >
>> > apls <- plsr(depy ~ indx, data=eqn)
>> >
>> > I seem to be getting either one of two error messages:
>> >
>> > [12] ERROR:
>> >  invalid type (list) for variable 'dep'
>> > [13] ERROR:
>> >  object of type 'closure' is not subsettable
>> >
>> > I'm sure now that this is a problem in my creation of data.frames, but
>> > can't
>> > seem to find anything that describes the problem
>> >
>> >
>> > -----
>> > J. Christopher Westland
>> > Professor, Information & Decision Sciences, University of Illinois -
>> > Chicago
>> > 601 S. Morgan Street (UH2400) Chicago, IL    60607-7124
>> > Telephone       +1.312.860.0587
>> > Google Voice  +1.209.757.8849
>> > [hidden email]
>> > http://uic.edu/~westland
>> > --
>> > View this message in context:
>> > http://r.789695.n4.nabble.com/Dataframes-in-PLS-package-tp4405798p4442436.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> J. Christopher Westland
> Professor, Information & Decision Sciences, University of Illinois - Chicago
> 601 S. Morgan Street (UH2400) Chicago, IL    60607-7124
> Telephone       +1.312.860.0587
> Google Voice  +1.209.757.8849
> [hidden email]
> http://uic.edu/~westland
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

westland

R still doesn't seem to recognize the data.frame ...  I get a [6] ERROR:  object 'depy.w' not found from the following code:

dep <- pls[,1:4]
ind <- pls[,5:8]
eqn <- data.frame(depy = dep, indx = ind)
apls <- plsr(depy.w + depy.h + depy.d + depy.s ~ indx.a + indx.i + indx.r + indx.x,  data=eqn)


BUT .... I DID try to cbind() these after add-concatenating them (not sure exactly what I am doing) like so ...

apls <- plsr(cbind(depy.w ,depy.h , depy.d , depy.s) ~ cbind(indx.a , indx.i , indx.r,indx.x), data=eqn)

And this seems to do the trick ... here is my output.


> summary(apls)
Data: X dimension: 10000 4
        Y dimension: 10000 4
Fit method: kernelpls
Number of components considered: 4
TRAINING: % variance explained
          1 comps    2 comps    3 comps    4 comps
X       1.000e+02  1.000e+02  1.000e+02  100.00000
depy.w  9.138e-03  9.362e-03  1.087e-02    0.01160
depy.h  4.844e-04  5.010e-03  5.484e-03    0.01304
depy.d  1.900e-02  1.915e-02  1.919e-02    0.01963
depy.s  2.532e-03  1.010e-02  1.104e-02    0.01171


Unfortunately, I still don't understand what I'm doing ... I believe cbind(...) forced the two data.frames depy and indx.   Can anyone perhaps give me a clearer explanation?
Chris Westland
University of Illinois  Chicago, IL    60607-7124
westland@uic.edu
http://uic.edu/~westland
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

Bjørn-Helge Mevik-3
In reply to this post by Michael Weylandt
"R. Michael Weylandt" <[hidden email]> writes:

> Without that though, I'm not sure you need the I(as.matrix.(dep)) and
> I(as.matrix(ind)), I would imagine (untested) that eqn <-
> data.frame(depy = dep, indx = ind) would work (probably better as I()
> changes things just a little).

The I() must be there to prevent data.frame() from separating the
coloumns of the matrices into individual variables in the data frame.
Without I() there will be no variables depy and indx in the data frame.

Try this:

> A <- matrix(1:4, ncol=2)
> B <- matrix(2:5, ncol=2)
> A
     [,1] [,2]
[1,]    1    3
[2,]    2    4
> B
     [,1] [,2]
[1,]    2    4
[2,]    3    5

> ## With I():
> d1 <- data.frame(A = I(A), B = I(B))
> d1
  A.1 A.2 B.1 B.2
1   1   3   2   4
2   2   4   3   5
> names(d1)
[1] "A" "B"
> d1$A
     [,1] [,2]
[1,]    1    3
[2,]    2    4

> ## Without I():
> d2 <- data.frame(A = A, B = B)
> d2
  A.1 A.2 B.1 B.2
1   1   3   2   4
2   2   4   3   5
> names(d2)
[1] "A.1" "A.2" "B.1" "B.2"
> d2$A
NULL
> d2$A.1
[1] 1 2


--
Regards,
Bjørn-Helge Mevik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

Bjørn-Helge Mevik-3
In reply to this post by westland
westland <[hidden email]> writes:

> Here is the dput(eqn)  and showData for the file 'eqn':
[...]

>> showData(eqn)
>  
> depy.w depy.h depy.d depy.s indx.a indx.i indx.r indx.x
>   63     55      1      0     44  37200      4      0
>    145     52      1      1     33  69300      4      1
>    104     32      0      1     68  56900      3      1
>    109     69      1      1     94  44300      6      1
>    221     61      0      1     72  79800      6      0
>    110     40      1      1     48  17600      5      1
>    194     41      0      0     85  58100      4      0
>    120     76      1      1     19  76700      3      0
>    210     61      0      0     41  37600      1      0 ... etc.

Okay, let me guess: you took the data in the file pls, created a data
frame eqn with two matrices in it, then used write.table() to write
eqn to a file, and then read it back with read.table().

If that is so, the problem you have is that write.table() will separate
the coloumns of the matrices into separate coloumns in the file (it
really has no other choice), and then read.table() will of course read
those in as separate coloumns again.

You have two solutions:

1) Repeat the commands to recreate the eqn data frame as a a data frame
with matrices, after reading it in from file:
   eqn <- data.frame(depy = I(as.matrix(eqn[,1:4])),
                     indx = I(as.matrix(eqn[,5:8])))

2) Save the data frame in an .RData file with save() instead of as a
text file with write.table().  That will keep the structure of the
variable.


>
>
>
> Initially, I had input a file 'pls' with the script:
>
> dep <- pls[,1:4]
> ind <- pls[,5:8]
> eqn <- data.frame(depy = dep, indx = ind)
> apls <- plsr(depy ~ indx, data=eqn)
>
> .... and this gives me   [7] ERROR:  object 'depy' not found

because you are missing the I(as.matrix()).

--
Regards,
Bjørn-Helge Mevik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dataframes in PLS package

Bjørn-Helge Mevik-3
In reply to this post by westland
westland <[hidden email]> writes:

> R still doesn't seem to recognize the data.frame ...  I get a [6] ERROR:
> object 'depy.w' not found from the following code:
>
> dep <- pls[,1:4]
> ind <- pls[,5:8]
> eqn <- data.frame(depy = dep, indx = ind)
> apls <- plsr(depy.w + depy.h + depy.d + depy.s ~ indx.a + indx.i + indx.r +
> indx.x,  data=eqn)
>
>
> BUT .... I DID try to cbind() these after add-concatenating them (not sure
> exactly what I am doing) like so ...
>
> apls <- plsr(cbind(depy.w ,depy.h , depy.d , depy.s) ~ cbind(indx.a , indx.i
> , indx.r,indx.x), data=eqn)

For creating multi-coloumn responses on-the-fly, using cbind() like this
works.  However, you don't need that for the predictors; there you can
get by with just using '+'.

If you only have a few predictors/responses, this will work okay, but if
you have many, it will take a lot of typing, and make the
formula handling part of plsr() take _ages_.  Then using matrices is
easier and faster.

--
Bjørn-Helge Mevik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.