

Hello,
I am trying to run an ordinal logistic regression (polr) using the package 'MASS'.
I have successfully run other regression classes (glm, multinom) without much problem, but with the 'polr' class I get the following error:
" Error in svd(X) : infinite or missing values in 'x' "
which appears when I run the "summary" command.
The data file is large (585000 rows) and has no NA, 9999 or blank values.
My script (in brief) is as follows, with results:
############
> library(MASS)
>
> ## ADD DATA
> Jdata< read.delim("/Analysis/20120709 JLittle data file.txt", header=T)
>
> attach(Jdata)
> names(Jdata)
[1] "POINTID" "Lat_Y_pos" "JVeg5" "Subregion" "Rock_U_Nam" "Rock_Name" "Elevation" "Slope" "Aspect" "Hillshade" "Stream_dist" "Coast_dist" "Coast_SE"
[14] "Coast_E" "Wind_310" "TPI" "Landform"
>
> Global < polr(JVeg5 ~ Elevation + Lat_Y_pos + Coast_dist + Stream_dist, data=Jdata)
>
> summary(Global)
Error in svd(X) : infinite or missing values in 'x'
>
##Try with omit NA command
> Global < polr(JVeg5 ~ Elevation + Lat_Y_pos + Coast_dist + Stream_dist, data=Jdata, na.action = na.omit, Hess = TRUE)
>
> summary(Global)
Error in svd(X) : infinite or missing values in 'x'
############
Does this imply an 'infinite value' and what would this mean?
If anyone has any idea how to address this error, I would very much appreciate your response.
Thank you in advance.
Jeremy
Date File Attachment (200 rows):
20120709_JLittle_data_file.txt


Since its something about the Hessian, and occurs in the vcov() call, have you thought about the note:
"The vcov method uses the approximate Hessian: for reliable results the model matrix should be sensibly scaled with all columns having range the order of one. "
?
I'm sorry i can't help you much further here, no idea about ordinal logistic regression ;)
On 09.07.2012, at 11:55, Jeremy Little wrote:
> Hello,
>
> I am trying to run an ordinal logistic regression (polr) using the package
> 'MASS'.
>
> I have successfully run other regression classes (glm, multinom) without
> much problem, but with the 'polr' class I get the following error:
> " Error in svd(X) : infinite or missing values in 'x' "
> which appears when I run the "summary" command.
>
> The data file is large (585000 rows) and has no NA, 9999 or blank values.
>
> My script (in brief) is as follows, with results:
>
> ############
>> library(MASS)
>>
>> ## ADD DATA
>> Jdata< read.delim("/Analysis/20120709 JLittle data file.txt", header=T)
>>
>> attach(Jdata)
>> names(Jdata)
> [1] "POINTID" "Lat_Y_pos" "JVeg5" "Subregion" "Rock_U_Nam"
> "Rock_Name" "Elevation" "Slope" "Aspect" "Hillshade"
> "Stream_dist" "Coast_dist" "Coast_SE"
> [14] "Coast_E" "Wind_310" "TPI" "Landform"
>>
>> Global < polr(JVeg5 ~ Elevation + Lat_Y_pos + Coast_dist + Stream_dist,
>> data=Jdata)
>>
>> summary(Global)
> Error in svd(X) : infinite or missing values in 'x'
>>
> ##Try with omit NA command
>> Global < polr(JVeg5 ~ Elevation + Lat_Y_pos + Coast_dist + Stream_dist,
>> data=Jdata, na.action = na.omit, Hess = TRUE)
>>
>> summary(Global)
> Error in svd(X) : infinite or missing values in 'x'
> ############
>
> Does this imply an 'infinite value' and what would this mean?
>
> If anyone has any idea how to address this error, I would very much
> appreciate your response.
>
> Thank you in advance.
>
> Jeremy
>
> Date File Attachment (200 rows):
> http://r.789695.n4.nabble.com/file/n4635829/20120709_JLittle_data_file.txt> 20120709_JLittle_data_file.txt
>
>
> 
> View this message in context: http://r.789695.n4.nabble.com/PackageMASSpolrErrorinsvdXinfiniteormissingvaluesinxtp4635829.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Jessica,
thank you for your prompt response. Yes I had deduced it had to do with the Hessian.
However, I am not clear what "all columns having range the order of one" actually means and what this means for my data. Does this mean removing decimals (ie by shifting the decimal place)?
I would have thought that the data matrix was already "sensibly scaled".
Any further insight greatly appreciated.
Regards


Hi Jeremy,
newData<data.frame(JVeg5=factor(Jdata[,"JVeg5"]),scale(Jdata[,c("Elevation","Lat_Y_pos","Coast_dist","Stream_dist")]))
Global < polr(JVeg5 ~ Elevation + Lat_Y_pos + Coast_dist + Stream_dist,
data=newData, na.action = na.omit, Hess = TRUE)
summary(Global)
Does this still do what you want? At least it doesn't produce the error like this.
greetings Jessi
On 09.07.2012, at 11:55, Jeremy Little wrote:
> Hello,
>
> I am trying to run an ordinal logistic regression (polr) using the package
> 'MASS'.
>
> I have successfully run other regression classes (glm, multinom) without
> much problem, but with the 'polr' class I get the following error:
> " Error in svd(X) : infinite or missing values in 'x' "
> which appears when I run the "summary" command.
>
> The data file is large (585000 rows) and has no NA, 9999 or blank values.
>
> My script (in brief) is as follows, with results:
>
> ############
>> library(MASS)
>>
>> ## ADD DATA
>> Jdata< read.delim("/Analysis/20120709 JLittle data file.txt", header=T)
>>
>> attach(Jdata)
>> names(Jdata)
> [1] "POINTID" "Lat_Y_pos" "JVeg5" "Subregion" "Rock_U_Nam"
> "Rock_Name" "Elevation" "Slope" "Aspect" "Hillshade"
> "Stream_dist" "Coast_dist" "Coast_SE"
> [14] "Coast_E" "Wind_310" "TPI" "Landform"
>>
>> Global < polr(JVeg5 ~ Elevation + Lat_Y_pos + Coast_dist + Stream_dist,
>> data=Jdata)
>>
>> summary(Global)
> Error in svd(X) : infinite or missing values in 'x'
>>
> ##Try with omit NA command
>> Global < polr(JVeg5 ~ Elevation + Lat_Y_pos + Coast_dist + Stream_dist,
>> data=Jdata, na.action = na.omit, Hess = TRUE)
>>
>> summary(Global)
> Error in svd(X) : infinite or missing values in 'x'
> ############
>
> Does this imply an 'infinite value' and what would this mean?
>
> If anyone has any idea how to address this error, I would very much
> appreciate your response.
>
> Thank you in advance.
>
> Jeremy
>
> Date File Attachment (200 rows):
> http://r.789695.n4.nabble.com/file/n4635829/20120709_JLittle_data_file.txt> 20120709_JLittle_data_file.txt
>
>
> 
> View this message in context: http://r.789695.n4.nabble.com/PackageMASSpolrErrorinsvdXinfiniteormissingvaluesinxtp4635829.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Jeremy,
I think Jessica is right that probably you could make polr converge
and produce a Hessian if the data are better scaled, but there might
also be other things not allowing you to get the Hessian/vcov. Could
be insightful if you showed us the result of
str(Jdata)
Also, I am thinking that perhaps the another implementation of ordinal
regression models might avoid the problem. You could try the ordinal
package (of which I am the author)  the following should reproduce
the MASS::polr results:
install.packages(ordinal)
library(ordinal)
Global < clm(JVeg5 ~ Elevation + Lat_Y_pos + Coast_dist +
Stream_dist, data=Jdata)
summary(Global)
Another option would be Frank Harrell's lrm function in the rms package.
HTH,
Rune
On 9 July 2012 11:55, Jeremy Little < [hidden email]> wrote:
> Hello,
>
> I am trying to run an ordinal logistic regression (polr) using the package
> 'MASS'.
>
> I have successfully run other regression classes (glm, multinom) without
> much problem, but with the 'polr' class I get the following error:
> " Error in svd(X) : infinite or missing values in 'x' "
> which appears when I run the "summary" command.
>
> The data file is large (585000 rows) and has no NA, 9999 or blank values.
>
> My script (in brief) is as follows, with results:
>
> ############
>> library(MASS)
>>
>> ## ADD DATA
>> Jdata< read.delim("/Analysis/20120709 JLittle data file.txt", header=T)
>>
>> attach(Jdata)
>> names(Jdata)
> [1] "POINTID" "Lat_Y_pos" "JVeg5" "Subregion" "Rock_U_Nam"
> "Rock_Name" "Elevation" "Slope" "Aspect" "Hillshade"
> "Stream_dist" "Coast_dist" "Coast_SE"
> [14] "Coast_E" "Wind_310" "TPI" "Landform"
>>
>> Global < polr(JVeg5 ~ Elevation + Lat_Y_pos + Coast_dist + Stream_dist,
>> data=Jdata)
>>
>> summary(Global)
> Error in svd(X) : infinite or missing values in 'x'
>>
> ##Try with omit NA command
>> Global < polr(JVeg5 ~ Elevation + Lat_Y_pos + Coast_dist + Stream_dist,
>> data=Jdata, na.action = na.omit, Hess = TRUE)
>>
>> summary(Global)
> Error in svd(X) : infinite or missing values in 'x'
> ############
>
> Does this imply an 'infinite value' and what would this mean?
>
> If anyone has any idea how to address this error, I would very much
> appreciate your response.
>
> Thank you in advance.
>
> Jeremy
>
> Date File Attachment (200 rows):
> http://r.789695.n4.nabble.com/file/n4635829/20120709_JLittle_data_file.txt> 20120709_JLittle_data_file.txt
>
>
> 
> View this message in context: http://r.789695.n4.nabble.com/PackageMASSpolrErrorinsvdXinfiniteormissingvaluesinxtp4635829.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Rune Haubo Bojesen Christensen
Ph.D. Student, M.Sc. Eng.
Phone: (+45) 45 25 33 63
Mobile: (+45) 30 26 45 54
DTU Informatics, Section for Statistics
Technical University of Denmark, Build. 305, Room 122,
DK2800 Kgs. Lyngby, Denmark
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Dear Jessica
thank you for the scale solution to my problem.
I tried to manually scale my data (scaling up and removing decimals), however, this resulted in the same error message.
It remains vague to me what the precise meaning of...
"the model matrix should be sensibly scaled with all columns having range the order of one"
â€¦actually means.
Regardless, the script you have supplied works for my data and looks (at this point) like a suitable solution.
Many many thanks for your time in resolving this issue.
############
Dear Rune,
thank you for your valuable input.
I used the package 'ordinal' originally for this ordinal logistic regression and this package was straight forward and worked fine without any errors.
However, I switched to using 'MASS', as I need to run these models through an additional package 'AICcmodavg', which requires the 'MASS' inputs. Hence, I have needed the models to work in 'MASS'.
The str(Jdata) call gives the following:
> str(Jdata)
'data.frame': 552270 obs. of 17 variables:
$ POINTID : int 582358 582360 582361 582359 35289 582357 582362 411225 584336 584493 ...
$ Lat_Y_pos : num 19.7 19.7 19.7 19.7 16 ...
$ JVeg5 : Factor w/ 5 levels "1RF","2WSFEG",..: 5 5 5 5 1 5 5 5 5 5 ...
$ Subregion : Factor w/ 47 levels "AUBF","AUBK",..: 16 16 46 16 8 16 46 28 46 46 ...
$ Rock_U_Nam : Factor w/ 607 levels "Adler Hill Basalt",..: 33 33 33 33 173 33 33 585 112 112 ...
$ Rock_Name : Factor w/ 32 levels "ALLUVIUM","ARENITE",..: 13 13 13 13 25 13 13 10 23 23 ...
$ Elevation : num 317 230 180 317 107 ...
$ Slope : num 35.5 44.7 39.1 43.5 23 ...
$ Aspect : num 25.3 3.68 30.83 4.02 254.66 ...
$ Hillshade : int 182 211 167 212 200 218 216 245 214 27 ...
$ Stream_dist: num 4241 4288 4330 4252 2160 ...
$ Coast_dist : num 39497 39128 38883 39312 5601 ...
$ Coast_SE : num 404751 404821 404468 404680 28426 ...
$ Coast_E : int 78000 77500 77250 77750 15550 78250 77000 55650 77000 76800 ...
$ Wind_310 : int 10 10 10 10 10 10 10 10 2 10 ...
$ TPI : num 122.6 109 95.9 94.7 76.6 ...
$ Landform : int 1 1 1 1 1 1 1 1 1 1 ...
Does this provide any insight?
Thank you
##########
Thank you both for your generous time and support, it is greatly appreciated.
kind regards
Jeremy


I'm not sure either, the wordings starnge and english isn't my primary language
I would GUESS, that it means that all columns values should be between 0,1 or 1,2 or 0.5,0.5 or something like that. The scale function scales the columns to be comparable between each other, by dividing each column by its deviation, so thats not exactly it, though it seems to fix the problem.
Something like this:
normalize<function(dataVec,n=1){
max<max(dataVec,na.rm=T);
min<min(dataVec,na.rm=T);
newVec<(dataVecmin)^n/(maxmin)^n;
return(newVec);
}
should scale a column to 0,1  But as i said, not sure if it is whats wanted here ;)
Now that i'm at my normalize function, another idea would be to have the column vector have a length of 1
(in the sqrt(sum(vec^2)) sense)
 In the end i don't know whats exactly meant either 
On 11.07.2012, at 05:11, Jeremy Little wrote:
>
> Dear Jessica
>
> thank you for the scale solution to my problem.
>
> I tried to manually scale my data (scaling up and removing decimals),
> however, this resulted in the same error message.
>
> It remains vague to me what the precise meaning of...
> "the model matrix should be sensibly scaled with all columns having range
> the order of one"
> Â…actually means.
>
> Regardless, the script you have supplied works for my data and looks (at
> this point) like a suitable solution.
>
> Many many thanks for your time in resolving this issue.
>
>
> ############
>
> Dear Rune,
>
> thank you for your valuable input.
>
> I used the package 'ordinal' originally for this ordinal logistic regression
> and this package was straight forward and worked fine without any errors.
>
> However, I switched to using 'MASS', as I need to run these models through
> an additional package 'AICcmodavg', which requires the 'MASS' inputs. Hence,
> I have needed the models to work in 'MASS'.
>
> The str(Jdata) call gives the following:
>
>> str(Jdata)
> 'data.frame': 552270 obs. of 17 variables:
> $ POINTID : int 582358 582360 582361 582359 35289 582357 582362 411225
> 584336 584493 ...
> $ Lat_Y_pos : num 19.7 19.7 19.7 19.7 16 ...
> $ JVeg5 : Factor w/ 5 levels "1RF","2WSFEG",..: 5 5 5 5 1 5 5 5 5 5
> ...
> $ Subregion : Factor w/ 47 levels "AUBF","AUBK",..: 16 16 46 16 8 16 46
> 28 46 46 ...
> $ Rock_U_Nam : Factor w/ 607 levels "Adler Hill Basalt",..: 33 33 33 33 173
> 33 33 585 112 112 ...
> $ Rock_Name : Factor w/ 32 levels "ALLUVIUM","ARENITE",..: 13 13 13 13 25
> 13 13 10 23 23 ...
> $ Elevation : num 317 230 180 317 107 ...
> $ Slope : num 35.5 44.7 39.1 43.5 23 ...
> $ Aspect : num 25.3 3.68 30.83 4.02 254.66 ...
> $ Hillshade : int 182 211 167 212 200 218 216 245 214 27 ...
> $ Stream_dist: num 4241 4288 4330 4252 2160 ...
> $ Coast_dist : num 39497 39128 38883 39312 5601 ...
> $ Coast_SE : num 404751 404821 404468 404680 28426 ...
> $ Coast_E : int 78000 77500 77250 77750 15550 78250 77000 55650 77000
> 76800 ...
> $ Wind_310 : int 10 10 10 10 10 10 10 10 2 10 ...
> $ TPI : num 122.6 109 95.9 94.7 76.6 ...
> $ Landform : int 1 1 1 1 1 1 1 1 1 1 ...
>
> Does this provide any insight?
>
> Thank you
> ##########
>
> Thank you both for your generous time and support, it is greatly
> appreciated.
>
> kind regards
>
> Jeremy
>
> 
> View this message in context: http://r.789695.n4.nabble.com/PackageMASSpolrErrorinsvdXinfiniteormissingvaluesinxtp4635829p4636091.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks Jessi,
your insights are extremely helpful.
If you would indulge me one more quick question on your script.
You have written...
newData<data.frame(JVeg5=factor(Jdata[,"JVeg5"]),scale(Jdata[,c("Elevation","Lat_Y_pos","Coast_dist","Stream_dist")]))
I wish to expand this analysis for all other variables in my data matrix, of which one is a factor (and therefore cannot be 'scaled').
Adding these variables to your script...
newData<data.frame(JVeg5=factor(Jdata[,"JVeg5"]),scale(Jdata[,c("Elevation", "Slope", "Aspect", "Hillshade", "Lat_Y_pos", "Coast_dist", "Coast_SE", "Coast_E", "Wind_310", "Stream_dist", "TPI", "Landform", "Rock_Name")]))
...returns the error:
"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"
because "Rock_Name" must be numeric to be scaled.
I've tried a couple of options for incorporating this factor (Rock_Name) into the script without success.
For example:
"newData<data.frame(JVeg5=factor(Jdata[,"JVeg5"], Rock_Name=factor(Jdata[,"Rock_Name"]), scale(Jdata[,c("Elevation", "Slope", "Aspect", "Hillshade", "Lat_Y_pos", "Coast_dist", "Coast_SE", "Coast_E", "Wind_310", "Stream_dist", "TPI", "Landform")]))"
Do you have a suggestion which might work for this analysis?
Thank you for your support with this, I really appreciate it.
kind regards


You could probably make them numeric, like
> v<c("a","a","b","c")
> f<factor(v)
> as.numeric(f)
[1] 1 1 2 3
to get a numeric "rock_id", but i wouldn't per se recommend it.
You should ask someone who knows more about the scientific side of this method to tell you how factorial data is properly treated.
On 12.07.2012, at 03:35, Jeremy Little wrote:
> Thanks Jessi,
>
> your insights are extremely helpful.
>
> If you would indulge me one more quick question on your script.
> You have written...
> newData<data.frame(JVeg5=factor(Jdata[,"JVeg5"]),scale(Jdata[,c("Elevation","Lat_Y_pos","Coast_dist","Stream_dist")]))
>
> I wish to expand this analysis for all other variables in my data matrix, of
> which one is a factor (and therefore cannot be 'scaled').
>
> Adding these variables to your script...
> newData<data.frame(JVeg5=factor(Jdata[,"JVeg5"]),scale(Jdata[,c("Elevation",
> "Slope", "Aspect", "Hillshade", "Lat_Y_pos", "Coast_dist", "Coast_SE",
> "Coast_E", "Wind_310", "Stream_dist", "TPI", "Landform", "Rock_Name")]))
>
> ...returns the error:
> "Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"
>
> because "Rock_Name" must be numeric to be scaled.
>
> I've tried a couple of options for incorporating this factor (Rock_Name)
> into the script without success.
> For example:
> "newData<data.frame(JVeg5=factor(Jdata[,"JVeg5"],
> Rock_Name=factor(Jdata[,"Rock_Name"]), scale(Jdata[,c("Elevation", "Slope",
> "Aspect", "Hillshade", "Lat_Y_pos", "Coast_dist", "Coast_SE", "Coast_E",
> "Wind_310", "Stream_dist", "TPI", "Landform")]))"
>
> Do you have a suggestion which might work for this analysis?
>
> Thank you for your support with this, I really appreciate it.
>
> kind regards
>
>
>
>
>
> 
> View this message in context: http://r.789695.n4.nabble.com/PackageMASSpolrErrorinsvdXinfiniteormissingvaluesinxtp4635829p4636244.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

