Quantcast

phantom NA/NaN/Inf in foreign function call (or something altogether different?)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

phantom NA/NaN/Inf in foreign function call (or something altogether different?)

Cecile De Cat
Dear experts,

Please forgive the puzzled title and the length of this message - I
thought it would be best to be as complete as possible and to show the
avenues I have explored.

I'm trying to fit a linear model to data with a binary dependent
variable (i.e. Target.ACC: accuracy of response) using lrm, and
thought I would start from the most complex model (of which
"sample1.lrm1" is a trimmed version).   I got the error shown below.
(sample1 is available at http://tinyurl.com/bwqq7ya)

For info:

> str(sample1)
'data.frame':   14022 obs. of  5 variables:
 $ Target.ACC : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ Word.Order : Factor w/ 2 levels "HeadMod*","ModHead": 1 1 1 1 1 1 1 1 1 1 ...
 $ Target.RESP: Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ...
 $ L1         : Factor w/ 3 levels "English","German",..: 2 2 2 2 2 2
2 2 2 2 ...
 $ Relation   : Factor w/ 4 levels "For","From","MadeOf",..: 1 1 1 1 1
1 1 1 1 1 ...

Commands and error message:

> sample1.dd = datadist(sample1)
> options(datadist="sample1.dd")
> sample1.lrm = lrm(Target.ACC ~ (L1 + Relation + Target.RESP + Word.Order)^2, sample1, x=T, y=T)
Error in lrm(Target.ACC ~ (L1 + Relation + Target.RESP + Word.Order)^2,  :
  Unable to fit model using “lrm.fit”

So I tried to narrow down the error by looking at all the combinations
manually, and the problem appears to be specifically with the
interaction between Word.Order and Target.RESP.  Models including
interaction of these variables with other variables (e.g. L1,
Relation) can be fitted without problem.

> sample1.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample1, x=T, y=T)
Error in lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, dat, x = T, y = T) :
  Unable to fit model using “lrm.fit”

unproblematic:
> sample1.lrm1 = lrm(Target.ACC ~ (L1 + Relation + Target.RESP)^2, sample1, x=T, y=T)
> sample1.lrm2 = lrm(Target.ACC ~ (L1 + Relation + Word.Order)^2, sample1, x=T, y=T)

When running the problematic analysis on a smaller sample of the same
data, I get a different (more precise?) error message:

> sample2 <- sample1[1:500,]
> sample2.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample2, x=T, y=T)
Error in fitter(X, Y, penalty.matrix = penalty.matrix, tol = tol,
weights = weights,  :
  NA/NaN/Inf in foreign function call (arg 1)

But I cannot find any NA in the data:
> table(complete.cases(sample2))
TRUE
 500

Some portions of the data don't appear to contain any of the offending "bit":
> sample3 <- sample1[12500:13000,]
> sample3.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample3, x=T, y=T)


Could one of your shine your light on this puzzle, please? If that
includes pointing me towards some background reading, that would be
great too.

Many thanks in advance.

Cecile De Cat
Linguistics - University of Leeds

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: phantom NA/NaN/Inf in foreign function call (or something altogether different?)

Michael Weylandt
On Tue, Jul 31, 2012 at 5:26 AM, Cecile De Cat <[hidden email]> wrote:

> Dear experts,
>
> Please forgive the puzzled title and the length of this message - I
> thought it would be best to be as complete as possible and to show the
> avenues I have explored.
>
> I'm trying to fit a linear model to data with a binary dependent
> variable (i.e. Target.ACC: accuracy of response) using lrm, and
> thought I would start from the most complex model (of which
> "sample1.lrm1" is a trimmed version).   I got the error shown below.
> (sample1 is available at http://tinyurl.com/bwqq7ya)
>
> For info:
>
>> str(sample1)
> 'data.frame':   14022 obs. of  5 variables:
>  $ Target.ACC : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
>  $ Word.Order : Factor w/ 2 levels "HeadMod*","ModHead": 1 1 1 1 1 1 1 1 1 1 ...
>  $ Target.RESP: Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ...
>  $ L1         : Factor w/ 3 levels "English","German",..: 2 2 2 2 2 2
> 2 2 2 2 ...
>  $ Relation   : Factor w/ 4 levels "For","From","MadeOf",..: 1 1 1 1 1
> 1 1 1 1 1 ...
>
> Commands and error message:
>
>> sample1.dd = datadist(sample1)
>> options(datadist="sample1.dd")
>> sample1.lrm = lrm(Target.ACC ~ (L1 + Relation + Target.RESP + Word.Order)^2, sample1, x=T, y=T)
> Error in lrm(Target.ACC ~ (L1 + Relation + Target.RESP + Word.Order)^2,  :
>   Unable to fit model using “lrm.fit”
>
> So I tried to narrow down the error by looking at all the combinations
> manually, and the problem appears to be specifically with the
> interaction between Word.Order and Target.RESP.  Models including
> interaction of these variables with other variables (e.g. L1,
> Relation) can be fitted without problem.
>
>> sample1.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample1, x=T, y=T)
> Error in lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, dat, x = T, y = T) :
>   Unable to fit model using “lrm.fit”
>
> unproblematic:
>> sample1.lrm1 = lrm(Target.ACC ~ (L1 + Relation + Target.RESP)^2, sample1, x=T, y=T)
>> sample1.lrm2 = lrm(Target.ACC ~ (L1 + Relation + Word.Order)^2, sample1, x=T, y=T)
>
> When running the problematic analysis on a smaller sample of the same
> data, I get a different (more precise?) error message:
>
>> sample2 <- sample1[1:500,]
>> sample2.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample2, x=T, y=T)
> Error in fitter(X, Y, penalty.matrix = penalty.matrix, tol = tol,
> weights = weights,  :
>   NA/NaN/Inf in foreign function call (arg 1)
>
> But I cannot find any NA in the data:
>> table(complete.cases(sample2))
> TRUE
>  500

Not a complete answer, but complete.cases() won't pick up +/- Inf.

x <- data.frame(1:5, letters[1:5], c(NA, NaN, Inf, -Inf, 0))

x[complete.cases(x),]

You could perhaps use something like

sapply(x, is.finite)

with any()/all() to hunt them down (is.finite requires "real" numbers:
it gives false for NA, NaN, Inf, and -Inf).

Best,
Michael

>
> Some portions of the data don't appear to contain any of the offending "bit":
>> sample3 <- sample1[12500:13000,]
>> sample3.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample3, x=T, y=T)
>
>
> Could one of your shine your light on this puzzle, please? If that
> includes pointing me towards some background reading, that would be
> great too.
>
> Many thanks in advance.
>
> Cecile De Cat
> Linguistics - University of Leeds
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: phantom NA/NaN/Inf in foreign function call (or something altogether different?)

Cecile De Cat
In reply to this post by Cecile De Cat
Sorry.  I've used:
> library(rms)

I realise I still have a lot to learn to ask questions well - it took me a
long time to compile this one, but I've obviously missed important things.
 Please see below for the session info.

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C

[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
base

other attached packages:
[1] rms_3.5-0        Hmisc_3.9-3      survival_2.36-12

loaded via a namespace (and not attached):
[1] cluster_1.14.2 grid_2.15.0    lattice_0.20-6 tools_2.15.0


Many thanks for your help.

Cecile


On 2 August 2012 01:00, R. Michael Weylandt <[hidden email]>wrote:

> What package(s) are the functions in question from?
>
> This might also help:
>
>
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>
> Michael
>
> On Wed, Aug 1, 2012 at 2:57 AM, Cecile De Cat <[hidden email]> wrote:
> > You're right, it's just the 2 columns that are characters that return
> > false.  But I don't use them in the analysis (it's the experiments'
> > names and the participants' names).
> >
> > So I guess I'm back to my original question (although I can discard
> > one possible cause thanks to you): there appears to be only "real
> > numbers" in the data used for the lrm analysis, and yet it falls over.
> >
> > Thanks a lot for your help.
> >
> > Cecile
> >
> >
> > On 31 July 2012 16:42, R. Michael Weylandt <[hidden email]>
> wrote:
> >> What classes are the columns of your data frame?
> >>
> >> Note that
> >>
> >> is.finite("a") # False
> >> is.finite(factor("a")) # True
> >>
> >> M
> >>
> >> On Tue, Jul 31, 2012 at 10:34 AM, Cecile De Cat <[hidden email]>
> wrote:
> >>> Thank you.  This is very useful.  I do indeed get the following:
> >>>> table(sapply(dat, is.finite))
> >>>  FALSE   TRUE
> >>>  28164 253476
> >>>
> >>> But the number of observations returned baffles me, as there should
> >>> only be 14082 in the data.  And when I look at each variable
> >>> individually, none appear to violate "is.finite": e.g.
> >>>
> >>>> table(sapply(dat$Proficiency, is.finite))
> >>>  TRUE
> >>> 14082
> >>>
> >>> Sorry if this is a dumb question, but can you help me understand
> >>> what's going on?
> >>>
> >>> Many thanks.
> >>>
> >>> Cecile
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...