unlist errors on a nested list of empty lists

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

unlist errors on a nested list of empty lists

Steven Nydick
Reproducible example:

x <- list(list(list(), list()))
unlist(x)

*> Error in as.character.factor(x) : malformed factor*

What should happen:

unlist(x)
> NULL

R.version
platform       x86_64-apple-darwin15.6.0
arch           x86_64
os             darwin15.6.0
system         x86_64, darwin15.6.0
status
major          3
minor          5.0
year           2018
month          04
day            23
svn rev        74626
language       R
version.string R version 3.5.0 (2018-04-23)
nickname       Joy in Playing
--
Steven Nydick
PhD, Quantitative Psychology
M.A., Psychology
M.S., Statistics
--
"Beware of the man who works hard to learn something, learns it, and finds
himself no wiser than before, Bokonon tells us. He is full of murderous
resentment of people who are ignorant without having come by their
ignorance the hard way."
-Kurt Vonnegut

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: unlist errors on a nested list of empty lists

Duncan Murdoch-2
On 08/05/2018 1:48 PM, Steven Nydick wrote:
> Reproducible example:
>
> x <- list(list(list(), list()))
> unlist(x)
>
> *> Error in as.character.factor(x) : malformed factor*

The error comes from the line

structure(res, levels = lv, names = nm, class = "factor")

which is called because unlist() thinks that some entry is a factor,
with NULL levels and NULL names.  It's not legal for a factor to have
NULL levels.  Probably it should never get here; the earlier test

if (.Internal(islistfactor(x, recursive))) {

should have been false, and then the result would have been

.Internal(unlist(x, recursive, use.names))

(with both recursive and use.names being TRUE), which returns NULL.

Duncan Murdoch


>
> What should happen:
>
> unlist(x)
>> NULL
>
> R.version
> platform       x86_64-apple-darwin15.6.0
> arch           x86_64
> os             darwin15.6.0
> system         x86_64, darwin15.6.0
> status
> major          3
> minor          5.0
> year           2018
> month          04
> day            23
> svn rev        74626
> language       R
> version.string R version 3.5.0 (2018-04-23)
> nickname       Joy in Playing
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: unlist errors on a nested list of empty lists

Duncan Murdoch-2
On 08/05/2018 2:58 PM, Duncan Murdoch wrote:

> On 08/05/2018 1:48 PM, Steven Nydick wrote:
>> Reproducible example:
>>
>> x <- list(list(list(), list()))
>> unlist(x)
>>
>> *> Error in as.character.factor(x) : malformed factor*
>
> The error comes from the line
>
> structure(res, levels = lv, names = nm, class = "factor")
>
> which is called because unlist() thinks that some entry is a factor,
> with NULL levels and NULL names.  It's not legal for a factor to have
> NULL levels.  Probably it should never get here; the earlier test
>
> if (.Internal(islistfactor(x, recursive))) {
>
> should have been false, and then the result would have been
>
> .Internal(unlist(x, recursive, use.names))
>
> (with both recursive and use.names being TRUE), which returns NULL.

And the problem is in the islistfactor function in src/main/apply.c,
which looks like this:

static Rboolean islistfactor(SEXP X)
{
     int i, n = length(X);

     switch(TYPEOF(X)) {
     case VECSXP:
     case EXPRSXP:
         if(n == 0) return NA_LOGICAL;
         for(i = 0; i < LENGTH(X); i++)
             if(!islistfactor(VECTOR_ELT(X, i))) return FALSE;
         return TRUE;
         break;
     }
     return isFactor(X);
}

One of those deeply nested lists is length 0, so at the lowest level it
returns NA_LOGICAL.  But then it does C-style logical testing on the
results.  I think to C NA_LOGICAL counts as true, so at the next level
up we get the wrong answer.

A fix would be to rewrite it like this:

static Rboolean islistfactor(SEXP X)
{
     int i, n = length(X);
     Rboolean result = NA_LOGICAL, childresult;
     switch(TYPEOF(X)) {
     case VECSXP:
     case EXPRSXP:
         for(i = 0; i < LENGTH(X); i++) {
             childresult = islistfactor(VECTOR_ELT(X, i));
             if(childresult == FALSE) return FALSE;
             else if(childresult == TRUE) result = TRUE;
         }
         return result;
         break;
     }
     return isFactor(X);
}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: unlist errors on a nested list of empty lists

Steven Nydick
It also does the same thing if the factor is not on the first level of the
list, which seems to be due to the fact that the islistfactor is recursive,
but if a list is a list-factor, the first level lists are coerced into
character strings.

> x <- list(list(factor(LETTERS[1])))
> unlist(x)
Error in as.character.factor(x) : malformed factor

However, if one of the factors is at the top level, and one is nested, then
the result is:

> x <- list(list(factor(LETTERS[1])), factor(LETTERS[2]))
> unlist(x)

[1] <NA> B
Levels: B

... which does not seem to me to be desired behavior.


On Tue, May 8, 2018 at 2:22 PM Duncan Murdoch <[hidden email]>
wrote:

> On 08/05/2018 2:58 PM, Duncan Murdoch wrote:
> > On 08/05/2018 1:48 PM, Steven Nydick wrote:
> >> Reproducible example:
> >>
> >> x <- list(list(list(), list()))
> >> unlist(x)
> >>
> >> *> Error in as.character.factor(x) : malformed factor*
> >
> > The error comes from the line
> >
> > structure(res, levels = lv, names = nm, class = "factor")
> >
> > which is called because unlist() thinks that some entry is a factor,
> > with NULL levels and NULL names.  It's not legal for a factor to have
> > NULL levels.  Probably it should never get here; the earlier test
> >
> > if (.Internal(islistfactor(x, recursive))) {
> >
> > should have been false, and then the result would have been
> >
> > .Internal(unlist(x, recursive, use.names))
> >
> > (with both recursive and use.names being TRUE), which returns NULL.
>
> And the problem is in the islistfactor function in src/main/apply.c,
> which looks like this:
>
> static Rboolean islistfactor(SEXP X)
> {
>      int i, n = length(X);
>
>      switch(TYPEOF(X)) {
>      case VECSXP:
>      case EXPRSXP:
>          if(n == 0) return NA_LOGICAL;
>          for(i = 0; i < LENGTH(X); i++)
>              if(!islistfactor(VECTOR_ELT(X, i))) return FALSE;
>          return TRUE;
>          break;
>      }
>      return isFactor(X);
> }
>
> One of those deeply nested lists is length 0, so at the lowest level it
> returns NA_LOGICAL.  But then it does C-style logical testing on the
> results.  I think to C NA_LOGICAL counts as true, so at the next level
> up we get the wrong answer.
>
> A fix would be to rewrite it like this:
>
> static Rboolean islistfactor(SEXP X)
> {
>      int i, n = length(X);
>      Rboolean result = NA_LOGICAL, childresult;
>      switch(TYPEOF(X)) {
>      case VECSXP:
>      case EXPRSXP:
>          for(i = 0; i < LENGTH(X); i++) {
>              childresult = islistfactor(VECTOR_ELT(X, i));
>              if(childresult == FALSE) return FALSE;
>              else if(childresult == TRUE) result = TRUE;
>          }
>          return result;
>          break;
>      }
>      return isFactor(X);
> }
>


--
Steven Nydick
PhD, Quantitative Psychology
M.A., Psychology
M.S., Statistics
--
"Beware of the man who works hard to learn something, learns it, and finds
himself no wiser than before, Bokonon tells us. He is full of murderous
resentment of people who are ignorant without having come by their
ignorance the hard way."
-Kurt Vonnegut

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: unlist errors on a nested list of empty lists

Duncan Murdoch-2
On 08/05/2018 4:50 PM, Steven Nydick wrote:

> It also does the same thing if the factor is not on the first level of
> the list, which seems to be due to the fact that the islistfactor is
> recursive, but if a list is a list-factor, the first level lists are
> coerced into character strings.
>
>  > x <- list(list(factor(LETTERS[1])))
>  > unlist(x)
> Error in as.character.factor(x) : malformed factor
>
> However, if one of the factors is at the top level, and one is nested,
> then the result is:
>
>  > x <- list(list(factor(LETTERS[1])), factor(LETTERS[2]))
>  > unlist(x)
>
> [1] <NA> B
> Levels: B
>
> ... which does not seem to me to be desired behavior.

The patch I suggested doesn't help with either of these.  I'd suggest
collecting examples, and posting a bug report to bugs.r-project.org.

Duncan Murdoch


>
>
> On Tue, May 8, 2018 at 2:22 PM Duncan Murdoch <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 08/05/2018 2:58 PM, Duncan Murdoch wrote:
>      > On 08/05/2018 1:48 PM, Steven Nydick wrote:
>      >> Reproducible example:
>      >>
>      >> x <- list(list(list(), list()))
>      >> unlist(x)
>      >>
>      >> *> Error in as.character.factor(x) : malformed factor*
>      >
>      > The error comes from the line
>      >
>      > structure(res, levels = lv, names = nm, class = "factor")
>      >
>      > which is called because unlist() thinks that some entry is a factor,
>      > with NULL levels and NULL names.  It's not legal for a factor to have
>      > NULL levels.  Probably it should never get here; the earlier test
>      >
>      > if (.Internal(islistfactor(x, recursive))) {
>      >
>      > should have been false, and then the result would have been
>      >
>      > .Internal(unlist(x, recursive, use.names))
>      >
>      > (with both recursive and use.names being TRUE), which returns NULL.
>
>     And the problem is in the islistfactor function in src/main/apply.c,
>     which looks like this:
>
>     static Rboolean islistfactor(SEXP X)
>     {
>           int i, n = length(X);
>
>           switch(TYPEOF(X)) {
>           case VECSXP:
>           case EXPRSXP:
>               if(n == 0) return NA_LOGICAL;
>               for(i = 0; i < LENGTH(X); i++)
>                   if(!islistfactor(VECTOR_ELT(X, i))) return FALSE;
>               return TRUE;
>               break;
>           }
>           return isFactor(X);
>     }
>
>     One of those deeply nested lists is length 0, so at the lowest level it
>     returns NA_LOGICAL.  But then it does C-style logical testing on the
>     results.  I think to C NA_LOGICAL counts as true, so at the next level
>     up we get the wrong answer.
>
>     A fix would be to rewrite it like this:
>
>     static Rboolean islistfactor(SEXP X)
>     {
>           int i, n = length(X);
>           Rboolean result = NA_LOGICAL, childresult;
>           switch(TYPEOF(X)) {
>           case VECSXP:
>           case EXPRSXP:
>               for(i = 0; i < LENGTH(X); i++) {
>                   childresult = islistfactor(VECTOR_ELT(X, i));
>                   if(childresult == FALSE) return FALSE;
>                   else if(childresult == TRUE) result = TRUE;
>               }
>               return result;
>               break;
>           }
>           return isFactor(X);
>     }
>
>
>
> --
> Steven Nydick
> PhD, Quantitative Psychology
> M.A., Psychology
> M.S., Statistics
> --
> "Beware of the man who works hard to learn something, learns it, and
> finds himself no wiser than before, Bokonon tells us. He is full of
> murderous resentment of people who are ignorant without having come by
> their ignorance the hard way."
> -Kurt Vonnegut

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: unlist errors on a nested list of empty lists

Steven Nydick
I do not have access to the bug reporting system. If somebody can get me
access, I can create a formal bug report.

The latter issues seem like duplicates of:
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=12572 (with slightly
different output), but as that bug was reported nearly 10 years ago, it
might be worth creating an update under R version 3. I could not find the
first issue when searching the bug reports (which I ran into when trying to
parse JSON files), which is why I posted on r-devel.

On Tue, May 8, 2018 at 7:51 PM Duncan Murdoch <[hidden email]>
wrote:

> On 08/05/2018 4:50 PM, Steven Nydick wrote:
> > It also does the same thing if the factor is not on the first level of
> > the list, which seems to be due to the fact that the islistfactor is
> > recursive, but if a list is a list-factor, the first level lists are
> > coerced into character strings.
> >
> >  > x <- list(list(factor(LETTERS[1])))
> >  > unlist(x)
> > Error in as.character.factor(x) : malformed factor
> >
> > However, if one of the factors is at the top level, and one is nested,
> > then the result is:
> >
> >  > x <- list(list(factor(LETTERS[1])), factor(LETTERS[2]))
> >  > unlist(x)
> >
> > [1] <NA> B
> > Levels: B
> >
> > ... which does not seem to me to be desired behavior.
>
> The patch I suggested doesn't help with either of these.  I'd suggest
> collecting examples, and posting a bug report to bugs.r-project.org.
>
> Duncan Murdoch
>
>
> >
> >
> > On Tue, May 8, 2018 at 2:22 PM Duncan Murdoch <[hidden email]
> > <mailto:[hidden email]>> wrote:
> >
> >     On 08/05/2018 2:58 PM, Duncan Murdoch wrote:
> >      > On 08/05/2018 1:48 PM, Steven Nydick wrote:
> >      >> Reproducible example:
> >      >>
> >      >> x <- list(list(list(), list()))
> >      >> unlist(x)
> >      >>
> >      >> *> Error in as.character.factor(x) : malformed factor*
> >      >
> >      > The error comes from the line
> >      >
> >      > structure(res, levels = lv, names = nm, class = "factor")
> >      >
> >      > which is called because unlist() thinks that some entry is a
> factor,
> >      > with NULL levels and NULL names.  It's not legal for a factor to
> have
> >      > NULL levels.  Probably it should never get here; the earlier test
> >      >
> >      > if (.Internal(islistfactor(x, recursive))) {
> >      >
> >      > should have been false, and then the result would have been
> >      >
> >      > .Internal(unlist(x, recursive, use.names))
> >      >
> >      > (with both recursive and use.names being TRUE), which returns
> NULL.
> >
> >     And the problem is in the islistfactor function in src/main/apply.c,
> >     which looks like this:
> >
> >     static Rboolean islistfactor(SEXP X)
> >     {
> >           int i, n = length(X);
> >
> >           switch(TYPEOF(X)) {
> >           case VECSXP:
> >           case EXPRSXP:
> >               if(n == 0) return NA_LOGICAL;
> >               for(i = 0; i < LENGTH(X); i++)
> >                   if(!islistfactor(VECTOR_ELT(X, i))) return FALSE;
> >               return TRUE;
> >               break;
> >           }
> >           return isFactor(X);
> >     }
> >
> >     One of those deeply nested lists is length 0, so at the lowest level
> it
> >     returns NA_LOGICAL.  But then it does C-style logical testing on the
> >     results.  I think to C NA_LOGICAL counts as true, so at the next
> level
> >     up we get the wrong answer.
> >
> >     A fix would be to rewrite it like this:
> >
> >     static Rboolean islistfactor(SEXP X)
> >     {
> >           int i, n = length(X);
> >           Rboolean result = NA_LOGICAL, childresult;
> >           switch(TYPEOF(X)) {
> >           case VECSXP:
> >           case EXPRSXP:
> >               for(i = 0; i < LENGTH(X); i++) {
> >                   childresult = islistfactor(VECTOR_ELT(X, i));
> >                   if(childresult == FALSE) return FALSE;
> >                   else if(childresult == TRUE) result = TRUE;
> >               }
> >               return result;
> >               break;
> >           }
> >           return isFactor(X);
> >     }
> >
> >
> >
> > --
> > Steven Nydick
> > PhD, Quantitative Psychology
> > M.A., Psychology
> > M.S., Statistics
> > --
> > "Beware of the man who works hard to learn something, learns it, and
> > finds himself no wiser than before, Bokonon tells us. He is full of
> > murderous resentment of people who are ignorant without having come by
> > their ignorance the hard way."
> > -Kurt Vonnegut
>
>

--
Steven Nydick
PhD, Quantitative Psychology
M.A., Psychology
M.S., Statistics
--
"Beware of the man who works hard to learn something, learns it, and finds
himself no wiser than before, Bokonon tells us. He is full of murderous
resentment of people who are ignorant without having come by their
ignorance the hard way."
-Kurt Vonnegut

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: unlist errors on a nested list of empty lists

Martin Maechler
>>>>> Steven Nydick <[hidden email]>
>>>>>     on Wed, 9 May 2018 13:25:11 +0000 writes:

    > I do not have access to the bug reporting system. If somebody can get me
    > access, I can create a formal bug report.

    > The latter issues seem like duplicates of:
    > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=12572 (with slightly
    > different output), but as that bug was reported nearly 10 years ago, it
    > might be worth creating an update under R version 3. I could not find the
    > first issue when searching the bug reports (which I ran into when trying to
    > parse JSON files), which is why I posted on r-devel.

Indeed, thanks a lot Steven (and Duncan!),  I've found the
following:

1. The first issue is a new bug, in R "only" since R version
  3.4.0, i.e. working upto R 3.3.3.
  Duncan's patch basically fixes.
  I've found that the C code there can be simplified and
  deconvoluted, and after that, I will commit basically the bug
  fix of Duncan Murdoch.    

2. The second issues indeed are an entirely different bug, and I
   would say actually point to a "design problem" of the whole    thing.
   The C code in islistfactor() talks about arbitrary trees with
   all leaves factors,  whereas the R code -- in the
   islistfactor() is TRUE -- actually only correctly deals with
   simple trees, namely of depth exactly 1. That are those you typically
   get from e.g., lapply(), and so this old design-bug triggers
   relatively rarely.

Last but not least: I have created an account for you, Steven,
on the bugzilla site.

Given we have holidays till the weekend and private duties of
mine, I won't get to more for now.

Best
Martin Maechler

   > On Tue, May 8, 2018 at 7:51 PM Duncan Murdoch <[hidden email]>
    > wrote:

    >> On 08/05/2018 4:50 PM, Steven Nydick wrote:
    >> > It also does the same thing if the factor is not on the first level of
    >> > the list, which seems to be due to the fact that the islistfactor is
    >> > recursive, but if a list is a list-factor, the first level lists are
    >> > coerced into character strings.
    >> >
    >> >  > x <- list(list(factor(LETTERS[1])))
    >> >  > unlist(x)
    >> > Error in as.character.factor(x) : malformed factor
    >> >
    >> > However, if one of the factors is at the top level, and one is nested,
    >> > then the result is:
    >> >
    >> >  > x <- list(list(factor(LETTERS[1])), factor(LETTERS[2]))
    >> >  > unlist(x)
    >> >
    >> > [1] <NA> B
    >> > Levels: B
    >> >
    >> > ... which does not seem to me to be desired behavior.
    >>
    >> The patch I suggested doesn't help with either of these.  I'd suggest
    >> collecting examples, and posting a bug report to bugs.r-project.org.
    >>
    >> Duncan Murdoch
    >>
    >>
    >> >
    >> >
    >> > On Tue, May 8, 2018 at 2:22 PM Duncan Murdoch <[hidden email]
    >> > <mailto:[hidden email]>> wrote:
    >> >
    >> >     On 08/05/2018 2:58 PM, Duncan Murdoch wrote:
    >> >      > On 08/05/2018 1:48 PM, Steven Nydick wrote:
    >> >      >> Reproducible example:
    >> >      >>
    >> >      >> x <- list(list(list(), list()))
    >> >      >> unlist(x)
    >> >      >>
    >> >      >> *> Error in as.character.factor(x) : malformed factor*
    >> >      >
    >> >      > The error comes from the line
    >> >      >
    >> >      > structure(res, levels = lv, names = nm, class = "factor")
    >> >      >
    >> >      > which is called because unlist() thinks that some entry is a
    >> factor,
    >> >      > with NULL levels and NULL names.  It's not legal for a factor to
    >> have
    >> >      > NULL levels.  Probably it should never get here; the earlier test
    >> >      >
    >> >      > if (.Internal(islistfactor(x, recursive))) {
    >> >      >
    >> >      > should have been false, and then the result would have been
    >> >      >
    >> >      > .Internal(unlist(x, recursive, use.names))
    >> >      >
    >> >      > (with both recursive and use.names being TRUE), which returns
    >> NULL.
    >> >
    >> >     And the problem is in the islistfactor function in src/main/apply.c,
    >> >     which looks like this:
    >> >
    >> >     static Rboolean islistfactor(SEXP X)
    >> >     {
    >> >           int i, n = length(X);
    >> >
    >> >           switch(TYPEOF(X)) {
    >> >           case VECSXP:
    >> >           case EXPRSXP:
    >> >               if(n == 0) return NA_LOGICAL;
    >> >               for(i = 0; i < LENGTH(X); i++)
    >> >                   if(!islistfactor(VECTOR_ELT(X, i))) return FALSE;
    >> >               return TRUE;
    >> >               break;
    >> >           }
    >> >           return isFactor(X);
    >> >     }
    >> >
    >> >     One of those deeply nested lists is length 0, so at the lowest level
    >> it
    >> >     returns NA_LOGICAL.  But then it does C-style logical testing on the
    >> >     results.  I think to C NA_LOGICAL counts as true, so at the next
    >> level
    >> >     up we get the wrong answer.
    >> >
    >> >     A fix would be to rewrite it like this:
    >> >
    >> >     static Rboolean islistfactor(SEXP X)
    >> >     {
    >> >           int i, n = length(X);
    >> >           Rboolean result = NA_LOGICAL, childresult;
    >> >           switch(TYPEOF(X)) {
    >> >           case VECSXP:
    >> >           case EXPRSXP:
    >> >               for(i = 0; i < LENGTH(X); i++) {
    >> >                   childresult = islistfactor(VECTOR_ELT(X, i));
    >> >                   if(childresult == FALSE) return FALSE;
    >> >                   else if(childresult == TRUE) result = TRUE;
    >> >               }
    >> >               return result;
    >> >               break;
    >> >           }
    >> >           return isFactor(X);
    >> >     }
    >> >
    >> >
    >> >
    >> > --
    >> > Steven Nydick
    >> > PhD, Quantitative Psychology
    >> > M.A., Psychology
    >> > M.S., Statistics
    >> > --
    >> > "Beware of the man who works hard to learn something, learns it, and
    >> > finds himself no wiser than before, Bokonon tells us. He is full of
    >> > murderous resentment of people who are ignorant without having come by
    >> > their ignorance the hard way."
    >> > -Kurt Vonnegut
    >>
    >>

    > --
    > Steven Nydick
    > PhD, Quantitative Psychology
    > M.A., Psychology
    > M.S., Statistics
    > --
    > "Beware of the man who works hard to learn something, learns it, and finds
    > himself no wiser than before, Bokonon tells us. He is full of murderous
    > resentment of people who are ignorant without having come by their
    > ignorance the hard way."
    > -Kurt Vonnegut

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel