Inconsistent behavior for the C AP's R_ParseVector() ?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistent behavior for the C AP's R_ParseVector() ?

lgautier
Hi,

The behavior of
```
SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
```
defined in `src/include/R_ext/Parse.h` appears to be inconsistent depending
on the string to be parsed.

Trying to parse a string such as `"list(''=1+"` sets the
`ParseStatus` to incomplete parsing error but trying to parse
`"list(''=123"` will result in R sending a message to the console
(followed but a crash):

```
R[write to console]: Error: attempt to use zero-length variable
nameR[write to console]: Fatal error: unable to initialize the JIT***
stack smashing detected ***: <unknown> terminated
```

Is there a reason for the difference in behavior, and is there a workaround ?

Thanks,


Laurent

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

lgautier
I found the following code comment in `src/main/gram.c`:

```

/* Memory leak

yyparse(), as generated by bison, allocates extra space for the parser
stack using malloc(). Unfortunately this means that there is a memory
leak in case of an R error (long-jump). In principle, we could define
yyoverflow() to relocate the parser stacks for bison and allocate say on
the R heap, but yyoverflow() is undocumented and somewhat complicated
(we would have to replicate some macros from the generated parser here).
The same problem exists at least in the Rd and LaTeX parsers in tools.
*/

```

Could this be related to be issue ?

Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <[hidden email]> a écrit :

> Hi,
>
> The behavior of
> ```
> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
> ```
> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
> depending on the string to be parsed.
>
> Trying to parse a string such as `"list(''=1+"` sets the
> `ParseStatus` to incomplete parsing error but trying to parse
> `"list(''=123"` will result in R sending a message to the console (followed but a crash):
>
> ```
> R[write to console]: Error: attempt to use zero-length variable nameR[write to console]: Fatal error: unable to initialize the JIT*** stack smashing detected ***: <unknown> terminated
> ```
>
> Is there a reason for the difference in behavior, and is there a workaround ?
>
> Thanks,
>
>
> Laurent
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

lgautier
Hi again,

Beside R_ParseVector()'s possible inconsistent behavior, R's handling of
zero-length named elements does not seem consistent either:

```
> lst <- list()
> lst[[""]] <- 1
> names(lst)
[1] ""
> list("" = 1)
Error: attempt to use zero-length variable name
```

Should the parser be made to accept as valid what is otherwise possible
when using `[[<` ?


Best,

Laurent



Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <[hidden email]> a écrit :

> I found the following code comment in `src/main/gram.c`:
>
> ```
>
> /* Memory leak
>
> yyparse(), as generated by bison, allocates extra space for the parser
> stack using malloc(). Unfortunately this means that there is a memory
> leak in case of an R error (long-jump). In principle, we could define
> yyoverflow() to relocate the parser stacks for bison and allocate say on
> the R heap, but yyoverflow() is undocumented and somewhat complicated
> (we would have to replicate some macros from the generated parser here).
> The same problem exists at least in the Rd and LaTeX parsers in tools.
> */
>
> ```
>
> Could this be related to be issue ?
>
> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <[hidden email]> a
> écrit :
>
>> Hi,
>>
>> The behavior of
>> ```
>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>> ```
>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>> depending on the string to be parsed.
>>
>> Trying to parse a string such as `"list(''=1+"` sets the
>> `ParseStatus` to incomplete parsing error but trying to parse
>> `"list(''=123"` will result in R sending a message to the console (followed but a crash):
>>
>> ```
>> R[write to console]: Error: attempt to use zero-length variable nameR[write to console]: Fatal error: unable to initialize the JIT*** stack smashing detected ***: <unknown> terminated
>> ```
>>
>> Is there a reason for the difference in behavior, and is there a workaround ?
>>
>> Thanks,
>>
>>
>> Laurent
>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

Tomas Kalibera
Dear Laurent,

could you please provide a complete reproducible example where parsing
results in a crash of R? Calling parse(text="list(''=123") from R works
fine for me (gives Error: attempt to use zero-length variable name).

I don't think the problem you observed could be related to the memory
leak. The leak is on the heap, not stack.

Zero-length names of elements in a list are allowed. They are not the
same thing as zero-length variables in an environment. If you try to
convert "lst" from your example to an environment, you would get the
error (attempt to use zero-length variable name).

Best
Tomas


On 11/30/19 11:55 PM, Laurent Gautier wrote:

> Hi again,
>
> Beside R_ParseVector()'s possible inconsistent behavior, R's handling of
> zero-length named elements does not seem consistent either:
>
> ```
>> lst <- list()
>> lst[[""]] <- 1
>> names(lst)
> [1] ""
>> list("" = 1)
> Error: attempt to use zero-length variable name
> ```
>
> Should the parser be made to accept as valid what is otherwise possible
> when using `[[<` ?
>
>
> Best,
>
> Laurent
>
>
>
> Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <[hidden email]> a écrit :
>
>> I found the following code comment in `src/main/gram.c`:
>>
>> ```
>>
>> /* Memory leak
>>
>> yyparse(), as generated by bison, allocates extra space for the parser
>> stack using malloc(). Unfortunately this means that there is a memory
>> leak in case of an R error (long-jump). In principle, we could define
>> yyoverflow() to relocate the parser stacks for bison and allocate say on
>> the R heap, but yyoverflow() is undocumented and somewhat complicated
>> (we would have to replicate some macros from the generated parser here).
>> The same problem exists at least in the Rd and LaTeX parsers in tools.
>> */
>>
>> ```
>>
>> Could this be related to be issue ?
>>
>> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <[hidden email]> a
>> écrit :
>>
>>> Hi,
>>>
>>> The behavior of
>>> ```
>>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>>> ```
>>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>>> depending on the string to be parsed.
>>>
>>> Trying to parse a string such as `"list(''=1+"` sets the
>>> `ParseStatus` to incomplete parsing error but trying to parse
>>> `"list(''=123"` will result in R sending a message to the console (followed but a crash):
>>>
>>> ```
>>> R[write to console]: Error: attempt to use zero-length variable nameR[write to console]: Fatal error: unable to initialize the JIT*** stack smashing detected ***: <unknown> terminated
>>> ```
>>>
>>> Is there a reason for the difference in behavior, and is there a workaround ?
>>>
>>> Thanks,
>>>
>>>
>>> Laurent
>>>
>>>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

lgautier
Thanks for the quick response Tomas.

The same error is indeed happening when trying to have a zero-length
variable name in an environment. The surprising bit is then "why is this
happening during parsing" (that is why are variables assigned to an
environment) ?

We are otherwise aware that the error is not occurring in the R console,
but can be traced to a call to R_ParseVector() in R's C API:(
https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
).

Our specific setup is calling an embedded R from Python, using the cffi
library. An error on end was the first possibility considered, but the
puzzling specificity of the error (as shown below other parsing errors are
handled properly) and the difficulty tracing what is in happening in
R_ParseVector() made me ask whether someone on this list had a suggestion
about the possible issue"

```

>>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") ---------------------------------------------------------------------------RParsingError                             Traceback (most recent call last)
>>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use zero-length variable name
R[write to console]: Fatal error: unable to initialize the JIT

*** stack smashing detected ***: <unknown> terminated
```


Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <[hidden email]> a
écrit :

> Dear Laurent,
>
> could you please provide a complete reproducible example where parsing
> results in a crash of R? Calling parse(text="list(''=123") from R works
> fine for me (gives Error: attempt to use zero-length variable name).
>
> I don't think the problem you observed could be related to the memory
> leak. The leak is on the heap, not stack.
>
> Zero-length names of elements in a list are allowed. They are not the
> same thing as zero-length variables in an environment. If you try to
> convert "lst" from your example to an environment, you would get the
> error (attempt to use zero-length variable name).
>
> Best
> Tomas
>
>
> On 11/30/19 11:55 PM, Laurent Gautier wrote:
> > Hi again,
> >
> > Beside R_ParseVector()'s possible inconsistent behavior, R's handling of
> > zero-length named elements does not seem consistent either:
> >
> > ```
> >> lst <- list()
> >> lst[[""]] <- 1
> >> names(lst)
> > [1] ""
> >> list("" = 1)
> > Error: attempt to use zero-length variable name
> > ```
> >
> > Should the parser be made to accept as valid what is otherwise possible
> > when using `[[<` ?
> >
> >
> > Best,
> >
> > Laurent
> >
> >
> >
> > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <[hidden email]> a
> écrit :
> >
> >> I found the following code comment in `src/main/gram.c`:
> >>
> >> ```
> >>
> >> /* Memory leak
> >>
> >> yyparse(), as generated by bison, allocates extra space for the parser
> >> stack using malloc(). Unfortunately this means that there is a memory
> >> leak in case of an R error (long-jump). In principle, we could define
> >> yyoverflow() to relocate the parser stacks for bison and allocate say on
> >> the R heap, but yyoverflow() is undocumented and somewhat complicated
> >> (we would have to replicate some macros from the generated parser here).
> >> The same problem exists at least in the Rd and LaTeX parsers in tools.
> >> */
> >>
> >> ```
> >>
> >> Could this be related to be issue ?
> >>
> >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <[hidden email]> a
> >> écrit :
> >>
> >>> Hi,
> >>>
> >>> The behavior of
> >>> ```
> >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
> >>> ```
> >>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
> >>> depending on the string to be parsed.
> >>>
> >>> Trying to parse a string such as `"list(''=1+"` sets the
> >>> `ParseStatus` to incomplete parsing error but trying to parse
> >>> `"list(''=123"` will result in R sending a message to the console
> (followed but a crash):
> >>>
> >>> ```
> >>> R[write to console]: Error: attempt to use zero-length variable
> nameR[write to console]: Fatal error: unable to initialize the JIT*** stack
> smashing detected ***: <unknown> terminated
> >>> ```
> >>>
> >>> Is there a reason for the difference in behavior, and is there a
> workaround ?
> >>>
> >>> Thanks,
> >>>
> >>>
> >>> Laurent
> >>>
> >>>
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

Tomas Kalibera
On 12/7/19 10:32 PM, Laurent Gautier wrote:
> Thanks for the quick response Tomas.
>
> The same error is indeed happening when trying to have a zero-length
> variable name in an environment. The surprising bit is then "why is
> this happening during parsing" (that is why are variables assigned to
> an environment) ?

The emitted R error (in the R console) is not a parse (syntax) error,
but an error emitted during parsing when the parser tries to intern a
name - look it up in a symbol table. Empty string is not allowed as a
symbol name, and hence the error. In the call "list(''=1)" , the empty
name is what could eventually become a name of a local variable inside
list(), even though not yet during parsing.

There is probably some error in how the external code is handling R
errors  (Fatal error: unable to initialize the JIT, stack smashing, etc)
and possibly also how R is initialized before calling ParseVector.
Probably you would get the same problem when running say
"stop('myerror')". Please note R errors are implemented as long-jumps,
so care has to be taken when calling into R, Writing R Extensions has
more details (and section 8 specifically about embedding R). This is
unlike parse (syntax) errors signaled via return value to ParseVector()

Best,
Tomas

>
> We are otherwise aware that the error is not occurring in the R
> console, but can be traced to a call to R_ParseVector() in R's C
> API:(https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509).
>
> Our specific setup is calling an embedded R from Python, using the
> cffi library. An error on end was the first possibility considered,
> but the puzzling specificity of the error (as shown below other
> parsing errors are handled properly) and the difficulty tracing what
> is in happening in R_ParseVector() made me ask whether someone on this
> list had a suggestion about the possible issue"
>
> ```
> >>>  import  rpy2.rinterface  as  ri
> >>>  ri.initr()
> >>>  e  =  ri.parse("list(''=1+")  
> ---------------------------------------------------------------------------
> RParsingError                              Traceback  (most  recent  call  last)>>> e = ri.parse("list(''=123") R[write to console]: Error:
> attempt to use zero-length variable name R[write to console]: Fatal
> error: unable to initialize the JIT *** stack smashing detected ***:
> <unknown> terminated ```
>
> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <[hidden email]
> <mailto:[hidden email]>> a écrit :
>
>     Dear Laurent,
>
>     could you please provide a complete reproducible example where
>     parsing
>     results in a crash of R? Calling parse(text="list(''=123") from R
>     works
>     fine for me (gives Error: attempt to use zero-length variable name).
>
>     I don't think the problem you observed could be related to the memory
>     leak. The leak is on the heap, not stack.
>
>     Zero-length names of elements in a list are allowed. They are not the
>     same thing as zero-length variables in an environment. If you try to
>     convert "lst" from your example to an environment, you would get the
>     error (attempt to use zero-length variable name).
>
>     Best
>     Tomas
>
>
>     On 11/30/19 11:55 PM, Laurent Gautier wrote:
>     > Hi again,
>     >
>     > Beside R_ParseVector()'s possible inconsistent behavior, R's
>     handling of
>     > zero-length named elements does not seem consistent either:
>     >
>     > ```
>     >> lst <- list()
>     >> lst[[""]] <- 1
>     >> names(lst)
>     > [1] ""
>     >> list("" = 1)
>     > Error: attempt to use zero-length variable name
>     > ```
>     >
>     > Should the parser be made to accept as valid what is otherwise
>     possible
>     > when using `[[<` ?
>     >
>     >
>     > Best,
>     >
>     > Laurent
>     >
>     >
>     >
>     > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier
>     <[hidden email] <mailto:[hidden email]>> a écrit :
>     >
>     >> I found the following code comment in `src/main/gram.c`:
>     >>
>     >> ```
>     >>
>     >> /* Memory leak
>     >>
>     >> yyparse(), as generated by bison, allocates extra space for the
>     parser
>     >> stack using malloc(). Unfortunately this means that there is a
>     memory
>     >> leak in case of an R error (long-jump). In principle, we could
>     define
>     >> yyoverflow() to relocate the parser stacks for bison and
>     allocate say on
>     >> the R heap, but yyoverflow() is undocumented and somewhat
>     complicated
>     >> (we would have to replicate some macros from the generated
>     parser here).
>     >> The same problem exists at least in the Rd and LaTeX parsers in
>     tools.
>     >> */
>     >>
>     >> ```
>     >>
>     >> Could this be related to be issue ?
>     >>
>     >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier
>     <[hidden email] <mailto:[hidden email]>> a
>     >> écrit :
>     >>
>     >>> Hi,
>     >>>
>     >>> The behavior of
>     >>> ```
>     >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>     >>> ```
>     >>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>     >>> depending on the string to be parsed.
>     >>>
>     >>> Trying to parse a string such as `"list(''=1+"` sets the
>     >>> `ParseStatus` to incomplete parsing error but trying to parse
>     >>> `"list(''=123"` will result in R sending a message to the
>     console (followed but a crash):
>     >>>
>     >>> ```
>     >>> R[write to console]: Error: attempt to use zero-length
>     variable nameR[write to console]: Fatal error: unable to
>     initialize the JIT*** stack smashing detected ***: <unknown>
>     terminated
>     >>> ```
>     >>>
>     >>> Is there a reason for the difference in behavior, and is there
>     a workaround ?
>     >>>
>     >>> Thanks,
>     >>>
>     >>>
>     >>> Laurent
>     >>>
>     >>>
>     >       [[alternative HTML version deleted]]
>     >
>     > ______________________________________________
>     > [hidden email] <mailto:[hidden email]> mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

lgautier
Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <[hidden email]> a
écrit :

> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>
> Thanks for the quick response Tomas.
>
> The same error is indeed happening when trying to have a zero-length
> variable name in an environment. The surprising bit is then "why is this
> happening during parsing" (that is why are variables assigned to an
> environment) ?
>
> The emitted R error (in the R console) is not a parse (syntax) error, but
> an error emitted during parsing when the parser tries to intern a name -
> look it up in a symbol table. Empty string is not allowed as a symbol name,
> and hence the error. In the call "list(''=1)" , the empty name is what
> could eventually become a name of a local variable inside list(), even
> though not yet during parsing.
>

Thanks Tomas.

I guess this has do with R expressions being lazily evaluated, and names of
arguments in a call are also part of the expression. Now the puzzling part
is why is that at all part of the parsing: I would have expected
R_ParseVector() to be restricted to parsing... Now it feels like
R_ParseVector() is performing parsing, and a first level of evalution for
expressions that "should never work" (the empty name).

There is probably some error in how the external code is handling R errors
> (Fatal error: unable to initialize the JIT, stack smashing, etc) and
> possibly also how R is initialized before calling ParseVector. Probably you
> would get the same problem when running say "stop('myerror')". Please note
> R errors are implemented as long-jumps, so care has to be taken when
> calling into R, Writing R Extensions has more details (and section 8
> specifically about embedding R). This is unlike parse (syntax) errors
> signaled via return value to ParseVector()
>

The issue is that the segfault (because of stack smashing, therefore
because of what also suspected to be an incontrolled jump) is happening
within the execution of R_ParseVector(). I would think that an issue with
the initialization of R is less likely because the project is otherwise
used a fair bit and is well covered by automated continuous tests.

After looking more into R's gram.c I suspect that an execution context is
required for R_ParseVector() to know to properly work (know where to jump
in case of error) when the parsing code decides to fail outside what it
thinks is a syntax error. If the case, this would make R_ParseVector()
function well when called from say, a C-extension to an R package, but fail
the way I am seeing it fail when called from an embedded R.

Best,

Laurent

> Best,
> Tomas
>
>
> We are otherwise aware that the error is not occurring in the R console,
> but can be traced to a call to R_ParseVector() in R's C API:(
> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
> ).
>
> Our specific setup is calling an embedded R from Python, using the cffi
> library. An error on end was the first possibility considered, but the
> puzzling specificity of the error (as shown below other parsing errors are
> handled properly) and the difficulty tracing what is in happening in
> R_ParseVector() made me ask whether someone on this list had a suggestion
> about the possible issue"
>
> ```
>
> >>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") ---------------------------------------------------------------------------RParsingError                             Traceback (most recent call last)>>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use zero-length variable name
> R[write to console]: Fatal error: unable to initialize the JIT
>
> *** stack smashing detected ***: <unknown> terminated
> ```
>
>
> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <[hidden email]> a
> écrit :
>
>> Dear Laurent,
>>
>> could you please provide a complete reproducible example where parsing
>> results in a crash of R? Calling parse(text="list(''=123") from R works
>> fine for me (gives Error: attempt to use zero-length variable name).
>>
>> I don't think the problem you observed could be related to the memory
>> leak. The leak is on the heap, not stack.
>>
>> Zero-length names of elements in a list are allowed. They are not the
>> same thing as zero-length variables in an environment. If you try to
>> convert "lst" from your example to an environment, you would get the
>> error (attempt to use zero-length variable name).
>>
>> Best
>> Tomas
>>
>>
>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
>> > Hi again,
>> >
>> > Beside R_ParseVector()'s possible inconsistent behavior, R's handling of
>> > zero-length named elements does not seem consistent either:
>> >
>> > ```
>> >> lst <- list()
>> >> lst[[""]] <- 1
>> >> names(lst)
>> > [1] ""
>> >> list("" = 1)
>> > Error: attempt to use zero-length variable name
>> > ```
>> >
>> > Should the parser be made to accept as valid what is otherwise possible
>> > when using `[[<` ?
>> >
>> >
>> > Best,
>> >
>> > Laurent
>> >
>> >
>> >
>> > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <[hidden email]> a
>> écrit :
>> >
>> >> I found the following code comment in `src/main/gram.c`:
>> >>
>> >> ```
>> >>
>> >> /* Memory leak
>> >>
>> >> yyparse(), as generated by bison, allocates extra space for the parser
>> >> stack using malloc(). Unfortunately this means that there is a memory
>> >> leak in case of an R error (long-jump). In principle, we could define
>> >> yyoverflow() to relocate the parser stacks for bison and allocate say
>> on
>> >> the R heap, but yyoverflow() is undocumented and somewhat complicated
>> >> (we would have to replicate some macros from the generated parser
>> here).
>> >> The same problem exists at least in the Rd and LaTeX parsers in tools.
>> >> */
>> >>
>> >> ```
>> >>
>> >> Could this be related to be issue ?
>> >>
>> >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <[hidden email]> a
>> >> écrit :
>> >>
>> >>> Hi,
>> >>>
>> >>> The behavior of
>> >>> ```
>> >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>> >>> ```
>> >>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>> >>> depending on the string to be parsed.
>> >>>
>> >>> Trying to parse a string such as `"list(''=1+"` sets the
>> >>> `ParseStatus` to incomplete parsing error but trying to parse
>> >>> `"list(''=123"` will result in R sending a message to the console
>> (followed but a crash):
>> >>>
>> >>> ```
>> >>> R[write to console]: Error: attempt to use zero-length variable
>> nameR[write to console]: Fatal error: unable to initialize the JIT*** stack
>> smashing detected ***: <unknown> terminated
>> >>> ```
>> >>>
>> >>> Is there a reason for the difference in behavior, and is there a
>> workaround ?
>> >>>
>> >>> Thanks,
>> >>>
>> >>>
>> >>> Laurent
>> >>>
>> >>>
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

Tomas Kalibera
On 12/9/19 2:54 PM, Laurent Gautier wrote:

>
>
> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <[hidden email]
> <mailto:[hidden email]>> a écrit :
>
>     On 12/7/19 10:32 PM, Laurent Gautier wrote:
>>     Thanks for the quick response Tomas.
>>
>>     The same error is indeed happening when trying to have a
>>     zero-length variable name in an environment. The surprising bit
>>     is then "why is this happening during parsing" (that is why are
>>     variables assigned to an environment) ?
>
>     The emitted R error (in the R console) is not a parse (syntax)
>     error, but an error emitted during parsing when the parser tries
>     to intern a name - look it up in a symbol table. Empty string is
>     not allowed as a symbol name, and hence the error. In the call
>     "list(''=1)" , the empty name is what could eventually become a
>     name of a local variable inside list(), even though not yet during
>     parsing.
>
>
> Thanks Tomas.
>
> I guess this has do with R expressions being lazily evaluated, and
> names of arguments in a call are also part of the expression. Now the
> puzzling part is why is that at all part of the parsing: I would have
> expected R_ParseVector() to be restricted to parsing... Now it feels
> like R_ParseVector() is performing parsing, and a first level of
> evalution for expressions that "should never work" (the empty name).
Think of it as an exception in say Python. Some failures during parsing
result in an exception (called error in R and implemented using a long
jump). Any time you are calling into R you can get an error; out of
memory is also signalled as R error.

>
>     There is probably some error in how the external code is handling
>     R errors  (Fatal error: unable to initialize the JIT, stack
>     smashing, etc) and possibly also how R is initialized before
>     calling ParseVector. Probably you would get the same problem when
>     running say "stop('myerror')". Please note R errors are
>     implemented as long-jumps, so care has to be taken when calling
>     into R, Writing R Extensions has more details (and section 8
>     specifically about embedding R). This is unlike parse (syntax)
>     errors signaled via return value to ParseVector()
>
>
> The issue is that the segfault (because of stack smashing, therefore
> because of what also suspected to be an incontrolled jump) is
> happening within the execution of R_ParseVector(). I would think that
> an issue with the initialization of R is less likely because the
> project is otherwise used a fair bit and is well covered by automated
> continuous tests.
>
> After looking more into R's gram.c I suspect that an execution context
> is required for R_ParseVector() to know to properly work (know where
> to jump in case of error) when the parsing code decides to fail
> outside what it thinks is a syntax error. If the case, this would make
> R_ParseVector() function well when called from say, a C-extension to
> an R package, but fail the way I am seeing it fail when called from an
> embedded R.

Yes, contexts are used internally to handle errors. For external use
please see Writing R Extensions, section 6.12.

Best
Tomas

> Best,
>
> Laurent
>
>     Best,
>     Tomas
>
>>
>>     We are otherwise aware that the error is not occurring in the R
>>     console, but can be traced to a call to R_ParseVector() in R's C
>>     API:(https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509).
>>
>>     Our specific setup is calling an embedded R from Python, using
>>     the cffi library. An error on end was the first possibility
>>     considered, but the puzzling specificity of the error (as shown
>>     below other parsing errors are handled properly) and the
>>     difficulty tracing what is in happening in R_ParseVector() made
>>     me ask whether someone on this list had a suggestion about the
>>     possible issue"
>>
>>     ```
>>     >>>  import  rpy2.rinterface  as  ri
>>     >>>  ri.initr()
>>     >>>  e  =  ri.parse("list(''=1+")  
>>     ---------------------------------------------------------------------------
>>     RParsingError                              Traceback  (most  recent  call  last)>>> e = ri.parse("list(''=123") R[write to console]: Error:
>>     attempt to use zero-length variable name R[write to console]:
>>     Fatal error: unable to initialize the JIT *** stack smashing
>>     detected ***: <unknown> terminated ```
>>
>>     Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera
>>     <[hidden email] <mailto:[hidden email]>> a
>>     écrit :
>>
>>         Dear Laurent,
>>
>>         could you please provide a complete reproducible example
>>         where parsing
>>         results in a crash of R? Calling parse(text="list(''=123")
>>         from R works
>>         fine for me (gives Error: attempt to use zero-length variable
>>         name).
>>
>>         I don't think the problem you observed could be related to
>>         the memory
>>         leak. The leak is on the heap, not stack.
>>
>>         Zero-length names of elements in a list are allowed. They are
>>         not the
>>         same thing as zero-length variables in an environment. If you
>>         try to
>>         convert "lst" from your example to an environment, you would
>>         get the
>>         error (attempt to use zero-length variable name).
>>
>>         Best
>>         Tomas
>>
>>
>>         On 11/30/19 11:55 PM, Laurent Gautier wrote:
>>         > Hi again,
>>         >
>>         > Beside R_ParseVector()'s possible inconsistent behavior,
>>         R's handling of
>>         > zero-length named elements does not seem consistent either:
>>         >
>>         > ```
>>         >> lst <- list()
>>         >> lst[[""]] <- 1
>>         >> names(lst)
>>         > [1] ""
>>         >> list("" = 1)
>>         > Error: attempt to use zero-length variable name
>>         > ```
>>         >
>>         > Should the parser be made to accept as valid what is
>>         otherwise possible
>>         > when using `[[<` ?
>>         >
>>         >
>>         > Best,
>>         >
>>         > Laurent
>>         >
>>         >
>>         >
>>         > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier
>>         <[hidden email] <mailto:[hidden email]>> a écrit :
>>         >
>>         >> I found the following code comment in `src/main/gram.c`:
>>         >>
>>         >> ```
>>         >>
>>         >> /* Memory leak
>>         >>
>>         >> yyparse(), as generated by bison, allocates extra space
>>         for the parser
>>         >> stack using malloc(). Unfortunately this means that there
>>         is a memory
>>         >> leak in case of an R error (long-jump). In principle, we
>>         could define
>>         >> yyoverflow() to relocate the parser stacks for bison and
>>         allocate say on
>>         >> the R heap, but yyoverflow() is undocumented and somewhat
>>         complicated
>>         >> (we would have to replicate some macros from the generated
>>         parser here).
>>         >> The same problem exists at least in the Rd and LaTeX
>>         parsers in tools.
>>         >> */
>>         >>
>>         >> ```
>>         >>
>>         >> Could this be related to be issue ?
>>         >>
>>         >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier
>>         <[hidden email] <mailto:[hidden email]>> a
>>         >> écrit :
>>         >>
>>         >>> Hi,
>>         >>>
>>         >>> The behavior of
>>         >>> ```
>>         >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>>         >>> ```
>>         >>> defined in `src/include/R_ext/Parse.h` appears to be
>>         inconsistent
>>         >>> depending on the string to be parsed.
>>         >>>
>>         >>> Trying to parse a string such as `"list(''=1+"` sets the
>>         >>> `ParseStatus` to incomplete parsing error but trying to parse
>>         >>> `"list(''=123"` will result in R sending a message to the
>>         console (followed but a crash):
>>         >>>
>>         >>> ```
>>         >>> R[write to console]: Error: attempt to use zero-length
>>         variable nameR[write to console]: Fatal error: unable to
>>         initialize the JIT*** stack smashing detected ***: <unknown>
>>         terminated
>>         >>> ```
>>         >>>
>>         >>> Is there a reason for the difference in behavior, and is
>>         there a workaround ?
>>         >>>
>>         >>> Thanks,
>>         >>>
>>         >>>
>>         >>> Laurent
>>         >>>
>>         >>>
>>         >       [[alternative HTML version deleted]]
>>         >
>>         > ______________________________________________
>>         > [hidden email] <mailto:[hidden email]>
>>         mailing list
>>         > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Why does INT 3 (opcode 0xCC) SIGTRAP break to debugger (gdb) in Rgui.exe and Rterm.exe but NOT in R.exe on Windows (64 bit)?

nospam@altfeld-im.de
I am developing a package to improve the debugging of Rcpp (C++) and SEXP based C code in gdb
by providing convenience print, subset and other functions:

https://github.com/aryoda/R_CppDebugHelper

I also want to solve the Windows-only problem that you can break into the debugger from R
only via Rgui.exe (menu "Misc > break to debugger") by supporting breakpoints for R.exe.

I want breakpoints support in R.exe because debugging in Rgui.exe has an unwanted side effect:

https://stackoverflow.com/questions/59236579/gdb-prints-output-stdout-to-rgui-console-instead-of-gdb-console-on-windows-whe

My idea is to break into the debugger from R.exe by calling a little C(++) code that contains an INT 3 (opcode 0xCC) SIGTRAP code:

// break_to_debugger.cpp
// [[Rcpp::export]]
int break_to_debugger()
{
  int a = 3;
  asm("int $3");  // this code line shall break into the debugger
  // Idea taken from "Rgui > break into debugger":
  // https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/gnuwin32/rui.c#L431
  a++;
  return a;
}

# breakpoint.R
#' breaks the execution into the debugger
#'
#' @return
#' @export
breakpoint <- function() {
  break_to_debugger()
}

Surprisingly this works not only on Linux but also on Windows (v10, x64 architecture = 64 bit) in Rterm.exe,
but NOT for R.exe (64 bit):

- Rgui.exe:    Works
- Rscript.exe: Works
- R.exe:       Does not work: R.exe is exited with:
               [Inferior 1 (process 20704) exited with code 020000000003]

Can you please help me to understand why it works for Rgui.exe and Rscript.exe but not for R.exe?

Why is int 3 exiting R.exe?

And: How could I make it also work with R.exe?

Thanks a lot for sharing your ideas and experiences!

Jürgen

PS 1: My sessionInfo():
        R version 3.6.1 (2019-07-05)
        Platform: x86_64-w64-mingw32/x64 (64-bit)
        Running under: Windows 10 x64 (build 17134)

PS 2: My package "CppDebugHelper" was compiled with -g -o0 -std=c++11

PS 3: Here is my captured gdb output for the three test cases:

1. Rgui.exe ------------------------------------------------------------------------

>gdb --quiet --args Rgui.exe --silent --vanilla
Reading symbols from Rgui.exe...(no debugging symbols found)...done.
(gdb) run
Starting program: C:\R\bin\x64\Rgui.exe --silent --vanilla
[New Thread 14476.0x3710]
[New Thread 14476.0x284c]
[New Thread 14476.0x50ec]
[New Thread 14476.0x2d24]
warning: Invalid parameter passed to C runtime function.
[In RGui's R console:]
library(CppDebugHelper)
breakpoint()
[in gdb again:]
Program received signal SIGTRAP, Trace/breakpoint trap.
break_to_debugger () at break_to_debugger.cpp:33
33        a++;
(gdb) b debug_example_rcpp
Breakpoint 1 at 0x66ac6846: file debug_example_rcpp.cpp, line 13.
(gdb) continue
Continuing.
[In RGui's R console:]
debug_example_rcpp()
[in gdb again:]
Breakpoint 1, debug_example_rcpp () at debug_example_rcpp.cpp:13
13          CharacterVector cv   = CharacterVector::create("foo", "bar", NA_STRING, "hello")  ;
(gdb) next
14          NumericVector nv     = NumericVector::create(0.0, 1.0, NA_REAL, 10) ;
(gdb) n
16          DateVector dv        = DateVector::create( 14974, 14975, 15123, NA_REAL); // TODO how to use real dates instead?
(gdb) n
17          DateVector dv2       = DateVector::create(Date("2010-12-31"), Date("01.01.2011", "%d.%m.%Y"), Date(2011, 05, 29),
NA_REAL);
(gdb) n
18          DatetimeVector dtv   = DatetimeVector::create(1293753600, Datetime("2011-01-01"), Datetime("2011-05-29 10:15:30")
, NA_REAL);
(gdb) n
19          DataFrame df         = DataFrame::create(Named("name1") = cv, _["value1"] = nv, _["dv2"] = dv2);  // Named and _[
] are the same
(gdb) n
20          CharacterVector col1 = df["name1"];          // get the first column
(gdb) call dbg_print(df)
(gdb) call dbg_str(df)
(gdb) continue
Continuing.

[Output for the dbg_* function calls is printed to Rgui's R console (NOT the gdb terminal!):]

  name1 value1        dv2
1   foo      0 2010-12-31
2   bar      1 2011-01-01
3  <NA>     NA 2011-05-29
4 hello     10       <NA>

'data.frame':   4 obs. of  3 variables:
$ name1 : Factor w/ 3 levels "bar","foo","hello": 2 1 NA 3
$ value1: num  0 1 NA 10
$ dv2   : Date, format: "2010-12-31" "2011-01-01" ...



2. R.exe ------------------------------------------------------------------------

>gdb --quiet --args R.exe --silent --vanilla
Reading symbols from R.exe...(no debugging symbols found)...done.
(gdb) r
Starting program: C:\R\bin\x64\R.exe --silent --vanilla
[New Thread 20704.0x2b20]
[New Thread 20704.0x4c08]
[New Thread 20704.0x425c]
[New Thread 20704.0x45f8]
> library(CppDebugHelper)
> breakpoint()
[Thread 20704.0x45f8 exited with code 2147483651]
[Thread 20704.0x425c exited with code 2147483651]
[Thread 20704.0x4c08 exited with code 2147483651]
[Inferior 1 (process 20704) exited with code 020000000003]
(gdb) bt
No stack.
(gdb)



3. Rterm.exe ------------------------------------------------------------------------

gdb --quiet --args Rterm.exe --silent --vanilla
Reading symbols from Rterm.exe...(no debugging symbols found)...done.
(gdb) run
Starting program: C:\R\bin\x64\Rterm.exe --silent --vanilla
[New Thread 8132.0x3ee8]
[New Thread 8132.0x3828]
[New Thread 8132.0x4f1c]
[New Thread 8132.0x4ff4]
warning: Invalid parameter passed to C runtime function.
[New Thread 8132.0x4dc8]
> library(CppDebugHelper)
> breakpoint()
Program received signal SIGTRAP, Trace/breakpoint trap.
break_to_debugger () at break_to_debugger.cpp:33
33        a++;
(gdb) b debug_example_rcpp
Breakpoint 1 at 0x66ac6846: file debug_example_rcpp.cpp, line 13.
(gdb) c
Continuing.
[1] 4
> debug_example_rcpp()
Breakpoint 1, debug_example_rcpp () at debug_example_rcpp.cpp:13
13          CharacterVector cv   = CharacterVector::create("foo", "bar", NA_STRING, "hello")  ;
(gdb) n
14          NumericVector nv     = NumericVector::create(0.0, 1.0, NA_REAL, 10) ;
(gdb) n
16          DateVector dv        = DateVector::create( 14974, 14975, 15123, NA_REAL); // TODO how to use real dates instead?
(gdb) call dbg_print(nv)
[1]  0  1 NA 10
(gdb) call dbg_print(dbg_subset(nv, 1, 2))
[1]  1 NA
(gdb)

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

lgautier
In reply to this post by Tomas Kalibera
Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera <[hidden email]> a
écrit :

> On 12/9/19 2:54 PM, Laurent Gautier wrote:
>
>
>
> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <[hidden email]> a
> écrit :
>
>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>>
>> Thanks for the quick response Tomas.
>>
>> The same error is indeed happening when trying to have a zero-length
>> variable name in an environment. The surprising bit is then "why is this
>> happening during parsing" (that is why are variables assigned to an
>> environment) ?
>>
>> The emitted R error (in the R console) is not a parse (syntax) error, but
>> an error emitted during parsing when the parser tries to intern a name -
>> look it up in a symbol table. Empty string is not allowed as a symbol name,
>> and hence the error. In the call "list(''=1)" , the empty name is what
>> could eventually become a name of a local variable inside list(), even
>> though not yet during parsing.
>>
>
> Thanks Tomas.
>
> I guess this has do with R expressions being lazily evaluated, and names
> of arguments in a call are also part of the expression. Now the puzzling
> part is why is that at all part of the parsing: I would have expected
> R_ParseVector() to be restricted to parsing... Now it feels like
> R_ParseVector() is performing parsing, and a first level of evalution for
> expressions that "should never work" (the empty name).
>
> Think of it as an exception in say Python. Some failures during parsing
> result in an exception (called error in R and implemented using a long
> jump). Any time you are calling into R you can get an error; out of memory
> is also signalled as R error.
>


The surprising bit for me was that I had expected the function to solely
perform parsing. I did expect an exception (and a jmp smashing the stack)
when the function concerned is in the C-API, is parsing a string, and is
using a parameter (pointer) to store whether parsing was a failure or a
success.

Since you are making a comparison with Python, the distinction I am making
between parsing and evaluation seem to apply there. For example:

```
>>> import parser
>>> parser.expr('1+')
  Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    1+
     ^
SyntaxError: unexpected EOF while parsing
>>> p = parser.expr('list(""=1)')
>>> p
<parser.st at 0x7f360e5329f0>
>>> eval(p)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: eval() arg 1 must be a string, bytes or code object

>>> list(""=1)
  File "<stdin>", line 1
SyntaxError: keyword can't be an expression
```


> There is probably some error in how the external code is handling R
>> errors  (Fatal error: unable to initialize the JIT, stack smashing, etc)
>> and possibly also how R is initialized before calling ParseVector. Probably
>> you would get the same problem when running say "stop('myerror')". Please
>> note R errors are implemented as long-jumps, so care has to be taken when
>> calling into R, Writing R Extensions has more details (and section 8
>> specifically about embedding R). This is unlike parse (syntax) errors
>> signaled via return value to ParseVector()
>>
>
> The issue is that the segfault (because of stack smashing, therefore
> because of what also suspected to be an incontrolled jump) is happening
> within the execution of R_ParseVector(). I would think that an issue with
> the initialization of R is less likely because the project is otherwise
> used a fair bit and is well covered by automated continuous tests.
>
> After looking more into R's gram.c I suspect that an execution context is
> required for R_ParseVector() to know to properly work (know where to jump
> in case of error) when the parsing code decides to fail outside what it
> thinks is a syntax error. If the case, this would make R_ParseVector()
> function well when called from say, a C-extension to an R package, but fail
> the way I am seeing it fail when called from an embedded R.
>
> Yes, contexts are used internally to handle errors. For external use
> please see Writing R Extensions, section 6.12.
>

I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and this
is seems to help me overcome the issue. Thanks for the pointer.

Best,


Laurent


> Best
> Tomas
>
>
> Best,
>
> Laurent
>
>> Best,
>> Tomas
>>
>>
>> We are otherwise aware that the error is not occurring in the R console,
>> but can be traced to a call to R_ParseVector() in R's C API:(
>> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
>> ).
>>
>> Our specific setup is calling an embedded R from Python, using the cffi
>> library. An error on end was the first possibility considered, but the
>> puzzling specificity of the error (as shown below other parsing errors are
>> handled properly) and the difficulty tracing what is in happening in
>> R_ParseVector() made me ask whether someone on this list had a suggestion
>> about the possible issue"
>>
>> ```
>>
>> >>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") ---------------------------------------------------------------------------RParsingError                             Traceback (most recent call last)>>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use zero-length variable name
>> R[write to console]: Fatal error: unable to initialize the JIT
>>
>> *** stack smashing detected ***: <unknown> terminated
>> ```
>>
>>
>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <[hidden email]> a
>> écrit :
>>
>>> Dear Laurent,
>>>
>>> could you please provide a complete reproducible example where parsing
>>> results in a crash of R? Calling parse(text="list(''=123") from R works
>>> fine for me (gives Error: attempt to use zero-length variable name).
>>>
>>> I don't think the problem you observed could be related to the memory
>>> leak. The leak is on the heap, not stack.
>>>
>>> Zero-length names of elements in a list are allowed. They are not the
>>> same thing as zero-length variables in an environment. If you try to
>>> convert "lst" from your example to an environment, you would get the
>>> error (attempt to use zero-length variable name).
>>>
>>> Best
>>> Tomas
>>>
>>>
>>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
>>> > Hi again,
>>> >
>>> > Beside R_ParseVector()'s possible inconsistent behavior, R's handling
>>> of
>>> > zero-length named elements does not seem consistent either:
>>> >
>>> > ```
>>> >> lst <- list()
>>> >> lst[[""]] <- 1
>>> >> names(lst)
>>> > [1] ""
>>> >> list("" = 1)
>>> > Error: attempt to use zero-length variable name
>>> > ```
>>> >
>>> > Should the parser be made to accept as valid what is otherwise possible
>>> > when using `[[<` ?
>>> >
>>> >
>>> > Best,
>>> >
>>> > Laurent
>>> >
>>> >
>>> >
>>> > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <[hidden email]> a
>>> écrit :
>>> >
>>> >> I found the following code comment in `src/main/gram.c`:
>>> >>
>>> >> ```
>>> >>
>>> >> /* Memory leak
>>> >>
>>> >> yyparse(), as generated by bison, allocates extra space for the parser
>>> >> stack using malloc(). Unfortunately this means that there is a memory
>>> >> leak in case of an R error (long-jump). In principle, we could define
>>> >> yyoverflow() to relocate the parser stacks for bison and allocate say
>>> on
>>> >> the R heap, but yyoverflow() is undocumented and somewhat complicated
>>> >> (we would have to replicate some macros from the generated parser
>>> here).
>>> >> The same problem exists at least in the Rd and LaTeX parsers in tools.
>>> >> */
>>> >>
>>> >> ```
>>> >>
>>> >> Could this be related to be issue ?
>>> >>
>>> >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <[hidden email]> a
>>> >> écrit :
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> The behavior of
>>> >>> ```
>>> >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>>> >>> ```
>>> >>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>>> >>> depending on the string to be parsed.
>>> >>>
>>> >>> Trying to parse a string such as `"list(''=1+"` sets the
>>> >>> `ParseStatus` to incomplete parsing error but trying to parse
>>> >>> `"list(''=123"` will result in R sending a message to the console
>>> (followed but a crash):
>>> >>>
>>> >>> ```
>>> >>> R[write to console]: Error: attempt to use zero-length variable
>>> nameR[write to console]: Fatal error: unable to initialize the JIT*** stack
>>> smashing detected ***: <unknown> terminated
>>> >>> ```
>>> >>>
>>> >>> Is there a reason for the difference in behavior, and is there a
>>> workaround ?
>>> >>>
>>> >>> Thanks,
>>> >>>
>>> >>>
>>> >>> Laurent
>>> >>>
>>> >>>
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

Simon Urbanek
Laurent,

the main point here is that ParseVector() just like any other R API has to be called in a correct context since it can raise errors so the issue was that your C code has a bug of not setting R correctly (my guess would be your'e not creating the initial context necessary in embedded R). There are many different errors, your is just one of many that can occur - any R API call that does allocation (and parsing obviously does) can cause errors. Note that this is true for pretty much all R API functions.

Cheers,
Simon



> On Dec 14, 2019, at 11:25 AM, Laurent Gautier <[hidden email]> wrote:
>
> Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera <[hidden email]> a
> écrit :
>
>> On 12/9/19 2:54 PM, Laurent Gautier wrote:
>>
>>
>>
>> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <[hidden email]> a
>> écrit :
>>
>>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>>>
>>> Thanks for the quick response Tomas.
>>>
>>> The same error is indeed happening when trying to have a zero-length
>>> variable name in an environment. The surprising bit is then "why is this
>>> happening during parsing" (that is why are variables assigned to an
>>> environment) ?
>>>
>>> The emitted R error (in the R console) is not a parse (syntax) error, but
>>> an error emitted during parsing when the parser tries to intern a name -
>>> look it up in a symbol table. Empty string is not allowed as a symbol name,
>>> and hence the error. In the call "list(''=1)" , the empty name is what
>>> could eventually become a name of a local variable inside list(), even
>>> though not yet during parsing.
>>>
>>
>> Thanks Tomas.
>>
>> I guess this has do with R expressions being lazily evaluated, and names
>> of arguments in a call are also part of the expression. Now the puzzling
>> part is why is that at all part of the parsing: I would have expected
>> R_ParseVector() to be restricted to parsing... Now it feels like
>> R_ParseVector() is performing parsing, and a first level of evalution for
>> expressions that "should never work" (the empty name).
>>
>> Think of it as an exception in say Python. Some failures during parsing
>> result in an exception (called error in R and implemented using a long
>> jump). Any time you are calling into R you can get an error; out of memory
>> is also signalled as R error.
>>
>
>
> The surprising bit for me was that I had expected the function to solely
> perform parsing. I did expect an exception (and a jmp smashing the stack)
> when the function concerned is in the C-API, is parsing a string, and is
> using a parameter (pointer) to store whether parsing was a failure or a
> success.
>
> Since you are making a comparison with Python, the distinction I am making
> between parsing and evaluation seem to apply there. For example:
>
> ```
>>>> import parser
>>>> parser.expr('1+')
>  Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "<string>", line 1
>    1+
>     ^
> SyntaxError: unexpected EOF while parsing
>>>> p = parser.expr('list(""=1)')
>>>> p
> <parser.st at 0x7f360e5329f0>
>>>> eval(p)
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> TypeError: eval() arg 1 must be a string, bytes or code object
>
>>>> list(""=1)
>  File "<stdin>", line 1
> SyntaxError: keyword can't be an expression
> ```
>
>
>> There is probably some error in how the external code is handling R
>>> errors  (Fatal error: unable to initialize the JIT, stack smashing, etc)
>>> and possibly also how R is initialized before calling ParseVector. Probably
>>> you would get the same problem when running say "stop('myerror')". Please
>>> note R errors are implemented as long-jumps, so care has to be taken when
>>> calling into R, Writing R Extensions has more details (and section 8
>>> specifically about embedding R). This is unlike parse (syntax) errors
>>> signaled via return value to ParseVector()
>>>
>>
>> The issue is that the segfault (because of stack smashing, therefore
>> because of what also suspected to be an incontrolled jump) is happening
>> within the execution of R_ParseVector(). I would think that an issue with
>> the initialization of R is less likely because the project is otherwise
>> used a fair bit and is well covered by automated continuous tests.
>>
>> After looking more into R's gram.c I suspect that an execution context is
>> required for R_ParseVector() to know to properly work (know where to jump
>> in case of error) when the parsing code decides to fail outside what it
>> thinks is a syntax error. If the case, this would make R_ParseVector()
>> function well when called from say, a C-extension to an R package, but fail
>> the way I am seeing it fail when called from an embedded R.
>>
>> Yes, contexts are used internally to handle errors. For external use
>> please see Writing R Extensions, section 6.12.
>>
>
> I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and this
> is seems to help me overcome the issue. Thanks for the pointer.
>
> Best,
>
>
> Laurent
>
>
>> Best
>> Tomas
>>
>>
>> Best,
>>
>> Laurent
>>
>>> Best,
>>> Tomas
>>>
>>>
>>> We are otherwise aware that the error is not occurring in the R console,
>>> but can be traced to a call to R_ParseVector() in R's C API:(
>>> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
>>> ).
>>>
>>> Our specific setup is calling an embedded R from Python, using the cffi
>>> library. An error on end was the first possibility considered, but the
>>> puzzling specificity of the error (as shown below other parsing errors are
>>> handled properly) and the difficulty tracing what is in happening in
>>> R_ParseVector() made me ask whether someone on this list had a suggestion
>>> about the possible issue"
>>>
>>> ```
>>>
>>>>>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") ---------------------------------------------------------------------------RParsingError                             Traceback (most recent call last)>>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use zero-length variable name
>>> R[write to console]: Fatal error: unable to initialize the JIT
>>>
>>> *** stack smashing detected ***: <unknown> terminated
>>> ```
>>>
>>>
>>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <[hidden email]> a
>>> écrit :
>>>
>>>> Dear Laurent,
>>>>
>>>> could you please provide a complete reproducible example where parsing
>>>> results in a crash of R? Calling parse(text="list(''=123") from R works
>>>> fine for me (gives Error: attempt to use zero-length variable name).
>>>>
>>>> I don't think the problem you observed could be related to the memory
>>>> leak. The leak is on the heap, not stack.
>>>>
>>>> Zero-length names of elements in a list are allowed. They are not the
>>>> same thing as zero-length variables in an environment. If you try to
>>>> convert "lst" from your example to an environment, you would get the
>>>> error (attempt to use zero-length variable name).
>>>>
>>>> Best
>>>> Tomas
>>>>
>>>>
>>>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
>>>>> Hi again,
>>>>>
>>>>> Beside R_ParseVector()'s possible inconsistent behavior, R's handling
>>>> of
>>>>> zero-length named elements does not seem consistent either:
>>>>>
>>>>> ```
>>>>>> lst <- list()
>>>>>> lst[[""]] <- 1
>>>>>> names(lst)
>>>>> [1] ""
>>>>>> list("" = 1)
>>>>> Error: attempt to use zero-length variable name
>>>>> ```
>>>>>
>>>>> Should the parser be made to accept as valid what is otherwise possible
>>>>> when using `[[<` ?
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Laurent
>>>>>
>>>>>
>>>>>
>>>>> Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <[hidden email]> a
>>>> écrit :
>>>>>
>>>>>> I found the following code comment in `src/main/gram.c`:
>>>>>>
>>>>>> ```
>>>>>>
>>>>>> /* Memory leak
>>>>>>
>>>>>> yyparse(), as generated by bison, allocates extra space for the parser
>>>>>> stack using malloc(). Unfortunately this means that there is a memory
>>>>>> leak in case of an R error (long-jump). In principle, we could define
>>>>>> yyoverflow() to relocate the parser stacks for bison and allocate say
>>>> on
>>>>>> the R heap, but yyoverflow() is undocumented and somewhat complicated
>>>>>> (we would have to replicate some macros from the generated parser
>>>> here).
>>>>>> The same problem exists at least in the Rd and LaTeX parsers in tools.
>>>>>> */
>>>>>>
>>>>>> ```
>>>>>>
>>>>>> Could this be related to be issue ?
>>>>>>
>>>>>> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <[hidden email]> a
>>>>>> écrit :
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> The behavior of
>>>>>>> ```
>>>>>>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>>>>>>> ```
>>>>>>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>>>>>>> depending on the string to be parsed.
>>>>>>>
>>>>>>> Trying to parse a string such as `"list(''=1+"` sets the
>>>>>>> `ParseStatus` to incomplete parsing error but trying to parse
>>>>>>> `"list(''=123"` will result in R sending a message to the console
>>>> (followed but a crash):
>>>>>>>
>>>>>>> ```
>>>>>>> R[write to console]: Error: attempt to use zero-length variable
>>>> nameR[write to console]: Fatal error: unable to initialize the JIT*** stack
>>>> smashing detected ***: <unknown> terminated
>>>>>>> ```
>>>>>>>
>>>>>>> Is there a reason for the difference in behavior, and is there a
>>>> workaround ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>> Laurent
>>>>>>>
>>>>>>>
>>>>>      [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>>>
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

lgautier
Hi Simon,

Widespread errors would have caught my earlier as the way that code is
using only one initialization of the embedded R, is used quite a bit, and
is covered by quite a few unit tests. This is the only situation I am aware
of in which an error occurs.

What is a "correct context", or initial context, the code should from ?
Searching for "context" in the R-exts manual does not return much.

Best,

Laurent


Le sam. 14 déc. 2019 à 12:20, Simon Urbanek <[hidden email]> a
écrit :

> Laurent,
>
> the main point here is that ParseVector() just like any other R API has to
> be called in a correct context since it can raise errors so the issue was
> that your C code has a bug of not setting R correctly (my guess would be
> your'e not creating the initial context necessary in embedded R). There are
> many different errors, your is just one of many that can occur - any R API
> call that does allocation (and parsing obviously does) can cause errors.
> Note that this is true for pretty much all R API functions.
>
> Cheers,
> Simon
>
>
>
> > On Dec 14, 2019, at 11:25 AM, Laurent Gautier <[hidden email]>
> wrote:
> >
> > Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera <[hidden email]> a
> > écrit :
> >
> >> On 12/9/19 2:54 PM, Laurent Gautier wrote:
> >>
> >>
> >>
> >> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <[hidden email]>
> a
> >> écrit :
> >>
> >>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
> >>>
> >>> Thanks for the quick response Tomas.
> >>>
> >>> The same error is indeed happening when trying to have a zero-length
> >>> variable name in an environment. The surprising bit is then "why is
> this
> >>> happening during parsing" (that is why are variables assigned to an
> >>> environment) ?
> >>>
> >>> The emitted R error (in the R console) is not a parse (syntax) error,
> but
> >>> an error emitted during parsing when the parser tries to intern a name
> -
> >>> look it up in a symbol table. Empty string is not allowed as a symbol
> name,
> >>> and hence the error. In the call "list(''=1)" , the empty name is what
> >>> could eventually become a name of a local variable inside list(), even
> >>> though not yet during parsing.
> >>>
> >>
> >> Thanks Tomas.
> >>
> >> I guess this has do with R expressions being lazily evaluated, and names
> >> of arguments in a call are also part of the expression. Now the puzzling
> >> part is why is that at all part of the parsing: I would have expected
> >> R_ParseVector() to be restricted to parsing... Now it feels like
> >> R_ParseVector() is performing parsing, and a first level of evalution
> for
> >> expressions that "should never work" (the empty name).
> >>
> >> Think of it as an exception in say Python. Some failures during parsing
> >> result in an exception (called error in R and implemented using a long
> >> jump). Any time you are calling into R you can get an error; out of
> memory
> >> is also signalled as R error.
> >>
> >
> >
> > The surprising bit for me was that I had expected the function to solely
> > perform parsing. I did expect an exception (and a jmp smashing the stack)
> > when the function concerned is in the C-API, is parsing a string, and is
> > using a parameter (pointer) to store whether parsing was a failure or a
> > success.
> >
> > Since you are making a comparison with Python, the distinction I am
> making
> > between parsing and evaluation seem to apply there. For example:
> >
> > ```
> >>>> import parser
> >>>> parser.expr('1+')
> >  Traceback (most recent call last):
> >  File "<stdin>", line 1, in <module>
> >  File "<string>", line 1
> >    1+
> >     ^
> > SyntaxError: unexpected EOF while parsing
> >>>> p = parser.expr('list(""=1)')
> >>>> p
> > <parser.st at 0x7f360e5329f0>
> >>>> eval(p)
> > Traceback (most recent call last):
> >  File "<stdin>", line 1, in <module>
> > TypeError: eval() arg 1 must be a string, bytes or code object
> >
> >>>> list(""=1)
> >  File "<stdin>", line 1
> > SyntaxError: keyword can't be an expression
> > ```
> >
> >
> >> There is probably some error in how the external code is handling R
> >>> errors  (Fatal error: unable to initialize the JIT, stack smashing,
> etc)
> >>> and possibly also how R is initialized before calling ParseVector.
> Probably
> >>> you would get the same problem when running say "stop('myerror')".
> Please
> >>> note R errors are implemented as long-jumps, so care has to be taken
> when
> >>> calling into R, Writing R Extensions has more details (and section 8
> >>> specifically about embedding R). This is unlike parse (syntax) errors
> >>> signaled via return value to ParseVector()
> >>>
> >>
> >> The issue is that the segfault (because of stack smashing, therefore
> >> because of what also suspected to be an incontrolled jump) is happening
> >> within the execution of R_ParseVector(). I would think that an issue
> with
> >> the initialization of R is less likely because the project is otherwise
> >> used a fair bit and is well covered by automated continuous tests.
> >>
> >> After looking more into R's gram.c I suspect that an execution context
> is
> >> required for R_ParseVector() to know to properly work (know where to
> jump
> >> in case of error) when the parsing code decides to fail outside what it
> >> thinks is a syntax error. If the case, this would make R_ParseVector()
> >> function well when called from say, a C-extension to an R package, but
> fail
> >> the way I am seeing it fail when called from an embedded R.
> >>
> >> Yes, contexts are used internally to handle errors. For external use
> >> please see Writing R Extensions, section 6.12.
> >>
> >
> > I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and
> this
> > is seems to help me overcome the issue. Thanks for the pointer.
> >
> > Best,
> >
> >
> > Laurent
> >
> >
> >> Best
> >> Tomas
> >>
> >>
> >> Best,
> >>
> >> Laurent
> >>
> >>> Best,
> >>> Tomas
> >>>
> >>>
> >>> We are otherwise aware that the error is not occurring in the R
> console,
> >>> but can be traced to a call to R_ParseVector() in R's C API:(
> >>>
> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
> >>> ).
> >>>
> >>> Our specific setup is calling an embedded R from Python, using the cffi
> >>> library. An error on end was the first possibility considered, but the
> >>> puzzling specificity of the error (as shown below other parsing errors
> are
> >>> handled properly) and the difficulty tracing what is in happening in
> >>> R_ParseVector() made me ask whether someone on this list had a
> suggestion
> >>> about the possible issue"
> >>>
> >>> ```
> >>>
> >>>>>> import rpy2.rinterface as ri>>> ri.initr()>>> e =
> ri.parse("list(''=1+")
> ---------------------------------------------------------------------------RParsingError
>                            Traceback (most recent call last)>>> e =
> ri.parse("list(''=123") R[write to console]: Error: attempt to use
> zero-length variable name
> >>> R[write to console]: Fatal error: unable to initialize the JIT
> >>>
> >>> *** stack smashing detected ***: <unknown> terminated
> >>> ```
> >>>
> >>>
> >>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <[hidden email]>
> a
> >>> écrit :
> >>>
> >>>> Dear Laurent,
> >>>>
> >>>> could you please provide a complete reproducible example where parsing
> >>>> results in a crash of R? Calling parse(text="list(''=123") from R
> works
> >>>> fine for me (gives Error: attempt to use zero-length variable name).
> >>>>
> >>>> I don't think the problem you observed could be related to the memory
> >>>> leak. The leak is on the heap, not stack.
> >>>>
> >>>> Zero-length names of elements in a list are allowed. They are not the
> >>>> same thing as zero-length variables in an environment. If you try to
> >>>> convert "lst" from your example to an environment, you would get the
> >>>> error (attempt to use zero-length variable name).
> >>>>
> >>>> Best
> >>>> Tomas
> >>>>
> >>>>
> >>>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
> >>>>> Hi again,
> >>>>>
> >>>>> Beside R_ParseVector()'s possible inconsistent behavior, R's handling
> >>>> of
> >>>>> zero-length named elements does not seem consistent either:
> >>>>>
> >>>>> ```
> >>>>>> lst <- list()
> >>>>>> lst[[""]] <- 1
> >>>>>> names(lst)
> >>>>> [1] ""
> >>>>>> list("" = 1)
> >>>>> Error: attempt to use zero-length variable name
> >>>>> ```
> >>>>>
> >>>>> Should the parser be made to accept as valid what is otherwise
> possible
> >>>>> when using `[[<` ?
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Laurent
> >>>>>
> >>>>>
> >>>>>
> >>>>> Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <[hidden email]> a
> >>>> écrit :
> >>>>>
> >>>>>> I found the following code comment in `src/main/gram.c`:
> >>>>>>
> >>>>>> ```
> >>>>>>
> >>>>>> /* Memory leak
> >>>>>>
> >>>>>> yyparse(), as generated by bison, allocates extra space for the
> parser
> >>>>>> stack using malloc(). Unfortunately this means that there is a
> memory
> >>>>>> leak in case of an R error (long-jump). In principle, we could
> define
> >>>>>> yyoverflow() to relocate the parser stacks for bison and allocate
> say
> >>>> on
> >>>>>> the R heap, but yyoverflow() is undocumented and somewhat
> complicated
> >>>>>> (we would have to replicate some macros from the generated parser
> >>>> here).
> >>>>>> The same problem exists at least in the Rd and LaTeX parsers in
> tools.
> >>>>>> */
> >>>>>>
> >>>>>> ```
> >>>>>>
> >>>>>> Could this be related to be issue ?
> >>>>>>
> >>>>>> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <[hidden email]>
> a
> >>>>>> écrit :
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> The behavior of
> >>>>>>> ```
> >>>>>>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
> >>>>>>> ```
> >>>>>>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
> >>>>>>> depending on the string to be parsed.
> >>>>>>>
> >>>>>>> Trying to parse a string such as `"list(''=1+"` sets the
> >>>>>>> `ParseStatus` to incomplete parsing error but trying to parse
> >>>>>>> `"list(''=123"` will result in R sending a message to the console
> >>>> (followed but a crash):
> >>>>>>>
> >>>>>>> ```
> >>>>>>> R[write to console]: Error: attempt to use zero-length variable
> >>>> nameR[write to console]: Fatal error: unable to initialize the JIT***
> stack
> >>>> smashing detected ***: <unknown> terminated
> >>>>>>> ```
> >>>>>>>
> >>>>>>> Is there a reason for the difference in behavior, and is there a
> >>>> workaround ?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>>
> >>>>>>> Laurent
> >>>>>>>
> >>>>>>>
> >>>>>      [[alternative HTML version deleted]]
> >>>>>
> >>>>> ______________________________________________
> >>>>> [hidden email] mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior for the C AP's R_ParseVector() ?

Simon Urbanek
Laurent,


> On Dec 14, 2019, at 5:29 PM, Laurent Gautier <[hidden email]> wrote:
>
> Hi Simon,
>
> Widespread errors would have caught my earlier as the way that code is
> using only one initialization of the embedded R, is used quite a bit, and
> is covered by quite a few unit tests. This is the only situation I am aware
> of in which an error occurs.
>

It may or may not be "widespread" - almost all R API functions can raise errors (e.g., unable to allocate). You'll only find out once they do and that's too late ;).


> What is a "correct context", or initial context, the code should from ?
> Searching for "context" in the R-exts manual does not return much.
>

It depends which embedded API use - see R-ext 8.1 the two options are run_Rmainloop() and R_ReplDLLinit() which both setup the top-level context with SETJMP. If you don't use either then you have to use one of the advanced R APIs that do it such as R_ToplevelExec() or R_UnwindProtect(), otherwise your point to abort to on error doesn't exist. Embedding R is much more complex than many think ...

Cheers,
Simon



> Best,
>
> Laurent
>
>
> Le sam. 14 déc. 2019 à 12:20, Simon Urbanek <[hidden email]> a
> écrit :
>
>> Laurent,
>>
>> the main point here is that ParseVector() just like any other R API has to
>> be called in a correct context since it can raise errors so the issue was
>> that your C code has a bug of not setting R correctly (my guess would be
>> your'e not creating the initial context necessary in embedded R). There are
>> many different errors, your is just one of many that can occur - any R API
>> call that does allocation (and parsing obviously does) can cause errors.
>> Note that this is true for pretty much all R API functions.
>>
>> Cheers,
>> Simon
>>
>>
>>
>>> On Dec 14, 2019, at 11:25 AM, Laurent Gautier <[hidden email]>
>> wrote:
>>>
>>> Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera <[hidden email]> a
>>> écrit :
>>>
>>>> On 12/9/19 2:54 PM, Laurent Gautier wrote:
>>>>
>>>>
>>>>
>>>> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <[hidden email]>
>> a
>>>> écrit :
>>>>
>>>>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>>>>>
>>>>> Thanks for the quick response Tomas.
>>>>>
>>>>> The same error is indeed happening when trying to have a zero-length
>>>>> variable name in an environment. The surprising bit is then "why is
>> this
>>>>> happening during parsing" (that is why are variables assigned to an
>>>>> environment) ?
>>>>>
>>>>> The emitted R error (in the R console) is not a parse (syntax) error,
>> but
>>>>> an error emitted during parsing when the parser tries to intern a name
>> -
>>>>> look it up in a symbol table. Empty string is not allowed as a symbol
>> name,
>>>>> and hence the error. In the call "list(''=1)" , the empty name is what
>>>>> could eventually become a name of a local variable inside list(), even
>>>>> though not yet during parsing.
>>>>>
>>>>
>>>> Thanks Tomas.
>>>>
>>>> I guess this has do with R expressions being lazily evaluated, and names
>>>> of arguments in a call are also part of the expression. Now the puzzling
>>>> part is why is that at all part of the parsing: I would have expected
>>>> R_ParseVector() to be restricted to parsing... Now it feels like
>>>> R_ParseVector() is performing parsing, and a first level of evalution
>> for
>>>> expressions that "should never work" (the empty name).
>>>>
>>>> Think of it as an exception in say Python. Some failures during parsing
>>>> result in an exception (called error in R and implemented using a long
>>>> jump). Any time you are calling into R you can get an error; out of
>> memory
>>>> is also signalled as R error.
>>>>
>>>
>>>
>>> The surprising bit for me was that I had expected the function to solely
>>> perform parsing. I did expect an exception (and a jmp smashing the stack)
>>> when the function concerned is in the C-API, is parsing a string, and is
>>> using a parameter (pointer) to store whether parsing was a failure or a
>>> success.
>>>
>>> Since you are making a comparison with Python, the distinction I am
>> making
>>> between parsing and evaluation seem to apply there. For example:
>>>
>>> ```
>>>>>> import parser
>>>>>> parser.expr('1+')
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> File "<string>", line 1
>>>   1+
>>>    ^
>>> SyntaxError: unexpected EOF while parsing
>>>>>> p = parser.expr('list(""=1)')
>>>>>> p
>>> <parser.st at 0x7f360e5329f0>
>>>>>> eval(p)
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> TypeError: eval() arg 1 must be a string, bytes or code object
>>>
>>>>>> list(""=1)
>>> File "<stdin>", line 1
>>> SyntaxError: keyword can't be an expression
>>> ```
>>>
>>>
>>>> There is probably some error in how the external code is handling R
>>>>> errors  (Fatal error: unable to initialize the JIT, stack smashing,
>> etc)
>>>>> and possibly also how R is initialized before calling ParseVector.
>> Probably
>>>>> you would get the same problem when running say "stop('myerror')".
>> Please
>>>>> note R errors are implemented as long-jumps, so care has to be taken
>> when
>>>>> calling into R, Writing R Extensions has more details (and section 8
>>>>> specifically about embedding R). This is unlike parse (syntax) errors
>>>>> signaled via return value to ParseVector()
>>>>>
>>>>
>>>> The issue is that the segfault (because of stack smashing, therefore
>>>> because of what also suspected to be an incontrolled jump) is happening
>>>> within the execution of R_ParseVector(). I would think that an issue
>> with
>>>> the initialization of R is less likely because the project is otherwise
>>>> used a fair bit and is well covered by automated continuous tests.
>>>>
>>>> After looking more into R's gram.c I suspect that an execution context
>> is
>>>> required for R_ParseVector() to know to properly work (know where to
>> jump
>>>> in case of error) when the parsing code decides to fail outside what it
>>>> thinks is a syntax error. If the case, this would make R_ParseVector()
>>>> function well when called from say, a C-extension to an R package, but
>> fail
>>>> the way I am seeing it fail when called from an embedded R.
>>>>
>>>> Yes, contexts are used internally to handle errors. For external use
>>>> please see Writing R Extensions, section 6.12.
>>>>
>>>
>>> I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and
>> this
>>> is seems to help me overcome the issue. Thanks for the pointer.
>>>
>>> Best,
>>>
>>>
>>> Laurent
>>>
>>>
>>>> Best
>>>> Tomas
>>>>
>>>>
>>>> Best,
>>>>
>>>> Laurent
>>>>
>>>>> Best,
>>>>> Tomas
>>>>>
>>>>>
>>>>> We are otherwise aware that the error is not occurring in the R
>> console,
>>>>> but can be traced to a call to R_ParseVector() in R's C API:(
>>>>>
>> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
>>>>> ).
>>>>>
>>>>> Our specific setup is calling an embedded R from Python, using the cffi
>>>>> library. An error on end was the first possibility considered, but the
>>>>> puzzling specificity of the error (as shown below other parsing errors
>> are
>>>>> handled properly) and the difficulty tracing what is in happening in
>>>>> R_ParseVector() made me ask whether someone on this list had a
>> suggestion
>>>>> about the possible issue"
>>>>>
>>>>> ```
>>>>>
>>>>>>>> import rpy2.rinterface as ri>>> ri.initr()>>> e =
>> ri.parse("list(''=1+")
>> ---------------------------------------------------------------------------RParsingError
>>                           Traceback (most recent call last)>>> e =
>> ri.parse("list(''=123") R[write to console]: Error: attempt to use
>> zero-length variable name
>>>>> R[write to console]: Fatal error: unable to initialize the JIT
>>>>>
>>>>> *** stack smashing detected ***: <unknown> terminated
>>>>> ```
>>>>>
>>>>>
>>>>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <[hidden email]>
>> a
>>>>> écrit :
>>>>>
>>>>>> Dear Laurent,
>>>>>>
>>>>>> could you please provide a complete reproducible example where parsing
>>>>>> results in a crash of R? Calling parse(text="list(''=123") from R
>> works
>>>>>> fine for me (gives Error: attempt to use zero-length variable name).
>>>>>>
>>>>>> I don't think the problem you observed could be related to the memory
>>>>>> leak. The leak is on the heap, not stack.
>>>>>>
>>>>>> Zero-length names of elements in a list are allowed. They are not the
>>>>>> same thing as zero-length variables in an environment. If you try to
>>>>>> convert "lst" from your example to an environment, you would get the
>>>>>> error (attempt to use zero-length variable name).
>>>>>>
>>>>>> Best
>>>>>> Tomas
>>>>>>
>>>>>>
>>>>>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
>>>>>>> Hi again,
>>>>>>>
>>>>>>> Beside R_ParseVector()'s possible inconsistent behavior, R's handling
>>>>>> of
>>>>>>> zero-length named elements does not seem consistent either:
>>>>>>>
>>>>>>> ```
>>>>>>>> lst <- list()
>>>>>>>> lst[[""]] <- 1
>>>>>>>> names(lst)
>>>>>>> [1] ""
>>>>>>>> list("" = 1)
>>>>>>> Error: attempt to use zero-length variable name
>>>>>>> ```
>>>>>>>
>>>>>>> Should the parser be made to accept as valid what is otherwise
>> possible
>>>>>>> when using `[[<` ?
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Laurent
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <[hidden email]> a
>>>>>> écrit :
>>>>>>>
>>>>>>>> I found the following code comment in `src/main/gram.c`:
>>>>>>>>
>>>>>>>> ```
>>>>>>>>
>>>>>>>> /* Memory leak
>>>>>>>>
>>>>>>>> yyparse(), as generated by bison, allocates extra space for the
>> parser
>>>>>>>> stack using malloc(). Unfortunately this means that there is a
>> memory
>>>>>>>> leak in case of an R error (long-jump). In principle, we could
>> define
>>>>>>>> yyoverflow() to relocate the parser stacks for bison and allocate
>> say
>>>>>> on
>>>>>>>> the R heap, but yyoverflow() is undocumented and somewhat
>> complicated
>>>>>>>> (we would have to replicate some macros from the generated parser
>>>>>> here).
>>>>>>>> The same problem exists at least in the Rd and LaTeX parsers in
>> tools.
>>>>>>>> */
>>>>>>>>
>>>>>>>> ```
>>>>>>>>
>>>>>>>> Could this be related to be issue ?
>>>>>>>>
>>>>>>>> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <[hidden email]>
>> a
>>>>>>>> écrit :
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> The behavior of
>>>>>>>>> ```
>>>>>>>>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>>>>>>>>> ```
>>>>>>>>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>>>>>>>>> depending on the string to be parsed.
>>>>>>>>>
>>>>>>>>> Trying to parse a string such as `"list(''=1+"` sets the
>>>>>>>>> `ParseStatus` to incomplete parsing error but trying to parse
>>>>>>>>> `"list(''=123"` will result in R sending a message to the console
>>>>>> (followed but a crash):
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> R[write to console]: Error: attempt to use zero-length variable
>>>>>> nameR[write to console]: Fatal error: unable to initialize the JIT***
>> stack
>>>>>> smashing detected ***: <unknown> terminated
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> Is there a reason for the difference in behavior, and is there a
>>>>>> workaround ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Laurent
>>>>>>>>>
>>>>>>>>>
>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel