regular expression help

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

regular expression help

Ashim Kapoor
Dear All,

My query is:

Do we always need to use perl = TRUE option when doing ignore.case=TRUE?

A small example :

my_text =
"RECOVERY OFFICER-II\nDEBTS RECOVERY TRIBUNAL-III\n  RC No. 162/2015\nSBI
VS RAMESH GUPTA.\n    Dated: 01.03.2016                   Item no.01\n
Present:   Ms. Sonakshi, the proxy counsel for Ms. Usha Singh, the counsel
for ARCIL.\n                None for the CDs.\n  The counsel for the CHFI
submitted that the matter has been assigned to ARCIL and deed of
assignment, application for substituting the name and vakalatnama has been
filed vide diary no. 1454 dated 08.02.2016\nIn the application it has been
prayed that ARCIL may be substituted in place of SBI for the purpose of
further proceedings in the matter. Request allowed.\nThe proxy counsel for
CHFI further requested to issue demand notice thereby mentioning the name
of ARCIL. Request allowed.\nRegistry is directed to issue fresh demand
notice mentioning the name of ARCIL.\nCHFI is directed to file status of
the mortgaged property as well as other assets of the CDs.\nList the case
on 28.03.2016.\n  (SUJEET KUMAR)\nRECOVERY OFFICER-II."

My regular expression is:

parties_present_start_1=
regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T)

parties_present_start_2=
regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE)

> parties_present_start_1
[1] 138
attr(,"match.length")
[1] 123
attr(,"useBytes")
[1] TRUE
> parties_present_start_2
[1] 20
attr(,"match.length")
[1] 949
attr(,"useBytes")
[1] TRUE
>

Why do I see the correct result only in the first case?

Best Regards,
Ashim

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regular expression help

Enrico Schumann-2

Zitat von Ashim Kapoor <[hidden email]>:

> Dear All,
>
> My query is:
>
> Do we always need to use perl = TRUE option when doing ignore.case=TRUE?
>
> A small example :
>
> my_text =
> "RECOVERY OFFICER-II\nDEBTS RECOVERY TRIBUNAL-III\n  RC No. 162/2015\nSBI
> VS RAMESH GUPTA.\n    Dated: 01.03.2016                   Item no.01\n
> Present:   Ms. Sonakshi, the proxy counsel for Ms. Usha Singh, the counsel
> for ARCIL.\n                None for the CDs.\n  The counsel for the CHFI
> submitted that the matter has been assigned to ARCIL and deed of
> assignment, application for substituting the name and vakalatnama has been
> filed vide diary no. 1454 dated 08.02.2016\nIn the application it has been
> prayed that ARCIL may be substituted in place of SBI for the purpose of
> further proceedings in the matter. Request allowed.\nThe proxy counsel for
> CHFI further requested to issue demand notice thereby mentioning the name
> of ARCIL. Request allowed.\nRegistry is directed to issue fresh demand
> notice mentioning the name of ARCIL.\nCHFI is directed to file status of
> the mortgaged property as well as other assets of the CDs.\nList the case
> on 28.03.2016.\n  (SUJEET KUMAR)\nRECOVERY OFFICER-II."
>
> My regular expression is:
>
> parties_present_start_1=
> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T)
>
> parties_present_start_2=
> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE)
>
>> parties_present_start_1
> [1] 138
> attr(,"match.length")
> [1] 123
> attr(,"useBytes")
> [1] TRUE
>> parties_present_start_2
> [1] 20
> attr(,"match.length")
> [1] 949
> attr(,"useBytes")
> [1] TRUE
>>
>
> Why do I see the correct result only in the first case?
>
> Best Regards,
> Ashim
>

In Perl, '.' matches anything but a newline.

In R, '.' matches any character.

   test <- "hello\n1"
   regexpr(".*[0-9]", test)
   ## [1] 1
   ## attr(,"match.length")
   ## [1] 7
   ## attr(,"useBytes")
   ## [1] TRUE

   regexpr(".*[0-9]", test, perl = TRUE)
   ## [1] 7
   ## attr(,"match.length")
   ## [1] 1
   ## attr(,"useBytes")
   ## [1] TRUE


--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regular expression help

Ashim Kapoor
Dear Enrico,

Many thanks and Best Regards,

Ashim.

On Thu, Jun 8, 2017 at 5:11 PM, Enrico Schumann <[hidden email]>
wrote:

>
> Zitat von Ashim Kapoor <[hidden email]>:
>
>
> Dear All,
>>
>> My query is:
>>
>> Do we always need to use perl = TRUE option when doing ignore.case=TRUE?
>>
>> A small example :
>>
>> my_text =
>> "RECOVERY OFFICER-II\nDEBTS RECOVERY TRIBUNAL-III\n  RC No. 162/2015\nSBI
>> VS RAMESH GUPTA.\n    Dated: 01.03.2016                   Item no.01\n
>> Present:   Ms. Sonakshi, the proxy counsel for Ms. Usha Singh, the counsel
>> for ARCIL.\n                None for the CDs.\n  The counsel for the CHFI
>> submitted that the matter has been assigned to ARCIL and deed of
>> assignment, application for substituting the name and vakalatnama has been
>> filed vide diary no. 1454 dated 08.02.2016\nIn the application it has been
>> prayed that ARCIL may be substituted in place of SBI for the purpose of
>> further proceedings in the matter. Request allowed.\nThe proxy counsel for
>> CHFI further requested to issue demand notice thereby mentioning the name
>> of ARCIL. Request allowed.\nRegistry is directed to issue fresh demand
>> notice mentioning the name of ARCIL.\nCHFI is directed to file status of
>> the mortgaged property as well as other assets of the CDs.\nList the case
>> on 28.03.2016.\n  (SUJEET KUMAR)\nRECOVERY OFFICER-II."
>>
>> My regular expression is:
>>
>> parties_present_start_1=
>> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T)
>>
>> parties_present_start_2=
>> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE)
>>
>> parties_present_start_1
>>>
>> [1] 138
>> attr(,"match.length")
>> [1] 123
>> attr(,"useBytes")
>> [1] TRUE
>>
>>> parties_present_start_2
>>>
>> [1] 20
>> attr(,"match.length")
>> [1] 949
>> attr(,"useBytes")
>> [1] TRUE
>>
>>>
>>>
>> Why do I see the correct result only in the first case?
>>
>> Best Regards,
>> Ashim
>>
>>
> In Perl, '.' matches anything but a newline.
>
> In R, '.' matches any character.
>
>   test <- "hello\n1"
>   regexpr(".*[0-9]", test)
>   ## [1] 1
>   ## attr(,"match.length")
>   ## [1] 7
>   ## attr(,"useBytes")
>   ## [1] TRUE
>
>   regexpr(".*[0-9]", test, perl = TRUE)
>   ## [1] 7
>   ## attr(,"match.length")
>   ## [1] 1
>   ## attr(,"useBytes")
>   ## [1] TRUE
>
>
> --
> Enrico Schumann
> Lucerne, Switzerland
> http://enricoschumann.net
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.