String comparison, trailing blanks make a difference.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

String comparison, trailing blanks make a difference.

John McKown
Well, this was a shock to me. And I don't really see any documentation
about it, but perhaps I just can't see it.

>"abc" == "abc "
[1] FALSE

I guess that I thought of strings in R like I do is some other
languages where the shorter value is padded with blanks to the length
of the longer value, then compared. I.e. that trailing blanks didn't
matter.

The best solution that I have found is to use the str_trim() function
from the stringr to remove all the trailing blanks after I get the
data from the SQL data base. I cannot change the SQL schema to make
the column a varchar instead of a char column. It is a vendor DB. And
I don't know an ANSI SQL standard way to remove trailing blanks in the
SELECT command. PostgreSQL has a "trim(trailing ' ' from column)', but
MS-SQL upchucks on that syntax.

--
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: String comparison, trailing blanks make a difference.

William Dunlap
>>"abc" == "abc "
> [1] FALSE

R does no interpretation of strings when doing comparisons so you do
have do your own canonicalization.  That may involve removing
trailing, leading, or all white space or punctuation, converting to
lower or upper case, mapping nicknames to official names, trimming to
a fixed number of characters, etc.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jul 18, 2014 at 9:17 AM, John McKown
<[hidden email]> wrote:

> Well, this was a shock to me. And I don't really see any documentation
> about it, but perhaps I just can't see it.
>
>>"abc" == "abc "
> [1] FALSE
>
> I guess that I thought of strings in R like I do is some other
> languages where the shorter value is padded with blanks to the length
> of the longer value, then compared. I.e. that trailing blanks didn't
> matter.
>
> The best solution that I have found is to use the str_trim() function
> from the stringr to remove all the trailing blanks after I get the
> data from the SQL data base. I cannot change the SQL schema to make
> the column a varchar instead of a char column. It is a vendor DB. And
> I don't know an ANSI SQL standard way to remove trailing blanks in the
> SELECT command. PostgreSQL has a "trim(trailing ' ' from column)', but
> MS-SQL upchucks on that syntax.
>
> --
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
>
> Maranatha! <><
> John McKown
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: String comparison, trailing blanks make a difference.

Hervé Pagès
In reply to this post by John McKown
Hi John,

On 07/18/2014 09:17 AM, John McKown wrote:

> Well, this was a shock to me. And I don't really see any documentation
> about it, but perhaps I just can't see it.
>
>> "abc" == "abc"
> [1] FALSE
>
> I guess that I thought of strings in R like I do is some other
> languages where the shorter value is padded with blanks to the length
> of the longer value, then compared. I.e. that trailing blanks didn't
> matter.

The shock to me is to learn that some programming languages consider
strings "abc" and "abc " to be the same. Please name them so I can stay
away from them ;-)

Thanks,
H.

>
> The best solution that I have found is to use the str_trim() function
> from the stringr to remove all the trailing blanks after I get the
> data from the SQL data base. I cannot change the SQL schema to make
> the column a varchar instead of a char column. It is a vendor DB. And
> I don't know an ANSI SQL standard way to remove trailing blanks in the
> SELECT command. PostgreSQL has a "trim(trailing ' ' from column)', but
> MS-SQL upchucks on that syntax.
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: String comparison, trailing blanks make a difference.

John McKown
In reply to this post by John McKown
On Fri, Jul 18, 2014 at 11:17 AM, John McKown
<[hidden email]> wrote:

> Well, this was a shock to me. And I don't really see any documentation
> about it, but perhaps I just can't see it.
>
>>"abc" == "abc "
> [1] FALSE
>
> I guess that I thought of strings in R like I do is some other
> languages where the shorter value is padded with blanks to the length
> of the longer value, then compared. I.e. that trailing blanks didn't
> matter.
>
> The best solution that I have found is to use the str_trim() function
> from the stringr to remove all the trailing blanks after I get the
> data from the SQL data base. I cannot change the SQL schema to make
> the column a varchar instead of a char column. It is a vendor DB. And
> I don't know an ANSI SQL standard way to remove trailing blanks in the
> SELECT command. PostgreSQL has a "trim(trailing ' ' from column)', but
> MS-SQL upchucks on that syntax.
>

Well, here I am - talking to myself ... again.

My "problem" was, of course, of my own making. I am getting my data
via RODBC from MS-SQL Server. I was basically doing a "SELECT * FROM
TABLE". I normally use PostgreSQL, not MS-SQL, and I tend to use the
"TEXT" data type instead of CHAR or VARCHAR. So when I do the SELECT,
I get back my data without trailing blanks. Well, the data I am
reading now is created by a software vendor. I guess in order to be
database independent, the vendor designed his tables to have only
fixed length CHAR, and INT values in it. The fixed length CHAR values
are, naturally, padded on the right with blanks. Of course, now that I
understand this (weird as it is to me), I know to use a SELECT which
specifically lists the columns that I want _and_ does a TRIM() on them
to remove trailing blanks. This will reduce the size, in bytes, in my
data.frame and make it easier to use the comparison operators. Given
how the vendor saves the data, I am quite surprised that they didn't
use SQLite. The tables are simple. There are no "stored procedures",
no VIEWs, no use of SCHEMAs to make subsets. Basically they just want
a simple data store, with the ability to do _simple_ joins. SQLite
seems, to me, to be a better fit than requiring the user to have a
full blown RDMS such as MS-SQL or Oracle.

Well, thanks for the whack on the head to wake me up and make me
really look at my data.

--
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: String comparison, trailing blanks make a difference.

hadley wickham
In reply to this post by William Dunlap
If you have unicode strings, you may need to do even more because
there are often multiple ways of representing the same glyph. I made a
little demo at http://rpubs.com/hadley/unicode-normalisation, since
any unicode characters are likely to get mangled by email.

Hadley

On Fri, Jul 18, 2014 at 11:32 AM, William Dunlap <[hidden email]> wrote:

>>>"abc" == "abc "
>> [1] FALSE
>
> R does no interpretation of strings when doing comparisons so you do
> have do your own canonicalization.  That may involve removing
> trailing, leading, or all white space or punctuation, converting to
> lower or upper case, mapping nicknames to official names, trimming to
> a fixed number of characters, etc.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Fri, Jul 18, 2014 at 9:17 AM, John McKown
> <[hidden email]> wrote:
>> Well, this was a shock to me. And I don't really see any documentation
>> about it, but perhaps I just can't see it.
>>
>>>"abc" == "abc "
>> [1] FALSE
>>
>> I guess that I thought of strings in R like I do is some other
>> languages where the shorter value is padded with blanks to the length
>> of the longer value, then compared. I.e. that trailing blanks didn't
>> matter.
>>
>> The best solution that I have found is to use the str_trim() function
>> from the stringr to remove all the trailing blanks after I get the
>> data from the SQL data base. I cannot change the SQL schema to make
>> the column a varchar instead of a char column. It is a vendor DB. And
>> I don't know an ANSI SQL standard way to remove trailing blanks in the
>> SELECT command. PostgreSQL has a "trim(trailing ' ' from column)', but
>> MS-SQL upchucks on that syntax.
>>
>> --
>> There is nothing more pleasant than traveling and meeting new people!
>> Genghis Khan
>>
>> Maranatha! <><
>> John McKown
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.