Problem with reading data from an UTF-16 database

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with reading data from an UTF-16 database

Thierry Onkelinx
Dear all,

We have a problem with reading some characters correctly from an UTF-16
encoded database. The code below givens the correct characters on Ubuntu
with the_driver = {ODBC Driver 13 for SQL Server}. On Windows (with
the_driver = {SQL Server}), special characters like '♂' and '♀' are
returned as '?'. I've added the sessionInfo() output from both machines.

Any suggestions on how to fix the problem?

Best regards,

Thierry

library(DBI)
con <- dbConnect(odbc::odbc(), .connection_string =
"Driver=the_drive;Server=our_server;Database=the_database;Trusted_Connection=Yes;")
dbGetQuery(con, sql_statement)


R version 3.4.2 (2017-09-28)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Belgium.1252  LC_CTYPE=Dutch_Belgium.1252
LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Belgium.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] RODBC_1.3-15 DBI_0.8

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14    R6_2.2.2        odbc_1.1.5      magrittr_1.5
pillar_1.1.0    rlang_0.2.0     testthat_2.0.0
 [8] blob_1.1.0      drat_0.1.3      fortunes_1.5-4  tools_3.4.2
 bit64_0.9-7     bit_1.1-12      hms_0.4.0
[15] yaml_2.1.14     compiler_3.4.2  pkgconfig_2.0.1 tibble_1.4.2


R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=nl_BE.UTF-8       LC_NUMERIC=C
 LC_TIME=nl_BE.UTF-8        LC_COLLATE=nl_BE.UTF-8
 [5] LC_MONETARY=nl_BE.UTF-8    LC_MESSAGES=nl_BE.UTF-8
LC_PAPER=nl_BE.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
 LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] DBI_0.7

loaded via a namespace (and not attached):
 [1] drat_0.1.3     bit_1.1-12     odbc_1.1.2     compiler_3.4.3 hms_0.3
    tools_3.4.3    pillar_1.2.1   tibble_1.4.2
 [9] yaml_2.1.17    Rcpp_0.12.14   bit64_0.9-7    blob_1.1.0
 rlang_0.2.0    fortunes_1.5-4



ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
[hidden email]
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Problem with reading data from an UTF-16 database

Duncan Murdoch-2
On 14/03/2018 7:15 AM, Thierry Onkelinx wrote:
> Dear all,
>
> We have a problem with reading some characters correctly from an UTF-16
> encoded database. The code below givens the correct characters on Ubuntu
> with the_driver = {ODBC Driver 13 for SQL Server}. On Windows (with
> the_driver = {SQL Server}), special characters like '♂' and '♀' are
> returned as '?'. I've added the sessionInfo() output from both machines.
>
> Any suggestions on how to fix the problem?

I haven't tried it, but the RODBC package includes an argument
DBMSencoding in the odbcDriverConnect function.  So maybe you could use
that instead of DBI and odbc.

Duncan Murdoch

>
> Best regards,
>
> Thierry
>
> library(DBI)
> con <- dbConnect(odbc::odbc(), .connection_string =
> "Driver=the_drive;Server=our_server;Database=the_database;Trusted_Connection=Yes;")
> dbGetQuery(con, sql_statement)
>
>
> R version 3.4.2 (2017-09-28)
> Platform: i386-w64-mingw32/i386 (32-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=Dutch_Belgium.1252  LC_CTYPE=Dutch_Belgium.1252
> LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C
> [5] LC_TIME=Dutch_Belgium.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] RODBC_1.3-15 DBI_0.8
>
> loaded via a namespace (and not attached):
>   [1] Rcpp_0.12.14    R6_2.2.2        odbc_1.1.5      magrittr_1.5
> pillar_1.1.0    rlang_0.2.0     testthat_2.0.0
>   [8] blob_1.1.0      drat_0.1.3      fortunes_1.5-4  tools_3.4.2
>   bit64_0.9-7     bit_1.1-12      hms_0.4.0
> [15] yaml_2.1.14     compiler_3.4.2  pkgconfig_2.0.1 tibble_1.4.2
>
>
> R version 3.4.3 (2017-11-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.4 LTS
>
> Matrix products: default
> BLAS: /usr/lib/libblas/libblas.so.3.6.0
> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
>
> locale:
>   [1] LC_CTYPE=nl_BE.UTF-8       LC_NUMERIC=C
>   LC_TIME=nl_BE.UTF-8        LC_COLLATE=nl_BE.UTF-8
>   [5] LC_MONETARY=nl_BE.UTF-8    LC_MESSAGES=nl_BE.UTF-8
> LC_PAPER=nl_BE.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>   LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] DBI_0.7
>
> loaded via a namespace (and not attached):
>   [1] drat_0.1.3     bit_1.1-12     odbc_1.1.2     compiler_3.4.3 hms_0.3
>      tools_3.4.3    pillar_1.2.1   tibble_1.4.2
>   [9] yaml_2.1.17    Rcpp_0.12.14   bit64_0.9-7    blob_1.1.0
>   rlang_0.2.0    fortunes_1.5-4
>
>
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> [hidden email]
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.