using read.csv2()

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

using read.csv2()

Voirin Pascale
Hello,

I have a problem with the variable type defined by reading a csv file with read.csv2.

Here is a test file saved as < test.csv > :
var1;var2;var3
TI;1995;4.5
VD;1990;4.8
FR;1994;3.9
VS;1993;5.1
FR;1995;4.7
FR;1992;5.8

That  I read in R with :
read.csv2("test.csv")->don;don
don$var3
## [1] 4.5 4.8 3.9 5.1 4.7 5.8
## Levels: 3.9 4.5 4.7 4.8 5.1 5.8

as.double(don$var3)
## [1] 2 4 1 5 3 6

Why is it by default a <levels> type ? And how can I get  the decimal value for var3

Thanks a lot for your answer.
With my best regards,

Pascale Voirin

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: using read.csv2()

Rainer Krug-3
Voirin Pascale <[hidden email]> writes:

> Hello,
>
> I have a problem with the variable type defined by reading a csv file with read.csv2.
>
> Here is a test file saved as < test.csv > :
> var1;var2;var3
> TI;1995;4.5
> VD;1990;4.8
> FR;1994;3.9
> VS;1993;5.1
> FR;1995;4.7
> FR;1992;5.8
>
> That  I read in R with :
> read.csv2("test.csv")->don;don
> don$var3
> ## [1] 4.5 4.8 3.9 5.1 4.7 5.8
> ## Levels: 3.9 4.5 4.7 4.8 5.1 5.8
>
> as.double(don$var3)
> ## [1] 2 4 1 5 3 6
>
> Why is it by default a <levels> type ? And how can I get  the decimal value for var3
You very likely have a character in your column named var3. Just check
your Levels after the import, and you should see it.

Cheers,

Rainer

>
> Thanks a lot for your answer.
> With my best regards,
>
> Pascale Voirin
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Rainer M. Krug
email: Rainer<at>krugs<dot>de
PGP: 0x0F52F982

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (463 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using read.csv2()

Duncan Murdoch-2
In reply to this post by Voirin Pascale
On 29/09/2016 4:59 AM, Voirin Pascale wrote:

> Hello,
>
> I have a problem with the variable type defined by reading a csv file with read.csv2.
>
> Here is a test file saved as < test.csv > :
> var1;var2;var3
> TI;1995;4.5
> VD;1990;4.8
> FR;1994;3.9
> VS;1993;5.1
> FR;1995;4.7
> FR;1992;5.8
>
> That  I read in R with :
> read.csv2("test.csv")->don;don
> don$var3
> ## [1] 4.5 4.8 3.9 5.1 4.7 5.8
> ## Levels: 3.9 4.5 4.7 4.8 5.1 5.8
>
> as.double(don$var3)
> ## [1] 2 4 1 5 3 6
>
> Why is it by default a <levels> type ? And how can I get  the decimal value for var3

It's a "factor".  read.csv2() defaults to a decimal separator of ","
rather than ".", so the last column doesn't look like numbers, and
they're being read as character strings, and then automatically
converted to a factor.  Reading as

read.csv2("test.csv", dec = ".")

should give you what you want, or you can convert after the fact with

as.numeric(as.character(don$var3))

Duncan Murdoch


>
> Thanks a lot for your answer.
> With my best regards,
>
> Pascale Voirin
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: using read.csv2()

Alain Guillet-2
In reply to this post by Voirin Pascale
Hello,

The defaults in read.csv2 are ";" as the separator and "," as the
decimal symbol. It seems that the file you import is not a true csv
since it mixes up two norms.

You can solve your problem in defining the dec option equals to ".":

read.csv2("test.csv",dec=".")->don


Alain

On 29/09/16 10:59, Voirin Pascale wrote:

> Hello,
>
> I have a problem with the variable type defined by reading a csv file with read.csv2.
>
> Here is a test file saved as < test.csv > :
> var1;var2;var3
> TI;1995;4.5
> VD;1990;4.8
> FR;1994;3.9
> VS;1993;5.1
> FR;1995;4.7
> FR;1992;5.8
>
> That  I read in R with :
> read.csv2("test.csv")->don;don
> don$var3
> ## [1] 4.5 4.8 3.9 5.1 4.7 5.8
> ## Levels: 3.9 4.5 4.7 4.8 5.1 5.8
>
> as.double(don$var3)
> ## [1] 2 4 1 5 3 6
>
> Why is it by default a <levels> type ? And how can I get  the decimal value for var3
>
> Thanks a lot for your answer.
> With my best regards,
>
> Pascale Voirin
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> .
>

--
Alain Guillet
Statistician and Computer Scientist

SMCS - IMMAQ - Université catholique de Louvain
http://www.uclouvain.be/smcs

Bureau c.316
Voie du Roman Pays, 20 (bte L1.04.01)
B-1348 Louvain-la-Neuve
Belgium

Tel: +32 10 47 30 50

Accès: http://www.uclouvain.be/323631.html

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: using read.csv2()

Peter Dalgaard-2
In reply to this post by Duncan Murdoch-2
> On 29 Sep 2016, at 11:40 , Duncan Murdoch <[hidden email]> wrote:
>
>
> It's a "factor".  read.csv2() defaults to a decimal separator of "," rather than ".", so the last column doesn't look like numbers, and they're being read as character strings, and then automatically converted to a factor.  Reading as
>

Yep, that's the whole point of read.csv2 -- someone stupidly decided (back in the 90's) that the use of comma as a decimal separator in some languages should extend to storage file formats.  That, of course, ruined the CSV standard use of comma as a field separator and prompted the double-standard situation where .csv files can be comma/period or semicolon/comma style, often depending on languages settings, which in turn can make data transfer between different languages an ungodly mess....

As Duncan points out, R provides settings that will (mostly) let you handle csv files that are written in hybrid formats like semicolon/period.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: using read.csv2()

Nordlund, Dan (DSHS/RDA)
In reply to this post by Alain Guillet-2
Or, you can just use read.csv with sep=';'

read.csv("test.csv", sep=';') -> don


Dan

Daniel Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services


> -----Original Message-----
> From: R-help [mailto:[hidden email]] On Behalf Of Alain
> Guillet
> Sent: Thursday, September 29, 2016 2:42 AM
> To: [hidden email]
> Subject: Re: [R] using read.csv2()
>
> Hello,
>
> The defaults in read.csv2 are ";" as the separator and "," as the decimal
> symbol. It seems that the file you import is not a true csv since it mixes up
> two norms.
>
> You can solve your problem in defining the dec option equals to ".":
>
> read.csv2("test.csv",dec=".")->don
>
>
> Alain
>
> On 29/09/16 10:59, Voirin Pascale wrote:
> > Hello,
> >
> > I have a problem with the variable type defined by reading a csv file with
> read.csv2.
> >
> > Here is a test file saved as < test.csv > :
> > var1;var2;var3
> > TI;1995;4.5
> > VD;1990;4.8
> > FR;1994;3.9
> > VS;1993;5.1
> > FR;1995;4.7
> > FR;1992;5.8
> >
> > That  I read in R with :
> > read.csv2("test.csv")->don;don
> > don$var3
> > ## [1] 4.5 4.8 3.9 5.1 4.7 5.8
> > ## Levels: 3.9 4.5 4.7 4.8 5.1 5.8
> >
> > as.double(don$var3)
> > ## [1] 2 4 1 5 3 6
> >
> > Why is it by default a <levels> type ? And how can I get  the decimal
> > value for var3
> >
> > Thanks a lot for your answer.
> > With my best regards,
> >
> > Pascale Voirin
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > .
> >
>
> --
> Alain Guillet
> Statistician and Computer Scientist
>
> SMCS - IMMAQ - Université catholique de Louvain
> http://www.uclouvain.be/smcs
>
> Bureau c.316
> Voie du Roman Pays, 20 (bte L1.04.01)
> B-1348 Louvain-la-Neuve
> Belgium
>
> Tel: +32 10 47 30 50
>
> Accès: http://www.uclouvain.be/323631.html
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: using read.csv2()

Rui Barradas
Hello,

No one mentioned that read.csv2 and read.csv are particular cases of  
read.table.


read.table(text = "
var1;var2;var3
TI;1995;4.5
VD;1990;4.8
FR;1994;3.9
VS;1993;5.1
FR;1995;4.7
FR;1992;5.8
", header = TRUE, sep = ";", dec = ".") -> don
str(don)


Rui Barradas


Citando Nordlund, Dan (DSHS/RDA) <[hidden email]>:

> Or, you can just use read.csv with sep=';'
>
> read.csv("test.csv", sep=';') -> don
>
>
> Dan
>
> Daniel Nordlund, PhD
> Research and Data Analysis Division
> Services & Enterprise Support Administration
> Washington State Department of Social and Health Services
>
>
>> -----Original Message-----
>> From: R-help [mailto:[hidden email]] On Behalf Of Alain
>> Guillet
>> Sent: Thursday, September 29, 2016 2:42 AM
>> To: [hidden email]
>> Subject: Re: [R] using read.csv2()
>>
>> Hello,
>>
>> The defaults in read.csv2 are ";" as the separator and "," as the decimal
>> symbol. It seems that the file you import is not a true csv since  
>> it mixes up
>> two norms.
>>
>> You can solve your problem in defining the dec option equals to ".":
>>
>> read.csv2("test.csv",dec=".")->don
>>
>>
>> Alain
>>
>> On 29/09/16 10:59, Voirin Pascale wrote:
>> > Hello,
>> >
>> > I have a problem with the variable type defined by reading a csv file with
>> read.csv2.
>> >
>> > Here is a test file saved as < test.csv > :
>> > var1;var2;var3
>> > TI;1995;4.5
>> > VD;1990;4.8
>> > FR;1994;3.9
>> > VS;1993;5.1
>> > FR;1995;4.7
>> > FR;1992;5.8
>> >
>> > That  I read in R with :
>> > read.csv2("test.csv")->don;don
>> > don$var3
>> > ## [1] 4.5 4.8 3.9 5.1 4.7 5.8
>> > ## Levels: 3.9 4.5 4.7 4.8 5.1 5.8
>> >
>> > as.double(don$var3)
>> > ## [1] 2 4 1 5 3 6
>> >
>> > Why is it by default a <levels> type ? And how can I get  the decimal
>> > value for var3
>> >
>> > Thanks a lot for your answer.
>> > With my best regards,
>> >
>> > Pascale Voirin
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> > .
>> >
>>
>> --
>> Alain Guillet
>> Statistician and Computer Scientist
>>
>> SMCS - IMMAQ - Université catholique de Louvain
>> http://www.uclouvain.be/smcs
>>
>> Bureau c.316
>> Voie du Roman Pays, 20 (bte L1.04.01)
>> B-1348 Louvain-la-Neuve
>> Belgium
>>
>> Tel: +32 10 47 30 50
>>
>> Accès: http://www.uclouvain.be/323631.html
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.