about data problem

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

about data problem

lily li
Hi R users,

I have a problem in reading data.
For example, part of my dataframe is like this:

df
month day year          Discharge
   3        1   2010                6.4
   3        2   2010               7.58
   3        3   2010               6.82
   3        4   2010               8.63
   3        5   2010               8.16
   3        6   2010               7.58

Then if I type summary(df), why it converts the discharge data to levels? I
also met the same problem when reading some other csv files. How to solve
this problem? Thanks.

Discharge
7.58     :2
6.4       :1
6.82     :1
8.63     :1
8.16     :1

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

glsnow
This indicates that your Discharge column has been stored/converted as
a factor (run str(df) to verify and check other columns).  This
usually happens when functions like read.table are left to try to
figure out what each column is and it finds something in that column
that cannot be converted to a number (possibly an oh instead of a
zero, an el instead of a one, or just a letter or punctuation mark
accidentally in the file).  You can either find the error in your
original data, fix it, and reread the data, or specify that the column
should be numeric using the colClasses argument to read.table or other
function.



On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]> wrote:

> Hi R users,
>
> I have a problem in reading data.
> For example, part of my dataframe is like this:
>
> df
> month day year          Discharge
>    3        1   2010                6.4
>    3        2   2010               7.58
>    3        3   2010               6.82
>    3        4   2010               8.63
>    3        5   2010               8.16
>    3        6   2010               7.58
>
> Then if I type summary(df), why it converts the discharge data to levels? I
> also met the same problem when reading some other csv files. How to solve
> this problem? Thanks.
>
> Discharge
> 7.58     :2
> 6.4       :1
> 6.82     :1
> 8.63     :1
> 8.16     :1
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Gregory (Greg) L. Snow Ph.D.
[hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

lily li
Yes, it is stored as factor. I can't check out any problem in the original
data. Reread data doesn't help either. I use read.csv to read in the data,
do you think it is better to use read.table? Thanks again.

On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <[hidden email]> wrote:

> This indicates that your Discharge column has been stored/converted as
> a factor (run str(df) to verify and check other columns).  This
> usually happens when functions like read.table are left to try to
> figure out what each column is and it finds something in that column
> that cannot be converted to a number (possibly an oh instead of a
> zero, an el instead of a one, or just a letter or punctuation mark
> accidentally in the file).  You can either find the error in your
> original data, fix it, and reread the data, or specify that the column
> should be numeric using the colClasses argument to read.table or other
> function.
>
>
>
> On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]> wrote:
> > Hi R users,
> >
> > I have a problem in reading data.
> > For example, part of my dataframe is like this:
> >
> > df
> > month day year          Discharge
> >    3        1   2010                6.4
> >    3        2   2010               7.58
> >    3        3   2010               6.82
> >    3        4   2010               8.63
> >    3        5   2010               8.16
> >    3        6   2010               7.58
> >
> > Then if I type summary(df), why it converts the discharge data to
> levels? I
> > also met the same problem when reading some other csv files. How to solve
> > this problem? Thanks.
> >
> > Discharge
> > 7.58     :2
> > 6.4       :1
> > 6.82     :1
> > 8.63     :1
> > 8.16     :1
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> [hidden email]
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

Jianling Fan
Add the "stringsAsFactors = F"  when you read the data, and then
convert them to numeric.

On 20 September 2016 at 16:00, lily li <[hidden email]> wrote:

> Yes, it is stored as factor. I can't check out any problem in the original
> data. Reread data doesn't help either. I use read.csv to read in the data,
> do you think it is better to use read.table? Thanks again.
>
> On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <[hidden email]> wrote:
>
>> This indicates that your Discharge column has been stored/converted as
>> a factor (run str(df) to verify and check other columns).  This
>> usually happens when functions like read.table are left to try to
>> figure out what each column is and it finds something in that column
>> that cannot be converted to a number (possibly an oh instead of a
>> zero, an el instead of a one, or just a letter or punctuation mark
>> accidentally in the file).  You can either find the error in your
>> original data, fix it, and reread the data, or specify that the column
>> should be numeric using the colClasses argument to read.table or other
>> function.
>>
>>
>>
>> On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]> wrote:
>> > Hi R users,
>> >
>> > I have a problem in reading data.
>> > For example, part of my dataframe is like this:
>> >
>> > df
>> > month day year          Discharge
>> >    3        1   2010                6.4
>> >    3        2   2010               7.58
>> >    3        3   2010               6.82
>> >    3        4   2010               8.63
>> >    3        5   2010               8.16
>> >    3        6   2010               7.58
>> >
>> > Then if I type summary(df), why it converts the discharge data to
>> levels? I
>> > also met the same problem when reading some other csv files. How to solve
>> > this problem? Thanks.
>> >
>> > Discharge
>> > 7.58     :2
>> > 6.4       :1
>> > 6.82     :1
>> > 8.63     :1
>> > 8.16     :1
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> [hidden email]
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Jianling Fan
樊建凌

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

lily li
I reread the data, and use 'na.rm = T' when reading the data. This time it
has no such problem. It seems that the existence of NAs convert the integer
to factor. Thanks for your help.


On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <[hidden email]> wrote:

> Add the "stringsAsFactors = F"  when you read the data, and then
> convert them to numeric.
>
> On 20 September 2016 at 16:00, lily li <[hidden email]> wrote:
> > Yes, it is stored as factor. I can't check out any problem in the
> original
> > data. Reread data doesn't help either. I use read.csv to read in the
> data,
> > do you think it is better to use read.table? Thanks again.
> >
> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <[hidden email]> wrote:
> >
> >> This indicates that your Discharge column has been stored/converted as
> >> a factor (run str(df) to verify and check other columns).  This
> >> usually happens when functions like read.table are left to try to
> >> figure out what each column is and it finds something in that column
> >> that cannot be converted to a number (possibly an oh instead of a
> >> zero, an el instead of a one, or just a letter or punctuation mark
> >> accidentally in the file).  You can either find the error in your
> >> original data, fix it, and reread the data, or specify that the column
> >> should be numeric using the colClasses argument to read.table or other
> >> function.
> >>
> >>
> >>
> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]> wrote:
> >> > Hi R users,
> >> >
> >> > I have a problem in reading data.
> >> > For example, part of my dataframe is like this:
> >> >
> >> > df
> >> > month day year          Discharge
> >> >    3        1   2010                6.4
> >> >    3        2   2010               7.58
> >> >    3        3   2010               6.82
> >> >    3        4   2010               8.63
> >> >    3        5   2010               8.16
> >> >    3        6   2010               7.58
> >> >
> >> > Then if I type summary(df), why it converts the discharge data to
> >> levels? I
> >> > also met the same problem when reading some other csv files. How to
> solve
> >> > this problem? Thanks.
> >> >
> >> > Discharge
> >> > 7.58     :2
> >> > 6.4       :1
> >> > 6.82     :1
> >> > 8.63     :1
> >> > 8.16     :1
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >> --
> >> Gregory (Greg) L. Snow Ph.D.
> >> [hidden email]
> >>
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jianling Fan
> 樊建凌
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

Jeff Newmiller
I suppose you can do what works for your data, but I wouldn't recommend na.rm=TRUE because it hides problems rather than clarifying them.

If in fact your data includes true NA values (the letters NA or simply nothing between the commas are typical ways this information may be indicated), then read.csv will NOT change from integer to factor (particularly if you have specified which markers represent NA using the na.strings argument documented under read.table)... so you probably DO have unexpected garbage still in your data which could be obscuring valuable information that could affect your conclusions.
--
Sent from my phone. Please excuse my brevity.

On September 20, 2016 3:11:42 PM PDT, lily li <[hidden email]> wrote:

>I reread the data, and use 'na.rm = T' when reading the data. This time
>it
>has no such problem. It seems that the existence of NAs convert the
>integer
>to factor. Thanks for your help.
>
>
>On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <[hidden email]>
>wrote:
>
>> Add the "stringsAsFactors = F"  when you read the data, and then
>> convert them to numeric.
>>
>> On 20 September 2016 at 16:00, lily li <[hidden email]> wrote:
>> > Yes, it is stored as factor. I can't check out any problem in the
>> original
>> > data. Reread data doesn't help either. I use read.csv to read in
>the
>> data,
>> > do you think it is better to use read.table? Thanks again.
>> >
>> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <[hidden email]>
>wrote:
>> >
>> >> This indicates that your Discharge column has been
>stored/converted as
>> >> a factor (run str(df) to verify and check other columns).  This
>> >> usually happens when functions like read.table are left to try to
>> >> figure out what each column is and it finds something in that
>column
>> >> that cannot be converted to a number (possibly an oh instead of a
>> >> zero, an el instead of a one, or just a letter or punctuation mark
>> >> accidentally in the file).  You can either find the error in your
>> >> original data, fix it, and reread the data, or specify that the
>column
>> >> should be numeric using the colClasses argument to read.table or
>other
>> >> function.
>> >>
>> >>
>> >>
>> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]>
>wrote:
>> >> > Hi R users,
>> >> >
>> >> > I have a problem in reading data.
>> >> > For example, part of my dataframe is like this:
>> >> >
>> >> > df
>> >> > month day year          Discharge
>> >> >    3        1   2010                6.4
>> >> >    3        2   2010               7.58
>> >> >    3        3   2010               6.82
>> >> >    3        4   2010               8.63
>> >> >    3        5   2010               8.16
>> >> >    3        6   2010               7.58
>> >> >
>> >> > Then if I type summary(df), why it converts the discharge data
>to
>> >> levels? I
>> >> > also met the same problem when reading some other csv files. How
>to
>> solve
>> >> > this problem? Thanks.
>> >> >
>> >> > Discharge
>> >> > 7.58     :2
>> >> > 6.4       :1
>> >> > 6.82     :1
>> >> > 8.63     :1
>> >> > 8.16     :1
>> >> >
>> >> >         [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
>see
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible
>code.
>> >>
>> >>
>> >>
>> >> --
>> >> Gregory (Greg) L. Snow Ph.D.
>> >> [hidden email]
>> >>
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Jianling Fan
>> 樊建凌
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

lily li
Thanks. Then what should I do to solve the problem?

On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <[hidden email]>
wrote:

> I suppose you can do what works for your data, but I wouldn't recommend
> na.rm=TRUE because it hides problems rather than clarifying them.
>
> If in fact your data includes true NA values (the letters NA or simply
> nothing between the commas are typical ways this information may be
> indicated), then read.csv will NOT change from integer to factor
> (particularly if you have specified which markers represent NA using the
> na.strings argument documented under read.table)... so you probably DO have
> unexpected garbage still in your data which could be obscuring valuable
> information that could affect your conclusions.
> --
> Sent from my phone. Please excuse my brevity.
>
> On September 20, 2016 3:11:42 PM PDT, lily li <[hidden email]> wrote:
> >I reread the data, and use 'na.rm = T' when reading the data. This time
> >it
> >has no such problem. It seems that the existence of NAs convert the
> >integer
> >to factor. Thanks for your help.
> >
> >
> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <[hidden email]>
> >wrote:
> >
> >> Add the "stringsAsFactors = F"  when you read the data, and then
> >> convert them to numeric.
> >>
> >> On 20 September 2016 at 16:00, lily li <[hidden email]> wrote:
> >> > Yes, it is stored as factor. I can't check out any problem in the
> >> original
> >> > data. Reread data doesn't help either. I use read.csv to read in
> >the
> >> data,
> >> > do you think it is better to use read.table? Thanks again.
> >> >
> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <[hidden email]>
> >wrote:
> >> >
> >> >> This indicates that your Discharge column has been
> >stored/converted as
> >> >> a factor (run str(df) to verify and check other columns).  This
> >> >> usually happens when functions like read.table are left to try to
> >> >> figure out what each column is and it finds something in that
> >column
> >> >> that cannot be converted to a number (possibly an oh instead of a
> >> >> zero, an el instead of a one, or just a letter or punctuation mark
> >> >> accidentally in the file).  You can either find the error in your
> >> >> original data, fix it, and reread the data, or specify that the
> >column
> >> >> should be numeric using the colClasses argument to read.table or
> >other
> >> >> function.
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]>
> >wrote:
> >> >> > Hi R users,
> >> >> >
> >> >> > I have a problem in reading data.
> >> >> > For example, part of my dataframe is like this:
> >> >> >
> >> >> > df
> >> >> > month day year          Discharge
> >> >> >    3        1   2010                6.4
> >> >> >    3        2   2010               7.58
> >> >> >    3        3   2010               6.82
> >> >> >    3        4   2010               8.63
> >> >> >    3        5   2010               8.16
> >> >> >    3        6   2010               7.58
> >> >> >
> >> >> > Then if I type summary(df), why it converts the discharge data
> >to
> >> >> levels? I
> >> >> > also met the same problem when reading some other csv files. How
> >to
> >> solve
> >> >> > this problem? Thanks.
> >> >> >
> >> >> > Discharge
> >> >> > 7.58     :2
> >> >> > 6.4       :1
> >> >> > 6.82     :1
> >> >> > 8.63     :1
> >> >> > 8.16     :1
> >> >> >
> >> >> >         [[alternative HTML version deleted]]
> >> >> >
> >> >> > ______________________________________________
> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
> >see
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> >> posting-guide.html
> >> >> > and provide commented, minimal, self-contained, reproducible
> >code.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Gregory (Greg) L. Snow Ph.D.
> >> >> [hidden email]
> >> >>
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >> --
> >> Jianling Fan
> >> 樊建凌
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

lily li
Is there a function in read.csv that I can use to avoid converting numeric
to factor? Thanks a lot.



On Tue, Sep 20, 2016 at 4:42 PM, lily li <[hidden email]> wrote:

> Thanks. Then what should I do to solve the problem?
>
> On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <[hidden email]>
> wrote:
>
>> I suppose you can do what works for your data, but I wouldn't recommend
>> na.rm=TRUE because it hides problems rather than clarifying them.
>>
>> If in fact your data includes true NA values (the letters NA or simply
>> nothing between the commas are typical ways this information may be
>> indicated), then read.csv will NOT change from integer to factor
>> (particularly if you have specified which markers represent NA using the
>> na.strings argument documented under read.table)... so you probably DO have
>> unexpected garbage still in your data which could be obscuring valuable
>> information that could affect your conclusions.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 20, 2016 3:11:42 PM PDT, lily li <[hidden email]>
>> wrote:
>> >I reread the data, and use 'na.rm = T' when reading the data. This time
>> >it
>> >has no such problem. It seems that the existence of NAs convert the
>> >integer
>> >to factor. Thanks for your help.
>> >
>> >
>> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <[hidden email]>
>> >wrote:
>> >
>> >> Add the "stringsAsFactors = F"  when you read the data, and then
>> >> convert them to numeric.
>> >>
>> >> On 20 September 2016 at 16:00, lily li <[hidden email]> wrote:
>> >> > Yes, it is stored as factor. I can't check out any problem in the
>> >> original
>> >> > data. Reread data doesn't help either. I use read.csv to read in
>> >the
>> >> data,
>> >> > do you think it is better to use read.table? Thanks again.
>> >> >
>> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <[hidden email]>
>> >wrote:
>> >> >
>> >> >> This indicates that your Discharge column has been
>> >stored/converted as
>> >> >> a factor (run str(df) to verify and check other columns).  This
>> >> >> usually happens when functions like read.table are left to try to
>> >> >> figure out what each column is and it finds something in that
>> >column
>> >> >> that cannot be converted to a number (possibly an oh instead of a
>> >> >> zero, an el instead of a one, or just a letter or punctuation mark
>> >> >> accidentally in the file).  You can either find the error in your
>> >> >> original data, fix it, and reread the data, or specify that the
>> >column
>> >> >> should be numeric using the colClasses argument to read.table or
>> >other
>> >> >> function.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]>
>> >wrote:
>> >> >> > Hi R users,
>> >> >> >
>> >> >> > I have a problem in reading data.
>> >> >> > For example, part of my dataframe is like this:
>> >> >> >
>> >> >> > df
>> >> >> > month day year          Discharge
>> >> >> >    3        1   2010                6.4
>> >> >> >    3        2   2010               7.58
>> >> >> >    3        3   2010               6.82
>> >> >> >    3        4   2010               8.63
>> >> >> >    3        5   2010               8.16
>> >> >> >    3        6   2010               7.58
>> >> >> >
>> >> >> > Then if I type summary(df), why it converts the discharge data
>> >to
>> >> >> levels? I
>> >> >> > also met the same problem when reading some other csv files. How
>> >to
>> >> solve
>> >> >> > this problem? Thanks.
>> >> >> >
>> >> >> > Discharge
>> >> >> > 7.58     :2
>> >> >> > 6.4       :1
>> >> >> > 6.82     :1
>> >> >> > 8.63     :1
>> >> >> > 8.16     :1
>> >> >> >
>> >> >> >         [[alternative HTML version deleted]]
>> >> >> >
>> >> >> > ______________________________________________
>> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
>> >see
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> >> posting-guide.html
>> >> >> > and provide commented, minimal, self-contained, reproducible
>> >code.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Gregory (Greg) L. Snow Ph.D.
>> >> >> [hidden email]
>> >> >>
>> >> >
>> >> >         [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >>
>> >> --
>> >> Jianling Fan
>> >> 樊建凌
>> >>
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> >______________________________________________
>> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

jCeradini
read.csv("your_data.csv", stringsAsFactors=FALSE)
(I'm just reiterating Jianling said...)

Joe

On Tue, Sep 20, 2016 at 4:56 PM, lily li <[hidden email]> wrote:

> Is there a function in read.csv that I can use to avoid converting numeric
> to factor? Thanks a lot.
>
>
>
> On Tue, Sep 20, 2016 at 4:42 PM, lily li <[hidden email]> wrote:
>
> > Thanks. Then what should I do to solve the problem?
> >
> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
> [hidden email]>
> > wrote:
> >
> >> I suppose you can do what works for your data, but I wouldn't recommend
> >> na.rm=TRUE because it hides problems rather than clarifying them.
> >>
> >> If in fact your data includes true NA values (the letters NA or simply
> >> nothing between the commas are typical ways this information may be
> >> indicated), then read.csv will NOT change from integer to factor
> >> (particularly if you have specified which markers represent NA using the
> >> na.strings argument documented under read.table)... so you probably DO
> have
> >> unexpected garbage still in your data which could be obscuring valuable
> >> information that could affect your conclusions.
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >>
> >> On September 20, 2016 3:11:42 PM PDT, lily li <[hidden email]>
> >> wrote:
> >> >I reread the data, and use 'na.rm = T' when reading the data. This time
> >> >it
> >> >has no such problem. It seems that the existence of NAs convert the
> >> >integer
> >> >to factor. Thanks for your help.
> >> >
> >> >
> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <[hidden email]>
> >> >wrote:
> >> >
> >> >> Add the "stringsAsFactors = F"  when you read the data, and then
> >> >> convert them to numeric.
> >> >>
> >> >> On 20 September 2016 at 16:00, lily li <[hidden email]> wrote:
> >> >> > Yes, it is stored as factor. I can't check out any problem in the
> >> >> original
> >> >> > data. Reread data doesn't help either. I use read.csv to read in
> >> >the
> >> >> data,
> >> >> > do you think it is better to use read.table? Thanks again.
> >> >> >
> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <[hidden email]>
> >> >wrote:
> >> >> >
> >> >> >> This indicates that your Discharge column has been
> >> >stored/converted as
> >> >> >> a factor (run str(df) to verify and check other columns).  This
> >> >> >> usually happens when functions like read.table are left to try to
> >> >> >> figure out what each column is and it finds something in that
> >> >column
> >> >> >> that cannot be converted to a number (possibly an oh instead of a
> >> >> >> zero, an el instead of a one, or just a letter or punctuation mark
> >> >> >> accidentally in the file).  You can either find the error in your
> >> >> >> original data, fix it, and reread the data, or specify that the
> >> >column
> >> >> >> should be numeric using the colClasses argument to read.table or
> >> >other
> >> >> >> function.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]>
> >> >wrote:
> >> >> >> > Hi R users,
> >> >> >> >
> >> >> >> > I have a problem in reading data.
> >> >> >> > For example, part of my dataframe is like this:
> >> >> >> >
> >> >> >> > df
> >> >> >> > month day year          Discharge
> >> >> >> >    3        1   2010                6.4
> >> >> >> >    3        2   2010               7.58
> >> >> >> >    3        3   2010               6.82
> >> >> >> >    3        4   2010               8.63
> >> >> >> >    3        5   2010               8.16
> >> >> >> >    3        6   2010               7.58
> >> >> >> >
> >> >> >> > Then if I type summary(df), why it converts the discharge data
> >> >to
> >> >> >> levels? I
> >> >> >> > also met the same problem when reading some other csv files. How
> >> >to
> >> >> solve
> >> >> >> > this problem? Thanks.
> >> >> >> >
> >> >> >> > Discharge
> >> >> >> > 7.58     :2
> >> >> >> > 6.4       :1
> >> >> >> > 6.82     :1
> >> >> >> > 8.63     :1
> >> >> >> > 8.16     :1
> >> >> >> >
> >> >> >> >         [[alternative HTML version deleted]]
> >> >> >> >
> >> >> >> > ______________________________________________
> >> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
> >> >see
> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> >> >> posting-guide.html
> >> >> >> > and provide commented, minimal, self-contained, reproducible
> >> >code.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Gregory (Greg) L. Snow Ph.D.
> >> >> >> [hidden email]
> >> >> >>
> >> >> >
> >> >> >         [[alternative HTML version deleted]]
> >> >> >
> >> >> > ______________________________________________
> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> >> posting-guide.html
> >> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jianling Fan
> >> >> 樊建凌
> >> >>
> >> >
> >> >       [[alternative HTML version deleted]]
> >> >
> >> >______________________________________________
> >> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> >https://stat.ethz.ch/mailman/listinfo/r-help
> >> >PLEASE do read the posting guide
> >> >http://www.R-project.org/posting-guide.html
> >> >and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




--
Cooperative Fish and Wildlife Research Unit
Zoology and Physiology Dept.
University of Wyoming
[hidden email] / 914.707.8506
wyocoopunit.org

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

Jeff Newmiller
In reply to this post by lily li
Find the offending data. One approach is to look at the input data with your image sensors and neural pattern processor (eyes and brain). One way to reduce the load on those told is to read in the data with the stringsAsFactors=TRUE argument and try manually converting the resulting character strings into numeric values. You can then use the is.na function to find which rows failed to convert and use indexing to review the strings that had trouble.

# I recommend against using df as a variable name, since it is the name of a function in base R
dta$DischargeNum <- as.numeric( dta$Discharge )
dta[ is.na( dta$DischargeNum ), "Discharge" ]
--
Sent from my phone. Please excuse my brevity.

On September 20, 2016 3:42:39 PM PDT, lily li <[hidden email]> wrote:

>Thanks. Then what should I do to solve the problem?
>
>On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller
><[hidden email]>
>wrote:
>
>> I suppose you can do what works for your data, but I wouldn't
>recommend
>> na.rm=TRUE because it hides problems rather than clarifying them.
>>
>> If in fact your data includes true NA values (the letters NA or
>simply
>> nothing between the commas are typical ways this information may be
>> indicated), then read.csv will NOT change from integer to factor
>> (particularly if you have specified which markers represent NA using
>the
>> na.strings argument documented under read.table)... so you probably
>DO have
>> unexpected garbage still in your data which could be obscuring
>valuable
>> information that could affect your conclusions.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 20, 2016 3:11:42 PM PDT, lily li <[hidden email]>
>wrote:
>> >I reread the data, and use 'na.rm = T' when reading the data. This
>time
>> >it
>> >has no such problem. It seems that the existence of NAs convert the
>> >integer
>> >to factor. Thanks for your help.
>> >
>> >
>> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
><[hidden email]>
>> >wrote:
>> >
>> >> Add the "stringsAsFactors = F"  when you read the data, and then
>> >> convert them to numeric.
>> >>
>> >> On 20 September 2016 at 16:00, lily li <[hidden email]>
>wrote:
>> >> > Yes, it is stored as factor. I can't check out any problem in
>the
>> >> original
>> >> > data. Reread data doesn't help either. I use read.csv to read in
>> >the
>> >> data,
>> >> > do you think it is better to use read.table? Thanks again.
>> >> >
>> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <[hidden email]>
>> >wrote:
>> >> >
>> >> >> This indicates that your Discharge column has been
>> >stored/converted as
>> >> >> a factor (run str(df) to verify and check other columns).  This
>> >> >> usually happens when functions like read.table are left to try
>to
>> >> >> figure out what each column is and it finds something in that
>> >column
>> >> >> that cannot be converted to a number (possibly an oh instead of
>a
>> >> >> zero, an el instead of a one, or just a letter or punctuation
>mark
>> >> >> accidentally in the file).  You can either find the error in
>your
>> >> >> original data, fix it, and reread the data, or specify that the
>> >column
>> >> >> should be numeric using the colClasses argument to read.table
>or
>> >other
>> >> >> function.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]>
>> >wrote:
>> >> >> > Hi R users,
>> >> >> >
>> >> >> > I have a problem in reading data.
>> >> >> > For example, part of my dataframe is like this:
>> >> >> >
>> >> >> > df
>> >> >> > month day year          Discharge
>> >> >> >    3        1   2010                6.4
>> >> >> >    3        2   2010               7.58
>> >> >> >    3        3   2010               6.82
>> >> >> >    3        4   2010               8.63
>> >> >> >    3        5   2010               8.16
>> >> >> >    3        6   2010               7.58
>> >> >> >
>> >> >> > Then if I type summary(df), why it converts the discharge
>data
>> >to
>> >> >> levels? I
>> >> >> > also met the same problem when reading some other csv files.
>How
>> >to
>> >> solve
>> >> >> > this problem? Thanks.
>> >> >> >
>> >> >> > Discharge
>> >> >> > 7.58     :2
>> >> >> > 6.4       :1
>> >> >> > 6.82     :1
>> >> >> > 8.63     :1
>> >> >> > 8.16     :1
>> >> >> >
>> >> >> >         [[alternative HTML version deleted]]
>> >> >> >
>> >> >> > ______________________________________________
>> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
>> >see
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> >> posting-guide.html
>> >> >> > and provide commented, minimal, self-contained, reproducible
>> >code.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Gregory (Greg) L. Snow Ph.D.
>> >> >> [hidden email]
>> >> >>
>> >> >
>> >> >         [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
>see
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible
>code.
>> >>
>> >>
>> >>
>> >> --
>> >> Jianling Fan
>> >> 樊建凌
>> >>
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> >______________________________________________
>> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

lily li
In reply to this post by jCeradini
Yes, I tried to add this statement when reading the dataset.
But when I use summary(df), it shows:
Discharge
Length:
Class  :character
Mode  :character


On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini <[hidden email]> wrote:

> read.csv("your_data.csv", stringsAsFactors=FALSE)
> (I'm just reiterating Jianling said...)
>
> Joe
>
> On Tue, Sep 20, 2016 at 4:56 PM, lily li <[hidden email]> wrote:
>
>> Is there a function in read.csv that I can use to avoid converting numeric
>> to factor? Thanks a lot.
>>
>>
>>
>> On Tue, Sep 20, 2016 at 4:42 PM, lily li <[hidden email]> wrote:
>>
>> > Thanks. Then what should I do to solve the problem?
>> >
>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
>> [hidden email]>
>> > wrote:
>> >
>> >> I suppose you can do what works for your data, but I wouldn't recommend
>> >> na.rm=TRUE because it hides problems rather than clarifying them.
>> >>
>> >> If in fact your data includes true NA values (the letters NA or simply
>> >> nothing between the commas are typical ways this information may be
>> >> indicated), then read.csv will NOT change from integer to factor
>> >> (particularly if you have specified which markers represent NA using
>> the
>> >> na.strings argument documented under read.table)... so you probably DO
>> have
>> >> unexpected garbage still in your data which could be obscuring valuable
>> >> information that could affect your conclusions.
>> >> --
>> >> Sent from my phone. Please excuse my brevity.
>> >>
>> >> On September 20, 2016 3:11:42 PM PDT, lily li <[hidden email]>
>> >> wrote:
>> >> >I reread the data, and use 'na.rm = T' when reading the data. This
>> time
>> >> >it
>> >> >has no such problem. It seems that the existence of NAs convert the
>> >> >integer
>> >> >to factor. Thanks for your help.
>> >> >
>> >> >
>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <[hidden email]>
>> >> >wrote:
>> >> >
>> >> >> Add the "stringsAsFactors = F"  when you read the data, and then
>> >> >> convert them to numeric.
>> >> >>
>> >> >> On 20 September 2016 at 16:00, lily li <[hidden email]> wrote:
>> >> >> > Yes, it is stored as factor. I can't check out any problem in the
>> >> >> original
>> >> >> > data. Reread data doesn't help either. I use read.csv to read in
>> >> >the
>> >> >> data,
>> >> >> > do you think it is better to use read.table? Thanks again.
>> >> >> >
>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <[hidden email]>
>> >> >wrote:
>> >> >> >
>> >> >> >> This indicates that your Discharge column has been
>> >> >stored/converted as
>> >> >> >> a factor (run str(df) to verify and check other columns).  This
>> >> >> >> usually happens when functions like read.table are left to try to
>> >> >> >> figure out what each column is and it finds something in that
>> >> >column
>> >> >> >> that cannot be converted to a number (possibly an oh instead of a
>> >> >> >> zero, an el instead of a one, or just a letter or punctuation
>> mark
>> >> >> >> accidentally in the file).  You can either find the error in your
>> >> >> >> original data, fix it, and reread the data, or specify that the
>> >> >column
>> >> >> >> should be numeric using the colClasses argument to read.table or
>> >> >other
>> >> >> >> function.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]>
>> >> >wrote:
>> >> >> >> > Hi R users,
>> >> >> >> >
>> >> >> >> > I have a problem in reading data.
>> >> >> >> > For example, part of my dataframe is like this:
>> >> >> >> >
>> >> >> >> > df
>> >> >> >> > month day year          Discharge
>> >> >> >> >    3        1   2010                6.4
>> >> >> >> >    3        2   2010               7.58
>> >> >> >> >    3        3   2010               6.82
>> >> >> >> >    3        4   2010               8.63
>> >> >> >> >    3        5   2010               8.16
>> >> >> >> >    3        6   2010               7.58
>> >> >> >> >
>> >> >> >> > Then if I type summary(df), why it converts the discharge data
>> >> >to
>> >> >> >> levels? I
>> >> >> >> > also met the same problem when reading some other csv files.
>> How
>> >> >to
>> >> >> solve
>> >> >> >> > this problem? Thanks.
>> >> >> >> >
>> >> >> >> > Discharge
>> >> >> >> > 7.58     :2
>> >> >> >> > 6.4       :1
>> >> >> >> > 6.82     :1
>> >> >> >> > 8.63     :1
>> >> >> >> > 8.16     :1
>> >> >> >> >
>> >> >> >> >         [[alternative HTML version deleted]]
>> >> >> >> >
>> >> >> >> > ______________________________________________
>> >> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
>> >> >see
>> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> >> >> posting-guide.html
>> >> >> >> > and provide commented, minimal, self-contained, reproducible
>> >> >code.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Gregory (Greg) L. Snow Ph.D.
>> >> >> >> [hidden email]
>> >> >> >>
>> >> >> >
>> >> >> >         [[alternative HTML version deleted]]
>> >> >> >
>> >> >> > ______________________________________________
>> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> >> posting-guide.html
>> >> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Jianling Fan
>> >> >> 樊建凌
>> >> >>
>> >> >
>> >> >       [[alternative HTML version deleted]]
>> >> >
>> >> >______________________________________________
>> >> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >PLEASE do read the posting guide
>> >> >http://www.R-project.org/posting-guide.html
>> >> >and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Cooperative Fish and Wildlife Research Unit
> Zoology and Physiology Dept.
> University of Wyoming
> [hidden email] / 914.707.8506
> wyocoopunit.org
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

Jeff Newmiller
Which means it avoided converting to factor... Success!

Note that the column apparently has garbage characters in one or more of the rows, which should be evident when you LOOK AT THE CHARACTERS in the column. They should all be numeric symbols, plus or minus, and perhaps decimal points. If they are not, then the conversion to numeric will be incomplete. See my other message. You have the choice of editing the file (may have concerns with traceability), or you can write R code that removes the garbage characters using gsub.
--
Sent from my phone. Please excuse my brevity.

On September 20, 2016 4:09:02 PM PDT, lily li <[hidden email]> wrote:

>Yes, I tried to add this statement when reading the dataset.
>But when I use summary(df), it shows:
>Discharge
>Length:
>Class  :character
>Mode  :character
>
>
>On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini <[hidden email]>
>wrote:
>
>> read.csv("your_data.csv", stringsAsFactors=FALSE)
>> (I'm just reiterating Jianling said...)
>>
>> Joe
>>
>> On Tue, Sep 20, 2016 at 4:56 PM, lily li <[hidden email]> wrote:
>>
>>> Is there a function in read.csv that I can use to avoid converting
>numeric
>>> to factor? Thanks a lot.
>>>
>>>
>>>
>>> On Tue, Sep 20, 2016 at 4:42 PM, lily li <[hidden email]>
>wrote:
>>>
>>> > Thanks. Then what should I do to solve the problem?
>>> >
>>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
>>> [hidden email]>
>>> > wrote:
>>> >
>>> >> I suppose you can do what works for your data, but I wouldn't
>recommend
>>> >> na.rm=TRUE because it hides problems rather than clarifying them.
>>> >>
>>> >> If in fact your data includes true NA values (the letters NA or
>simply
>>> >> nothing between the commas are typical ways this information may
>be
>>> >> indicated), then read.csv will NOT change from integer to factor
>>> >> (particularly if you have specified which markers represent NA
>using
>>> the
>>> >> na.strings argument documented under read.table)... so you
>probably DO
>>> have
>>> >> unexpected garbage still in your data which could be obscuring
>valuable
>>> >> information that could affect your conclusions.
>>> >> --
>>> >> Sent from my phone. Please excuse my brevity.
>>> >>
>>> >> On September 20, 2016 3:11:42 PM PDT, lily li
><[hidden email]>
>>> >> wrote:
>>> >> >I reread the data, and use 'na.rm = T' when reading the data.
>This
>>> time
>>> >> >it
>>> >> >has no such problem. It seems that the existence of NAs convert
>the
>>> >> >integer
>>> >> >to factor. Thanks for your help.
>>> >> >
>>> >> >
>>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
><[hidden email]>
>>> >> >wrote:
>>> >> >
>>> >> >> Add the "stringsAsFactors = F"  when you read the data, and
>then
>>> >> >> convert them to numeric.
>>> >> >>
>>> >> >> On 20 September 2016 at 16:00, lily li <[hidden email]>
>wrote:
>>> >> >> > Yes, it is stored as factor. I can't check out any problem
>in the
>>> >> >> original
>>> >> >> > data. Reread data doesn't help either. I use read.csv to
>read in
>>> >> >the
>>> >> >> data,
>>> >> >> > do you think it is better to use read.table? Thanks again.
>>> >> >> >
>>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow
><[hidden email]>
>>> >> >wrote:
>>> >> >> >
>>> >> >> >> This indicates that your Discharge column has been
>>> >> >stored/converted as
>>> >> >> >> a factor (run str(df) to verify and check other columns).
>This
>>> >> >> >> usually happens when functions like read.table are left to
>try to
>>> >> >> >> figure out what each column is and it finds something in
>that
>>> >> >column
>>> >> >> >> that cannot be converted to a number (possibly an oh
>instead of a
>>> >> >> >> zero, an el instead of a one, or just a letter or
>punctuation
>>> mark
>>> >> >> >> accidentally in the file).  You can either find the error
>in your
>>> >> >> >> original data, fix it, and reread the data, or specify that
>the
>>> >> >column
>>> >> >> >> should be numeric using the colClasses argument to
>read.table or
>>> >> >other
>>> >> >> >> function.
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li
><[hidden email]>
>>> >> >wrote:
>>> >> >> >> > Hi R users,
>>> >> >> >> >
>>> >> >> >> > I have a problem in reading data.
>>> >> >> >> > For example, part of my dataframe is like this:
>>> >> >> >> >
>>> >> >> >> > df
>>> >> >> >> > month day year          Discharge
>>> >> >> >> >    3        1   2010                6.4
>>> >> >> >> >    3        2   2010               7.58
>>> >> >> >> >    3        3   2010               6.82
>>> >> >> >> >    3        4   2010               8.63
>>> >> >> >> >    3        5   2010               8.16
>>> >> >> >> >    3        6   2010               7.58
>>> >> >> >> >
>>> >> >> >> > Then if I type summary(df), why it converts the discharge
>data
>>> >> >to
>>> >> >> >> levels? I
>>> >> >> >> > also met the same problem when reading some other csv
>files.
>>> How
>>> >> >to
>>> >> >> solve
>>> >> >> >> > this problem? Thanks.
>>> >> >> >> >
>>> >> >> >> > Discharge
>>> >> >> >> > 7.58     :2
>>> >> >> >> > 6.4       :1
>>> >> >> >> > 6.82     :1
>>> >> >> >> > 8.63     :1
>>> >> >> >> > 8.16     :1
>>> >> >> >> >
>>> >> >> >> >         [[alternative HTML version deleted]]
>>> >> >> >> >
>>> >> >> >> > ______________________________________________
>>> >> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and
>more,
>>> >> >see
>>> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> >> >> > PLEASE do read the posting guide
>http://www.R-project.org/
>>> >> >> >> posting-guide.html
>>> >> >> >> > and provide commented, minimal, self-contained,
>reproducible
>>> >> >code.
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Gregory (Greg) L. Snow Ph.D.
>>> >> >> >> [hidden email]
>>> >> >> >>
>>> >> >> >
>>> >> >> >         [[alternative HTML version deleted]]
>>> >> >> >
>>> >> >> > ______________________________________________
>>> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and
>more, see
>>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> >> > PLEASE do read the posting guide http://www.R-project.org/
>>> >> >> posting-guide.html
>>> >> >> > and provide commented, minimal, self-contained, reproducible
>code.
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Jianling Fan
>>> >> >> 樊建凌
>>> >> >>
>>> >> >
>>> >> >       [[alternative HTML version deleted]]
>>> >> >
>>> >> >______________________________________________
>>> >> >[hidden email] mailing list -- To UNSUBSCRIBE and more,
>see
>>> >> >https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> >PLEASE do read the posting guide
>>> >> >http://www.R-project.org/posting-guide.html
>>> >> >and provide commented, minimal, self-contained, reproducible
>code.
>>> >>
>>> >>
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> --
>> Cooperative Fish and Wildlife Research Unit
>> Zoology and Physiology Dept.
>> University of Wyoming
>> [hidden email] / 914.707.8506
>> wyocoopunit.org
>>
>>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

lily li
Thanks. The former method works. I confused character with factor.

Besides, I should use: dta$DischargeNum <- as.numeric( dta$Discharge )
instead of: dta$Discharge <- as.numeric( dta$Discharge )


On Tue, Sep 20, 2016 at 5:18 PM, Jeff Newmiller <[hidden email]>
wrote:

> Which means it avoided converting to factor... Success!
>
> Note that the column apparently has garbage characters in one or more of
> the rows, which should be evident when you LOOK AT THE CHARACTERS in the
> column. They should all be numeric symbols, plus or minus, and perhaps
> decimal points. If they are not, then the conversion to numeric will be
> incomplete. See my other message. You have the choice of editing the file
> (may have concerns with traceability), or you can write R code that removes
> the garbage characters using gsub.
> --
> Sent from my phone. Please excuse my brevity.
>
> On September 20, 2016 4:09:02 PM PDT, lily li <[hidden email]> wrote:
> >Yes, I tried to add this statement when reading the dataset.
> >But when I use summary(df), it shows:
> >Discharge
> >Length:
> >Class  :character
> >Mode  :character
> >
> >
> >On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini <[hidden email]>
> >wrote:
> >
> >> read.csv("your_data.csv", stringsAsFactors=FALSE)
> >> (I'm just reiterating Jianling said...)
> >>
> >> Joe
> >>
> >> On Tue, Sep 20, 2016 at 4:56 PM, lily li <[hidden email]> wrote:
> >>
> >>> Is there a function in read.csv that I can use to avoid converting
> >numeric
> >>> to factor? Thanks a lot.
> >>>
> >>>
> >>>
> >>> On Tue, Sep 20, 2016 at 4:42 PM, lily li <[hidden email]>
> >wrote:
> >>>
> >>> > Thanks. Then what should I do to solve the problem?
> >>> >
> >>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
> >>> [hidden email]>
> >>> > wrote:
> >>> >
> >>> >> I suppose you can do what works for your data, but I wouldn't
> >recommend
> >>> >> na.rm=TRUE because it hides problems rather than clarifying them.
> >>> >>
> >>> >> If in fact your data includes true NA values (the letters NA or
> >simply
> >>> >> nothing between the commas are typical ways this information may
> >be
> >>> >> indicated), then read.csv will NOT change from integer to factor
> >>> >> (particularly if you have specified which markers represent NA
> >using
> >>> the
> >>> >> na.strings argument documented under read.table)... so you
> >probably DO
> >>> have
> >>> >> unexpected garbage still in your data which could be obscuring
> >valuable
> >>> >> information that could affect your conclusions.
> >>> >> --
> >>> >> Sent from my phone. Please excuse my brevity.
> >>> >>
> >>> >> On September 20, 2016 3:11:42 PM PDT, lily li
> ><[hidden email]>
> >>> >> wrote:
> >>> >> >I reread the data, and use 'na.rm = T' when reading the data.
> >This
> >>> time
> >>> >> >it
> >>> >> >has no such problem. It seems that the existence of NAs convert
> >the
> >>> >> >integer
> >>> >> >to factor. Thanks for your help.
> >>> >> >
> >>> >> >
> >>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
> ><[hidden email]>
> >>> >> >wrote:
> >>> >> >
> >>> >> >> Add the "stringsAsFactors = F"  when you read the data, and
> >then
> >>> >> >> convert them to numeric.
> >>> >> >>
> >>> >> >> On 20 September 2016 at 16:00, lily li <[hidden email]>
> >wrote:
> >>> >> >> > Yes, it is stored as factor. I can't check out any problem
> >in the
> >>> >> >> original
> >>> >> >> > data. Reread data doesn't help either. I use read.csv to
> >read in
> >>> >> >the
> >>> >> >> data,
> >>> >> >> > do you think it is better to use read.table? Thanks again.
> >>> >> >> >
> >>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow
> ><[hidden email]>
> >>> >> >wrote:
> >>> >> >> >
> >>> >> >> >> This indicates that your Discharge column has been
> >>> >> >stored/converted as
> >>> >> >> >> a factor (run str(df) to verify and check other columns).
> >This
> >>> >> >> >> usually happens when functions like read.table are left to
> >try to
> >>> >> >> >> figure out what each column is and it finds something in
> >that
> >>> >> >column
> >>> >> >> >> that cannot be converted to a number (possibly an oh
> >instead of a
> >>> >> >> >> zero, an el instead of a one, or just a letter or
> >punctuation
> >>> mark
> >>> >> >> >> accidentally in the file).  You can either find the error
> >in your
> >>> >> >> >> original data, fix it, and reread the data, or specify that
> >the
> >>> >> >column
> >>> >> >> >> should be numeric using the colClasses argument to
> >read.table or
> >>> >> >other
> >>> >> >> >> function.
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li
> ><[hidden email]>
> >>> >> >wrote:
> >>> >> >> >> > Hi R users,
> >>> >> >> >> >
> >>> >> >> >> > I have a problem in reading data.
> >>> >> >> >> > For example, part of my dataframe is like this:
> >>> >> >> >> >
> >>> >> >> >> > df
> >>> >> >> >> > month day year          Discharge
> >>> >> >> >> >    3        1   2010                6.4
> >>> >> >> >> >    3        2   2010               7.58
> >>> >> >> >> >    3        3   2010               6.82
> >>> >> >> >> >    3        4   2010               8.63
> >>> >> >> >> >    3        5   2010               8.16
> >>> >> >> >> >    3        6   2010               7.58
> >>> >> >> >> >
> >>> >> >> >> > Then if I type summary(df), why it converts the discharge
> >data
> >>> >> >to
> >>> >> >> >> levels? I
> >>> >> >> >> > also met the same problem when reading some other csv
> >files.
> >>> How
> >>> >> >to
> >>> >> >> solve
> >>> >> >> >> > this problem? Thanks.
> >>> >> >> >> >
> >>> >> >> >> > Discharge
> >>> >> >> >> > 7.58     :2
> >>> >> >> >> > 6.4       :1
> >>> >> >> >> > 6.82     :1
> >>> >> >> >> > 8.63     :1
> >>> >> >> >> > 8.16     :1
> >>> >> >> >> >
> >>> >> >> >> >         [[alternative HTML version deleted]]
> >>> >> >> >> >
> >>> >> >> >> > ______________________________________________
> >>> >> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and
> >more,
> >>> >> >see
> >>> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >> >> >> > PLEASE do read the posting guide
> >http://www.R-project.org/
> >>> >> >> >> posting-guide.html
> >>> >> >> >> > and provide commented, minimal, self-contained,
> >reproducible
> >>> >> >code.
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> --
> >>> >> >> >> Gregory (Greg) L. Snow Ph.D.
> >>> >> >> >> [hidden email]
> >>> >> >> >>
> >>> >> >> >
> >>> >> >> >         [[alternative HTML version deleted]]
> >>> >> >> >
> >>> >> >> > ______________________________________________
> >>> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and
> >more, see
> >>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >> >> > PLEASE do read the posting guide http://www.R-project.org/
> >>> >> >> posting-guide.html
> >>> >> >> > and provide commented, minimal, self-contained, reproducible
> >code.
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> --
> >>> >> >> Jianling Fan
> >>> >> >> 樊建凌
> >>> >> >>
> >>> >> >
> >>> >> >       [[alternative HTML version deleted]]
> >>> >> >
> >>> >> >______________________________________________
> >>> >> >[hidden email] mailing list -- To UNSUBSCRIBE and more,
> >see
> >>> >> >https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >> >PLEASE do read the posting guide
> >>> >> >http://www.R-project.org/posting-guide.html
> >>> >> >and provide commented, minimal, self-contained, reproducible
> >code.
> >>> >>
> >>> >>
> >>> >
> >>>
> >>>         [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posti
> >>> ng-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >>
> >> --
> >> Cooperative Fish and Wildlife Research Unit
> >> Zoology and Physiology Dept.
> >> University of Wyoming
> >> [hidden email] / 914.707.8506
> >> wyocoopunit.org
> >>
> >>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

Jeff Newmiller
You can use the latter IF you know there are no problems with the input data. If you need to troubleshoot then you need separate columns so you can compare them.
--
Sent from my phone. Please excuse my brevity.

On September 20, 2016 4:22:41 PM PDT, lily li <[hidden email]> wrote:

>Thanks. The former method works. I confused character with factor.
>
>Besides, I should use: dta$DischargeNum <- as.numeric( dta$Discharge )
>instead of: dta$Discharge <- as.numeric( dta$Discharge )
>
>
>On Tue, Sep 20, 2016 at 5:18 PM, Jeff Newmiller
><[hidden email]>
>wrote:
>
>> Which means it avoided converting to factor... Success!
>>
>> Note that the column apparently has garbage characters in one or more
>of
>> the rows, which should be evident when you LOOK AT THE CHARACTERS in
>the
>> column. They should all be numeric symbols, plus or minus, and
>perhaps
>> decimal points. If they are not, then the conversion to numeric will
>be
>> incomplete. See my other message. You have the choice of editing the
>file
>> (may have concerns with traceability), or you can write R code that
>removes
>> the garbage characters using gsub.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 20, 2016 4:09:02 PM PDT, lily li <[hidden email]>
>wrote:
>> >Yes, I tried to add this statement when reading the dataset.
>> >But when I use summary(df), it shows:
>> >Discharge
>> >Length:
>> >Class  :character
>> >Mode  :character
>> >
>> >
>> >On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini
><[hidden email]>
>> >wrote:
>> >
>> >> read.csv("your_data.csv", stringsAsFactors=FALSE)
>> >> (I'm just reiterating Jianling said...)
>> >>
>> >> Joe
>> >>
>> >> On Tue, Sep 20, 2016 at 4:56 PM, lily li <[hidden email]>
>wrote:
>> >>
>> >>> Is there a function in read.csv that I can use to avoid
>converting
>> >numeric
>> >>> to factor? Thanks a lot.
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Sep 20, 2016 at 4:42 PM, lily li <[hidden email]>
>> >wrote:
>> >>>
>> >>> > Thanks. Then what should I do to solve the problem?
>> >>> >
>> >>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
>> >>> [hidden email]>
>> >>> > wrote:
>> >>> >
>> >>> >> I suppose you can do what works for your data, but I wouldn't
>> >recommend
>> >>> >> na.rm=TRUE because it hides problems rather than clarifying
>them.
>> >>> >>
>> >>> >> If in fact your data includes true NA values (the letters NA
>or
>> >simply
>> >>> >> nothing between the commas are typical ways this information
>may
>> >be
>> >>> >> indicated), then read.csv will NOT change from integer to
>factor
>> >>> >> (particularly if you have specified which markers represent NA
>> >using
>> >>> the
>> >>> >> na.strings argument documented under read.table)... so you
>> >probably DO
>> >>> have
>> >>> >> unexpected garbage still in your data which could be obscuring
>> >valuable
>> >>> >> information that could affect your conclusions.
>> >>> >> --
>> >>> >> Sent from my phone. Please excuse my brevity.
>> >>> >>
>> >>> >> On September 20, 2016 3:11:42 PM PDT, lily li
>> ><[hidden email]>
>> >>> >> wrote:
>> >>> >> >I reread the data, and use 'na.rm = T' when reading the data.
>> >This
>> >>> time
>> >>> >> >it
>> >>> >> >has no such problem. It seems that the existence of NAs
>convert
>> >the
>> >>> >> >integer
>> >>> >> >to factor. Thanks for your help.
>> >>> >> >
>> >>> >> >
>> >>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
>> ><[hidden email]>
>> >>> >> >wrote:
>> >>> >> >
>> >>> >> >> Add the "stringsAsFactors = F"  when you read the data, and
>> >then
>> >>> >> >> convert them to numeric.
>> >>> >> >>
>> >>> >> >> On 20 September 2016 at 16:00, lily li
><[hidden email]>
>> >wrote:
>> >>> >> >> > Yes, it is stored as factor. I can't check out any
>problem
>> >in the
>> >>> >> >> original
>> >>> >> >> > data. Reread data doesn't help either. I use read.csv to
>> >read in
>> >>> >> >the
>> >>> >> >> data,
>> >>> >> >> > do you think it is better to use read.table? Thanks
>again.
>> >>> >> >> >
>> >>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow
>> ><[hidden email]>
>> >>> >> >wrote:
>> >>> >> >> >
>> >>> >> >> >> This indicates that your Discharge column has been
>> >>> >> >stored/converted as
>> >>> >> >> >> a factor (run str(df) to verify and check other
>columns).
>> >This
>> >>> >> >> >> usually happens when functions like read.table are left
>to
>> >try to
>> >>> >> >> >> figure out what each column is and it finds something in
>> >that
>> >>> >> >column
>> >>> >> >> >> that cannot be converted to a number (possibly an oh
>> >instead of a
>> >>> >> >> >> zero, an el instead of a one, or just a letter or
>> >punctuation
>> >>> mark
>> >>> >> >> >> accidentally in the file).  You can either find the
>error
>> >in your
>> >>> >> >> >> original data, fix it, and reread the data, or specify
>that
>> >the
>> >>> >> >column
>> >>> >> >> >> should be numeric using the colClasses argument to
>> >read.table or
>> >>> >> >other
>> >>> >> >> >> function.
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li
>> ><[hidden email]>
>> >>> >> >wrote:
>> >>> >> >> >> > Hi R users,
>> >>> >> >> >> >
>> >>> >> >> >> > I have a problem in reading data.
>> >>> >> >> >> > For example, part of my dataframe is like this:
>> >>> >> >> >> >
>> >>> >> >> >> > df
>> >>> >> >> >> > month day year          Discharge
>> >>> >> >> >> >    3        1   2010                6.4
>> >>> >> >> >> >    3        2   2010               7.58
>> >>> >> >> >> >    3        3   2010               6.82
>> >>> >> >> >> >    3        4   2010               8.63
>> >>> >> >> >> >    3        5   2010               8.16
>> >>> >> >> >> >    3        6   2010               7.58
>> >>> >> >> >> >
>> >>> >> >> >> > Then if I type summary(df), why it converts the
>discharge
>> >data
>> >>> >> >to
>> >>> >> >> >> levels? I
>> >>> >> >> >> > also met the same problem when reading some other csv
>> >files.
>> >>> How
>> >>> >> >to
>> >>> >> >> solve
>> >>> >> >> >> > this problem? Thanks.
>> >>> >> >> >> >
>> >>> >> >> >> > Discharge
>> >>> >> >> >> > 7.58     :2
>> >>> >> >> >> > 6.4       :1
>> >>> >> >> >> > 6.82     :1
>> >>> >> >> >> > 8.63     :1
>> >>> >> >> >> > 8.16     :1
>> >>> >> >> >> >
>> >>> >> >> >> >         [[alternative HTML version deleted]]
>> >>> >> >> >> >
>> >>> >> >> >> > ______________________________________________
>> >>> >> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE
>and
>> >more,
>> >>> >> >see
>> >>> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >> >> >> > PLEASE do read the posting guide
>> >http://www.R-project.org/
>> >>> >> >> >> posting-guide.html
>> >>> >> >> >> > and provide commented, minimal, self-contained,
>> >reproducible
>> >>> >> >code.
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> Gregory (Greg) L. Snow Ph.D.
>> >>> >> >> >> [hidden email]
>> >>> >> >> >>
>> >>> >> >> >
>> >>> >> >> >         [[alternative HTML version deleted]]
>> >>> >> >> >
>> >>> >> >> > ______________________________________________
>> >>> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and
>> >more, see
>> >>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >> >> > PLEASE do read the posting guide
>http://www.R-project.org/
>> >>> >> >> posting-guide.html
>> >>> >> >> > and provide commented, minimal, self-contained,
>reproducible
>> >code.
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> Jianling Fan
>> >>> >> >> 樊建凌
>> >>> >> >>
>> >>> >> >
>> >>> >> >       [[alternative HTML version deleted]]
>> >>> >> >
>> >>> >> >______________________________________________
>> >>> >> >[hidden email] mailing list -- To UNSUBSCRIBE and more,
>> >see
>> >>> >> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >> >PLEASE do read the posting guide
>> >>> >> >http://www.R-project.org/posting-guide.html
>> >>> >> >and provide commented, minimal, self-contained, reproducible
>> >code.
>> >>> >>
>> >>> >>
>> >>> >
>> >>>
>> >>>         [[alternative HTML version deleted]]
>> >>>
>> >>> ______________________________________________
>> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide http://www.R-project.org/posti
>> >>> ng-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible
>code.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Cooperative Fish and Wildlife Research Unit
>> >> Zoology and Physiology Dept.
>> >> University of Wyoming
>> >> [hidden email] / 914.707.8506
>> >> wyocoopunit.org
>> >>
>> >>
>>
>>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about data problem

Martin Maechler
In reply to this post by jCeradini
>>>>> Joe Ceradini <[hidden email]>
>>>>>     on Tue, 20 Sep 2016 17:06:17 -0600 writes:

    > read.csv("your_data.csv", stringsAsFactors=FALSE)
    > (I'm just reiterating Jianling said...)

If you do not have very many columns, and want to become more
efficient and knowledgeable,
I strongly recommend alternatively to use the 'colClasses' argument
to read.csv or read.table (they are the same apart from defaults
for arguments!) and set "numeric" for numeric columns.

This has a similar effect to the *combination* of
 1)  stringsAsFactors = FALSE
 2)  foo <- as.numeric(foo) # for respective columns

Martin


    > Joe

    > On Tue, Sep 20, 2016 at 4:56 PM, lily li <[hidden email]> wrote:

    >> Is there a function in read.csv that I can use to avoid converting numeric
    >> to factor? Thanks a lot.
    >>
    >>
    >>
    >> On Tue, Sep 20, 2016 at 4:42 PM, lily li <[hidden email]> wrote:
    >>
    >> > Thanks. Then what should I do to solve the problem?
    >> >
    >> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
    >> [hidden email]>
    >> > wrote:
    >> >
    >> >> I suppose you can do what works for your data, but I wouldn't recommend
    >> >> na.rm=TRUE because it hides problems rather than clarifying them.
    >> >>
    >> >> If in fact your data includes true NA values (the letters NA or simply
    >> >> nothing between the commas are typical ways this information may be
    >> >> indicated), then read.csv will NOT change from integer to factor
    >> >> (particularly if you have specified which markers represent NA using the
    >> >> na.strings argument documented under read.table)... so you probably DO
    >> have
    >> >> unexpected garbage still in your data which could be obscuring valuable
    >> >> information that could affect your conclusions.
    >> >> --
    >> >> Sent from my phone. Please excuse my brevity.
    >> >>
    >> >> On September 20, 2016 3:11:42 PM PDT, lily li <[hidden email]>
    >> >> wrote:
    >> >> >I reread the data, and use 'na.rm = T' when reading the data. This time
    >> >> >it
    >> >> >has no such problem. It seems that the existence of NAs convert the
    >> >> >integer
    >> >> >to factor. Thanks for your help.
    >> >> >
    >> >> >
    >> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <[hidden email]>
    >> >> >wrote:
    >> >> >
    >> >> >> Add the "stringsAsFactors = F"  when you read the data, and then
    >> >> >> convert them to numeric.
    >> >> >>
    >> >> >> On 20 September 2016 at 16:00, lily li <[hidden email]> wrote:
    >> >> >> > Yes, it is stored as factor. I can't check out any problem in the
    >> >> >> original
    >> >> >> > data. Reread data doesn't help either. I use read.csv to read in
    >> >> >the
    >> >> >> data,
    >> >> >> > do you think it is better to use read.table? Thanks again.
    >> >> >> >
    >> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <[hidden email]>
    >> >> >wrote:
    >> >> >> >
    >> >> >> >> This indicates that your Discharge column has been
    >> >> >stored/converted as
    >> >> >> >> a factor (run str(df) to verify and check other columns).  This
    >> >> >> >> usually happens when functions like read.table are left to try to
    >> >> >> >> figure out what each column is and it finds something in that
    >> >> >column
    >> >> >> >> that cannot be converted to a number (possibly an oh instead of a
    >> >> >> >> zero, an el instead of a one, or just a letter or punctuation mark
    >> >> >> >> accidentally in the file).  You can either find the error in your
    >> >> >> >> original data, fix it, and reread the data, or specify that the
    >> >> >column
    >> >> >> >> should be numeric using the colClasses argument to read.table or
    >> >> >other
    >> >> >> >> function.
    >> >> >> >>
    >> >> >> >>
    >> >> >> >>
    >> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <[hidden email]>
    >> >> >wrote:
    >> >> >> >> > Hi R users,
    >> >> >> >> >
    >> >> >> >> > I have a problem in reading data.
    >> >> >> >> > For example, part of my dataframe is like this:
    >> >> >> >> >
    >> >> >> >> > df
    >> >> >> >> > month day year          Discharge
    >> >> >> >> >    3        1   2010                6.4
    >> >> >> >> >    3        2   2010               7.58
    >> >> >> >> >    3        3   2010               6.82
    >> >> >> >> >    3        4   2010               8.63
    >> >> >> >> >    3        5   2010               8.16
    >> >> >> >> >    3        6   2010               7.58
    >> >> >> >> >
    >> >> >> >> > Then if I type summary(df), why it converts the discharge data
    >> >> >to
    >> >> >> >> levels? I
    >> >> >> >> > also met the same problem when reading some other csv files. How
    >> >> >to
    >> >> >> solve
    >> >> >> >> > this problem? Thanks.
    >> >> >> >> >
    >> >> >> >> > Discharge
    >> >> >> >> > 7.58     :2
    >> >> >> >> > 6.4       :1
    >> >> >> >> > 6.82     :1
    >> >> >> >> > 8.63     :1
    >> >> >> >> > 8.16     :1
    >> >> >> >> >
    >> >> >> >> >         [[alternative HTML version deleted]]
    >> >> >> >> >
    >> >> >> >> > ______________________________________________
    >> >> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
    >> >> >see
    >> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
    >> >> >> >> > PLEASE do read the posting guide http://www.R-project.org/
    >> >> >> >> posting-guide.html
    >> >> >> >> > and provide commented, minimal, self-contained, reproducible
    >> >> >code.
    >> >> >> >>
    >> >> >> >>
    >> >> >> >>
    >> >> >> >> --
    >> >> >> >> Gregory (Greg) L. Snow Ph.D.
    >> >> >> >> [hidden email]
    >> >> >> >>
    >> >> >> >
    >> >> >> >         [[alternative HTML version deleted]]
    >> >> >> >
    >> >> >> > ______________________________________________
    >> >> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
    >> >> >> > PLEASE do read the posting guide http://www.R-project.org/
    >> >> >> posting-guide.html
    >> >> >> > and provide commented, minimal, self-contained, reproducible code.
    >> >> >>
    >> >> >>
    >> >> >>
    >> >> >> --
    >> >> >> Jianling Fan
    >> >> >> 樊建凌
    >> >> >>
    >> >> >
    >> >> >       [[alternative HTML version deleted]]
    >> >> >
    >> >> >______________________________________________
    >> >> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
    >> >> >https://stat.ethz.ch/mailman/listinfo/r-help
    >> >> >PLEASE do read the posting guide
    >> >> >http://www.R-project.org/posting-guide.html
    >> >> >and provide commented, minimal, self-contained, reproducible code.
    >> >>
    >> >>
    >> >
    >>
    >> [[alternative HTML version deleted]]
    >>
    >> ______________________________________________
    >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    >> https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide http://www.R-project.org/
    >> posting-guide.html
    >> and provide commented, minimal, self-contained, reproducible code.




    > --
    > Cooperative Fish and Wildlife Research Unit
    > Zoology and Physiology Dept.
    > University of Wyoming
    > [hidden email] / 914.707.8506
    > wyocoopunit.org

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.