Date in dataframe manipulation

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Date in dataframe manipulation

Dan Chan
Hi,

I have a dataframe with many columns, including date and I want to keep
only a few of the columns including date column.

I used the following command:
with(FireDataAppling, cbind(STARTDATE, County, TOTAL, CAUSE)

It works, but the date becomes days from Jan 1, 2001.  

FireDataAppling$STARTDATE[1] gives
[1] 2001-01-04 00:00:00  
1703 Levels: .........

After the cbind command, the entry becomes a 4.  

I want to get 2001-01-04.  What command should I use?  

Thank you.

Daniel Chan
Meteorologist
Georgia Forestry Commission
P O Box 819
Macon, GA
31202
Tel: 478-751-3508
Fax: 478-751-3465

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Date in dataframe manipulation

Don MacQueen
Try

    FireDataAppling[,c('STARTDATE','County','TOTAL','CAUSE')]

or perhaps the subset() function.

The way you are doing it is more complex than necessary (internally
complex, I mean, not how the code looks).

-Don

At 3:29 PM -0500 3/24/06, Dan Chan wrote:

>Hi,
>
>I have a dataframe with many columns, including date and I want to keep
>only a few of the columns including date column.
>
>I used the following command:
>with(FireDataAppling, cbind(STARTDATE, County, TOTAL, CAUSE)
>
>It works, but the date becomes days from Jan 1, 2001.
>
>FireDataAppling$STARTDATE[1] gives
>[1] 2001-01-04 00:00:00
>1703 Levels: .........
>
>After the cbind command, the entry becomes a 4.
>
>I want to get 2001-01-04.  What command should I use?
>
>Thank you.
>
>Daniel Chan
>Meteorologist
>Georgia Forestry Commission
>P O Box 819
>Macon, GA
>31202
>Tel: 478-751-3508
>Fax: 478-751-3465
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Date in dataframe manipulation

Marc Schwartz (via MN)
In reply to this post by Dan Chan
On Fri, 2006-03-24 at 15:29 -0500, Dan Chan wrote:

> Hi,
>
> I have a dataframe with many columns, including date and I want to keep
> only a few of the columns including date column.
>
> I used the following command:
> with(FireDataAppling, cbind(STARTDATE, County, TOTAL, CAUSE)
>
> It works, but the date becomes days from Jan 1, 2001.  
>
> FireDataAppling$STARTDATE[1] gives
> [1] 2001-01-04 00:00:00  
> 1703 Levels: .........

This output suggests that STARTDATE is a factor, rather than a Date
related data type. Did you read this data in via one of the read.table()
family of functions? If these values are quoted character fields in the
imported text file, they will be converted to factors by default.

> After the cbind command, the entry becomes a 4.  
>
> I want to get 2001-01-04.  What command should I use?  
>
> Thank you.

You might want to review the "Note" section in ?cbind, relative to the
result of cbind()ing vectors of differing data types. By using with(),
you are effectively taking the data frame columns as individual vectors
and the resultant _matrix_ will be coerced to a single data type, in
this case, presumably numeric. I am guessing that 'County' and 'CAUSE'
are also factors, whereas 'TOTAL' is numeric.

Using str(FireDataAppling) will give you some insight into the structure
of your data frame.

The '4' that you are getting is the factor level numeric code for the
entry above, not the number of days since Jan 1, 2001, which is not a
default 'origin' date in R. Jan 1, 1970 is.

You might want to look at ?factor for more insight here.

If you want to retain only a _subset_ of the columns in a data frame,
use the subset() function:

  subset(FireDataAppling, select = c(STARTDATE, County, TOTAL, CAUSE))

This will return a data frame and retain the original data types. If you
want to then perform actual Date based operations on those values, take
a look at ?DateTimeClasses, paying attention to the "See Also" section
relative to associated functions.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Date in dataframe manipulation

Dan Chan
In reply to this post by Dan Chan
Thank you Marc and Don's help, especially Marc's.  

Output<- subset(FireDataAppling, select = c(STARTDATE, County, TOTAL,
CAUSE))
Worked!
STARTDATE IS a factor and I used the following command to get the
yyyy-mm-dd format of the date
Output$Date<- as.POSIXct(Output$STARTDATE)

Thank you!

Daniel Chan

-----Original Message-----
From: Marc Schwartz (via MN) [mailto:[hidden email]]
Sent: Friday, March 24, 2006 9:22 PM
To: Dan Chan
Cc: [hidden email]
Subject: Re: [R] Date in dataframe manipulation

On Fri, 2006-03-24 at 15:29 -0500, Dan Chan wrote:
> Hi,
>
> I have a dataframe with many columns, including date and I want to
keep

> only a few of the columns including date column.
>
> I used the following command:
> with(FireDataAppling, cbind(STARTDATE, County, TOTAL, CAUSE)
>
> It works, but the date becomes days from Jan 1, 2001.  
>
> FireDataAppling$STARTDATE[1] gives
> [1] 2001-01-04 00:00:00  
> 1703 Levels: .........

This output suggests that STARTDATE is a factor, rather than a Date
related data type. Did you read this data in via one of the read.table()
family of functions? If these values are quoted character fields in the
imported text file, they will be converted to factors by default.

> After the cbind command, the entry becomes a 4.  
>
> I want to get 2001-01-04.  What command should I use?  
>
> Thank you.

You might want to review the "Note" section in ?cbind, relative to the
result of cbind()ing vectors of differing data types. By using with(),
you are effectively taking the data frame columns as individual vectors
and the resultant _matrix_ will be coerced to a single data type, in
this case, presumably numeric. I am guessing that 'County' and 'CAUSE'
are also factors, whereas 'TOTAL' is numeric.

Using str(FireDataAppling) will give you some insight into the structure
of your data frame.

The '4' that you are getting is the factor level numeric code for the
entry above, not the number of days since Jan 1, 2001, which is not a
default 'origin' date in R. Jan 1, 1970 is.

You might want to look at ?factor for more insight here.

If you want to retain only a _subset_ of the columns in a data frame,
use the subset() function:

  subset(FireDataAppling, select = c(STARTDATE, County, TOTAL, CAUSE))

This will return a data frame and retain the original data types. If you
want to then perform actual Date based operations on those values, take
a look at ?DateTimeClasses, paying attention to the "See Also" section
relative to associated functions.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html