Quantcast

help subsetting data based on date AND time

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

help subsetting data based on date AND time

Steve E.
Dear R Community,

I am new to R, and have a question that I suspect may be quite simple but is proving a formidable roadblock for me.  I have a large data set that includes water-quality measurements collected over many 24-hour periods.  The date and time of sample collection are in a combined Date/Time field in the format yyyy-mm-dd hh:mm:ss.  I need to be able to subset the data for analysis of different date and time windows.  Thus far, I have tried casting the Date/Time field using several approaches, such as:

DataSet$NewDateTime <- strptime(DataSet$DateTime, '%Y-%m-%d %H:%M:%S')
DataSet$NewDateTime <- as.POSIXlt(strptime(DataSet$DateTime, '%Y-%m-%d %H:%M:S'))

These instructions seem to cast the NewDateTime field correctly (at least it appears to be in the correct format, and I assume R sees the field as a date and a time) but I am then unable to subset the data using instructions such as:

with(DataSet, subset(DataSet, DataSet$NewDateTime < '2004-08-05 14:15:00'))
DataSubset <- subset(DataSet, DataSet$NewDateTime < '2004-08-05 14:00:00', select = DataSet)

I have tried also separating the date and time fields in the input file, and casting with instructions such as:

DataSet$NewTime <- strptime(DataSet$Time, '%H:%M:%S')
DataSet$NewTime <- as.POSIXct(strptime(DataSet$Time, '%H:%M:%S'))

but these seem to generate a NewTime field that contains today's date + the time data, and also will not subset based on date/time.

I appreciate greatly any help and advice,
Steve
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help subsetting data based on date AND time

Luke Miller
Try altering your subset operation from this:

with(DataSet, subset(DataSet, DataSet$NewDateTime < '2004-08-05 14:15:00'))

to this:

with(DataSet, subset(DataSet, DataSet$NewDateTime < as.POSIXct('2004-08-05
14:15:00')))

and see if you get the desired effect.

The statement DataSet$NewDateTime < '2004-08-05 14:15:00' is asking R to
find all of the rows in DataSet$NewDateTime that are less than the
*character* value '2004-08-05 14:15:00'. You need to convert that
*character* value to a POSIX time value first, using as.POSIXct(). Then you
can successfully carry out the comparison between the POSIXct values in
DataSet$NewDateTime and your newly created POSIX time value.

Because your character time value is listed in the standard POSIX format
(yyyy-mm-dd HH:MM:SS), you don't need to include the format information
(%y-%m-%d %H:%M:%S) in the as.POSIXct() function, which saves a little
typing. If it was in another format (mm-dd-yyyy) you'd need to use the
format argument in as.POSIXct() to make the character-to-POSIXct conversion
correctly.


On Thu, Sep 8, 2011 at 4:03 PM, Steve E. <[hidden email]> wrote:

> Dear R Community,
>
> I am new to R, and have a question that I suspect may be quite simple but
> is
> proving a formidable roadblock for me.  I have a large data set that
> includes water-quality measurements collected over many 24-hour periods.
> The date and time of sample collection are in a combined Date/Time field in
> the format yyyy-mm-dd hh:mm:ss.  I need to be able to subset the data for
> analysis of different date and time windows.  Thus far, I have tried
> casting
> the Date/Time field using several approaches, such as:
>
> DataSet$NewDateTime <- strptime(DataSet$DateTime, '%Y-%m-%d %H:%M:%S')
> DataSet$NewDateTime <- as.POSIXlt(strptime(DataSet$DateTime, '%Y-%m-%d
> %H:%M:S'))
>
> These instructions seem to cast the NewDateTime field correctly (at least
> it
> appears to be in the correct format, and I assume R sees the field as a
> date
> and a time) but I am then unable to subset the data using instructions such
> as:
>
> with(DataSet, subset(DataSet, DataSet$NewDateTime < '2004-08-05 14:15:00'))
> DataSubset <- subset(DataSet, DataSet$NewDateTime < '2004-08-05 14:00:00',
> select = DataSet)
>
> I have tried also separating the date and time fields in the input file,
> and
> casting with instructions such as:
>
> DataSet$NewTime <- strptime(DataSet$Time, '%H:%M:%S')
> DataSet$NewTime <- as.POSIXct(strptime(DataSet$Time, '%H:%M:%S'))
>
> but these seem to generate a NewTime field that contains today's date + the
> time data, and also will not subset based on date/time.
>
> I appreciate greatly any help and advice,
> Steve
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/help-subsetting-data-based-on-date-AND-time-tp3799933p3799933.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
___________________________
Luke Miller
Postdoctoral Researcher
Marine Science Center
Northeastern University
Nahant, MA
(781) 581-7370 x318

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help subsetting data based on date AND time

MacQueen, Don
In reply to this post by Steve E.
Steve,

Just below are some examples that I hope will help.
With regard to what you've tried, I don't see any reason for using with(),
or the select argument to subset(). They both look unnecessary to me.

## examples of subsetting date-time values

## create fake data
tmp <- seq(as.POSIXct('2011-08-01 13:00'), as.POSIXct('2011-08-05 03:00'),
len=42)
df <- data.frame(tm=tmp, x=seq(42))
## subset examples

## on or before the 2nd at 01:30
df1 <- subset(df, tm <= as.POSIXct('2011-08-02 1:30'))

## everying on the 3rd
df2 <- subset(df, format(tm,'%d')=='03')

## everything in hours  11am through 1pm inclusive
df3 <- subset(df, format(tm,'%H') %in% c('11','12','13'))

## 11 am through 3:59 pm on the 2nd
df4 <- subset(df, tm >= as.POSIXct('2011-08-02 11:00') & tm <=
as.POSIXct('2011-08-02 15:59'))



## just for reference, a sequence of every 15 minutes
tmp <- seq(as.POSIXct('2011-08-01 13:00'), as.POSIXct('2011-08-02 03:00'),
by='15 min')

Note that all comparisons use POSIXct class objects, converting character
to POSIXct where needed. As Luke mentioned, if the character strings are
in standard format, 'yyyy-mm-dd HH:MM:SS', just use as.POSIXct() without
any additional args.

-Don



--
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 9/8/11 1:03 PM, "Steve E." <[hidden email]> wrote:

>Dear R Community,
>
>I am new to R, and have a question that I suspect may be quite simple but
>is
>proving a formidable roadblock for me.  I have a large data set that
>includes water-quality measurements collected over many 24-hour periods.
>The date and time of sample collection are in a combined Date/Time field
>in
>the format yyyy-mm-dd hh:mm:ss.  I need to be able to subset the data for
>analysis of different date and time windows.  Thus far, I have tried
>casting
>the Date/Time field using several approaches, such as:
>
>DataSet$NewDateTime <- strptime(DataSet$DateTime, '%Y-%m-%d %H:%M:%S')
>DataSet$NewDateTime <- as.POSIXlt(strptime(DataSet$DateTime, '%Y-%m-%d
>%H:%M:S'))
>
>These instructions seem to cast the NewDateTime field correctly (at least
>it
>appears to be in the correct format, and I assume R sees the field as a
>date
>and a time) but I am then unable to subset the data using instructions
>such
>as:
>
>with(DataSet, subset(DataSet, DataSet$NewDateTime < '2004-08-05
>14:15:00'))
>DataSubset <- subset(DataSet, DataSet$NewDateTime < '2004-08-05 14:00:00',
>select = DataSet)
>
>I have tried also separating the date and time fields in the input file,
>and
>casting with instructions such as:
>
>DataSet$NewTime <- strptime(DataSet$Time, '%H:%M:%S')
>DataSet$NewTime <- as.POSIXct(strptime(DataSet$Time, '%H:%M:%S'))
>
>but these seem to generate a NewTime field that contains today's date +
>the
>time data, and also will not subset based on date/time.
>
>I appreciate greatly any help and advice,
>Steve
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/help-subsetting-data-based-on-date-AND-time-
>tp3799933p3799933.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...