Stringr / Regular Expressions advice

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Stringr / Regular Expressions advice

VINCENT DEAN BOYCE
Hello,

Using R,  I've loaded a .cvs file comprised of several hundred rows and 3
columns of data. The data within maps the output of a triaxial
accelerometer, a sensor which measures an object's acceleration along the
x,y and z axes. The data for each respective column sequentially
oscillates, and ranges numerically from 100 to 500.

I want create a function that parses the data and detects patterns across
the three columns.

For instance, I would like to detect instances when the values for the x,y
and z columns equal 150, 200, 300 respectively. Additionally, when a match
is detected, I would like to know how many times the pattern appears.

I have been successful using str_detect to provide a Boolean, however it
seems to only work on a single vector, i.e, "400" , not a range of values
i.e "400 - 450". See below:


# this works
> vals <- str_detect (string = data_log$x_reading, pattern = "400")

# this also works, but doesn't detect the particular range, rather the
existence of the numbers
> vals <- str_detect (string = data_log$x_reading, pattern = "[400-450]")

Also, it appears that I can only apply it to a single column, not to all
three columns. However I may be mistaken.

Any advice on my current approach or alternativea I should consider is
greatly appreciated.

Many thanks,

Vincent

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Stringr / Regular Expressions advice

Adams, Jean
You could define a simple function to detect whether a value is within a
given range.  For example,

inrange <- function(vec, range) {
!is.na(vec) & vec >= range[1] & vec <= range[2]
 }
x <- 1:30
inrange(x, c(5, 20))

If you wanted to apply this function to all three columns at once, you
could use apply().  For example,
apply(data_log, 2, inrange)

Jean



On Thu, Jun 26, 2014 at 11:17 AM, VINCENT DEAN BOYCE <
[hidden email]> wrote:

> Hello,
>
> Using R,  I've loaded a .cvs file comprised of several hundred rows and 3
> columns of data. The data within maps the output of a triaxial
> accelerometer, a sensor which measures an object's acceleration along the
> x,y and z axes. The data for each respective column sequentially
> oscillates, and ranges numerically from 100 to 500.
>
> I want create a function that parses the data and detects patterns across
> the three columns.
>
> For instance, I would like to detect instances when the values for the x,y
> and z columns equal 150, 200, 300 respectively. Additionally, when a match
> is detected, I would like to know how many times the pattern appears.
>
> I have been successful using str_detect to provide a Boolean, however it
> seems to only work on a single vector, i.e, "400" , not a range of values
> i.e "400 - 450". See below:
>
>
> # this works
> > vals <- str_detect (string = data_log$x_reading, pattern = "400")
>
> # this also works, but doesn't detect the particular range, rather the
> existence of the numbers
> > vals <- str_detect (string = data_log$x_reading, pattern = "[400-450]")
>
> Also, it appears that I can only apply it to a single column, not to all
> three columns. However I may be mistaken.
>
> Any advice on my current approach or alternativea I should consider is
> greatly appreciated.
>
> Many thanks,
>
> Vincent
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Stringr / Regular Expressions advice

Sarah Goslee
In reply to this post by VINCENT DEAN BOYCE
Hi,

On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE
<[hidden email]> wrote:
> Hello,
>
> Using R,  I've loaded a .cvs file comprised of several hundred rows and 3
> columns of data. The data within maps the output of a triaxial
> accelerometer, a sensor which measures an object's acceleration along the
> x,y and z axes. The data for each respective column sequentially
> oscillates, and ranges numerically from 100 to 500.

If your data are numeric, why are you using stringr?

It would be easier to provide you with an answer if we knew what your
data looked like.

dput(head(yourdata, 20))

and paste that into your non-HTML email.

> I want create a function that parses the data and detects patterns across
> the three columns.
>
> For instance, I would like to detect instances when the values for the x,y
> and z columns equal 150, 200, 300 respectively. Additionally, when a match
> is detected, I would like to know how many times the pattern appears.

That's easy enough:

fakedata <- data.frame(matrix(c(
100, 100, 200,
150, 200, 300,
100, 350, 100,
400, 200, 300,
200, 500, 200,
150, 200, 300,
150, 200, 300),
ncol=3, byrow=TRUE))

v.to.match <- c(150, 200, 300)

v.matches <- apply(fakedata, 1, function(x)all(x == v.to.match))

# which rows match
which(v.matches)

# how many rows match
sum(v.matches)

> I have been successful using str_detect to provide a Boolean, however it
> seems to only work on a single vector, i.e, "400" , not a range of values
> i.e "400 - 450". See below:

This is where I get confused, and where we need sample data. Are your
data numeric, as you state above, or some other format?

If your data are character, and like "400 - 450", you can still match
them with the code I suggested above.

> # this works
>> vals <- str_detect (string = data_log$x_reading, pattern = "400")
>
> # this also works, but doesn't detect the particular range, rather the
> existence of the numbers
>> vals <- str_detect (string = data_log$x_reading, pattern = "[400-450]")

Are you trying to match any numeric value in the range 400-450? Again,
actual data.

> Also, it appears that I can only apply it to a single column, not to all
> three columns. However I may be mistaken.

You answer your own question unwittingly - apply().

Sarah

--
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Stringr / Regular Expressions advice

arun kirshna
In reply to this post by VINCENT DEAN BOYCE


Hi,
May be you can use ?cut or ?findInterval for the range

dat1 <- read.table(text="100, 100, 200
250, 300, 350
100, 350, 100
400, 250, 300
200, 450, 200
150, 501, 300
150, 250, 300",sep=",",header=F)
sapply(dat1, findInterval, c(400,500))==1
#        V1    V2    V3
#[1,] FALSE FALSE FALSE
#[2,] FALSE FALSE FALSE
#[3,] FALSE FALSE FALSE
#[4,]  TRUE FALSE FALSE
#[5,] FALSE  TRUE FALSE
#[6,] FALSE FALSE FALSE
#[7,] FALSE FALSE FALSE

A.K.



On Thursday, June 26, 2014 4:11 PM, VINCENT DEAN BOYCE <[hidden email]> wrote:
Hello,

Using R,  I've loaded a .cvs file comprised of several hundred rows and 3
columns of data. The data within maps the output of a triaxial
accelerometer, a sensor which measures an object's acceleration along the
x,y and z axes. The data for each respective column sequentially
oscillates, and ranges numerically from 100 to 500.

I want create a function that parses the data and detects patterns across
the three columns.

For instance, I would like to detect instances when the values for the x,y
and z columns equal 150, 200, 300 respectively. Additionally, when a match
is detected, I would like to know how many times the pattern appears.

I have been successful using str_detect to provide a Boolean, however it
seems to only work on a single vector, i.e, "400" , not a range of values
i.e "400 - 450". See below:


# this works
> vals <- str_detect (string = data_log$x_reading, pattern = "400")

# this also works, but doesn't detect the particular range, rather the
existence of the numbers
> vals <- str_detect (string = data_log$x_reading, pattern = "[400-450]")

Also, it appears that I can only apply it to a single column, not to all
three columns. However I may be mistaken.

Any advice on my current approach or alternativea I should consider is
greatly appreciated.

Many thanks,

Vincent

    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Stringr / Regular Expressions advice

Sarah Goslee
In reply to this post by Sarah Goslee
Hi,

It's a good idea to copy back to the list, not just to mo, to keep the
discussion all in one place.

On Thursday, June 26, 2014, VINCENT DEAN BOYCE <[hidden email]>
wrote:

> Sarah,
>
> Great feedback and direction. Here is the data I am working with*:
>
> > dput(head(data_log, 20))
>
> structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L,
> 448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L,
> 439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L,
> 505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L,
> 469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L,
> 446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L,
> 455L, 460L, 463L, 458L, 458L)), .Names = c("x_reading", "y_reading",
> "z_reading"), row.names = c(NA, 20L), class = "data.frame")
>
> *however, I am unsure why the letter "L" has been appended to each
> numerical string.
>

It denotes values stored as integers, and is nothing you need to worry
about.


> In any event, as you can see there are three columns of data named
> x_reading, y_reading and z_reading. I would like to detect patterns among
> them.
>
> For instance, let's say the pattern I wish to detect is 455, 502, 454
> across the three columns respectively. As you can see in the data, this is
> found in the first row.This particular string reoccurs numerous times
> within the dataset is what I wish to quantify - how many times the string
> 455, 502, 454 appears.
>
> Your thoughts?
>

Did you try the code I provided? It does what I think you're looking for.

Sarah


> Many thanks,
>
> Vincent
>
>
> On Thu, Jun 26, 2014 at 4:46 PM, Sarah Goslee <[hidden email]
> <javascript:_e(%7B%7D,'cvml','[hidden email]');>> wrote:
>
>> Hi,
>>
>> On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE
>> <[hidden email]
>> <javascript:_e(%7B%7D,'cvml','[hidden email]');>> wrote:
>> > Hello,
>> >
>> > Using R,  I've loaded a .cvs file comprised of several hundred rows and
>> 3
>> > columns of data. The data within maps the output of a triaxial
>> > accelerometer, a sensor which measures an object's acceleration along
>> the
>> > x,y and z axes. The data for each respective column sequentially
>> > oscillates, and ranges numerically from 100 to 500.
>>
>> If your data are numeric, why are you using stringr?
>>
>> It would be easier to provide you with an answer if we knew what your
>> data looked like.
>>
>> dput(head(yourdata, 20))
>>
>> and paste that into your non-HTML email.
>>
>> > I want create a function that parses the data and detects patterns
>> across
>> > the three columns.
>> >
>> > For instance, I would like to detect instances when the values for the
>> x,y
>> > and z columns equal 150, 200, 300 respectively. Additionally, when a
>> match
>> > is detected, I would like to know how many times the pattern appears.
>>
>> That's easy enough:
>>
>> fakedata <- data.frame(matrix(c(
>> 100, 100, 200,
>> 150, 200, 300,
>> 100, 350, 100,
>> 400, 200, 300,
>> 200, 500, 200,
>> 150, 200, 300,
>> 150, 200, 300),
>> ncol=3, byrow=TRUE))
>>
>> v.to.match <- c(150, 200, 300)
>>
>> v.matches <- apply(fakedata, 1, function(x)all(x == v.to.match))
>>
>> # which rows match
>> which(v.matches)
>>
>> # how many rows match
>> sum(v.matches)
>>
>> > I have been successful using str_detect to provide a Boolean, however it
>> > seems to only work on a single vector, i.e, "400" , not a range of
>> values
>> > i.e "400 - 450". See below:
>>
>> This is where I get confused, and where we need sample data. Are your
>> data numeric, as you state above, or some other format?
>>
>> If your data are character, and like "400 - 450", you can still match
>> them with the code I suggested above.
>>
>> > # this works
>> >> vals <- str_detect (string = data_log$x_reading, pattern = "400")
>> >
>> > # this also works, but doesn't detect the particular range, rather the
>> > existence of the numbers
>> >> vals <- str_detect (string = data_log$x_reading, pattern = "[400-450]")
>>
>> Are you trying to match any numeric value in the range 400-450? Again,
>> actual data.
>>
>> > Also, it appears that I can only apply it to a single column, not to all
>> > three columns. However I may be mistaken.
>>
>> You answer your own question unwittingly - apply().
>>
>> Sarah
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>
>

--
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Stringr / Regular Expressions advice

VINCENT DEAN BOYCE
Sara,

Yes, I modified the code that you provided and it worked quite well. Here
is the revised code:

.....

accel_data <- data
*# pattern to be identified*
v.to.match <- c(438, 454, 459)
# call the below function anytime the "v.to.match" criteria changes to
ensure match is updated
v.matches <- apply(fakedata, 1, function(x)all(x == v.to.match))
which(v.matches)
[1] 405
sum(v.matches)
[1] 1

......

Again, here is the dataset:

> dput(head(accel_data, 20))

structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L,
448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L,
439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L,
505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L,
469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L,
446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L,
455L, 460L, 463L, 458L, 458L)), .Names = c("x_reading", "y_reading",
"z_reading"), row.names = c(NA, 20L), class = "data.frame")

My next goal is to extend the range for each column. For instance:

v.to.match <- c(438:445, 454:460, 459:470)

Your thoughts?

Many thanks,

Vincent





On Fri, Jun 27, 2014 at 5:51 AM, Sarah Goslee <[hidden email]>
wrote:

> Hi,
>
> It's a good idea to copy back to the list, not just to mo, to keep the
> discussion all in one place.
>
>
> On Thursday, June 26, 2014, VINCENT DEAN BOYCE <[hidden email]>
> wrote:
>
>> Sarah,
>>
>> Great feedback and direction. Here is the data I am working with*:
>>
>> > dput(head(data_log, 20))
>>
>> structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L,
>> 448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L,
>> 439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L,
>> 505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L,
>> 469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L,
>> 446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L,
>> 455L, 460L, 463L, 458L, 458L)), .Names = c("x_reading", "y_reading",
>> "z_reading"), row.names = c(NA, 20L), class = "data.frame")
>>
>> *however, I am unsure why the letter "L" has been appended to each
>> numerical string.
>>
>
> It denotes values stored as integers, and is nothing you need to worry
> about.
>
>
>> In any event, as you can see there are three columns of data named
>> x_reading, y_reading and z_reading. I would like to detect patterns among
>> them.
>>
>> For instance, let's say the pattern I wish to detect is 455, 502, 454
>> across the three columns respectively. As you can see in the data, this is
>> found in the first row.This particular string reoccurs numerous times
>> within the dataset is what I wish to quantify - how many times the string
>> 455, 502, 454 appears.
>>
>> Your thoughts?
>>
>
> Did you try the code I provided? It does what I think you're looking for.
>
> Sarah
>
>
>> Many thanks,
>>
>> Vincent
>>
>>
>> On Thu, Jun 26, 2014 at 4:46 PM, Sarah Goslee <[hidden email]>
>> wrote:
>>
>>> Hi,
>>>
>>> On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE
>>> <[hidden email]> wrote:
>>> > Hello,
>>> >
>>> > Using R,  I've loaded a .cvs file comprised of several hundred rows
>>> and 3
>>> > columns of data. The data within maps the output of a triaxial
>>> > accelerometer, a sensor which measures an object's acceleration along
>>> the
>>> > x,y and z axes. The data for each respective column sequentially
>>> > oscillates, and ranges numerically from 100 to 500.
>>>
>>> If your data are numeric, why are you using stringr?
>>>
>>> It would be easier to provide you with an answer if we knew what your
>>> data looked like.
>>>
>>> dput(head(yourdata, 20))
>>>
>>> and paste that into your non-HTML email.
>>>
>>> > I want create a function that parses the data and detects patterns
>>> across
>>> > the three columns.
>>> >
>>> > For instance, I would like to detect instances when the values for the
>>> x,y
>>> > and z columns equal 150, 200, 300 respectively. Additionally, when a
>>> match
>>> > is detected, I would like to know how many times the pattern appears.
>>>
>>> That's easy enough:
>>>
>>> fakedata <- data.frame(matrix(c(
>>> 100, 100, 200,
>>> 150, 200, 300,
>>> 100, 350, 100,
>>> 400, 200, 300,
>>> 200, 500, 200,
>>> 150, 200, 300,
>>> 150, 200, 300),
>>> ncol=3, byrow=TRUE))
>>>
>>> v.to.match <- c(150, 200, 300)
>>>
>>> v.matches <- apply(fakedata, 1, function(x)all(x == v.to.match))
>>>
>>> # which rows match
>>> which(v.matches)
>>>
>>> # how many rows match
>>> sum(v.matches)
>>>
>>> > I have been successful using str_detect to provide a Boolean, however
>>> it
>>> > seems to only work on a single vector, i.e, "400" , not a range of
>>> values
>>> > i.e "400 - 450". See below:
>>>
>>> This is where I get confused, and where we need sample data. Are your
>>> data numeric, as you state above, or some other format?
>>>
>>> If your data are character, and like "400 - 450", you can still match
>>> them with the code I suggested above.
>>>
>>> > # this works
>>> >> vals <- str_detect (string = data_log$x_reading, pattern = "400")
>>> >
>>> > # this also works, but doesn't detect the particular range, rather the
>>> > existence of the numbers
>>> >> vals <- str_detect (string = data_log$x_reading, pattern =
>>> "[400-450]")
>>>
>>> Are you trying to match any numeric value in the range 400-450? Again,
>>> actual data.
>>>
>>> > Also, it appears that I can only apply it to a single column, not to
>>> all
>>> > three columns. However I may be mistaken.
>>>
>>> You answer your own question unwittingly - apply().
>>>
>>> Sarah
>>>
>>> --
>>> Sarah Goslee
>>> http://www.functionaldiversity.org
>>>
>>
>>
>
> --
> Sarah Goslee
> http://www.stringpage.com
> http://www.sarahgoslee.com
> http://www.functionaldiversity.org
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Stringr / Regular Expressions advice

arun kirshna
#or

res <- mapply(`%in%`, accel_data, v.to.match)

res1 <- sapply(seq_len(ncol(accel_data)),function(i) accel_data[i]<=tail(v.to.match[[i]],1) & accel_data[i] >=v.to.match[[i]][1])

all.equal(res, res1,check.attributes=F)
#[1] TRUE

A.K.

On Tuesday, July 1, 2014 10:56 PM, arun <[hidden email]> wrote:
Hi Vincent,

You could try:
v.to.match <- list(438:445, 454:460,459:470)

sapply(seq_len(ncol(accel_data)),function(i) accel_data[i]<=tail(v.to.match[[i]],1) & accel_data[i] >=v.to.match[[i]][1])

#or use ?cut or ?findInterval

A.K.







On Tuesday, July 1, 2014 2:23 PM, VINCENT DEAN BOYCE <[hidden email]> wrote:
Sara,

Yes, I modified the code that you provided and it worked quite well. Here
is the revised code:

.....

accel_data <- data
*# pattern to be identified*
v.to.match <- c(438, 454, 459)
# call the below function anytime the "v.to.match" criteria changes to
ensure match is updated
v.matches <- apply(fakedata, 1, function(x)all(x == v.to.match))
which(v.matches)
[1] 405
sum(v.matches)
[1] 1

......

Again, here is the dataset:

> dput(head(accel_data, 20))

structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L,
448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L,
439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L,
505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L,
469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L,
446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L,
455L, 460L, 463L, 458L, 458L)), .Names = c("x_reading", "y_reading",
"z_reading"), row.names = c(NA, 20L), class = "data.frame")

My next goal is to extend the range for each column. For instance:

v.to.match <- c(438:445, 454:460, 459:470)

Your thoughts?

Many thanks,

Vincent








On Fri, Jun 27, 2014 at 5:51 AM, Sarah Goslee <[hidden email]>
wrote:

> Hi,
>
> It's a good idea to copy back to the list, not just to mo, to keep the
> discussion all in one place.
>
>
> On Thursday, June 26, 2014, VINCENT DEAN BOYCE <[hidden email]>
> wrote:
>
>> Sarah,
>>
>> Great feedback and direction. Here is the data I am working with*:
>>
>> > dput(head(data_log, 20))
>>
>> structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L,
>> 448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L,
>> 439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L,
>> 505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L,
>> 469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L,
>> 446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L,
>> 455L, 460L, 463L, 458L, 458L)), .Names = c("x_reading", "y_reading",
>> "z_reading"), row.names = c(NA, 20L), class = "data.frame")
>>
>> *however, I am unsure why the letter "L" has been appended to each
>> numerical string.
>>
>
> It denotes values stored as integers, and is nothing you need to worry
> about.
>
>
>> In any event, as you can see there are three columns of data named
>> x_reading, y_reading and z_reading. I would like to detect patterns among
>> them.
>>
>> For instance, let's say the pattern I wish to detect is 455, 502, 454
>> across the three columns respectively. As you can see in the data, this is
>> found in the first row.This particular string reoccurs numerous times
>> within the dataset is what I wish to quantify - how many times the string
>> 455, 502, 454 appears.
>>
>> Your thoughts?
>>
>
> Did you try the code I provided? It does what I think you're looking for.
>
> Sarah
>
>
>> Many thanks,
>>
>> Vincent
>>
>>
>> On Thu, Jun 26, 2014 at 4:46 PM, Sarah Goslee <[hidden email]>
>> wrote:
>>
>>> Hi,
>>>
>>> On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE
>>> <[hidden email]> wrote:
>>> > Hello,
>>> >
>>> > Using R,  I've loaded a .cvs file comprised of several hundred rows
>>> and 3
>>> > columns of data. The data within maps the output of a triaxial
>>> > accelerometer, a sensor which measures an object's acceleration along
>>> the
>>> > x,y and z axes. The data for each respective column sequentially
>>> > oscillates, and ranges numerically from 100 to 500.
>>>
>>> If your data are numeric, why are you using stringr?
>>>
>>> It would be easier to provide you with an answer if we knew what your
>>> data looked like.
>>>
>>> dput(head(yourdata, 20))
>>>
>>> and paste that into your non-HTML email.
>>>
>>> > I want create a function that parses the data and detects patterns
>>> across
>>> > the three columns.
>>> >
>>> > For instance, I would like to detect instances when the values for the
>>> x,y
>>> > and z columns equal 150, 200, 300 respectively. Additionally, when a
>>> match
>>> > is detected, I would like to know how many times the pattern appears.
>>>
>>> That's easy enough:
>>>
>>> fakedata <- data.frame(matrix(c(
>>> 100, 100, 200,
>>> 150, 200, 300,
>>> 100, 350, 100,
>>> 400, 200, 300,
>>> 200, 500, 200,
>>> 150, 200, 300,
>>> 150, 200, 300),
>>> ncol=3, byrow=TRUE))
>>>
>>> v.to.match <- c(150, 200, 300)
>>>
>>> v.matches <- apply(fakedata, 1, function(x)all(x == v.to.match))
>>>
>>> # which rows match
>>> which(v.matches)
>>>
>>> # how many rows match
>>> sum(v.matches)
>>>
>>> > I have been successful using str_detect to provide a Boolean, however
>>> it
>>> > seems to only work on a single vector, i.e, "400" , not a range of
>>> values
>>> > i.e "400 - 450". See below:
>>>
>>> This is where I get confused, and where we need sample data. Are your
>>> data numeric, as you state above, or some other format?
>>>
>>> If your data are character, and like "400 - 450", you can still match
>>> them with the code I suggested above.
>>>
>>> > # this works
>>> >> vals <- str_detect (string = data_log$x_reading, pattern = "400")
>>> >
>>> > # this also works, but doesn't detect the particular range, rather the
>>> > existence of the numbers
>>> >> vals <- str_detect (string = data_log$x_reading, pattern =
>>> "[400-450]")
>>>
>>> Are you trying to match any numeric value in the range 400-450? Again,
>>> actual data.
>>>
>>> > Also, it appears that I can only apply it to a single column, not to
>>> all
>>> > three columns. However I may be mistaken.
>>>
>>> You answer your own question unwittingly - apply().
>>>
>>> Sarah
>>>
>>> --
>>> Sarah Goslee
>>> http://www.functionaldiversity.org
>>>
>>
>>
>
> --
> Sarah Goslee
> http://www.stringpage.com
> http://www.sarahgoslee.com
> http://www.functionaldiversity.org
>

    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.