

Hi all,
In a data set I have group(GR) and two variables x and y. I want to
remove a group that have the same record for the x variable in each
row.
DM < read.table( text='GR x y
A 25 125
A 23 135
A 14 145
A 12 230
B 25 321
B 25 512
B 25 123
B 25 451
C 11 521
C 14 235
C 15 258
C 10 654',header = TRUE, stringsAsFactors = FALSE)
In this example the output should contain group A and C as group B
has the same record for the variable x .
The result will be
A 25 125
A 23 135
A 14 145
A 12 230
C 11 521
C 14 235
C 15 258
C 10 654
How do I do it R?
Thank you.
> On Dec 6, 2017, at 3:15 PM, Ashta < [hidden email]> wrote:
>
> Hi all,
> In a data set I have group(GR) and two variables x and y. I want to
> remove a group that have the same record for the x variable in each
> row.
>
> DM < read.table( text='GR x y
> A 25 125
> A 23 135
> A 14 145
> A 12 230
> B 25 321
> B 25 512
> B 25 123
> B 25 451
> C 11 521
> C 14 235
> C 15 258
> C 10 654',header = TRUE, stringsAsFactors = FALSE)
>
> In this example the output should contain group A and C as group B
> has the same record for the variable x .
>
> The result will be
> A 25 125
> A 23 135
> A 14 145
> A 12 230
> C 11 521
> C 14 235
> C 15 258
> C 10 654
Try:
DM[ !duplicated(DM$x) , ]
>
> How do I do it R?
> Thank you.
>
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' Gehm's Corollary to Clarke's Third Law
Thank you David.
This will not work. Tthis removes only duplicate records.
DM[ !duplicated(DM$x) , ]
My goal is to remove the group if all elements of x in that group have
the same value.
subset( DM, "B" != x )
This is covered in the Introduction to R document that comes with R.

Thank you Jeff,
subset( DM, "B" != x ), this works if I know the group only.
But if I don't know that group in this case "B", how do I identify
group(s) that all elements of x have the same value?
Hi Ashta,
There are many ways to do it. Here is one:
vars < sapply(split(DM$x, DM$GR), var)
DM[DM$GR %in% names(vars[vars > 0]), ]
Best
Ista
Thank you Ista! Worked fine.
> On Dec 6, 2017, at 4:27 PM, Ashta < [hidden email]> wrote:
>
> Thank you Ista! Worked fine.
Here's another (possibly more direct in its logic?):
DM[ !ave(DM$x, DM$GR, FUN= function(x) {!length(unique(x))==1}), ]
GR x y
5 B 25 321
6 B 25 512
7 B 25 123
8 B 25 451

David
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' Gehm's Corollary to Clarke's Third Law
Hi David, Ista and all,
I have one related question Within one group I want to keep records
conditionally.
example within
group A I want keep rows that have " x" values ranged between 15 and 30.
group B I want keep rows that have " x" values ranged between 40 and 50.
group C I want keep rows that have " x" values ranged between 60 and 75.
DM < read.table( text='GR x y
A 25 125
A 23 135
A 14 145
A 35 230
B 45 321
B 47 512
B 53 123
B 55 451
C 61 521
C 68 235
C 85 258
C 80 654',header = TRUE, stringsAsFactors = FALSE)
The end result will be
A 25 125
A 23 135
B 45 321
B 47 512
C 61 521
C 68 235
Thank you
> On Dec 8, 2017, at 4:48 PM, Ashta < [hidden email]> wrote:
>
> Hi David, Ista and all,
>
> I have one related question Within one group I want to keep records
> conditionally.
> example within
> group A I want keep rows that have " x" values ranged between 15 and 30.
> group B I want keep rows that have " x" values ranged between 40 and 50.
> group C I want keep rows that have " x" values ranged between 60 and 75.
When you have a problem where there are multiple "parallel: parameters, the function to "reach for" is `mapply`.
mapply( your_selection_func, group_vec, min_vec, max_vec)
... and this will probably return the values as a list (of dataframes if you build the function correctly, so you may may need to then do:
do.call(rbind, ...)

David.
>
>
> DM < read.table( text='GR x y
> A 25 125
> A 23 135
> A 14 145
> A 35 230
> B 45 321
> B 47 512
> B 53 123
> B 55 451
> C 61 521
> C 68 235
> C 85 258
> C 80 654',header = TRUE, stringsAsFactors = FALSE)
>
>
> The end result will be
> A 25 125
> A 23 135
> B 45 321
> B 47 512
> C 61 521
> C 68 235
>
> Thank you
>
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' Gehm's Corollary to Clarke's Third Law
library(dplyr)
DM < read.table( text='GR x y
A 25 125
A 23 135
.
.
.
)
DM %>% filter((GR == "A" & (x >= 15) & (x <= 30)) 
(GR == "B" & (x >= 40) & (x <= 50)) 
(GR == "C" & (x >= 60) & (x <= 75)))
In this case I cannot see an advantage to using dplyr over subset, other
than if dplyr is your hammer then the problem will look like a nail, or if
this is one step in a larger context where dplyr is more useful.
Nor do I think this is a good use for mapply (or dplyr::group_by) because
the groups are handled differently... better to introduce a datadriven
columnar approach than to have three separate algorithms and bind the data
frames together again.
Here are three ways I came up with. I sometimes use a variation of method
3 when the logical tests are rather more complicated than this and I want
to characterize those tests in the final report.
####### reprex
DM < read.table( text =
"GR x y
A 25 125
A 23 135
A 14 145
A 35 230
B 45 321
B 47 512
B 53 123
B 55 451
C 61 521
C 68 235
C 85 258
C 80 654", header = TRUE, stringsAsFactors = FALSE )
# 1 Hardcoded logic
DM1 < subset( DM
, "A" == GR & 15 <= x & x <= 30
 "B" == GR & 40 <= x & x <= 50
 "C" == GR & 60 <= x & x <= 75
)
DM1
#> GR x y
#> 1 A 25 125
#> 2 A 23 135
#> 5 B 45 321
#> 6 B 47 512
#> 9 C 61 521
#> 10 C 68 235
# 2 relational approach
cond < read.table( text =
"GR minx maxx
A 15 30
B 40 50
C 60 75
", header = TRUE )
DM2 < merge( DM, cond, by = "GR" )
DM2 < subset( DM2, minx <= x & x <= maxx, select = c( minx, maxx ) )
DM2
#> GR x y
#> 1 A 25 125
#> 2 A 23 135
#> 5 B 45 321
#> 6 B 47 512
#> 9 C 61 521
#> 10 C 68 235
# 3 Construct selection vector
sel < rep( FALSE, nrow( DM ) )
for ( i in seq.int( nrow( cond ) ) ) {
sel < sel  ( cond$GR[ i ] == DM$GR
& cond$minx[ i ] <= DM$x
& DM$x <= cond$maxx[ i ]
)
}
DM3 < DM[ sel, ]
DM3
#> GR x y
#> 1 A 25 125
#> 2 A 23 135
#> 5 B 45 321
#> 6 B 47 512
#> 9 C 61 521
#> 10 C 68 235
#######
>

Jeff Newmiller The ..... ..... Go Live...
DCN:< [hidden email]> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Hello,
Try the following.
keep < list(A = c(15, 30), B = c(40, 50), C = c(60, 75))
sp < split(DM$x, DM$GR)
inx < unlist(lapply(seq_along(sp), function(i) keep[[i]][1] <= sp[[i]]
& sp[[i]] <= keep[[i]][2]))
DM[inx, ]
# GR x y
#1 A 25 125
#2 A 23 135
#5 B 45 321
#6 B 47 512
#9 C 61 521
#10 C 68 235
Hope this helps,
Rui Barradas
>
HI
How about this one. It produces the desired result. If you have more
conditions, you can put them in a matrix/DF form and subset as suggested by
one of the previous suggestion.
DM[(DM$GR=="A"&DM$x>=15&DM$x<=30)(DM$GR=="B"&DM$x>=40&DM$x<=50)(DM
$GR=="C"&DM$x>=60&DM$x<=70),]
EK
> On Dec 8, 2017, at 6:16 PM, David Winsemius < [hidden email]> wrote:
>
>
>> On Dec 8, 2017, at 4:48 PM, Ashta < [hidden email]> wrote:
>>
>> Hi David, Ista and all,
>>
>> I have one related question Within one group I want to keep records
>> conditionally.
>> example within
>> group A I want keep rows that have " x" values ranged between 15 and 30.
>> group B I want keep rows that have " x" values ranged between 40 and 50.
>> group C I want keep rows that have " x" values ranged between 60 and 75.
>
> When you have a problem where there are multiple "parallel: parameters, the function to "reach for" is `mapply`.
>
> mapply( your_selection_func, group_vec, min_vec, max_vec)
>
> ... and this will probably return the values as a list (of dataframes if you build the function correctly, so you may may need to then do:
>
> do.call(rbind, ...)
do.call( rbind,
mapply( function(dat, grp, minx, maxx) {dat[ dat$GR==grp & dat$x >= minx & dat$x <= maxx, ]},
grp=LETTERS[1:3], minx=c(15,40,60), maxx=c(30,50,75) ,
MoreArgs=list(dat=DM),
IMPLIFY=FALSE))
GR x y
A.1 A 25 125
A.2 A 23 135
B.5 B 45 321
B.6 B 47 512
C.9 C 61 521
C.10 C 68 235
>
> 
> David.
>>
>>
>> DM < read.table( text='GR x y
>> A 25 125
>> A 23 135
>> A 14 145
>> A 35 230
>> B 45 321
>> B 47 512
>> B 53 123
>> B 55 451
>> C 61 521
>> C 68 235
>> C 85 258
>> C 80 654',header = TRUE, stringsAsFactors = FALSE)
>>
>>
>> The end result will be
>> A 25 125
>> A 23 135
>> B 45 321
>> B 47 512
>> C 61 521
>> C 68 235
>>
>> Thank you
>>
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' Gehm's Corollary to Clarke's Third Law
You could make numeric vectors, named by the group identifier, of the
contraints
and subscript it by group name:
> DM < read.table( text='GR x y
+ A 25 125
+ A 23 135
+ A 14 145
+ A 35 230
+ B 45 321
+ B 47 512
+ B 53 123
+ B 55 451
+ C 61 521
+ C 68 235
+ C 85 258
+ C 80 654',header = TRUE, stringsAsFactors = FALSE)
>
> GRmin < c(A=15, B=40, C=60)
> GRmax < c(A=30, B=50, C=75)
> subset(DM, x>=GRmin[GR] & x <=GRmax[GR])
GR x y
1 A 25 125
2 A 23 135
5 B 45 321
6 B 47 512
9 C 61 521
10 C 68 235
Or, if you want to completely avoid nonstandard evaluation:
> DM[ DM$x >= GRmin[DM$GR] & DM$x <= GRmax[DM$GR], ]
GR x y
1 A 25 125
2 A 23 135
5 B 45 321
6 B 47 512
9 C 61 521
10 C 68 235
Bill Dunlap
TIBCO Software
wdunlap tibco.com
Thank you All !!
Now, I have plenty of options to chose.
>
