

Hi! Newbie (self)learning R using P. Dalgaard's "Intro Stats w/ R"; not
new to statistics (have had gradlevel courses and work experience in
statistics) or vectorized programming syntax (have extensive experience
with MatLab, Python/NumPy, and IDL, and even a smidgena long time agoof
experience w/ Splus).
In exploring the use of is.na in the context of logical indexing, I've come
across the following puzzlingtome result:
> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
[1] 0.3534253 1.6731597 NA 0.2079209
[1] TRUE TRUE FALSE
[1] 0.3534253 1.6731597 0.2079209
As you can see, y is a four element vector, the third element of which is
NA; the next line gives what I would expectT T Fbecause the first two
elements are not NA but the third element is. The third line is what
confuses me: why is the result not the two element vector consisting of
simply the first two elements of the vector (or, if vectorized indexing in
R is implemented to return a vector the same length as the logical index
vector, which appears to be the case, at least the first two elements and
then either NA or NaN in the third slot, where the logical indexing vector
is FALSE): why does the implementation "go looking" for an element whose
index in the "original" vector, 4, is larger than BOTH the largest index
specified in the innermost subsetting index AND the size of the resulting
indexing vector? (Note: at first I didn't even understand why the result
wasn't simply
0.3534253 1.6731597 NA
but then I realized that the third logical index being FALSE, there was no
reason for *any* element to be there; but if there is, due to some
overriding rule regarding the length of the result relative to the length
of the indexer, shouldn't it revert back to *something* that indicates the
"FALSE"ness of that indexing element?)
Thanks!
DLG
> sessionInfo()
R version 3.5.2 (20181220)
Platform: x86_64appledarwin15.6.0 (64bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF8/en_US.UTF8/en_US.UTF8/C/en_US.UTF8/en_US.UTF8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ISwR_2.07
loaded via a namespace (and not attached):
[1] compiler_3.5.2 tools_3.5.2
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


From ?Arithmetic
the elements of shorter
vectors are recycled as necessary (with a ‘warning’ when they are
recycled only _fractionally_).
> tmp < !is.na(y[1:3])
> tmp
[1] TRUE TRUE FALSE
> c(tmp, tmp)
[1] TRUE TRUE FALSE TRUE TRUE FALSE
> c(tmp, tmp)[1:4]
[1] TRUE TRUE FALSE TRUE
> y[c(tmp, tmp)[1:4]]
[1] 0.3534253 1.6731597 0.2079209
>
The behavior is as documented. I am surprised that there is no
warning about partial recycling.
On Sat, Mar 9, 2019 at 9:03 PM David Goldsmith
< [hidden email]> wrote:
>
> Hi! Newbie (self)learning R using P. Dalgaard's "Intro Stats w/ R"; not
> new to statistics (have had gradlevel courses and work experience in
> statistics) or vectorized programming syntax (have extensive experience
> with MatLab, Python/NumPy, and IDL, and even a smidgena long time agoof
> experience w/ Splus).
>
> In exploring the use of is.na in the context of logical indexing, I've come
> across the following puzzlingtome result:
>
> > y; !is.na(y[1:3]); y[!is.na(y[1:3])]
> [1] 0.3534253 1.6731597 NA 0.2079209
> [1] TRUE TRUE FALSE
> [1] 0.3534253 1.6731597 0.2079209
>
> As you can see, y is a four element vector, the third element of which is
> NA; the next line gives what I would expectT T Fbecause the first two
> elements are not NA but the third element is. The third line is what
> confuses me: why is the result not the two element vector consisting of
> simply the first two elements of the vector (or, if vectorized indexing in
> R is implemented to return a vector the same length as the logical index
> vector, which appears to be the case, at least the first two elements and
> then either NA or NaN in the third slot, where the logical indexing vector
> is FALSE): why does the implementation "go looking" for an element whose
> index in the "original" vector, 4, is larger than BOTH the largest index
> specified in the innermost subsetting index AND the size of the resulting
> indexing vector? (Note: at first I didn't even understand why the result
> wasn't simply
>
> 0.3534253 1.6731597 NA
>
> but then I realized that the third logical index being FALSE, there was no
> reason for *any* element to be there; but if there is, due to some
> overriding rule regarding the length of the result relative to the length
> of the indexer, shouldn't it revert back to *something* that indicates the
> "FALSE"ness of that indexing element?)
>
> Thanks!
>
> DLG
>
> > sessionInfo()
> R version 3.5.2 (20181220)
> Platform: x86_64appledarwin15.6.0 (64bit)
> Running under: macOS High Sierra 10.13.6
>
> Matrix products: default
> BLAS:
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF8/en_US.UTF8/en_US.UTF8/C/en_US.UTF8/en_US.UTF8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ISwR_2.07
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.2 tools_3.5.2
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 3/10/19 2:36 PM, David Goldsmith wrote:
> Hi! Newbie (self)learning R using P. Dalgaard's "Intro Stats w/ R"; not
> new to statistics (have had gradlevel courses and work experience in
> statistics) or vectorized programming syntax (have extensive experience
> with MatLab, Python/NumPy, and IDL, and even a smidgena long time agoof
> experience w/ Splus).
>
> In exploring the use of is.na in the context of logical indexing, I've come
> across the following puzzlingtome result:
>
>> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
> [1] 0.3534253 1.6731597 NA 0.2079209
> [1] TRUE TRUE FALSE
> [1] 0.3534253 1.6731597 0.2079209
>
> As you can see, y is a four element vector, the third element of which is
> NA; the next line gives what I would expectT T Fbecause the first two
> elements are not NA but the third element is. The third line is what
> confuses me: why is the result not the two element vector consisting of
> simply the first two elements of the vector (or, if vectorized indexing in
> R is implemented to return a vector the same length as the logical index
> vector, which appears to be the case, at least the first two elements and
> then either NA or NaN in the third slot, where the logical indexing vector
> is FALSE): why does the implementation "go looking" for an element whose
> index in the "original" vector, 4, is larger than BOTH the largest index
> specified in the innermost subsetting index AND the size of the resulting
> indexing vector? (Note: at first I didn't even understand why the result
> wasn't simply
>
> 0.3534253 1.6731597 NA
>
> but then I realized that the third logical index being FALSE, there was no
> reason for *any* element to be there; but if there is, due to some
> overriding rule regarding the length of the result relative to the length
> of the indexer, shouldn't it revert back to *something* that indicates the
> "FALSE"ness of that indexing element?)
>
> Thanks!
It happens because R is ecoconcious and recycles. :)
Try:
ok < c(TRUE,TRUE,FALSE)
(1:4)[ok]
In general in R if there is an operation involving two vectors then
the shorter one gets recycled to provide sufficiently many entries to
match those of the longer vector.
This in the foregoing example the first entry of "ok" gets used again,
to make a length 4 vector to match up with 1:4. The result is the same
as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].
If you did (1:7)[ok] you'd get the same result as that from
(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
recycled 2 and 1/3 times.
Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .
Note that in the first two instances you get warnings, but in the third
you don't, since 6 is an integer multiple of 3.
Why aren't there warnings when logical indexing is used? I guess
because it would be annoying. Maybe.
Note that integer indices get recycled too, but the recycling is limited
so as not to produce redundancies. So
(1:4)[1:3] just (sensibly) gives
[1] 1 2 3
and *not*
[1] 1 2 3 1
Perhaps a bit subtle, but it gives what you'd actually *want* rather
than being pedantic about rules with a result that you wouldn't want.
cheers,
Rolf Turner
P.S. If you do
y[1:3][!is.na(y[1:3])]
i.e. if you're careful to match the length of the vector and the that of
the indices, you get what you initially expected.
R. T.
P^2.S. To the younger and wiser heads on this list: the help on "["
does not mention that the index vectors can be logical. I couldn't find
anything about logical indexing in the R help files. Is something
missing here, or am I just not looking in the right place?
R. T.

Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +6493737599 ext. 88276
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Regarding the mention of logical indexing, under ?Extract I see:
For [indexing only: i, j, ... can be logical vectors, indicating elements/slices to select. Such vectors are recycled if necessary to match the corresponding extent. i, j, ... can also be negative integers, indicating elements/slices to leave out of the selection.
On March 9, 2019 6:57:05 PM PST, Rolf Turner < [hidden email]> wrote:
>On 3/10/19 2:36 PM, David Goldsmith wrote:
>> Hi! Newbie (self)learning R using P. Dalgaard's "Intro Stats w/ R";
>not
>> new to statistics (have had gradlevel courses and work experience in
>> statistics) or vectorized programming syntax (have extensive
>experience
>> with MatLab, Python/NumPy, and IDL, and even a smidgena long time
>agoof
>> experience w/ Splus).
>>
>> In exploring the use of is.na in the context of logical indexing,
>I've come
>> across the following puzzlingtome result:
>>
>>> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
>> [1] 0.3534253 1.6731597 NA 0.2079209
>> [1] TRUE TRUE FALSE
>> [1] 0.3534253 1.6731597 0.2079209
>>
>> As you can see, y is a four element vector, the third element of
>which is
>> NA; the next line gives what I would expectT T Fbecause the first
>two
>> elements are not NA but the third element is. The third line is what
>> confuses me: why is the result not the two element vector consisting
>of
>> simply the first two elements of the vector (or, if vectorized
>indexing in
>> R is implemented to return a vector the same length as the logical
>index
>> vector, which appears to be the case, at least the first two elements
>and
>> then either NA or NaN in the third slot, where the logical indexing
>vector
>> is FALSE): why does the implementation "go looking" for an element
>whose
>> index in the "original" vector, 4, is larger than BOTH the largest
>index
>> specified in the innermost subsetting index AND the size of the
>resulting
>> indexing vector? (Note: at first I didn't even understand why the
>result
>> wasn't simply
>>
>> 0.3534253 1.6731597 NA
>>
>> but then I realized that the third logical index being FALSE, there
>was no
>> reason for *any* element to be there; but if there is, due to some
>> overriding rule regarding the length of the result relative to the
>length
>> of the indexer, shouldn't it revert back to *something* that
>indicates the
>> "FALSE"ness of that indexing element?)
>>
>> Thanks!
>
>It happens because R is ecoconcious and recycles. :)
>
>Try:
>
>ok < c(TRUE,TRUE,FALSE)
>(1:4)[ok]
>
>In general in R if there is an operation involving two vectors then
>the shorter one gets recycled to provide sufficiently many entries to
>match those of the longer vector.
>
>This in the foregoing example the first entry of "ok" gets used again,
>to make a length 4 vector to match up with 1:4. The result is the same
>
>as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].
>
>If you did (1:7)[ok] you'd get the same result as that from
>(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
>recycled 2 and 1/3 times.
>
>Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .
>
>Note that in the first two instances you get warnings, but in the third
>you don't, since 6 is an integer multiple of 3.
>
>Why aren't there warnings when logical indexing is used? I guess
>because it would be annoying. Maybe.
>
>Note that integer indices get recycled too, but the recycling is
>limited
>so as not to produce redundancies. So
>
>(1:4)[1:3] just (sensibly) gives
>
>[1] 1 2 3
>
>and *not*
>
>[1] 1 2 3 1
>
>Perhaps a bit subtle, but it gives what you'd actually *want* rather
>than being pedantic about rules with a result that you wouldn't want.
>
>cheers,
>
>Rolf Turner
>
>P.S. If you do
>
>y[1:3][!is.na(y[1:3])]
>
>i.e. if you're careful to match the length of the vector and the that
>of
>the indices, you get what you initially expected.
>
>R. T.
>
>P^2.S. To the younger and wiser heads on this list: the help on "["
>does not mention that the index vectors can be logical. I couldn't
>find
>anything about logical indexing in the R help files. Is something
>missing here, or am I just not looking in the right place?
>
>R. T.

Sent from my phone. Please excuse my brevity.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 3/10/19 6:07 PM, Jeff Newmiller wrote:
> Regarding the mention of logical indexing, under ?Extract I see:
>
> For [indexing only: i, j, ... can be logical vectors, indicating
> elements/slices to select. Such vectors are recycled if necessary to
> match the corresponding extent. i, j, ... can also be negative
> integers, indicating elements/slices to leave out of the selection.
Dang! It was staring me in the face all the time, and I didn't see it!
Grrrrrr.
Thanks Jeff.
cheers,
Rolf

Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +6493737599 ext. 88276
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks, all. I had read about recycling, but I guess I didn't fully
appreciate all the "weirdness" it might produce. :/
With this explained, I'm going to ask a followup, which is only
contextually related: the impetus for this discovery was checking "corner
cases" to determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to
determine equality of two vectors containing NA's. Between the above
result; my related discovery that this indexing preserves relative
positional info but not absolute positional info; and the performance
penalty when comparing long vectors that may be unequal "early on"; I've
concluded thatif it (can be made to) "short circuit"it would probably
be better to use an implicit loop. So that's my Q: will (or can) an
implicit loop (be made to) "exit early" if a specified condition is met
before all indices have been checked?
Thanks again!
DLG
On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller < [hidden email]>
wrote:
> Regarding the mention of logical indexing, under ?Extract I see:
>
> For [indexing only: i, j, ... can be logical vectors, indicating
> elements/slices to select. Such vectors are recycled if necessary to match
> the corresponding extent. i, j, ... can also be negative integers,
> indicating elements/slices to leave out of the selection.
>
> On March 9, 2019 6:57:05 PM PST, Rolf Turner < [hidden email]>
> wrote:
> >On 3/10/19 2:36 PM, David Goldsmith wrote:
> >> Hi! Newbie (self)learning R using P. Dalgaard's "Intro Stats w/ R";
> >not
> >> new to statistics (have had gradlevel courses and work experience in
> >> statistics) or vectorized programming syntax (have extensive
> >experience
> >> with MatLab, Python/NumPy, and IDL, and even a smidgena long time
> >agoof
> >> experience w/ Splus).
> >>
> >> In exploring the use of is.na in the context of logical indexing,
> >I've come
> >> across the following puzzlingtome result:
> >>
> >>> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
> >> [1] 0.3534253 1.6731597 NA 0.2079209
> >> [1] TRUE TRUE FALSE
> >> [1] 0.3534253 1.6731597 0.2079209
> >>
> >> As you can see, y is a four element vector, the third element of
> >which is
> >> NA; the next line gives what I would expectT T Fbecause the first
> >two
> >> elements are not NA but the third element is. The third line is what
> >> confuses me: why is the result not the two element vector consisting
> >of
> >> simply the first two elements of the vector (or, if vectorized
> >indexing in
> >> R is implemented to return a vector the same length as the logical
> >index
> >> vector, which appears to be the case, at least the first two elements
> >and
> >> then either NA or NaN in the third slot, where the logical indexing
> >vector
> >> is FALSE): why does the implementation "go looking" for an element
> >whose
> >> index in the "original" vector, 4, is larger than BOTH the largest
> >index
> >> specified in the innermost subsetting index AND the size of the
> >resulting
> >> indexing vector? (Note: at first I didn't even understand why the
> >result
> >> wasn't simply
> >>
> >> 0.3534253 1.6731597 NA
> >>
> >> but then I realized that the third logical index being FALSE, there
> >was no
> >> reason for *any* element to be there; but if there is, due to some
> >> overriding rule regarding the length of the result relative to the
> >length
> >> of the indexer, shouldn't it revert back to *something* that
> >indicates the
> >> "FALSE"ness of that indexing element?)
> >>
> >> Thanks!
> >
> >It happens because R is ecoconcious and recycles. :)
> >
> >Try:
> >
> >ok < c(TRUE,TRUE,FALSE)
> >(1:4)[ok]
> >
> >In general in R if there is an operation involving two vectors then
> >the shorter one gets recycled to provide sufficiently many entries to
> >match those of the longer vector.
> >
> >This in the foregoing example the first entry of "ok" gets used again,
> >to make a length 4 vector to match up with 1:4. The result is the same
> >
> >as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].
> >
> >If you did (1:7)[ok] you'd get the same result as that from
> >(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
> >recycled 2 and 1/3 times.
> >
> >Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .
> >
> >Note that in the first two instances you get warnings, but in the third
> >you don't, since 6 is an integer multiple of 3.
> >
> >Why aren't there warnings when logical indexing is used? I guess
> >because it would be annoying. Maybe.
> >
> >Note that integer indices get recycled too, but the recycling is
> >limited
> >so as not to produce redundancies. So
> >
> >(1:4)[1:3] just (sensibly) gives
> >
> >[1] 1 2 3
> >
> >and *not*
> >
> >[1] 1 2 3 1
> >
> >Perhaps a bit subtle, but it gives what you'd actually *want* rather
> >than being pedantic about rules with a result that you wouldn't want.
> >
> >cheers,
> >
> >Rolf Turner
> >
> >P.S. If you do
> >
> >y[1:3][!is.na(y[1:3])]
> >
> >i.e. if you're careful to match the length of the vector and the that
> >of
> >the indices, you get what you initially expected.
> >
> >R. T.
> >
> >P^2.S. To the younger and wiser heads on this list: the help on "["
> >does not mention that the index vectors can be logical. I couldn't
> >find
> >anything about logical indexing in the R help files. Is something
> >missing here, or am I just not looking in the right place?
> >
> >R. T.
>
> 
> Sent from my phone. Please excuse my brevity.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 10/03/2019 1:15 a.m., David Goldsmith wrote:
> Thanks, all. I had read about recycling, but I guess I didn't fully
> appreciate all the "weirdness" it might produce. :/
>
> With this explained, I'm going to ask a followup, which is only
> contextually related: the impetus for this discovery was checking "corner
> cases" to determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to
> determine equality of two vectors containing NA's. Between the above
> result; my related discovery that this indexing preserves relative
> positional info but not absolute positional info; and the performance
> penalty when comparing long vectors that may be unequal "early on"; I've
> concluded thatif it (can be made to) "short circuit"it would probably
> be better to use an implicit loop. So that's my Q: will (or can) an
> implicit loop (be made to) "exit early" if a specified condition is met
> before all indices have been checked?
You could use the identical() function. When I have vectors of length 1
million, all(x == y) takes about 3 milliseconds when the difference is
in the last value, 2 milliseconds when it comes first. identical(x, y)
takes about 5 milliseconds when the difference comes last, but 0.006
milliseconds when it comes first. Of course, all(x == y) and
identical(x, y) do slightly different tests: read the docs!
Duncan Murdoch
>
> Thanks again!
>
> DLG
>
> On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller < [hidden email]>
> wrote:
>
>> Regarding the mention of logical indexing, under ?Extract I see:
>>
>> For [indexing only: i, j, ... can be logical vectors, indicating
>> elements/slices to select. Such vectors are recycled if necessary to match
>> the corresponding extent. i, j, ... can also be negative integers,
>> indicating elements/slices to leave out of the selection.
>>
>> On March 9, 2019 6:57:05 PM PST, Rolf Turner < [hidden email]>
>> wrote:
>>> On 3/10/19 2:36 PM, David Goldsmith wrote:
>>>> Hi! Newbie (self)learning R using P. Dalgaard's "Intro Stats w/ R";
>>> not
>>>> new to statistics (have had gradlevel courses and work experience in
>>>> statistics) or vectorized programming syntax (have extensive
>>> experience
>>>> with MatLab, Python/NumPy, and IDL, and even a smidgena long time
>>> agoof
>>>> experience w/ Splus).
>>>>
>>>> In exploring the use of is.na in the context of logical indexing,
>>> I've come
>>>> across the following puzzlingtome result:
>>>>
>>>>> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
>>>> [1] 0.3534253 1.6731597 NA 0.2079209
>>>> [1] TRUE TRUE FALSE
>>>> [1] 0.3534253 1.6731597 0.2079209
>>>>
>>>> As you can see, y is a four element vector, the third element of
>>> which is
>>>> NA; the next line gives what I would expectT T Fbecause the first
>>> two
>>>> elements are not NA but the third element is. The third line is what
>>>> confuses me: why is the result not the two element vector consisting
>>> of
>>>> simply the first two elements of the vector (or, if vectorized
>>> indexing in
>>>> R is implemented to return a vector the same length as the logical
>>> index
>>>> vector, which appears to be the case, at least the first two elements
>>> and
>>>> then either NA or NaN in the third slot, where the logical indexing
>>> vector
>>>> is FALSE): why does the implementation "go looking" for an element
>>> whose
>>>> index in the "original" vector, 4, is larger than BOTH the largest
>>> index
>>>> specified in the innermost subsetting index AND the size of the
>>> resulting
>>>> indexing vector? (Note: at first I didn't even understand why the
>>> result
>>>> wasn't simply
>>>>
>>>> 0.3534253 1.6731597 NA
>>>>
>>>> but then I realized that the third logical index being FALSE, there
>>> was no
>>>> reason for *any* element to be there; but if there is, due to some
>>>> overriding rule regarding the length of the result relative to the
>>> length
>>>> of the indexer, shouldn't it revert back to *something* that
>>> indicates the
>>>> "FALSE"ness of that indexing element?)
>>>>
>>>> Thanks!
>>>
>>> It happens because R is ecoconcious and recycles. :)
>>>
>>> Try:
>>>
>>> ok < c(TRUE,TRUE,FALSE)
>>> (1:4)[ok]
>>>
>>> In general in R if there is an operation involving two vectors then
>>> the shorter one gets recycled to provide sufficiently many entries to
>>> match those of the longer vector.
>>>
>>> This in the foregoing example the first entry of "ok" gets used again,
>>> to make a length 4 vector to match up with 1:4. The result is the same
>>>
>>> as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].
>>>
>>> If you did (1:7)[ok] you'd get the same result as that from
>>> (1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
>>> recycled 2 and 1/3 times.
>>>
>>> Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .
>>>
>>> Note that in the first two instances you get warnings, but in the third
>>> you don't, since 6 is an integer multiple of 3.
>>>
>>> Why aren't there warnings when logical indexing is used? I guess
>>> because it would be annoying. Maybe.
>>>
>>> Note that integer indices get recycled too, but the recycling is
>>> limited
>>> so as not to produce redundancies. So
>>>
>>> (1:4)[1:3] just (sensibly) gives
>>>
>>> [1] 1 2 3
>>>
>>> and *not*
>>>
>>> [1] 1 2 3 1
>>>
>>> Perhaps a bit subtle, but it gives what you'd actually *want* rather
>>> than being pedantic about rules with a result that you wouldn't want.
>>>
>>> cheers,
>>>
>>> Rolf Turner
>>>
>>> P.S. If you do
>>>
>>> y[1:3][!is.na(y[1:3])]
>>>
>>> i.e. if you're careful to match the length of the vector and the that
>>> of
>>> the indices, you get what you initially expected.
>>>
>>> R. T.
>>>
>>> P^2.S. To the younger and wiser heads on this list: the help on "["
>>> does not mention that the index vectors can be logical. I couldn't
>>> find
>>> anything about logical indexing in the R help files. Is something
>>> missing here, or am I just not looking in the right place?
>>>
>>> R. T.
>>
>> 
>> Sent from my phone. Please excuse my brevity.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi
Do you want something like this?
> x < c(1,2,NA, 3, 4, 5, NA, 6,7,8, NA, NA, 9,10)
> y < c(1,2,NA, NA, 3, 4, 5, 6, NA, 7,8, NA, NA, 9,10)
> identical(x[which(!is.na(x))], y[which(!is.na(y))])
[1] TRUE
If I expect NA and want to extract or compare something, I tend to use which to select only non NA elements.
Cheers
Petr
> Original Message
> From: Rhelp < [hidden email]> On Behalf Of David Goldsmith
> Sent: Sunday, March 10, 2019 7:16 AM
> Cc: [hidden email]
> Subject: Re: [R] [FORGED] Q re: logical indexing with is.na
>
> Thanks, all. I had read about recycling, but I guess I didn't fully appreciate all
> the "weirdness" it might produce. :/
>
> With this explained, I'm going to ask a followup, which is only contextually
> related: the impetus for this discovery was checking "corner cases" to
> determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to determine equality of
> two vectors containing NA's. Between the above result; my related discovery
> that this indexing preserves relative positional info but not absolute positional
> info; and the performance penalty when comparing long vectors that may be
> unequal "early on"; I've concluded thatif it (can be made to) "short circuit"it
> would probably be better to use an implicit loop. So that's my Q: will (or can)
> an implicit loop (be made to) "exit early" if a specified condition is met before
> all indices have been checked?
>
> Thanks again!
>
> DLG
>
> On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller < [hidden email]>
> wrote:
>
> > Regarding the mention of logical indexing, under ?Extract I see:
> >
> > For [indexing only: i, j, ... can be logical vectors, indicating
> > elements/slices to select. Such vectors are recycled if necessary to
> > match the corresponding extent. i, j, ... can also be negative
> > integers, indicating elements/slices to leave out of the selection.
> >
> > On March 9, 2019 6:57:05 PM PST, Rolf Turner < [hidden email]>
> > wrote:
> > >On 3/10/19 2:36 PM, David Goldsmith wrote:
> > >> Hi! Newbie (self)learning R using P. Dalgaard's "Intro Stats w/
> > >> R";
> > >not
> > >> new to statistics (have had gradlevel courses and work experience
> > >> in
> > >> statistics) or vectorized programming syntax (have extensive
> > >experience
> > >> with MatLab, Python/NumPy, and IDL, and even a smidgena long time
> > >agoof
> > >> experience w/ Splus).
> > >>
> > >> In exploring the use of is.na in the context of logical indexing,
> > >I've come
> > >> across the following puzzlingtome result:
> > >>
> > >>> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
> > >> [1] 0.3534253 1.6731597 NA 0.2079209
> > >> [1] TRUE TRUE FALSE
> > >> [1] 0.3534253 1.6731597 0.2079209
> > >>
> > >> As you can see, y is a four element vector, the third element of
> > >which is
> > >> NA; the next line gives what I would expectT T Fbecause the
> > >> first
> > >two
> > >> elements are not NA but the third element is. The third line is
> > >> what confuses me: why is the result not the two element vector
> > >> consisting
> > >of
> > >> simply the first two elements of the vector (or, if vectorized
> > >indexing in
> > >> R is implemented to return a vector the same length as the logical
> > >index
> > >> vector, which appears to be the case, at least the first two
> > >> elements
> > >and
> > >> then either NA or NaN in the third slot, where the logical indexing
> > >vector
> > >> is FALSE): why does the implementation "go looking" for an element
> > >whose
> > >> index in the "original" vector, 4, is larger than BOTH the largest
> > >index
> > >> specified in the innermost subsetting index AND the size of the
> > >resulting
> > >> indexing vector? (Note: at first I didn't even understand why the
> > >result
> > >> wasn't simply
> > >>
> > >> 0.3534253 1.6731597 NA
> > >>
> > >> but then I realized that the third logical index being FALSE, there
> > >was no
> > >> reason for *any* element to be there; but if there is, due to some
> > >> overriding rule regarding the length of the result relative to the
> > >length
> > >> of the indexer, shouldn't it revert back to *something* that
> > >indicates the
> > >> "FALSE"ness of that indexing element?)
> > >>
> > >> Thanks!
> > >
> > >It happens because R is ecoconcious and recycles. :)
> > >
> > >Try:
> > >
> > >ok < c(TRUE,TRUE,FALSE)
> > >(1:4)[ok]
> > >
> > >In general in R if there is an operation involving two vectors then
> > >the shorter one gets recycled to provide sufficiently many entries to
> > >match those of the longer vector.
> > >
> > >This in the foregoing example the first entry of "ok" gets used
> > >again, to make a length 4 vector to match up with 1:4. The result is
> > >the same
> > >
> > >as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].
> > >
> > >If you did (1:7)[ok] you'd get the same result as that from
> > >(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
> > >recycled 2 and 1/3 times.
> > >
> > >Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .
> > >
> > >Note that in the first two instances you get warnings, but in the
> > >third you don't, since 6 is an integer multiple of 3.
> > >
> > >Why aren't there warnings when logical indexing is used? I guess
> > >because it would be annoying. Maybe.
> > >
> > >Note that integer indices get recycled too, but the recycling is
> > >limited so as not to produce redundancies. So
> > >
> > >(1:4)[1:3] just (sensibly) gives
> > >
> > >[1] 1 2 3
> > >
> > >and *not*
> > >
> > >[1] 1 2 3 1
> > >
> > >Perhaps a bit subtle, but it gives what you'd actually *want* rather
> > >than being pedantic about rules with a result that you wouldn't want.
> > >
> > >cheers,
> > >
> > >Rolf Turner
> > >
> > >P.S. If you do
> > >
> > >y[1:3][!is.na(y[1:3])]
> > >
> > >i.e. if you're careful to match the length of the vector and the that
> > >of the indices, you get what you initially expected.
> > >
> > >R. T.
> > >
> > >P^2.S. To the younger and wiser heads on this list: the help on "["
> > >does not mention that the index vectors can be logical. I couldn't
> > >find anything about logical indexing in the R help files. Is
> > >something missing here, or am I just not looking in the right place?
> > >
> > >R. T.
> >
> > 
> > Sent from my phone. Please excuse my brevity.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/posting> guide.html
> and provide commented, minimal, selfcontained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasadyochranyosobnichudaju/  Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personaldataprotectionprinciples/Důvěrnost: Tento email a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01dovetek/  This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01disclaimer/______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


logical indexing requires the logical index to be of the same length as the vector being indexed. If it is not, then the index
is wrapped to be of sufficient length. The result on line 3 is
y[c(TRUE, TRUE, FALSE, TRUE)] where the last TRUE was
originally the first component of !is.na(y[1:3])
Grant Izmirlian, Ph.D.
Mathematical Statistician
[hidden email]
Delivery Address:
9609 Medical Center Dr, RM 5E130
Rockville MD 20850
Postal Address:
BG 9609 RM 5E130 MSC 9789
9609 Medical Center Dr
Bethesda, MD 208929789
ofc: 2402767025
cell: 2408887367
fax: 2402767845
________________________________
From: David Goldsmith < [hidden email]>
Sent: Saturday, March 9, 2019 8:36 PM
To: [hidden email]
Subject: [R] Q re: logical indexing with is.na
Hi! Newbie (self)learning R using P. Dalgaard's "Intro Stats w/ R"; not
new to statistics (have had gradlevel courses and work experience in
statistics) or vectorized programming syntax (have extensive experience
with MatLab, Python/NumPy, and IDL, and even a smidgena long time agoof
experience w/ Splus).
In exploring the use of is.na in the context of logical indexing, I've come
across the following puzzlingtome result:
> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
[1] 0.3534253 1.6731597 NA 0.2079209
[1] TRUE TRUE FALSE
[1] 0.3534253 1.6731597 0.2079209
As you can see, y is a four element vector, the third element of which is
NA; the next line gives what I would expectT T Fbecause the first two
elements are not NA but the third element is. The third line is what
confuses me: why is the result not the two element vector consisting of
simply the first two elements of the vector (or, if vectorized indexing in
R is implemented to return a vector the same length as the logical index
vector, which appears to be the case, at least the first two elements and
then either NA or NaN in the third slot, where the logical indexing vector
is FALSE): why does the implementation "go looking" for an element whose
index in the "original" vector, 4, is larger than BOTH the largest index
specified in the innermost subsetting index AND the size of the resulting
indexing vector? (Note: at first I didn't even understand why the result
wasn't simply
0.3534253 1.6731597 NA
but then I realized that the third logical index being FALSE, there was no
reason for *any* element to be there; but if there is, due to some
overriding rule regarding the length of the result relative to the length
of the indexer, shouldn't it revert back to *something* that indicates the
"FALSE"ness of that indexing element?)
Thanks!
DLG
> sessionInfo()
R version 3.5.2 (20181220)
Platform: x86_64appledarwin15.6.0 (64bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF8/en_US.UTF8/en_US.UTF8/C/en_US.UTF8/en_US.UTF8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ISwR_2.07
loaded via a namespace (and not attached):
[1] compiler_3.5.2 tools_3.5.2
[[alternative HTML version deleted]]
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

