I would very much appreciate the help with following:
I need to calculate the mean of different lat/long points that should be
grouped.
However I would like that r excludes taking values that are different in
only last decimal.
So instead 4 values in the group it would calculate the mean for only 3(
excluding the ones that differs in only one decimal).
# construct the dataframe
`TKQUADRANT` < c(9161,9162,9163,9164,10152,10154,10161,10163)
LAT < c(55.07496,55.07496,55.02495,55.02496
,54.97496,54.92495,54.97496,54.92496)
LON <
c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774)
df < data.frame(`TKQUADRANT`=`TKQUADRANT`,LAT=LAT,LON=LON)
I would like to group the data and calculate means by group but in a way to
exclude every number that differs in only last decimal.
Also please see pdf. exampleattached .
Use round() with the appropriate "digits" argument. Then use unique() to define your groups.
> On 20181115, at 11:48, sasa kosanic < [hidden email]> wrote:
>
> Dear All,
>
> I would very much appreciate the help with following:
> I need to calculate the mean of different lat/long points that should be
> grouped.
> However I would like that r excludes taking values that are different in
> only last decimal.
> So instead 4 values in the group it would calculate the mean for only 3(
> excluding the ones that differs in only one decimal).
> # construct the dataframe
> `TKQUADRANT` < c(9161,9162,9163,9164,10152,10154,10161,10163)
> LAT < c(55.07496,55.07496,55.02495,55.02496
> ,54.97496,54.92495,54.97496,54.92496)
> LON <
> c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774)
> df < data.frame(`TKQUADRANT`=`TKQUADRANT`,LAT=LAT,LON=LON)
>
>
> I would like to group the data and calculate means by group but in a way to
> exclude every number that differs in only last decimal.
>
>
> Also please see pdf. exampleattached .
>
On Thu, Nov 15, 2018 at 10:40 AM Boris Steipe < [hidden email]> wrote:
>
> Use round() with the appropriate "digits" argument. Then use unique() to define your groups.
No.
> round(c(.124,.126),2)
[1] 0.12 0.13
As I understand it, the OP said he wanted the last decimal to be ignored.
The OP also did not specify what he wanted to calculate means of. I
assume TKQUADRANT. It is also not clear whether the calculations are
to be done separately by latitude and longitude, or both together.
I'll assume separately. In which case, the calculation of TKQUADRANT
means by e.g. grouped according to 4 decimal digit values of latitude
could be done using(using the provided example data):
(Note: ignore all that follows if my interpretation is incorrect)
> with(df, tapply(TK.QUADRANT, floor(1e4*LAT),mean))
549249 549749 550249 550749
10158.5 10156.5 9163.5 9161.5
## Note that this assumes positive values of latitude, because:
> floor(c(1.2,1.2))
[1] 2 1
This could be easily modifed if both positive and negative values were
used: e.g.
> x <c(1.2,1.2)
> sign(x)*floor(abs(x))
[1] 1 1
Confession: I suspect that this exponentiate and floor() procedure
might fail with lots of decimal places due to the usual issues of
binary representations of decimals. But maybe it fails even here. If
so, I would appreciate someone pointing this out and, if possible,
providing a better strategy.
On further thought  and subject to my prior interpretation  I
think a foolproof way of truncating to 4 decimal digits is to treat
them as character strings rather than numerics and use regex
operations:
> with(df,tapply(TK.QUADRANT, sub("(\\.[[:digit:]]{4}).*","\\1",as.character(LAT)),mean))
54.9249 54.9749 55.0249 55.0749
10158.5 10156.5 9163.5 9161.5
I should have realized this before!!!!
On Thu, Nov 15, 2018 at 12:19 PM Bert Gunter < [hidden email]> wrote:
>
> On Thu, Nov 15, 2018 at 10:40 AM Boris Steipe < [hidden email]> wrote:
> >
> > Use round() with the appropriate "digits" argument. Then use unique() to define your groups.
>
> No.
> > round(c(.124,.126),2)
> [1] 0.12 0.13
>
> As I understand it, the OP said he wanted the last decimal to be ignored.
>
> The OP also did not specify what he wanted to calculate means of. I
> assume TKQUADRANT. It is also not clear whether the calculations are
> to be done separately by latitude and longitude, or both together.
> I'll assume separately. In which case, the calculation of TKQUADRANT
> means by e.g. grouped according to 4 decimal digit values of latitude
> could be done using(using the provided example data):
> (Note: ignore all that follows if my interpretation is incorrect)
>
> > with(df, tapply(TK.QUADRANT, floor(1e4*LAT),mean))
> 549249 549749 550249 550749
> 10158.5 10156.5 9163.5 9161.5
>
> ## Note that this assumes positive values of latitude, because:
> > floor(c(1.2,1.2))
> [1] 2 1
>
> This could be easily modifed if both positive and negative values were
> used: e.g.
> > x <c(1.2,1.2)
> > sign(x)*floor(abs(x))
> [1] 1 1
>
> Confession: I suspect that this exponentiate and floor() procedure
> might fail with lots of decimal places due to the usual issues of
> binary representations of decimals. But maybe it fails even here. If
> so, I would appreciate someone pointing this out and, if possible,
> providing a better strategy.
>
> Cheers,
> Bert
>
>
>
> >
> > HTH,
> > B.
> >
> >
> > > On 20181115, at 11:48, sasa kosanic < [hidden email]> wrote:
> > >
> > > Dear All,
> > >
> > > I would very much appreciate the help with following:
> > > I need to calculate the mean of different lat/long points that should be
> > > grouped.
> > > However I would like that r excludes taking values that are different in
> > > only last decimal.
> > > So instead 4 values in the group it would calculate the mean for only 3(
> > > excluding the ones that differs in only one decimal).
> > > # construct the dataframe
> > > `TKQUADRANT` < c(9161,9162,9163,9164,10152,10154,10161,10163)
> > > LAT < c(55.07496,55.07496,55.02495,55.02496
> > > ,54.97496,54.92495,54.97496,54.92496)
> > > LON <
> > > c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774)
> > > df < data.frame(`TKQUADRANT`=`TKQUADRANT`,LAT=LAT,LON=LON)
> > >
> > >
> > > I would like to group the data and calculate means by group but in a way to
> > > exclude every number that differs in only last decimal.
> > >
> > >
> > > Also please see pdf. exampleattached .
> > >
> > > Many thanks!
> > > Best wishes,
> > > Sasha
> > >
> > > 
> > >
> > > Dr Sasha Kosanic
> > > Ecology Lab (Biology Department)
> > > Room M644
> > > University of Konstanz
> > > Universitätsstraße 10
> > > D78464 Konstanz
> > > Phone: +49 7531 883321 & +49 (0)175 9172503
> > >
> > > http://cms.unikonstanz.de/vkleunen/> > > https://tinyurl.com/y8u5wyoj> > > https://tinyurl.com/cgec6tu> > > <dataset example.pdf>______________________________________________
> > > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/rhelp> > > PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> > > and provide commented, minimal, selfcontained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
Hi Sasa,
Those latitude look equidistant with a separation of 0.05.
I guess you want to calculate the zonal mean along the latitude, right?
#estimate the lower and upper latitude for the cut
lat.dist=0.05 #equidistant spacing along latitude
lat.min=min(df$LAT,na.rm=T)lat.dist/2
lat.max=max(df$LAT,na.rm=T)+lat.dist/2
cat.lat=cut(df$LAT,breaks=seq(lat.min,lat.max,by=lat.dist));cat.lat
#just show which indices are grouped
tapply(df$TK.QUADRANT,cat.lat, paste,collapse=",")
#calculate the mean of whatever column. The lat.mean will have NA for any latitude cell where the df column has no data
lat.mean=tapply(df$TK.QUADRANT,cat.lat, mean)
#if you need to remove any potential NAs
lat.mean[!is.na(lat.mean)]
On 15. Nov 2018, at 17:48, sasa kosanic < [hidden email]<mailto: [hidden email]>> wrote:
`TKQUADRANT` < c(9161,9162,9163,9164,10152,10154,10161,10163)
LAT < c(55.07496,55.07496,55.02495,55.02496
,54.97496,54.92495,54.97496,54.92496)
LON <
c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774)
df < data.frame(`TKQUADRANT`=`TKQUADRANT`,LAT=LAT,LON=LON)
