help with grouping data and calculating the means

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

help with grouping data and calculating the means

SasaK
Dear All,

I would very much appreciate the help with following:
I need to calculate the mean of  different lat/long points that should be
grouped.
However I would like that r excludes taking  values that are different in
only last decimal.
So instead 4 values in the group it would calculate the mean for only 3(
excluding the ones that differs in only one decimal).
# construct the dataframe
`TK-QUADRANT` <- c(9161,9162,9163,9164,10152,10154,10161,10163)
LAT <- c(55.07496,55.07496,55.02495,55.02496
,54.97496,54.92495,54.97496,54.92496)
LON <-
c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774)
df <- data.frame(`TK-QUADRANT`=`TK-QUADRANT`,LAT=LAT,LON=LON)


I would like to group the data and calculate means by group but in a way to
exclude every number that differs in only last decimal.


Also please see pdf. example-attached .

Many thanks!
Best wishes,
Sasha

--

Dr Sasha Kosanic
Ecology Lab (Biology Department)
Room M644
University of Konstanz
Universitätsstraße 10
D-78464 Konstanz
Phone: +49 7531 883321 & +49 (0)175 9172503

http://cms.uni-konstanz.de/vkleunen/
https://tinyurl.com/y8u5wyoj
https://tinyurl.com/cgec6tu

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

dataset example.pdf (315K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: help with grouping data and calculating the means

Boris Steipe
Use round() with the appropriate  "digits" argument. Then use unique() to define your groups.

HTH,
B.


> On 2018-11-15, at 11:48, sasa kosanic <[hidden email]> wrote:
>
> Dear All,
>
> I would very much appreciate the help with following:
> I need to calculate the mean of  different lat/long points that should be
> grouped.
> However I would like that r excludes taking  values that are different in
> only last decimal.
> So instead 4 values in the group it would calculate the mean for only 3(
> excluding the ones that differs in only one decimal).
> # construct the dataframe
> `TK-QUADRANT` <- c(9161,9162,9163,9164,10152,10154,10161,10163)
> LAT <- c(55.07496,55.07496,55.02495,55.02496
> ,54.97496,54.92495,54.97496,54.92496)
> LON <-
> c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774)
> df <- data.frame(`TK-QUADRANT`=`TK-QUADRANT`,LAT=LAT,LON=LON)
>
>
> I would like to group the data and calculate means by group but in a way to
> exclude every number that differs in only last decimal.
>
>
> Also please see pdf. example-attached .
>
> Many thanks!
> Best wishes,
> Sasha
>
> --
>
> Dr Sasha Kosanic
> Ecology Lab (Biology Department)
> Room M644
> University of Konstanz
> Universitätsstraße 10
> D-78464 Konstanz
> Phone: +49 7531 883321 & +49 (0)175 9172503
>
> http://cms.uni-konstanz.de/vkleunen/
> https://tinyurl.com/y8u5wyoj
> https://tinyurl.com/cgec6tu
> <dataset example.pdf>______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help with grouping data and calculating the means

Bert Gunter-2
On Thu, Nov 15, 2018 at 10:40 AM Boris Steipe <[hidden email]> wrote:
>
> Use round() with the appropriate  "digits" argument. Then use unique() to define your groups.

No.
> round(c(.124,.126),2)
[1] 0.12 0.13

As I understand it, the OP said he wanted the last decimal to be ignored.

The OP also did not specify what he wanted to calculate means of. I
assume TK-QUADRANT. It is also not clear whether the calculations are
to be done separately by latitude and longitude, or both together.
I'll assume separately. In which case, the calculation of TK-QUADRANT
means by e.g. grouped according to 4 decimal digit values of latitude
could be done using(using the provided example data):
(Note: ignore all that follows if my interpretation is incorrect)

> with(df, tapply(TK.QUADRANT, floor(1e4*LAT),mean))
 549249  549749  550249  550749
10158.5 10156.5  9163.5  9161.5

## Note that this assumes positive values of latitude, because:
> floor(c(-1.2,1.2))
[1] -2  1

This could be easily modifed if both positive and negative values were
used: e.g.
> x <-c(-1.2,1.2)
> sign(x)*floor(abs(x))
[1] -1  1

Confession: I suspect that this exponentiate and floor() procedure
might fail with lots of decimal places due to the usual issues of
binary representations of decimals. But maybe it fails even here. If
so, I would appreciate someone pointing this out and, if possible,
providing a better strategy.

Cheers,
Bert



>
> HTH,
> B.
>
>
> > On 2018-11-15, at 11:48, sasa kosanic <[hidden email]> wrote:
> >
> > Dear All,
> >
> > I would very much appreciate the help with following:
> > I need to calculate the mean of  different lat/long points that should be
> > grouped.
> > However I would like that r excludes taking  values that are different in
> > only last decimal.
> > So instead 4 values in the group it would calculate the mean for only 3(
> > excluding the ones that differs in only one decimal).
> > # construct the dataframe
> > `TK-QUADRANT` <- c(9161,9162,9163,9164,10152,10154,10161,10163)
> > LAT <- c(55.07496,55.07496,55.02495,55.02496
> > ,54.97496,54.92495,54.97496,54.92496)
> > LON <-
> > c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774)
> > df <- data.frame(`TK-QUADRANT`=`TK-QUADRANT`,LAT=LAT,LON=LON)
> >
> >
> > I would like to group the data and calculate means by group but in a way to
> > exclude every number that differs in only last decimal.
> >
> >
> > Also please see pdf. example-attached .
> >
> > Many thanks!
> > Best wishes,
> > Sasha
> >
> > --
> >
> > Dr Sasha Kosanic
> > Ecology Lab (Biology Department)
> > Room M644
> > University of Konstanz
> > Universitätsstraße 10
> > D-78464 Konstanz
> > Phone: +49 7531 883321 & +49 (0)175 9172503
> >
> > http://cms.uni-konstanz.de/vkleunen/
> > https://tinyurl.com/y8u5wyoj
> > https://tinyurl.com/cgec6tu
> > <dataset example.pdf>______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help with grouping data and calculating the means

Bert Gunter-2
On further thought -- and subject to my prior interpretation -- I
think a foolproof way of truncating to 4 decimal digits is to treat
them as character strings rather than numerics and use regex
operations:

> with(df,tapply(TK.QUADRANT, sub("(\\.[[:digit:]]{4}).*","\\1",as.character(LAT)),mean))
54.9249 54.9749 55.0249 55.0749
10158.5 10156.5  9163.5  9161.5

I should have realized this before!!!!

Cheers,
Bert





Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, Nov 15, 2018 at 12:19 PM Bert Gunter <[hidden email]> wrote:

>
> On Thu, Nov 15, 2018 at 10:40 AM Boris Steipe <[hidden email]> wrote:
> >
> > Use round() with the appropriate  "digits" argument. Then use unique() to define your groups.
>
> No.
> > round(c(.124,.126),2)
> [1] 0.12 0.13
>
> As I understand it, the OP said he wanted the last decimal to be ignored.
>
> The OP also did not specify what he wanted to calculate means of. I
> assume TK-QUADRANT. It is also not clear whether the calculations are
> to be done separately by latitude and longitude, or both together.
> I'll assume separately. In which case, the calculation of TK-QUADRANT
> means by e.g. grouped according to 4 decimal digit values of latitude
> could be done using(using the provided example data):
> (Note: ignore all that follows if my interpretation is incorrect)
>
> > with(df, tapply(TK.QUADRANT, floor(1e4*LAT),mean))
>  549249  549749  550249  550749
> 10158.5 10156.5  9163.5  9161.5
>
> ## Note that this assumes positive values of latitude, because:
> > floor(c(-1.2,1.2))
> [1] -2  1
>
> This could be easily modifed if both positive and negative values were
> used: e.g.
> > x <-c(-1.2,1.2)
> > sign(x)*floor(abs(x))
> [1] -1  1
>
> Confession: I suspect that this exponentiate and floor() procedure
> might fail with lots of decimal places due to the usual issues of
> binary representations of decimals. But maybe it fails even here. If
> so, I would appreciate someone pointing this out and, if possible,
> providing a better strategy.
>
> Cheers,
> Bert
>
>
>
> >
> > HTH,
> > B.
> >
> >
> > > On 2018-11-15, at 11:48, sasa kosanic <[hidden email]> wrote:
> > >
> > > Dear All,
> > >
> > > I would very much appreciate the help with following:
> > > I need to calculate the mean of  different lat/long points that should be
> > > grouped.
> > > However I would like that r excludes taking  values that are different in
> > > only last decimal.
> > > So instead 4 values in the group it would calculate the mean for only 3(
> > > excluding the ones that differs in only one decimal).
> > > # construct the dataframe
> > > `TK-QUADRANT` <- c(9161,9162,9163,9164,10152,10154,10161,10163)
> > > LAT <- c(55.07496,55.07496,55.02495,55.02496
> > > ,54.97496,54.92495,54.97496,54.92496)
> > > LON <-
> > > c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774)
> > > df <- data.frame(`TK-QUADRANT`=`TK-QUADRANT`,LAT=LAT,LON=LON)
> > >
> > >
> > > I would like to group the data and calculate means by group but in a way to
> > > exclude every number that differs in only last decimal.
> > >
> > >
> > > Also please see pdf. example-attached .
> > >
> > > Many thanks!
> > > Best wishes,
> > > Sasha
> > >
> > > --
> > >
> > > Dr Sasha Kosanic
> > > Ecology Lab (Biology Department)
> > > Room M644
> > > University of Konstanz
> > > Universitätsstraße 10
> > > D-78464 Konstanz
> > > Phone: +49 7531 883321 & +49 (0)175 9172503
> > >
> > > http://cms.uni-konstanz.de/vkleunen/
> > > https://tinyurl.com/y8u5wyoj
> > > https://tinyurl.com/cgec6tu
> > > <dataset example.pdf>______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help with grouping data and calculating the means

Anthoni, Peter (IMK)
In reply to this post by SasaK
Hi Sasa,

Those latitude look equidistant with a separation of 0.05.
I guess you want to calculate the zonal mean along the latitude, right?

#estimate the lower and upper latitude for the cut
lat.dist=0.05 #equidistant spacing along latitude
lat.min=min(df$LAT,na.rm=T)-lat.dist/2
lat.max=max(df$LAT,na.rm=T)+lat.dist/2
cat.lat=cut(df$LAT,breaks=seq(lat.min,lat.max,by=lat.dist));cat.lat

#just show which indices are grouped
tapply(df$TK.QUADRANT,cat.lat, paste,collapse=",")

#calculate the mean of whatever column. The lat.mean will have NA for any latitude cell where the df column has no data
lat.mean=tapply(df$TK.QUADRANT,cat.lat, mean)

#if you need to remove any potential NAs
lat.mean[!is.na(lat.mean)]

cheers/beste Grüße
Peter

On 15. Nov 2018, at 17:48, sasa kosanic <[hidden email]<mailto:[hidden email]>> wrote:

`TK-QUADRANT` <- c(9161,9162,9163,9164,10152,10154,10161,10163)
LAT <- c(55.07496,55.07496,55.02495,55.02496
,54.97496,54.92495,54.97496,54.92496)
LON <-
c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774)
df <- data.frame(`TK-QUADRANT`=`TK-QUADRANT`,LAT=LAT,LON=LON)


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.