How to make our data normally distributed in R

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to make our data normally distributed in R

NehaBologna
Hi

I have a regression based data where I get the RMSE results as:

SVM=3500
ANN=4600
R.Forest=2900

I want to know how can I make it so that its values comes as 0-1

I plot the boxplot for it to indicate their RMSE values and used,
ylim=(0,1), but the boxplot which works for RMSE values like 3500 etc, but
when I use ylim=(0,1), all the boxplots suddenly disappears. What should I
do for it?

Thanks

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to make our data normally distributed in R

Rui Barradas
Hello,

To rescale data so that their values are between 0 and 1, use this function:


scale01 <- function(x, na.rm = FALSE){
   (x - min(x, na.rm = na.rm))/(max(x, na.rm = na.rm) - min(x, na.rm =
na.rm))
}

x <- c(SVM=3500,
        ANN=4600,
        R.Forest=2900)

scale01(x)
#      SVM       ANN  R.Forest
#0.3529412 1.0000000 0.0000000


See base R function ?scale for another way of scaling data.

As for the second question, if your RMSE vector had values in the range
2900 to 4600 and the y axis limits are c(0, 1), how can you expect to
see anything?

Hope this helps,

Rui Barradas


Às 21:08 de 12/03/20, Neha gupta escreveu:

> Hi
>
> I have a regression based data where I get the RMSE results as:
>
> SVM=3500
> ANN=4600
> R.Forest=2900
>
> I want to know how can I make it so that its values comes as 0-1
>
> I plot the boxplot for it to indicate their RMSE values and used,
> ylim=(0,1), but the boxplot which works for RMSE values like 3500 etc, but
> when I use ylim=(0,1), all the boxplots suddenly disappears. What should I
> do for it?
>
> Thanks
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to make our data normally distributed in R

NehaBologna
Thanks Hasan and Rui

Rui, as you mentioned

As for the second question, if your RMSE vector had values in the range
2900 to 4600 and the y axis limits are c(0, 1), how can you expect to
see anything?

Then what should be the values of ylim in boxplots? I need to show them as
boxplot between 0-1 or 1-10, even 10-100 but it will be very awkward if the
boxplot shows the values of 3500 etc.

Regards



On Thu, Mar 12, 2020 at 11:51 PM Rui Barradas <[hidden email]> wrote:

> Hello,
>
> To rescale data so that their values are between 0 and 1, use this
> function:
>
>
> scale01 <- function(x, na.rm = FALSE){
>    (x - min(x, na.rm = na.rm))/(max(x, na.rm = na.rm) - min(x, na.rm =
> na.rm))
> }
>
> x <- c(SVM=3500,
>         ANN=4600,
>         R.Forest=2900)
>
> scale01(x)
> #      SVM       ANN  R.Forest
> #0.3529412 1.0000000 0.0000000
>
>
> See base R function ?scale for another way of scaling data.
>
> As for the second question, if your RMSE vector had values in the range
> 2900 to 4600 and the y axis limits are c(0, 1), how can you expect to
> see anything?
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 21:08 de 12/03/20, Neha gupta escreveu:
> > Hi
> >
> > I have a regression based data where I get the RMSE results as:
> >
> > SVM=3500
> > ANN=4600
> > R.Forest=2900
> >
> > I want to know how can I make it so that its values comes as 0-1
> >
> > I plot the boxplot for it to indicate their RMSE values and used,
> > ylim=(0,1), but the boxplot which works for RMSE values like 3500 etc,
> but
> > when I use ylim=(0,1), all the boxplots suddenly disappears. What should
> I
> > do for it?
> >
> > Thanks
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to make our data normally distributed in R

Jin Li
In reply to this post by NehaBologna
Hi,
Why do you want to re-scale RMSE to 0-1? You can change ylim=(0,1) to
ylim=(0, 4600). You may use VEcv (Variance explained by predictive models
based on cross-validation) that ranges from  0 to 100% instead. It can be
calculated using vecv function in library(spm) or you can convert RMSE to
VEcv using tovecv in spm.
Hope this helps,
Jin

On Fri, Mar 13, 2020 at 8:08 AM Neha gupta <[hidden email]> wrote:

> Hi
>
> I have a regression based data where I get the RMSE results as:
>
> SVM=3500
> ANN=4600
> R.Forest=2900
>
> I want to know how can I make it so that its values comes as 0-1
>
> I plot the boxplot for it to indicate their RMSE values and used,
> ylim=(0,1), but the boxplot which works for RMSE values like 3500 etc, but
> when I use ylim=(0,1), all the boxplots suddenly disappears. What should I
> do for it?
>
> Thanks
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Jin

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to make our data normally distributed in R

Rui Barradas
In reply to this post by NehaBologna
Hello,

Why would it be awkward to show values like 4600? If those are the
values, show them. When there is a large difference, orders of
magnitude, you can plot logs by setting parameter log = "y" as in

boxplot(10^(0:5), log = "y")

But I don't see why to have values in the range 2900-4600 (same order of
magnitude) is a reason to alter ylim.

Hope this helps,

Rui Barradas

Às 22:58 de 12/03/20, Neha gupta escreveu:

> Thanks Hasan and Rui
>
> Rui, as you mentioned
>
> As for the second question, if your RMSE vector had values in the range
> 2900 to 4600 and the y axis limits are c(0, 1), how can you expect to
> see anything?
>
> Then what should be the values of ylim in boxplots? I need to show them
> as boxplot between 0-1 or 1-10, even 10-100 but it will be very awkward
> if the boxplot shows the values of 3500 etc.
>
> Regards
>
>
>
> On Thu, Mar 12, 2020 at 11:51 PM Rui Barradas <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hello,
>
>     To rescale data so that their values are between 0 and 1, use this
>     function:
>
>
>     scale01 <- function(x, na.rm = FALSE){
>         (x - min(x, na.rm = na.rm))/(max(x, na.rm = na.rm) - min(x, na.rm =
>     na.rm))
>     }
>
>     x <- c(SVM=3500,
>              ANN=4600,
>              R.Forest=2900)
>
>     scale01(x)
>     #      SVM       ANN  R.Forest
>     #0.3529412 1.0000000 0.0000000
>
>
>     See base R function ?scale for another way of scaling data.
>
>     As for the second question, if your RMSE vector had values in the range
>     2900 to 4600 and the y axis limits are c(0, 1), how can you expect to
>     see anything?
>
>     Hope this helps,
>
>     Rui Barradas
>
>
>     Às 21:08 de 12/03/20, Neha gupta escreveu:
>      > Hi
>      >
>      > I have a regression based data where I get the RMSE results as:
>      >
>      > SVM=3500
>      > ANN=4600
>      > R.Forest=2900
>      >
>      > I want to know how can I make it so that its values comes as 0-1
>      >
>      > I plot the boxplot for it to indicate their RMSE values and used,
>      > ylim=(0,1), but the boxplot which works for RMSE values like 3500
>     etc, but
>      > when I use ylim=(0,1), all the boxplots suddenly disappears. What
>     should I
>      > do for it?
>      >
>      > Thanks
>      >
>      >       [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > [hidden email] <mailto:[hidden email]> mailing list
>     -- To UNSUBSCRIBE and more, see
>      > https://stat.ethz.ch/mailman/listinfo/r-help
>      > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>      > and provide commented, minimal, self-contained, reproducible code.
>      >
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to make our data normally distributed in R

NehaBologna
In reply to this post by Jin Li
Thanks a lot Jin..

If my total number of observations are 500,
n will be 500,
mu will be average (500)
s will be sd (500)
and m will be RMSE value i.e. 4500 in this case?

tovecv(n=500, mu=average (500), s=sd, m=4500, measure="rmse")


On Fri, Mar 13, 2020 at 12:46 AM Jin Li <[hidden email]> wrote:

> Hi,
> Why do you want to re-scale RMSE to 0-1? You can change ylim=(0,1) to
> ylim=(0, 4600). You may use VEcv (Variance explained by predictive models
> based on cross-validation) that ranges from  0 to 100% instead. It can be
> calculated using vecv function in library(spm) or you can convert RMSE to
> VEcv using tovecv in spm.
> Hope this helps,
> Jin
>
> On Fri, Mar 13, 2020 at 8:08 AM Neha gupta <[hidden email]>
> wrote:
>
>> Hi
>>
>> I have a regression based data where I get the RMSE results as:
>>
>> SVM=3500
>> ANN=4600
>> R.Forest=2900
>>
>> I want to know how can I make it so that its values comes as 0-1
>>
>> I plot the boxplot for it to indicate their RMSE values and used,
>> ylim=(0,1), but the boxplot which works for RMSE values like 3500 etc, but
>> when I use ylim=(0,1), all the boxplots suddenly disappears. What should I
>> do for it?
>>
>> Thanks
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Jin
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to make our data normally distributed in R

Jin Li
Please note that mu and sd are the mean and standard deviation of
validation samples. You may use pred.acc in spm to calculate a number of
error and accuracy measures including RMSE and VEcv from the observed and
predicted values directly.

On Sat, Mar 14, 2020 at 2:07 AM Neha gupta <[hidden email]> wrote:

> Thanks a lot Jin..
>
> If my total number of observations are 500,
> n will be 500,
> mu will be average (500)
> s will be sd (500)
> and m will be RMSE value i.e. 4500 in this case?
>
> tovecv(n=500, mu=average (500), s=sd, m=4500, measure="rmse")
>
>
> On Fri, Mar 13, 2020 at 12:46 AM Jin Li <[hidden email]> wrote:
>
>> Hi,
>> Why do you want to re-scale RMSE to 0-1? You can change ylim=(0,1) to
>> ylim=(0, 4600). You may use VEcv (Variance explained by predictive models
>> based on cross-validation) that ranges from  0 to 100% instead. It can be
>> calculated using vecv function in library(spm) or you can convert RMSE to
>> VEcv using tovecv in spm.
>> Hope this helps,
>> Jin
>>
>> On Fri, Mar 13, 2020 at 8:08 AM Neha gupta <[hidden email]>
>> wrote:
>>
>>> Hi
>>>
>>> I have a regression based data where I get the RMSE results as:
>>>
>>> SVM=3500
>>> ANN=4600
>>> R.Forest=2900
>>>
>>> I want to know how can I make it so that its values comes as 0-1
>>>
>>> I plot the boxplot for it to indicate their RMSE values and used,
>>> ylim=(0,1), but the boxplot which works for RMSE values like 3500 etc,
>>> but
>>> when I use ylim=(0,1), all the boxplots suddenly disappears. What should
>>> I
>>> do for it?
>>>
>>> Thanks
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> Jin
>>
>

--
Jin

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.