Fwd: Distribution to use to calculate p values

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: Distribution to use to calculate p values

Lalitha Viswanathan
Hi
I have a dataset as below
Price Country Reliability Mileage Type Weight Disp. HP


8895 USA 4 33 Small 2560 97 113
(Hundreds of rows)

I am trying to find the best possible distribution to use, to find p-values
and compute which factors most influence efficiency.

Any starting points for the functions I could use, or similar examples I
could follow, would be a start.
I am a relative novice at R having used it many years ago and am now
getting back to it.
So looking for pointers

Thanks


>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Distribution to use to calculate p values

David Winsemius

On Apr 27, 2015, at 10:50 AM, Lalitha Viswanathan wrote:

> Hi
> I have a dataset as below
> Price Country Reliability Mileage Type Weight Disp. HP
>
>
> 8895 USA 4 33 Small 2560 97 113
> (Hundreds of rows)
>
> I am trying to find the best possible distribution to use, to find p-values
> and compute which factors most influence efficiency.

"Finding p-values" is a task that requires research questions. You obviously have some sort of meaning attached to the word "efficiency" but have not stated what it is. This appears to be a request for a statistical tutorial an a topic that has not been described. (And if this is course homework, then it is off-topic for r-help.)

>
> Any starting points for the functions I could use, or similar examples I
> could follow, would be a start.
> I am a relative novice at R having used it many years ago and am now
> getting back to it.
> So looking for pointers
>
> Thanks
>
> [[alternative HTML version deleted]]

The Posting Guide suggests that you create a small example in R code and describe your question more clearly (if it's not homework.)

> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Distribution to use to calculate p values

Jim Lemon-4
Hi Lalitha,
If you want to find a reasonable model distribution for your data, try
plotting the histogram of the variable you want to predict and compare
this to the density curves of the distributions that you think will
fit. So for example:

# plot a histogram of a uniform distribution
hist(seq(1,10,length.out=100))
# overlay a normal density function with the same mean
lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*30)

Not a very good fit, but:

hist(rnorm(100,5.5))
lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*90)

Much better. You can then perform a "goodness of fit" test if you need
it to justify your choice of distribution. In most cases, you will
have to find a "family" (link function) to use in a generalized linear
modeling (glm) test.

Another approach is to use a non-parametric test if one gives an
appropriate answer to your question.

Jim


On Tue, Apr 28, 2015 at 5:07 AM, David Winsemius <[hidden email]> wrote:

>
> On Apr 27, 2015, at 10:50 AM, Lalitha Viswanathan wrote:
>
>> Hi
>> I have a dataset as below
>> Price Country Reliability Mileage Type Weight Disp. HP
>>
>>
>> 8895 USA 4 33 Small 2560 97 113
>> (Hundreds of rows)
>>
>> I am trying to find the best possible distribution to use, to find p-values
>> and compute which factors most influence efficiency.
>
> "Finding p-values" is a task that requires research questions. You obviously have some sort of meaning attached to the word "efficiency" but have not stated what it is. This appears to be a request for a statistical tutorial an a topic that has not been described. (And if this is course homework, then it is off-topic for r-help.)
>
>>
>> Any starting points for the functions I could use, or similar examples I
>> could follow, would be a start.
>> I am a relative novice at R having used it many years ago and am now
>> getting back to it.
>> So looking for pointers
>>
>> Thanks
>>
>>       [[alternative HTML version deleted]]
>
> The Posting Guide suggests that you create a small example in R code and describe your question more clearly (if it's not homework.)
>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Distribution to use to calculate p values

Bert Gunter
... Realizing, of course, that after such data dredging, any subsequent
inference is highly biased.

Cheers,
Bert

On Tuesday, April 28, 2015, Jim Lemon <[hidden email]> wrote:

> Hi Lalitha,
> If you want to find a reasonable model distribution for your data, try
> plotting the histogram of the variable you want to predict and compare
> this to the density curves of the distributions that you think will
> fit. So for example:
>
> # plot a histogram of a uniform distribution
> hist(seq(1,10,length.out=100))
> # overlay a normal density function with the same mean
> lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*30)
>
> Not a very good fit, but:
>
> hist(rnorm(100,5.5))
> lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*90)
>
> Much better. You can then perform a "goodness of fit" test if you need
> it to justify your choice of distribution. In most cases, you will
> have to find a "family" (link function) to use in a generalized linear
> modeling (glm) test.
>
> Another approach is to use a non-parametric test if one gives an
> appropriate answer to your question.
>
> Jim
>
>
> On Tue, Apr 28, 2015 at 5:07 AM, David Winsemius <[hidden email]
> <javascript:;>> wrote:
> >
> > On Apr 27, 2015, at 10:50 AM, Lalitha Viswanathan wrote:
> >
> >> Hi
> >> I have a dataset as below
> >> Price Country Reliability Mileage Type Weight Disp. HP
> >>
> >>
> >> 8895 USA 4 33 Small 2560 97 113
> >> (Hundreds of rows)
> >>
> >> I am trying to find the best possible distribution to use, to find
> p-values
> >> and compute which factors most influence efficiency.
> >
> > "Finding p-values" is a task that requires research questions. You
> obviously have some sort of meaning attached to the word "efficiency" but
> have not stated what it is. This appears to be a request for a statistical
> tutorial an a topic that has not been described. (And if this is course
> homework, then it is off-topic for r-help.)
> >
> >>
> >> Any starting points for the functions I could use, or similar examples I
> >> could follow, would be a start.
> >> I am a relative novice at R having used it many years ago and am now
> >> getting back to it.
> >> So looking for pointers
> >>
> >> Thanks
> >>
> >>       [[alternative HTML version deleted]]
> >
> > The Posting Guide suggests that you create a small example in R code and
> describe your question more clearly (if it's not homework.)
> >
> >> ______________________________________________
> >> [hidden email] <javascript:;> mailing list -- To UNSUBSCRIBE and
> more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > [hidden email] <javascript:;> mailing list -- To UNSUBSCRIBE and
> more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] <javascript:;> mailing list -- To UNSUBSCRIBE and
> more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge is
certainly not wisdom."
Clifford Stoll

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.