PLEASE EXCUSE: This discussion has diverged from R into discussing the

precise assumptions seemingly descriptive of an application that drove

the initial post to this thread. A reply by Abby Spurdle seemed to me

to raise questions, whose answers may not be intelligible without

material snipped from Spurdle's reply. I wish to thank Spurdle from the

reply and apologize to those who feel this is an abuse of this list. I

trust that those in the latter category will please not bother to read

further. For anyone still interested in this problem, below please find

my earlier analysis with corrections that attempt to respond to

Spurdle's most recent concerns. Thanks, Spencer Graves

On 2019-07-12 22:31, Abby Spurdle wrote:

> > The distribution of the randomly truncated variable has thus four

> > parameters: a, b, mu and sigma. I was able to write down the likelihood

> > and attempted to maximise it

>

> I read the Wikipedia article more carefully.

> The formula is relatively simple, and is based on the application of

> Bayes Theorem.

> If one doesn't want to work out the integral, numerical methods can be

> used.

>

> However, the problem needs to be defined *precisely* first.

>

Correct: In my case, I confess I hadn't thought this through completely

before posting. I tried Rseek, as Bert Gunter suggested. That led me

to the "truncreg" and "DTDA" packages, neither of which seemed to be

what I wanted; thanks to Bert, Rolf, and Abby for your comments.

I'm observing a random variable Y[i] = (x[i]'b+e[i]) given

Y[i]>(z[i]'c+f[i]) where the tick mark (') denotes transpose of a

vector, and e and f are normally distributed with mean 0 and standard

deviations s and t, respectively, i = 1:n. Thus, Y[i] follows a

truncated normal distribution with mean x[i]'b and standard deviation s,

with the truncation condition being that Y[i]>(z[i]'c+f[i]).

I want the total of all the Y's from the untruncated

distribution, i.e., including those truncated (and not observed).

I think the likelihood is the product of the density of Y[i]

given x[i] and given that Y[i] is actually observed. By substituting

Y[i] = (x[i]'b+e[i]) into the truncation condition Y[i]>(z[i]'c+f[i]),

we get the following:

(x[i]'b+e)>(z[i]c+f).

This occurs if and only if:

(x[i]'b-z[i]'c)>(f-e),

Therefore, the probability that Y[i] is observed (and not truncated) is

Pr{Y[i] observed} = Phi((x[i]'b-z[i]'c)/sqrt(s^2+t^2))

where Phi is the cdf of the standard normal.

And then the likelihood for observation i can be written as follows:

f(y[i]|x[i], z[i], b, c, s, t) = phi((y[i]-x[i]'b)/s) /

Phi((x[i]'b-z[i]'c)/sqrt(s^2+t^2)).

We may not be able to estimate "c" in this, because if one of the

z[i]'s is nonzero, we can pick "c" so z[i]'c is Inf. That makes the

denominator 0 and the likelihood Inf. (If all the z[i]'s are zero, we

still cannot estimate "c".) However, if "b" is estimable, ignoring the

truncation, then we can estimate "b", "s" and "t" given "c".

And then the desired total of all the Y's, observed and

unobserved, would be the sum of y[i] divided by Pr{Y[i] observed}.

This likelihood is simple enough, it can be easily programmed in

R and maximized over variations in "b", "s" and "t" given "c". I can

get starting values for "b" and "s" from "lm", ignoring the truncation.

And I can first fit the model assuming t = s, then test whether it's

different using likelihood ratio. And I can try to estimate "c", but I

should probably use values I can estimate from other sources until I'm

comfortable with the estimates I get for "b", "s" and "t" given an

assumed value for "c".

Comments?

Thanks so much.

Spencer Graves

______________________________________________

[hidden email] mailing list -- To UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.