# random truncation

11 messages
Open this post in threaded view
|

## random truncation

 Hello:        What do you suggest I do about modeling random truncation?         I have data on a variable Y in strata S[0], S[1], ..., S[n], where Y is always observed in S[0] but is less often observed in the other strata.  I assume that the probability of observing Y is a monotonically increasing function of Y and a monotonically decreasing function of d[i] = the distance from S[0] to S[i].        There is a section on "random truncation" in the Wikipedia article on "Truncated distribution".[1]  It would be nice if I had an R package that would make it relatively easy to model the truncation as a function of "d" and / or publication that described someone doing it in R.  (I also have a couple of other variables that influence the distribution of Y.)        Thanks,        Spencer Graves [1] https://en.wikipedia.org/wiki/Truncated_distribution#Random_truncation______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: random truncation

 > It would be nice if I had an R > package that would make it relatively easy to model the truncation as a > function of "d" I suspect that R has everything you need, already. However, I suspect you may need to reformulate your question to find what you need. > I assume that the probability of observing Y is a > monotonically increasing function of Y That's like flipping a coin, and saying the probability of observing heads is a monotonically increasing function of heads.         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: random truncation

 Did you search on e.g. "model truncation" at rseek.org? Several packages came up that appear to deal with truncated data, though I have no clue whether in the way you specify. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jul 12, 2019 at 4:26 PM Abby Spurdle <[hidden email]> wrote: > > It would be nice if I had an R > > package that would make it relatively easy to model the truncation as a > > function of "d" > > I suspect that R has everything you need, already. > However, I suspect you may need to reformulate your question to find what > you need. > > > I assume that the probability of observing Y is a > > monotonically increasing function of Y > > That's like flipping a coin, and saying the probability of observing heads > is a monotonically increasing function of heads. > >         [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: random truncation

 In reply to this post by Spencer Graves-4 On 13/07/19 10:54 AM, Spencer Graves wrote: > Hello: > > >        What do you suggest I do about modeling random truncation? Good question!  Probably the best answer is "Give up and go to the pub!" :-) But seriously, there is a package DTDA on CRAN which purports to analyse randomly truncated data. > >         I have data on a variable Y in strata S[0], S[1], ..., S[n], > where Y is always observed in S[0] but is less often observed in the > other strata.  I assume that the probability of observing Y is a > monotonically increasing function of Y and a monotonically decreasing > function of d[i] = the distance from S[0] to S[i]. > > >        There is a section on "random truncation" in the Wikipedia > article on "Truncated distribution".[1]  It would be nice if I had an R > package that would make it relatively easy to model the truncation as a > function of "d" and / or publication that described someone doing it in > R.  (I also have a couple of other variables that influence the > distribution of Y.) > > >        Thanks, >        Spencer Graves > > > [1] https://en.wikipedia.org/wiki/Truncated_distribution#Random_truncationI'd just like to add that many many years ago, probably back before you were born, I struggled mightily for a while with random truncation, modelling the truncation variable to have a density that was uniform on some interval [a,b]. I took the underlying ("untruncated") variable to have a Gaussian distribution N(mu,sigma^2). The distribution of the randomly truncated variable has thus four parameters: a, b, mu and sigma.  I was able to write down the likelihood and attempted to maximise it, but this was a nightmare since the partial derivatives of the log likelihood are undefined for b=x_i where x_1, ..., x_n are the i.i.d. observations from the randomly truncated distribution.  I fought with this for a long while, could not get estimates based on simulated data to agree at all well with the true values, and eventually chucked it all in. This all happened back before there was R, or even S!!! Let us hope that the authors of DTDA are far clever than I.  (Not at all unlikely!) cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: random truncation

 > The distribution of the randomly truncated variable has thus four > parameters: a, b, mu and sigma.  I was able to write down the likelihood > and attempted to maximise it I read the Wikipedia article more carefully. The formula is relatively simple, and is based on the application of Bayes Theorem. If one doesn't want to work out the integral, numerical methods can be used. However, the problem needs to be defined *precisely* first.         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: random truncation

 On 13/07/19 3:31 PM, Abby Spurdle wrote: >  > The distribution of the randomly truncated variable has thus four >  > parameters: a, b, mu and sigma.  I was able to write down the likelihood >  > and attempted to maximise it > > I read the Wikipedia article more carefully. > The formula is relatively simple, and is based on the application of > Bayes Theorem. > If one doesn't want to work out the integral, numerical methods can be used. I am mystified as to what you are saying here.  What integral? As I said, I could write down the likelihood.  Explicitly.  Maximising it (numerically *of course*) turned out to be problematic.  Back then. Modern optimisers might help.  Presumably the DTDA package has a reasonable procedure.  (I haven't looked at it.) > However, the problem needs to be defined *precisely* first. Again I have no idea of what you are driving at here.  The concept of "random truncation" is quite precisely defined. cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: random truncation

 In reply to this post by Abby Spurdle On 2019-07-12 22:31, Abby Spurdle wrote: > > The distribution of the randomly truncated variable has thus four > > parameters: a, b, mu and sigma.  I was able to write down the likelihood > > and attempted to maximise it > > I read the Wikipedia article more carefully. > The formula is relatively simple, and is based on the application of > Bayes Theorem. > If one doesn't want to work out the integral, numerical methods can be > used. > > However, the problem needs to be defined *precisely* first. > Correct:  In my case, I confess I hadn't thought this through completely before posting.  I tried Rseek, as Bert Gunter suggested.  That led me to the "truncreg" and "DTDA" packages, neither of which seemed to be what I wanted;  thanks to Bert, Rolf, and Abby for your comments.        I'm observing Y[i] = (X[i]'b+e) given Y[i]>(z[i]'c+f) where e and f are normally distributed with standard deviations s and t, respectively, i = 1:n.  I want the total of all the Y's, including those truncated (and not observed).        I think the likelihood is the product of the density of Y[i] given x[i] and given that Y[i] is actually observed.  The latter can be further written as follows:              (x[i]'b+e)>(z[i]c+f) iff              (x[i]'b-z[i]'c)>(f-e), Therefore, the probability that Y[i] is observed is              Pr{Y[i] observed} = Phi((x[i]'b-z[i]'c)/sqrt(s^2+t^2)) where Phi is the cdf of the standard normal.        And then the likelihood for observation i can be written as follows:              f(y[i]|x[i], z[i], b, c, s, t) = phi((y[i]-x[i]'b)/s) / Phi((x[i]'b-z[i]'c)/sqrt(s^2+t^2).        We may not be able to estimate "c" in this, because if one of the z[i]'s is nonzero, we can pick "c" so z[i]'c is Inf.  That makes the denominator 0 and the likelihood Inf.  (If all the z[i]'s are zero, we still cannot estimate "c".)  However, if "b" is estimable, ignoring the truncation, then we can estimate "b", "s" and "t" given "c".        And then the desired total of all the Y's, observed and unobserved, would be the sum of y[i] divided by Pr{Y[i] observed}.        This likelihood is simple enough, it can be easily programmed in R and maximized over variations in "b", "s" and "t" given "c".  I can get starting values for "b" and "s" from "lm", ignoring the truncation.  And I can first fit the model assuming t = s, then test whether it's different using likelihood ratio.  And I can try to estimate "c", but I should probably use values I can estimate from other sources until I'm comfortable with the estimates I get for "b", "s" and "t" given an assumed value for "c".        Comments?        Thanks so much.        Spencer Graves ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: random truncation

Open this post in threaded view
|

## Re: random truncation

 In reply to this post by Spencer Graves-4 Firstly, we don't really need all your working. Just the problem you want solve. However, I'm still having difficulty understanding this. > I'm observing Y[i] = (X[i]'b+e) given Y[i]>(z[i]'c+f) where e and > f are normally distributed with standard deviations s and t, > respectively, i = 1:n.  I want the total of all the Y's, including those > truncated (and not observed). > (x[i]'b+e)>(z[i]c+f) > (x[i]'b-z[i]'c)>(f-e), (1) What does the tick (or single quote) mean? (2) What do you mean by "X[i]" and "observing Y[i]"? (3) Are e and f random variables, or vectors of (observed) errors? My intuitive understanding of (2), in the given context, would be that X and Y are vectors of observations. However, if e and f are random variables, wouldn't that would make Y[i] a random variable too? In which case I'm not sure what you mean by "observing". Conversely, if e and f are vectors of (observed) errors, shouldn't they be indexed (for consistency)? And in which case, there's no random component in your expressions. (4) Is lower case "x[i]" the same as upper case "X[i]", and what's "z[i]"? (5) Is the mean of e and f, zero? (6) So, you want to estimate b and c? And wouldn't that make this a parameter estimation problem? (In which case, you don't necessary need to model any distributions). > Pr{Y[i] observed} = Phi((x[i]'b-z[i]'c)/sqrt(s^2+t^2)) > where Phi is the cdf of the standard normal. This implies that "Y[i]" is a is binary (true or false) random variable. Do you mean Pr (Y <= y) where Y is a random variable and y is a possible value (from Y's sample space), that Y can be less than or equal to? You can replace y with y[i] if you want, but the principle is the same.         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.