A paper I wrote together with Christine Amsler and Peter Schmidt (yes, I cannot resist to say, the Peter Schmidt of the KPSS time series stationarity test, and one of the founders of Stochastic Frontier Analysis), has just been approved for publication in a special issue of Empirical Economics that will be dedicated to efficiency and productivity analysis. The paper is

Amsler C, A Papadopoulos and P Schmidt (2020). “Evaluating the CDF of the Skew Normal distribution.” Forthcoming in Empirical Economics. Download the full paper incl. the supplementary file.

ABSTRACT. In this paper we consider various methods of evaluating the cdf of the Skew Normal distribution. This distribution arises in the stochastic frontier model because it is the distribution of the composed error, which is the sum (or difference) of a Normal and a Half Normal random variable. The cdf must be evaluated in models in which
the composed error is linked to other errors using a Copula, in some methods of goodness of fit testing, or in the likelihood of models with sample selection bias. We investigate the accuracy of the evaluation of the cdf using expressions based on the bivariate Normal distribution, and also using simulation methods and some approximations. We find that the expressions based on the bivariate Normal distribution are quite accurate in the central portion of the distribution, and we propose several new approximations that are accurate in the extreme tails. By a simulated example we show that the use of approximations instead of the theoretical exact expressions may be critical in obtaining meaningful and valid estimation results.

 

The paper computes values of the Skew Normal distribution using 17 different mathematical formulas (approximations or exact), and/or algorithms and different software. with particular focus on the accuracy of computation of the Skew Normal CDF by the use of the Bivariate standard Normal CDF, since the latter is readily available, but also on what happens deep into the tails. There, the CDF values as so close to zero or unity that it would appear it wouldn’t matter for empirical studies, if one simply imposed a non-zero floor and a non-unity ceiling, and be ok. It is not ok. In Section 7 of the paper we show by a simulated example, that using the Bivariate standard Normal CDF only (with or without floor/ceiling) may lead to failed estimation, while inserting an approximate expression in its place for the left tail solves the problem. This is a result we did not anticipate: it says that approximate mathematical expressions may perform better than exact formulas due to computational limitations related to the latter.

It only took 15 months and 3 revisions, but the paper

Papadopoulos, A and Roland B. Stark (2019). “Does Home Health Care increase the probability of 30-day hospital re-admissions? Interpreting coefficient sign reversals, or their absence, in binary logistic regression analysis”.

has now been accepted for publication in The American Statistician

…and is now (Dec 17, 2019) on-line at https://doi.org/10.1080/00031305.2019.1704873

The paper is very light on technical stuff, but heavy on concepts. The abstract reads : Data for 30-day readmission rates in American hospitals often show that patients that receive Home Health Care (HHC) have a higher probability of being readmitted to hospital than those that did not receive such services, but it is expected that when control variables are included in a regression we will obtain a “sign reversal” of the treatment effect. We map the real-world situation to the binary logistic regression model, and we construct a counterfactual probability metric that leads to necessary and sufficient conditions for the sign reversal to occur, conditions that show that logistic regression is an appropriate tool for this research purpose. This metric also permits us to obtain evidence related to the criteria used to assign HHC treatment. We examine seven data samples from different USA hospitals for the period 2011-2017. We find that in all cases the provision of HHC increased the probability of readmission of the treated patients. This casts doubt on the appropriateness of the 30-day readmission rate as an indicator of hospital performance and a criterion for hospital reimbursement, as it is currently used for Medicare patients.

The main contributions of the paper can be distilled down to the following two: first, we show how the familiar binary logistic regression model can be reliably used to glean information as to whether assignment of Home Health Care (HHC) treatment, to patients that are discharged form the hospital, depends positively on the seriousness of their health status, or not (in which case we would have statistical evidence that administrators go for an “easy win” by assigning HHC to less needy patients).

Second, we provide the theoretical framework to explain an ongoing “puzzle” in Healthcare, that HHC appears to increase the probability of hospital readmissions, even after risk-adjustment: in other words, we explain why the statement “Home Health Care is beneficial to the health of patients and it increases their probability of hospital readmission” is not a contradiction in terms.

In (counterfactual) Treatment Effects Analysis, we learn that a fundamental condition in order to be able to estimate treatment effects reliably is that the treatment variable is “ignorable conditional on the control variables” (see Rosenbaum and Rubin 1983). When ignorability does not hold, as it happens with most cases of observational, non-randomized data, various methods have been developed to obtain ignorability, or in more precise words, to construct a sample (through “risk adjustment”, “balancing on propensity scores”, etc) that “imitates” a randomized one.

We are also told that ignorability is analogous to regressor exogeneity in the linear regression setup, and so that when ignorability does not hold, essentially we have endogeneity and the estimation will produce inconsistent and so unreliable estimates, see e.g. Imbens (2004), or Guo and Fraser “Propensity Score Anaysis” (2010), 1st ed., pp 30-35.

This is simply wrong. The treatment variable may not be ignorable and yet the estimator can be consistent. This means that we can estimate consistently the treatment effect even if the treatment is non-ignorable. We illustrate that non-ignorability does not necessarily imply inconsistency of the estimator, through the widely used Binary Logistic Regression model (BLR).

The BLR model starts properly with a latent-variable regression, usually linear,

y^{\ell}_i = \beta_0 + \beta_1T_i + \mathbf z'_i \gamma + u_i,\;\;\; i=1,...,n  \;\;\;\;(1)

Where y^{\ell}_i is the unobservable (latent) variable, T_i is the treatment variable,  \mathbf z_i is the vector of controls and u_i is the error term. We obtain the BLR model if we assume that the error term follows the standard Logistic distribution conditional on the regressors, u_i | \{T_i, \mathbf z_i\} \sim \Lambda (0, \pi^2/3). Then we define the indicator variable y_i \equiv I\{y^{\ell}_i >0\}, which is observable, and we wonder what is the probability distribution of y_i conditional on the regressors. We obtain

\Pr\left (y_i = 1 | \{T_i, \mathbf z_i\}\right) = \Lambda\left (\beta_0 + \beta_1T_i + \mathbf z'_i \gamma\right)\;\;\;\;(2)

and in general,

\Pr\left (y_i  | \{T_i, \mathbf z_i\}\right) = \left[\Lambda\left (\beta_0 + \beta_1T_i + \mathbf z'_i \gamma\right)\right]^{y_i}\cdot \left[1-\Lambda\left (\beta_0 + \beta_1T_i + \mathbf z'_i \gamma\right)\right]^{1-y_i}\;\;\;\;(3)

This likelihood is estimated by the maximum likelihood estimator (MLE).

Turning to ignorability, it can be expressed as

\Pr \left (y_i | \{T_i, \mathbf z_i\}\right) = \Pr \left (y_i |\mathbf z_i\right)\;\;\;\;(4)

Essentially ignorability means that the treatment variable is totally determined by the controls, or maybe, that if it is only partly determined by them, its other “part” is independent from the dependent variable/outcome.

Comparing (4) with (3) we see that ignorability of treatment in the context of the BLR model, is equivalent to the assumption \beta_1=0.

“Great”, you could say. “So run the model and let the data decide whether ignorability holds or not”. Well, the issue is whether, when ignorability does not hold, the MLE remains a consistent estimator so that we can have confidence in the estimates that we will obtain. And the assertion that we find in the literature, is that non-ignorability destroys consistency.

Does it? Let’s see: in order for the MLE to be inconsistent, it must be the case that the regressors in the latent-variabe regression (eq. 1), are correlated with the error term. The controls are assumed independent from the error term from the outset. What is argued, is that if T_i is non-ignorable, then it is associated with u_i.

We just have seen that ignorability implies that \beta_1 =0. So if non-ignorability is the case, we have that \beta_1 \neq 0. How does this imply the inconsistency condition “T_i is not independent from u_i“?

It doesn’t. The (informal) argument is that if the treatment variable is not fully determined by the controls, it “must” be statistically associated with the unmodeled/random factors represented by u_i. But there is nothing here to support a priori this assertion. Whether the treatment variable is endogenous or not, must be argued per case, with respect to the actual situation that we analyze and model. Certainly, if the argument is that the treatment is ignorable, then, if the controls are exogenous to the error term (which is the maintained assumption), so will be the treatment variable also. But if it is non-ignorable, it does not follow automatically that it is endogenous.

Therefore, depending on the real-world phenomenon under study and the available sample, we may very well have a consistent MLE in the BLR model, and so

a) be able to test validly the ignorability assumption, and

b) estimate treatment effects reliably even if the treatment is non-ignorable.–

ADDENDUM

At the request of a comment, here is a quick Gretl code to simulate a situation where the Treatment is not ignorable, but it is independent from the error term and so it can be consistently estimated. Play around with the sample size (now n=5000) , or embed the script into a simple index loop (with matrices to hold the estimates for each run, then fill a series with the estimates from the matrix, then take basic statistics to see that the estimator is consistent).

<hansl>

nulldata 5000

set hc_version 2 #uses HC2 robust standard errors

#Data generation

genr U1 = randgen(U,0,1) #auxialiary variable
genr Er = -log((1-U1)/U1) #Logistic error term Λ(0,1)
genr X1 = randgen(G,1,2) # continuous regressor following Exponential
genr N1 = randgen (N,0,1) # codetermines the assignment of treatment
genr T = (X1+N1 >0) #Bernoulli treatment
genr yL = -0.5 + 0.5*T + X1 + Er # latent dependent variable

#The Treatment is not ignorable because it influences directly the latent dependent variable.
genr Depvar = (yL >0) #obseravble dependent variable

#Estimation

list Reglist = const T X1  #OLS estimation for starting values
ols Depvar Reglist –quiet 

matrix bcoeff = $coeff  #starting value scale parameter of the error term

#This so that the names of the variables appear in the estimation output
string varblnames = varname(Reglist)
string allcoefnames = varblnames

#command for maximum likelihood estimation
catch mle logl = Depvar*log(CDF) + (1-Depvar)*log(1-CDF)
series g = lincomb(Reglist,bcoeff)

series CDF = 1/(1+ exp(-g)) #correct specification of the distribution of the error term

params bcoeff
param_names allcoefnames
end mle –hessian

</hansl>

When \hat \theta \to_d D(\theta, v) \implies \hat\theta \to_p \theta ?

We know that in general, convergence in distribution does not imply convergence in probability. But for the case of most interest in econometrics, where we examine a sequence of estimators, convergence in distribution does imply convergence in probability to a constant, under two regularity conditions that are also satisfied in most cases of interest.

This post of mine in stats.stackexchange.com has the proof. Essentially, under these regularity conditions we are able to prove the even stronger result that convergence in distribution  implies  convergence in quadratic mean to a constant (which in turn implies convergence in probability to that constant).

Time and again I encounter people confused about marginal and joint Normality, and I could not find a single internet or book source that lists together the main cases of interest. So here they are:

  1. Subject to the usual regularity conditions, the below hold if we are talking about asymptotic (limiting) marginal/joint Normality also.
  2. If two random variables are not each marginally Normal, then they are certainly not jointly Normal.
  3. If two random variables are not each marginally Normal, then their linear combinations are not marginally Normal.
  4. If X and Y have Normal marginals and they are independent, then they are also jointly Normal. By the Cramér-Wold theorem, it then follows that all linear combinations of them (like their sum, difference, etc) are also marginally Normal. If we want to consider more than two random variables, then for the above to hold, they must be jointly independent, and not only pair-wise independent. If they are only pair-wise independent, then they may be jointly Normal, may be not.
  5. If X and Y have Normal marginals but they are dependent, then it is not necessarily the case that they are jointly Normal. They may be, they may be not. It follows that a linear combination of them, may be Normal, may be not. It must be proven based on something more than marginal Normality. This is important to remember when one wants to show asymptotic Normality of a test statistic that is linearly composed by two random variables that are each asymptotically normal, but they are dependent without the dependence dying out asymptotically. This is the case, for example, in all “Hausman endogeneity tests” in econometrics, where the test statistic is the difference of two estimators that use the same data, and so are in general dependent. Even if each is asymptotically Normal, the asymptotic Normality of their difference does not necessarily follow. In his original paper in fact, Hausman (1978) explicitly assumed asymptotic normality of the test statistic, he did not prove it.
  6. If X and Y have Normal marginals, are uncorrelated (i.e. their covariance is zero), but they have some higher order/”non-linear” form of dependence, than they are certainly NOT jointly normal (because in joint Normality, existence of dependence is always expressed also as non-zero covariance). Their linear combinations may be marginally Normal, may be not. Again, if a linear combination of them is the object of interest, its asymptotic normality must be proven based on something more than marginal Normality.

Changes

Posted: October 18, 2017 in Uncategorized

Bowie changesIt just has been made official: my PhD focus has changed, and it will now be about the Two-tier Stochastic Frontier model, on which I have already published a paper.  The thesis will contain new distributional specifications for the model, among them one that allows for statistical dependence, and two applications where I apply the model to new situations that the existing literature has not touched upon. The projection is that the whole thing will be finished in the next 6 months, since most of the work has been done in the past years, as a … recreational break.

L2 to consistency

E(\hat \theta_n -\mu)^2 \to 0 \implies Pr\left(\big |\hat \theta_n -\mu \big| > \varepsilon\right) \to 0

That L2- convergence implies convergence in probability is a standard result, but I think I have come up with a pretty intuitive exposition as to why it is so. You can read it over at CrossValidated,

http://stats.stackexchange.com/a/270434/28746