Ignorability and estimator consistency in binary Logistic regression

Posted: May 10, 2019 in Technical Reports, Uncategorized
Tags: , , , ,

In (counterfactual) Treatment Effects Analysis, we learn that a fundamental condition in order to be able to estimate treatment effects reliably is that the treatment variable is “ignorable conditional on the control variables” (see Rosenbaum and Rubin 1983). When ignorability does not hold, as it happens with most cases of observational, non-randomized data, various methods have been developed to obtain ignorability, or in more precise words, to construct a sample (through “risk adjustment”, “balancing on propensity scores”, etc) that “imitates” a randomized one.

We are also told that ignorability is analogous to regressor exogeneity in the linear regression setup, and so that when ignorability does not hold, essentially we have endogeneity and the estimation will produce inconsistent and so unreliable estimates, see e.g. Imbens (2004), or Guo and Fraser “Propensity Score Anaysis” (2010), 1st ed., pp 30-35.

This is simply wrong. The treatment variable may not be ignorable and yet the estimator can be consistent. This means that we can estimate consistently the treatment effect even if the treatment is non-ignorable. We illustrate that non-ignorability does not necessarily imply inconsistency of the estimator, through the widely used Binary Logistic Regression model (BLR).

The BLR model starts properly with a latent-variable regression, usually linear,

y^{\ell}_i = \beta_0 + \beta_1T_i + \mathbf z'_i \gamma + u_i,\;\;\; i=1,...,n  \;\;\;\;(1)

Where y^{\ell}_i is the unobservable (latent) variable, T_i is the treatment variable,  \mathbf z_i is the vector of controls and u_i is the error term. We obtain the BLR model if we assume that the error term follows the standard Logistic distribution conditional on the regressors, u_i | \{T_i, \mathbf z_i\} \sim \Lambda (0, \pi^2/3). Then we define the indicator variable y_i \equiv I\{y^{\ell}_i >0\}, which is observable, and we wonder what is the probability distribution of y_i conditional on the regressors. We obtain

\Pr\left (y_i = 1 | \{T_i, \mathbf z_i\}\right) = \Lambda\left (\beta_0 + \beta_1T_i + \mathbf z'_i \gamma\right)\;\;\;\;(2)

and in general,

\Pr\left (y_i  | \{T_i, \mathbf z_i\}\right) = \left[\Lambda\left (\beta_0 + \beta_1T_i + \mathbf z'_i \gamma\right)\right]^{y_i}\cdot \left[1-\Lambda\left (\beta_0 + \beta_1T_i + \mathbf z'_i \gamma\right)\right]^{1-y_i}\;\;\;\;(3)

This likelihood is estimated by the maximum likelihood estimator (MLE).

Turning to ignorability, it can be expressed as

\Pr \left (y_i | \{T_i, \mathbf z_i\}\right) = \Pr \left (y_i |\mathbf z_i\right)\;\;\;\;(4)

Essentially ignorability means that the treatment variable is totally determined by the controls, or maybe, that if it is only partly determined by them, its other “part” is independent from the dependent variable/outcome.

Comparing (4) with (3) we see that ignorability of treatment in the context of the BLR model, is equivalent to the assumption \beta_1=0.

“Great”, you could say. “So run the model and let the data decide whether ignorability holds or not”. Well, the issue is whether, when ignorability does not hold, the MLE remains a consistent estimator so that we can have confidence in the estimates that we will obtain. And the assertion that we find in the literature, is that non-ignorability destroys consistency.

Does it? Let’s see: in order for the MLE to be inconsistent, it must be the case that the regressors in the latent-variabe regression (eq. 1), are correlated with the error term. The controls are assumed independent from the error term from the outset. What is argued, is that if T_i is non-ignorable, then it is associated with u_i.

We just have seen that ignorability implies that \beta_1 =0. So if non-ignorability is the case, we have that \beta_1 \neq 0. How does this imply the inconsistency condition “T_i is not independent from u_i“?

It doesn’t. The (informal) argument is that if the treatment variable is not fully determined by the controls, it “must” be statistically associated with the unmodeled/random factors represented by u_i. But there is nothing here to support a priori this assertion. Whether the treatment variable is endogenous or not, must be argued per case, with respect to the actual situation that we analyze and model. Certainly, if the argument is that the treatment is ignorable, then, if the controls are exogenous to the error term (which is the maintained assumption), so will be the treatment variable also. But if it is non-ignorable, it does not follow automatically that it is endogenous.

Therefore, depending on the real-world phenomenon under study and the available sample, we may very well have a consistent MLE in the BLR model, and so

a) be able to test validly the ignorability assumption, and

b) estimate treatment effects reliably even if the treatment is non-ignorable.–


At the request of a comment, here is a quick Gretl code to simulate a situation where the Treatment is not ignorable, but it is independent from the error term and so it can be consistently estimated. Play around with the sample size (now n=5000) , or embed the script into a simple index loop (with matrices to hold the estimates for each run, then fill a series with the estimates from the matrix, then take basic statistics to see that the estimator is consistent).


nulldata 5000

set hc_version 2 #uses HC2 robust standard errors

#Data generation

genr U1 = randgen(U,0,1) #auxialiary variable
genr Er = -log((1-U1)/U1) #Logistic error term Λ(0,1)
genr X1 = randgen(G,1,2) # continuous regressor following Exponential
genr N1 = randgen (N,0,1) # codetermines the assignment of treatment
genr T = (X1+N1 >0) #Bernoulli treatment
genr yL = -0.5 + 0.5*T + X1 + Er # latent dependent variable

#The Treatment is not ignorable because it influences directly the latent dependent variable.
genr Depvar = (yL >0) #obseravble dependent variable


list Reglist = const T X1  #OLS estimation for starting values
ols Depvar Reglist –quiet 

matrix bcoeff = $coeff  #starting value scale parameter of the error term

#This so that the names of the variables appear in the estimation output
string varblnames = varname(Reglist)
string allcoefnames = varblnames

#command for maximum likelihood estimation
catch mle logl = Depvar*log(CDF) + (1-Depvar)*log(1-CDF)
series g = lincomb(Reglist,bcoeff)

series CDF = 1/(1+ exp(-g)) #correct specification of the distribution of the error term

params bcoeff
param_names allcoefnames
end mle –hessian


  1. Clive Nicholas says:

    Just been reading about PS models and treatment effects, so this is a very absorbing piece. Do you have a working example in R or gretl motivating this example?

    • Alecos Papadopoulos says:

      I just posted some Gretl code as an addendum to the post

      • Clive Nicholas says:

        Hello Alecos,

        Sorry for the delay in replying (very rude), but thanks very much for your response. Your code works very nicely in -gretl-.

        Do you think you will be developing this with a view to incorporating a bootstrapped PS model, or do you think that might not make conceptual/methodological sense?

        Cheers, Clive

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s