Statisbical Inference for Generalized Semiparametric Nonlinear Models with Nonignorable Missing Data

Author:Tang Lin

Supervisor:Yu Dan, tang Niansheng


Degree Year:2017





Missing data commonly occurs in various fields and it has gained a lot of popularity in statistical analysis.The existing methods developed for analyzing semiparametric models with missing data mainly focused on missing at random(MAR)assumption of missing responses or covariates.However,in practical applications,missing data occurrs as nonignorably missing(NMAR).For example,nonignorable missing data occurrs when the subject refuse to receive medical treatment due to the side effects of the medicine,in such case,missing at random assumption of missing responses or covariates and statistical analysis based on MAR assumption are unreasonable.Therefore,in this dissertation,we aim at developing methods to obtain estimates of parametric and nonparametric component of generalized semiparametric nonlinear models(GSNM)with nonignorable missing responses,and investigate variable selection procedures in GSNM and the missing data mechanism.We also aim at developing a Bayesian analysis and Bayesian local inference for generalized semiparametric nonlinear mixed effects models with nonignorable missing responses.The main purposes of this dissertation include:1.We develop estimation methods for estimating nonparametric component and parameters of interest in generalized semiparametric nonlinear models with nonignorable missing responses.The missing data mechanism is specified by a Logistic regression model.First of all,we obtain the estimation of the nonparametric function by combining local kernel estimation and propensity score adjustment method.Then we obtain the maximum likelihood estimation of parameters of interest by using the EM algorithm.We also prove the consistency and the asymptotic properties for the estimate of nonparametric function and the estimation of parameters.2.Based on the SCAD and ALASSO penalty functions,we investigate the variable selection procedure for both the generalized semiparametric nonlinear models and the missing data mechanism model.The penalization-based method is used to simultaneously estimate parameters and nonparametric function and select important covariates in GSNM and missing data mechanism model,and the estimation of parameters are obtained by maximizing the penalized likelihood.The covariates correspond to the nonzero estimation of coefficient are selected as the significant variables.To ensure that the maximum penalized likelihood estimation has a well-know oracle property,the penalty parameter has to be appropriately selected.The commonly used criterion such as GCV and BIC are not easily implemented in the presence of missing data,thus,the ICQ criterion is used to select penalty parameter.We prove that the ICQ criterion can consistently select the correct model and we also prove the oracle property of the maximum penalized likelihood estimation.3.Many of the existing theories concerning semiparametric mixed effects models are obtained by assuming that the random effects are parametrically distributed.However,sometimes this assumption is unreasonable.Therefore,for the random effects in generalized semiparametric nonlinear mixed effects model(GSNMM)with nonignorable missing responses,we consider the Dirichlet process prior distribution.We also consider measurement errors for the covariates and assume that the measurement errors are distributed as skew normal distribution,which is more reasonable when the data are presented with skewness and heavy tails.The Gibbs sampler is used to simulate random samples from the conditional probability density functions to conduct Bayesian analysis.4.Following Zhu et al.(2011),in order to assess the sensitivity of the model to the minor perturbations of the individuals,the priors,the DP prior and the missing mechanism in GSNMMs with nonignorable missing responses,this dissertation develops a Bayesian local influence approach.A perturbation model is introduced to simultaneously characterize various perturbations,and a Bayesian perturbation manifold is constructed to characterize the intrinsic structure of these perturbations.The first-order and second-order adjusted local influence measures are developed to quantify the effect of various perturbations.Random samples simulated from the posterior probability density by MCMC algorithm are used for Bayesian influence analysis.