Escola de Regressão

DATAS E HORÁRIOS - Comunicações Orais


COMUNICAÇÕES ORAIS – Sessão 1

 

Dia 25/03/2019
Horário: 08h30 às 10h30

CO 1.1 - SINH-SKEW-NORMAL/INDEPENDENT REGRESSION MODELS

Autores: Rocio Maehara (Universidad del Pacífico, Lima, Perú), Heleno Bolfarine (Universidade Estadual de São Paulo, São Paulo, SP, Brasil), Filidor Vilca (Universidade Estadual de Campinas, São Paulo, SP, Brasil) e Narayanaswamy Balakrishnan (McMaster University, Canadá).

Resumo: Skew-normal/independent (SNI) distributions form an attractive class of asymmetric heavy-tailed distributions that also accommodate skewness. We use this class of dis-tributions here to derive a generalization of sinh-normal distributions (Rieck, 1989), called the sinh-skew-normal/independent (sinh-SNI) distribution. Based on this dis- tribution, we then propose a general class of nonlinear regression models, generalizing the regression models of Rieck and Nedelman (1991) that have been used extensively in Birnbaum-Saunders regression models. The proposed regression models have a nice hierarchical representation that facilitates easy implementation of an EM-algorithm for the maximum likelihood estimation of model parameters and provide a robust alterna-
tive to estimation of parameters. Simulation studies as well as applications to a real dataset are presented to illustrate the usefulness of the proposed model as well as all the inferential methods developed here.

CO 1.2 - O MODELO DE REGRESSAO GJS INFLACIONADO EM ZERO OU UM

Autores: Francisco Felipe Queiroz (Universidade de São Paulo, SAO PAULO, SP, Brasil), Artur José Lemonte (Universidade Federal do Rio Grande do Norte, Natal, RN, Brasil) 

Resumo: Em uma ampla variedade de problemas envolvendo taxas, frações e proporções, a variável de interesse pode assumir não apenas valores no intervalo (0,1) como, também, os valores zero ou um. Nessas situações, o modelo de regressão beta, que é uma alternativa para modelagem de dados no intervalo (0,1), não é adequado, já que a variável resposta é discreta nos pontos zero e/ou um e contínua no intervalo (0,1). O modelo de regressão beta inflacionado de zero ou um pode ser utilizado nestes casos. Este trabalho tem como objetivo desenvolver uma alternativa ao modelo de regressão beta inflacionado para análise de taxas e proporções na presença de zeros ou uns. O modelo de regressão proposto é baseado na distribuição GJS (Lemonte e Bazan (2016)). Apresentamos a distribuição GJS inflacionada de zero ou um, seu respectivo modelo de regressão e abordamos aspectos inferenciais para a estimação dos parâmetros do modelo. Além disso, avaliamos o desempenho dos estimadores através de simulações Monte Carlo. Adicionalmente, propomos resíduos para o modelo de regressão GJS inflacionado e aplicamos a técnica de influência local baseada na curvatura normal para identificar possíveis pontos influentes. Ilustramos a metodologia desenvolvida mediante uma aplicação a conjunto de dados reais.

CO 1.3 - INTERVAL-CENSORED DATA WITH MISCLASSIFICATION: A BAYESIAN APPROACH

Autores: Guilherme Augusto Veloso (UFMG, BH, MG, Brasil), Magda Carvalho Pires (UFMG, BH, MG, Brasil), Enrico Antonio Colosimo (UFMG, BH, MG, Brasil), Raquel de Souza Borges Ferreira (UFMG, Belo Horizonte, MG, Brasil)

Resumo: Survival data involving silent events are often subject to interval censoring (the event is known to occur within a time interval) and classification errors if a test with no perfect sensitivity and specificity is applied. Considering the nature of this data plays an important role in estimating the time distribution until the occurrence of the event. In this context, we incorporate validation subsets into the parametric proportional hazard model, and show that this additional data, combined with Bayesian inference, compensate the lack of knowledge about test sensitivity and specificity improving the parameter estimates. The proposed model is evaluated through simulation studies, and Bayesian analysis is conducted within a Gibbs sampling procedure. The posterior estimates obtained under validation subset models present lower bias and standard deviation. Finally, we illustrate the usefulness of the new methodology with an analysis of real data about HIV acquisition in female sex workers that has been discussed in the literature.

CO 1.4 - A FLEXIBLE PROCEDURE FOR FORMULATING PROBABILITY DISTRIBUTIONS ON THE UNIT INTERVAL WITH APPLICATIONS

Autor: Josemar Rodrigues (ICMC-USP, São Carlos, SP, Brasil)

Resumo: In this paper, we present a flexible mechanism for constructing probability distributions on a bounded intervals which is based on the composition of the baseline cumulative probability function and the quantile transformation from another cumulative probability distribution. In particular, we are interested in the (0; 1) intervals. The composite quantile family of probability distributions contains many models that have been proposed in the recent literature and new probability distributions are introduced on the unit interval. The proposed methodology is illustrated with two examples to analyze the poverty dataset in Peru from the Bayesian paradigm and Likelihood point of view.

 

COMUNICAÇÕES ORAIS – Sessão 2

 

Dia 25/03/2019
Horário: 17h às 18h30

CO 2.1 - KUMARASWAMY REGRESSION MODEL WITH ARANDA-ORDAZ LINK FUNCTION

Autores: Guilherme Pumi (Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brasil), Cristine Rauber Oliveira (Universidade Federal de Pernambuco, Recife, PE, Brasil), Fábio Mariano Bayer (Universidade Federal de Santa Maria, Santa Maria, RS, Brasil)

Resumo: In this work we introduce a regression model for double bounded variables in the interval (0, 1) following a Kumaraswamy distribution. The model resembles a generalized linear model in which the response’s median is modeled by a regression structure through the asymmetric Aranda-Ordaz parametric link function. A maximum likelihood approach is considered to estimate the regression and the link function parameters altogether. We study the large sample properties of the proposed maximum likelihood approach, presenting closed forms for the score vector as well as the observed and Fisher information matrices. Some diagnostic tools are presented and discussed. A numeric evaluation of the finite sample inferences is performed through Monte Carlo simulation.

CO 2.2 - INFLUENCE DIAGNOSTICS FOR CENSORED REGRESSION MODELS WITH AUTOREGRESSIVE ERRORS

Autores: Fernanda Lang Schumacher (UNICAMP, Campinas, SP, Brasil), Victor Hugo Lachos (UCONN, Estados Unidos), Filidor E Vilca-Labra (UNICAMP, Campinas, SP, Brasil), Luis M Castro (Pontificia Universidad Católica de Chile, Chile)

Resumo: Observations collected over time are often autocorrelated rather than independent, and sometimes include observations below or above detection limits (i.e., censored values reported as less or more than a level of detection) and/or missing data. Practitioners commonly disregard censored data cases or replace these observations with some function of the limit of detection, which often results in biased estimates. Besides, the parameter estimation can be greatly affected by the presence of influential observations in the data. In this paper, we derive local influence diagnostics measures for censored regression models with autoregressive errors of order p on the basis of the Q-function under three useful perturbation schemes. In order to account censoring in a likelihood-based estimation procedure for AR(p)-CR models, we used a stochastic approximation version of the expectation-maximization (SAEM) algorithm. The accuracy of the local influence diagnostic measure in detecting influential observations is explored through the analysis of empirical studies. The oposed methods are illustrated using data from total phosphorus concentration that contain left-censored observations and are implemented in the R package ARCensReg.

CO 2.3 - REGRESSAO QUANTILICA HIERARQUICA EM AVALIAÇAO EDUCACIONAL: ASPECTOS TEORICOS E COMPUTACIONAIS

Autores: Heliton Ribeiro Tavares (Universidade Federal do Pará, Belém, Pa, Brasil), Dalton Francisco Andrade (Universidade Federal de Santa Catarina, Florianópolis, Sc, Brasil), Pedro Alberto Barbetta (Universidade Federal de Santa Catarina, Florianópolis, Sc, Brasil)

Resumo: A avaliação em larga escala e o estudo de fatores associados ao desempenho escolar são instrumentos fundamentais para a melhoria na implantação de políticas públicas em Educação e, até mesmo, na orientação de gestores e professores na escola. Os modelos hierárquicos de regressão são essenciais nesses estudos por acomodar a estrutura hierárquica de estudantes agrupados em escolas, dentre outros níveis hierárquicos. Este trabalho discute o uso de regressões quantílicas nesses estudos, porque além de permitir avaliar a significância e o efeito de determinado fator, permite também verificar se esse fator tem efeito mais forte em estudantes de baixo ou de alto desempenho.

 

COMUNICAÇÕES ORAIS – Sessão 3

 

Dia 26/03/2019
Horário: 08h30 às 10h30

CO 3.1 ESTIMATION AND DIAGNOSTIC ANALYSIS IN SKEW-GENERALIZED-NORMAL REGRESSION MODELS

Autores: Clécio da Silva Ferreira (UFJF, Juiz de Fora, MG, Brasil) e Reinaldo Boris Arellano-Valle (PUC-Chile, Chile).

Resumo: The skew-generalized-normal distribution [Arellano-Valle, RB, Gómez, HW, Quintana, FA. A new class of skew-normal distributions. Comm Statist Theory Methods 2004;33(7):1465–1480] is a class of asymmetric normal distributions, which contains the normal and skew-normal distributions as special cases. The main virtues of this distribution are that it is easy to simulate from and it also supplies a genuine expectation–maximization (EM) algorithm for maximum likelihood estimation. In this paper, we extend the EM algorithm for linear regression models assuming skew-generalized-normal random errors and we develop diagnostics analyses via local influence and generalized leverage, following Zhu and Lee’s approach. This is because Cook’s well-known approach would be more complicated to use to obtain measures of local influence. Finally, results obtained for a real data set are reported, illustrating the usefulness of the proposed method.

CO 3.2 - PRIOR SPECIFICATIONS TO HANDLE THE MONOTONE LIKELIHOOD PROBLEM IN THE COX REGRESSION MODEL

Autores: Frederico Machado Almeida (UFMG, Belo Horizonte, MG, Brasil), Enrico Antonio Colosimo (UFMG, Belo Horizonte, MG, Brasil), Vinícius Diniz Mayrink (UFMG, Belo Horizonte, MG, Brasil)

Resumo: The phenomenon of monotone likelihood is observed in the fitting process of a Cox model when the likelihood converges to a finite value while at least one parameter estimate diverges to +\- infinity. Monotone likelihood primarily occurs in samples with substantial censoring of survival times and associated to categorical covariates. In particular and more frequent, it occurs when one level of a categorical covariate has not experienced any failure. A solution suggested by Heinze and Schemper (2001) is an adaptation of a procedure by Firth (1993) originally developed to reduce the bias of maximum likelihood estimates. The method leads to finite parameter estimates by means of penalized maximum likelihood estimation. In this case, the penalty might be interpreted as a Jeffreys type of prior well known in Bayesian inference. However, this approach has some drawbacks, especially biased estimators and high standard errors. In this paper, we explore other penalties for the partial likelihood function in the flavor of Bayesian prior distributions. An empirical study of the suggested procedures confirms satisfactory performance of both estimation and inference. We also explore a real analysis related to a melanoma skin data set to evaluate the impact of the different prior distributions as penalizations.

CO 3.3 - THE BETA PRIME REGRESSION MODEL WITH LONG-TERM SURVIVAL

Autores: Jeremias Leao (UFAM, Manaus, AM, Brasil), Marcelo Bourguignon (UFRN, Natal, RN, Brasil), Helton Saulo (UnB, Brasilia, DF, Brasil), Manoel Santos-Neto (UFCG, Campina Grande, PB, Brasil)

Resumo: This paper introduces a cure rate survival model by assuming that the time to the event of interest follows a beta prime distribution and that the number of competing causes of the event of interest follows a negative binomial distribution. This model provides a novel alternative to the existing cure rate regression models due to its flexibility, as the beta prime model can exhibit greater levels of skewness and kurtosis than those of the gamma and inverse Gaussian distributions. Moreover, the hazard rate of this model can have an upside-down bathtub or an increasing shape. We approach both parameter estimation and local influence based on likelihood methods. In special, three perturbation schemes are considered for local influence. Numerical evaluation of the proposed model is performed by Monte Carlo simulations. In order to illustrate the potential for practice of our model we apply it to a real data set.

CO 3.4 - ESTIMATION AND DIAGNOSTICS FOR PARTIALLY LINEAR CENSORED REGRESSION MODELS BASED ON HEAVY-TAILED DISTRIBUTIONS

Autores: Marcela Nuñez Lemus (UNICAMP, Colômbia), Victor Hugo Lachos (UCONN, Estados Unidos), Larissa Avila Matos (UNICAMP, Campinas, SP, Brasil), Christian E. Galarza (UNICAMP, Equador)

Resumo: In many studies, limited or censored data are collected. This occurs, in several practical situations, for reasons such as limitations of measuring instruments or due to experimental design. So, the responses can be either left, interval or right censored. On the other hand, partially linear models are considered as a flexible generalizations of linear regression models by including a nonparametric component of some covariates in the linear predictor. In this work, we discuss estimation and diagnostic procedures in partially linear censored regression models with errors following a scale mixture of normal (SMN) distributions. This family of distributions contains a group of well-known heavy-tailed distributions that are often used for robust inference of symmetrical data, such as Student-t, slash and contaminated normal, among others. A simple EM-type algorithm for iteratively computing maximum penalized likelihood (MPL) estimates of the parameters is presented. To examine the performance of the proposed model, case-deletion and local influence techniques are developed to show its robustness against outlying and influential observations. This is performed by sensitivity analysis of the maximum penalized likelihood estimates under some usual perturbation schemes, either in the model or in the data, and by inspecting some proposed diagnostic graphs. We evaluate the finite sample performance of the algorithm and the asymptotic properties of the MPL estimates through empirical experiments. An application to a real dataset is presented to illustrate the effectiveness of the proposed methods.

 

COMUNICAÇÕES ORAIS – Sessão 4

 

Dia 26/03/2019
Horário: 17h às 18h30

CO 4.1 - DYNAMIC GENERALIZED LINEAR MODELS VIA INFORMATION GEOMETRY

Autores: Raíra Marotta (UFRJ, Rio de Janeiro, RJ, Brasil), Mariane Branco Alves (UFRJ, RJ, RJ, Brasil), Hélio Migon (UFRJ, RJ, RJ, Brasil)

Resumo: Dynamic generalized linear models are an extension of dynamic linear models (in the sense of considering non-Gaussian responses) and for generalized linear models, which consider responses in the exponential family, but presume fixed effects over time. One of the Bayesian inference methods for this class of models was proposed by West et al.(1985) and applies Linear Bayes to obtain estimates of the model, since the canonical parameter and predictive distributions have a closed analytic form. However, the state parameters, which control the structural effects in the predictor, do not. The predictor is deterministically related to the canonical parameter. If a prior distribution is assigned to the states, it implies a prior on the canonical parameter, which must be compatible with the conjugate prior. Thus there are two prior distributions for the natural parameter of the exponential family: one induced by the state vector and the other is the conjugate prior in the exponential family. The solution suggested by West et al. was to equate the first and second moments of such prior distributions. So, the convenience of the closed analytic form for the posterior distribution of the canonical parameter and predictive distribution is preserved. We propose a new form of pooling prior distributions. This approach uses concepts of Information Geometry such as Projection Theorem and Bregman’s Divergence. The idea is to project the prior induced by the vector of states in the space of the conjugated prior distribution and then combine them. Once the priors are combined, the update structure follows the one proposed by W.H.M (1985).

CO 4.2 - ALLEVIATING SPATIAL CONFOUNDING FOR AREAL DATA PROBLEMS BY DISPLACING THE GEOGRAPHICAL CENTROIDS

Autores: Marcos Oliveira Prates (UFMG, Belo Horizonte, MG, Brasil), Renato Martins Assunção (UFMG, Belo Horizonte, MG, Brasil), Erica Castilho Rodrigues (UFOP, Ouro Preto, MG, Brasil)

Resumo: Spatial confounding between the spatial random effects and fixed effects covariates has been recently discovered and showed that it may bring misleading interpretation to the model results. Techniques to alleviate this problem are based on decomposing the spatial random effect and fitting a restricted spatial regression. In this paper, we propose a different approach: a transformation of the geographic space to ensure that the unobserved spatial random effect added to the regression is orthogonal to the fixed effects covariates. Our approach, named SPOCK, has the additional benefit of providing a fast and simple computational method to estimate the parameters. Also, it does not constrain the distribution class assumed for the spatial error term. A simulation study and real data analyses are presented to better understand the advantages of the new method in comparison with the existing ones.

CO 4.3 - PARTIALLY LINEAR MODELS AND THEIR APPLICATIONS TO CHANGE POINT DETECTION OF CHEMICAL PROCESS DATA

Autores: Clécio S Ferreira (UFJF, Juiz de Fora, MG, Brasil), Camila Borelli Zeller (UFJF, Juiz de Fora, MG, Brasil), Aparecida M S Mimura (UFJF, Juiz de Fora, MG, Brasil), Júlio C J Silva (UFJF, Juiz de Fora, MG, Brasil)

Resumo: In many chemical data sets, the amount of radiation absorbed (absorbance) is related to the concentration of the element in the sample by Lambert–Beer’s law. However, this relation changes abruptly when the variable concentration reaches an unknown threshold level, the so-called change point. In the context of analytical chemistry, there are many methods that describe the relationship between absorbance and concentration, but none of them provide inferential procedures to detect change points. In this work, we propose partially linear models with a change point separating the parametric and nonparametric components. The Schwarz information criterion is used to locate a change point. A back-fitting algorithm is presented to obtain parameter estimates and the penalized Fisher information matrix is obtained to calculate the standard errors of the parameter estimates. To examine the proposed method, we present a simulation study. Finally, we apply the method to data sets from the chemistry area. The partially linear models with a change point developed in this paper are useful supplements to other methods of absorbance–concentration analysis in chemical studies, for example, and in many other practical applications.

 

COMUNICAÇÕES ORAIS – Sessão 5

 

Dia 27/03/2019
Horário: 08h30 às 10h

CO 5.1 - MODELOS LINEARES GENERALIZADOS USANDO TENSORFLO

Autores: Julio Adolfo Zucon Trecenti (IME-USP, São Paulo, SP, Brasil)

Resumo: Modelos lineares generalizados (GLMs) são os modelos de regressão mais importantes da estatística. Os GLMs são usualmente ajustados através de um algoritmo de mínimos quadrados ponderados iterados (IWLS), derivado da função escore e da informação de Fisher. No entanto, a implementação padrão do R atual não utiliza adequadamente os recursos disponíveis na máquina. Neste trabalho, atacamos este problema com o pacote `tensorglm`, que ajusta GLMs usando o TensorFlow, uma biblioteca computacional que realiza cálculos em paralelo e tira proveito das Graphic Processing Units (GPUs), aumentando significativamente a capacidade de computação. Mostramos análises de desempenho do tensorglm, comparando-o com a solução base do R e com o pacote `parglm`. Os resultados preliminares apontam que a nova implementação é melhor suas alternativas à medida quando o número de variáveis preditoras é grande em relação ao número de observações.

CO 5.2 - MACHINE LEARNING AND CHORD BASED FEATURE ENGINEERING FOR GENRE PREDICTION IN POPULAR BRAZILIAN MUSIC

Autores: Bruna Davies Wundervald (Maynooth University, Irlanda), Walmes Marques Zeviani (Universidade Federal do Paraná, Curitiba, PR, Brasil)

Resumo: Music genre can be hard to describe: many factors are involved, as style, music technique, and historical context. Some genres can even have overlapping characteristics. Looking for a better understanding of how music genres are related to musical harmonic structures, we gathered data about the music chords for thousands of popular Brazilian songs. Here, the term 'popular' does not refer only to the genre named MPB (Brazilian Popular Music) but to nine different genres that were considered familiar to the Brazilian population. Being so, the main goals of the present work are to extract and engineer harmonically related features from chords data and use it to correctly classify popular Brazilian music genres towards establishing a connection between harmonic relationships and music genres. We also emphasize the generalization of the method for obtaining the data, allowing for the replication and direct extension of this work. Our final model is a combination of many classification trees, known in the literature as the random forest model. We observed that harmonic elements can satisfactorily predict music genre for the Brazilian case, as well as give a good description of how it relates to each genre classification.

CO 5.3 - LADR: AN R PACKAGE FOR FIT, INFERENCE AND DIAGNOSTICS IN LAD MODELS

Autores: Kévin Allan Sales Rodrigues (IME-USP, São Paulo, SP, Brasil), Silvia Nagib Elian (IME-USP, São Paulo, SP, Brasil)

Resumo: The main objective of this text is to present the LadR package. The focus of this package is an alternative method of regression that is called L1 regression, also known as LAD (least absolute deviations). LAD is robust to outliers in the Y variable while the traditional least squares method does not provide robustness to this type of outlier. We will present some topics on diagnostic measures to highlight influential observations in L1 regression context. Among the measures of influence we will use the likelihood displacement and the conditional likelihood displacement. We also present the LadR package. LadR is implemented in R and it is freely available through the CRAN project (https://cran.r-project.org/package=LadR) for Windows, Linux and Mac OS X operating systems.

 

COMUNICAÇÕES ORAIS – Sessão 6

 

Dia 27/03/2019
Horário: 08h30 às 10h

CO 6.1 - FITTING LINEAR MIXED MODELS UNDER NONSTANDARD ASSUMPTIONS: A BAYESIAN APPROACH

Autores: Aline S. Damascena (Universidade de São Paulo, Brasil, SP, Brasil), Francisco Marcelo M. Rocha (Universidade Federal de São Paulo, São Paulo, SP, Brasil), Julio M. Singer (Universidade de São Paulo, São Paulo, SP, Brasil)

Resumo: The standard assumptions considered for fitting Linear Mixed Models (LMM) to longitudinal data include Gaussian distributions and homoskedastic conditional independence. In many situations, however, these assumptions may not be adequate. From a frequentist point of view, adopting different distributions along with more general within subjects covariance structure can be nontrivial because the integrals involved in the estimation process do not allow analytical expressions. Under a Bayesian approach, however, the estimation process can be facilitated by using posterior conditional distributions, which are generally more treatable than posterior marginal distributions. We use diagnostic tools to show that a Gaussian distribution for the random effects is not acceptable for fitting a LMM with AR(1) structure for the within subjects covariance matrix to a dataset involving lactation of dairy cows. We consider alternative Bayesian models adopting t distributions with different degrees of freedom for the random effects. The results indicate that the fixed effects are not considerably affected by the different models, but the corresponding standard errors are smaller when heavier tailed distributions are adopted.

CO 6.2 - THE GENERALIZED GAMMA ZERO-INFLATED CURE-RATE REGRESSION MODEL: AN APPLICATION ON LABOR DATASET

Autores: Hayala Cristina Cavenague Souza (Faculdade de Medicina de Ribeirão Preto - USP, Ribeirão Preto, SP, Brasil), Gleici Silva Castro Perdoná (Faculdade de Medicina de Ribeirão Preto - USP, Ribeirão Preto, SP, Brasil), Francisco Louzada (Instituto de Ciências Matemáticas e de Computação - USP, São Carlos, SP, Brasil)

Resumo: When the interest is to study the time between the admission of pregnant women and the vaginal delivery, it is possible that some women have times equal to zero due to fetal death at admission. Standard survival models usually do not allow times equal to zero and then these times are generally excluded from the modelling. The objective of this paper is to consider the Generalized Gamma Zero-Inflated Cure-Rate model in the context of labor time, including times equal to zero, and to evaluate likelihood-based estimation procedures for the parameters by a simulation study and then apply to a real dataset. In general, the inference procedures showed a better performance for larger samples and low proportions of zero inflation and cure rate. To exemplify how this model can be an important tool for investigating the course of the childbirth process, we considered the World Health Organization (WHO) dataset. We thanks WHO for granting us permission to use the dataset.

CO 6.3 - A FLEXIBLE MODEL FOR THE ANALYSIS OF COUNT DATA

Autores: Bernardo Borba de Andrade (UnB, Brasilia, DF, Brasil), Raul Matsushita (UnB, Brasilia, DF, Brasil), Sandro Oliveira (Rede SARAH, Brasilia, DF, Brasil)

Resumo: This article presents inferential tools for analyzing count data with the Touchard model. The Touchard distribution is a simple two-paramter model which has been proposed as an alternative to the Poisson with the ability to model both over- and under-dispersion. Here we develop estimation by maximum likelihood and method of moments, random number generation, visual tools for univariate analysis and, most importantly, regression modeling, diagnostics and prediction. Graphical tools for goodness-of-fit are also discussed. Results with datasets with different characteristics in terms of size, dispersion and excess of zeros indicate that the model is competitive against popular distributions such as negative binomial and COM-Poisson.