Regressão Logística

class: center, middle, inverse, title-slide

# Regressão Logística
## Visão Geral
### Adriana Andrade
### UFRRJ
### 2021/07/02

---

layout: true
<div style="position: absolute;right:50px;top:11px;color:gray;">Adriana Andrade - UFRRJ</div>

---
background-image: url(https://gordoncstewart.files.wordpress.com/2016/01/to-be-or-not-to-be.jpg)
background-size: cover

---

#Regressão Logística

## Roteiro

- Propriedades do Modelo de Regressão Logística
- Testes utilizados nessa modelagem
- Interpretação dos resultados
- Aplicação
- Prática

---
# Regressão Logística

## Que tal ajustarmos um modelo?
 
<img src="https://media3.giphy.com/media/CjmvTCZf2U3p09Cn0h/giphy-downsized.gif" width="70%" style="display: block; margin: auto;" />

---

class: inverse, center, middle

# Regressão Logística - Exercício

---
# MRLog - Exercício

## Ajuste MRLog no R

**Função glm** – utilizada para ajustar modelos lineares generalizados e obter estimativas pontuais para os parâmetros e algumas medidas de qualidade de ajuste.

Especificar:

- Variável dependente
- Variáveis explicativas
- Family - a distribuição de probabilidade da variável resposta do modelo
- Link – função de ligação

As estatísticas de adequação e diagnóstico serão obtidas pelo pacote **blorr**.

---
# MRLog - Exercício

## Data Titanic

Data sobre o Titanic, navio construído na Irlanda que naufragou quatro dias após sua viagem inaugural, em 1912. Quando construído, o navio prometia ser o mais luxuoso e seguro de sua época. Entretanto, estudos posteriores indicaram falhas no sistema de segurança e evacuação. A estimativa é de 1514 mortes entre os 2224 passageiros, ou seja, aproximadamente 68% da tripulação.

---

# MRLog - Exercício

.panelset[
.panel[.panel-name[R Code]

```r
#install.packages("titanic")
library(titanic)
data<-titanic_train
```
]

.panel[.panel-name[Data]

```
'data.frame':	891 obs. of  12 variables:
 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
 $ Sex        : chr  "male" "female" "female" "female" ...
 $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : chr  "" "C85" "" "C123" ...
 $ Embarked   : chr  "S" "C" "S" "S" ...
```
]
]

---
# MRLog - Exercício

**Parametrização do modelo**

|Variável | Nome | Definição | Categoria de base|
|---------|:----:|:----------|:----------------:|
| Sexo    | Sex  | Fem x Masc|  Masculino |
| Classe de embarque| Pclass |1ª,2ª,3ª | 3ª|
| Idade   | Age  | anos - contínua| -|

---
# MRLog - Exercício

```r
#Remoção de missing values
data<-data[complete.cases(data),]

#Recodificação da variável Survived
data$Survived<-factor(data$Survived,
                      labels = c("Not survived","Survived"),
                      levels = c(0,1))

#Recodificação da Variável Sexo para factor
#Categoria de base: male
data$Sex<-factor(data$Sex,
                 ordered = TRUE,
                  levels = c("male","female"))

#Recodificação do nível 3ª classe para valor zero - categoria de base
data$Pclass[data$Pclass==3]<-0
data$Pclass<-factor(data$Pclass)
```

---
# MRLog - Exercício

## Análise Exploratória

.pull-left[

```r
data %>% 
 ggplot(aes(x = Survived)) +
  geom_bar()
```
]
.right-graph[

]

---
# MRLog - Exercício

## Análise Exploratória

.pull-left[

```r
data %>% 
  ggplot(aes(x=Survived,fill=Sex))+
  geom_bar(position="fill")+
  scale_fill_manual(values=c("blue","red"))
```
]
.right-graph[

]

---
# MRLog - Exercício

## Análise Exploratória

.pull-left[

```r
data %>% 
  ggplot(aes(x=Survived,fill=Pclass))+
  geom_bar(position="fill")
```
]
.right-graph[

]

---
# MRLog - Exercício

## Análise Exploratória

.pull-left[

```r
data %>% 
 ggplot(aes(x=factor(Survived),y=Age,fill=factor(Survived)))+
  geom_boxplot()
```
]
.right-graph[

]

---
# MRLog - Exercício

.panelset[

.panel[.panel-name[R Code]

```r
#Ajuste do Modelo

m1<-glm(Survived~Sex+Pclass+Age, family = binomial(link="logit"), data=data)
summary(m1)
```

]

.panel[.panel-name[Output]

```

Call:
glm(formula = Survived ~ Sex + Pclass + Age, family = binomial(link = "logit"), 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.7303  -0.6780  -0.3953   0.6485   2.4657

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.065003   0.227790  -0.285    0.775    
Sex.L        1.783875   0.146648  12.164  < 2e-16 ***
Pclass1      2.580625   0.281442   9.169  < 2e-16 ***
Pclass2      1.270826   0.244048   5.207 1.92e-07 ***
Age         -0.036985   0.007656  -4.831 1.36e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 964.52  on 713  degrees of freedom
Residual deviance: 647.28  on 709  degrees of freedom
AIC: 657.28

Number of Fisher Scoring iterations: 5
```
]

]

---
# Modelo ajustado 
O modelo ajustado é dado por

`$logito = -0.065 + 1.784Sex_F - 2.581PClass_1 + 1.271 PClass_1+ - 0.037Age$`

--
- Todos os coeficientes estimados das variáveis do modelo de pesquisa são estatisticamente significativos e, portanto,diferentes de zero.

Efeito das Variáveis na ocorrência do desfecho: Sobrevivência do Passageiro

Lembrando: o valor de `$B_j$` representa o efeito no logit `$p/(1-p)$`, ou seja, no log da chance de sobreviver em relação a não sobreviver.
---

# MRLog - Exercício

## OR e interpretação
.panelset[

.panel[.panel-name[R Code]

```r
exp(cbind(OR=coef(m1),confint(m1)))
```
]

.panel[.panel-name[Output]

```
                    OR     2.5 %     97.5 %
(Intercept)  0.9370645 0.6000403  1.4674914
Sex.L        5.9528821 4.4919878  7.9880676
Pclass1     13.2053931 7.7041808 23.2598689
Pclass2      3.5637952 2.2170845  5.7801843
Age          0.9636903 0.9490535  0.9780124
```
]
.panel[.panel-name[Interpretation]

|Variável | OR | Interpretação|
|-------|----|--------------|
|Sex_F   | 5.95 | Mulheres possuem chance de sobrevivência 5,95 vezes maior que homens.
|Pclass1|13,21 | Passageiros da 1ª classe possuem chance de sobrevivência 13,21 vezes maior que chance de sobrevivência dos passageiros da 3ª classe.|
|Pclass2|3,56 | Passageiros da 2ª classe possuem chance de sobrevivência 3,56 vezes maior que chance de sobrevivência dos passageiros da 3ª classe.|
|Age  |0,96 | A cada ano de vida tem-se uma diminuição na chance de sobrevivência na ordem de 0,96.

]
]

---
# Exemplo - Código R

## Valores preditos

Pode-se ainda obter os valores de probabilidade predita pelo modelo para valores específicos da variável explicativa:

#Probabilidades Preditas

```r
prob<-function(sex,pclass1,pclass2,age)
  { 1 /(1+exp(-(-0.06500312 + (1.78387550*sex) + (2.58062532*pclass1) + 
       (1.27082605*pclass2) + (-0.03698527*age))))
}
```

|Perfil |Probabilidae|
|---------------------------|------------------|
|Mulher – 40 anos – 1 classe|0.9437517|
|Mulher – 40 anos – 3 classe|0.559581|
|Homem  – 40 anos – 1 classe|0.7381183|
|Homem  – 40 anos – 3 classe|0.1758945|

---
# Exemplo - Código R

.panelset[
.panel[.panel-name[R Code]

```r
predito<-predict(m1,type="response")
datap<-cbind(data,predito)

datap %>% 
ggplot(aes(x=Age,y=predito,group=Sex,colour=Sex)) +
  geom_line(size = 1) +
  geom_point(size=1) + 
  labs(title = "Probabilidades Preditas",
       y = "Probabilidade", x = "Idade") + 
  scale_color_manual(values=c("cadetblue", "rosybrown1"))+
  facet_wrap(~Pclass,ncol=1 )
```
]

.panel[.panel-name[Plot]

]

]
---
# MRLog - Exercício

.pull-left[

```r
#ANOVA
anova(m1)
```
]

.right-graph[

```
Analysis of Deviance Table

Model: binomial, link: logit

Response: Survived

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev
NULL                     713     964.52
Sex     1  213.816       712     750.70
Pclass  2   78.269       710     672.43
Age     1   25.148       709     647.28
```

]
---

# MRLog - Exercício

##### Medidas de Adequação do Modelo

.panelset[

.panel[.panel-name[R Code]

```r
library(blorr)
blorr::blr_model_fit_stats(m1)
```

]

.panel[.panel-name[Output]

```
                              Model Fit Statistics                                
---------------------------------------------------------------------------------
Log-Lik Intercept Only:      -482.258    Log-Lik Full Model:             -323.642 
Deviance(709):                647.283    LR(4):                           317.233 
                                         Prob > LR:                         0.000 
MCFadden's R2                   0.329    McFadden's Adj R2:                 0.319 
ML (Cox-Snell) R2:              0.359    Cragg-Uhler(Nagelkerke) R2:        0.484 
McKelvey & Zavoina's R2:        0.479    Efron's R2:                        0.407 
Count R2:                       0.789    Adj Count R2:                      0.479 
BIC:                          680.138    AIC:                             657.283 
---------------------------------------------------------------------------------
```
]

]

---
# MRLog - Exercício

.panelset[

.panel[.panel-name[R Code]

```r
##Estatística Hosmer and Lemeshow
blorr::blr_test_hosmer_lemeshow(m1)
```

]

.panel[.panel-name[Output]

```
           Partition for the Hosmer & Lemeshow Test            
--------------------------------------------------------------
                        def = 1                 def = 0        
Group    Total    Observed    Expected    Observed    Expected 
--------------------------------------------------------------
  1       77         8          4.49         69        72.51   
  2       73         14         6.50         59        66.50   
  3       69         7          7.66         62        61.34   
  4       68         11        11.01         57        56.99   
  5       71         8         18.46         63        52.54   
  6       70         26        28.32         44        41.68   
  7       72         38        39.38         34        32.62   
  8       72         42        49.19         30        22.81   
  9       70         66        57.86         4         12.14   
 10       72         70        67.13         2          4.87   
--------------------------------------------------------------

Goodness of Fit Test      
------------------------------
Chi-Square    DF    Pr > ChiSq 
------------------------------
 32.6770      8       1e-04    
------------------------------
```
]

]

---
# MRLog - Exercício

.panelset[

.panel[.panel-name[R Code]

```r
## Matriz de Confusão

blorr::blr_confusion_matrix(m1)
```

]

.panel[.panel-name[Output]

```
Confusion Matrix and Statistics

Reference
Prediction Not survived Survived
         0          356       83
         1           68      207

Accuracy : 0.7885 
     No Information Rate : 0.5938

Kappa : 0.5580

McNemars's Test P-Value  : 0.2546

Sensitivity : 0.7138 
             Specificity : 0.8396 
          Pos Pred Value : 0.7527 
          Neg Pred Value : 0.8109 
              Prevalence : 0.4062 
          Detection Rate : 0.2899 
    Detection Prevalence : 0.3852 
       Balanced Accuracy : 0.7767 
               Precision : 0.7527 
                  Recall : 0.7138

'Positive' Class : 1
```
]

]

---
# MRLog - Exercício

#### Curva ROC

.panelset[
  
.panel[.panel-name[R code]

```r
library(pROC)
roc1<- plot.roc(data$Survived,fitted(m1))

plot(roc1,
     print.auc=TRUE,
     auc.polygon=TRUE, 
     auc.polygon.col="lightblue",
     print.thres=TRUE
     )
```
]

.panel[.panel-name[Plot]
<img src="index_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /><img src="index_files/figure-html/unnamed-chunk-11-2.png" style="display: block; margin: auto;" />

]

]
---
# MRLog - Exercício

.panelset[
  
.panel[.panel-name[R code]

```r
#Validação por Treinamento e Teste
library(caret)
set.seed(123)
indice <- createDataPartition(data$Survived, p = .80, list = FALSE)

#Aplicação do Índice na base
treinamento <- data[indice, ]
dim(treinamento)
teste <- data[-indice, ]
dim(teste)

#Ajuste do modelo de treinamento

m_train<- glm(Survived~Sex+Pclass+Age, family = binomial(link="logit"), data=treinamento)
summary(m_train)

#Predição da variável resposta no conjunto de teste
teste$Survived_pred <- predict(m_train, teste, type = "response")
teste$classifica<-ifelse(teste$Survived_pred>0.5,"Survived","Not Survived")

# Visualização dos valores observados, preditos e classificados
valida<-teste[,c("Survived","Survived_pred","classifica")]
DT::datatable(valida, editable = list(target = 'row'))
```
]

.panel[.panel-name[Output]
<div id="htmlwidget-c43c96a8d8bef7305444" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-c43c96a8d8bef7305444">{"x":{"filter":"none","editable":{"target":"row"},"data":[["1","4","7","14","19","24","35","38","39","53","57","81","82","84","85","94","97","98","99","100","113","115","116","117","119","123","126","131","133","135","140","148","157","163","166","174","176","179","180","190","194","209","212","214","221","226","228","243","249","263","266","274","283","284","293","306","309","311","312","324","328","341","342","343","345","354","370","378","380","383","386","391","393","400","406","408","409","438","442","446","447","448","453","461","462","463","468","477","488","494","509","514","516","538","545","551","572","582","583","584","586","600","601","617","619","632","639","646","652","653","662","678","682","690","692","694","711","727","731","732","746","762","768","771","781","788","797","800","805","815","818","819","828","834","836","841","849","851","861","873","880","888"],["Not survived","Survived","Not survived","Not survived","Not survived","Survived","Not survived","Not survived","Not survived","Survived","Survived","Not survived","Survived","Not survived","Survived","Not survived","Not survived","Survived","Survived","Not survived","Not survived","Not survived","Not survived","Not survived","Not survived","Not survived","Survived","Not survived","Not survived","Not survived","Not survived","Not survived","Survived","Not survived","Survived","Not survived","Not survived","Not survived","Not survived","Not survived","Survived","Survived","Survived","Not survived","Survived","Not survived","Not survived","Not survived","Survived","Not survived","Not survived","Not survived","Not survived","Survived","Not survived","Survived","Not survived","Survived","Survived","Survived","Survived","Survived","Survived","Not survived","Not survived","Not survived","Survived","Not survived","Not survived","Not survived","Not survived","Survived","Not survived","Survived","Not survived","Survived","Not survived","Survived","Not survived","Survived","Survived","Survived","Not survived","Survived","Not survived","Not survived","Not survived","Not survived","Not survived","Not survived","Not survived","Survived","Not survived","Survived","Not survived","Survived","Survived","Survived","Not survived","Not survived","Survived","Survived","Survived","Not survived","Survived","Not survived","Not survived","Survived","Survived","Not survived","Not survived","Survived","Survived","Survived","Survived","Not survived","Survived","Survived","Survived","Not survived","Not survived","Not survived","Not survived","Not survived","Survived","Not survived","Survived","Not survived","Survived","Not survived","Not survived","Not survived","Survived","Not survived","Survived","Not survived","Not survived","Not survived","Not survived","Not survived","Survived","Survived"],[0.111775265608433,0.915757618044961,0.36787420418725,0.0748799837587899,0.53191044378364,0.53334683992476,0.53334683992476,0.114378601828445,0.614266466943264,0.883152486600675,0.819730564025734,0.111775265608433,0.0949665455200823,0.53334683992476,0.83456868586376,0.101874299306781,0.272366332978187,0.56546710505245,0.764418158134539,0.221469365259438,0.111775265608433,0.620398731899085,0.114378601828445,0.0344980875864335,0.559078123900544,0.228255904818984,0.140258469994808,0.0864096368263455,0.428609183793071,0.264347487108046,0.559078123900544,0.66794665594925,0.626492786540831,0.101874299306781,0.149914987701371,0.114378601828445,0.122507232911281,0.23988898962347,0.0804565527595327,0.0804565527595327,0.388791195131404,0.626492786540831,0.759711345894003,0.23988898962347,0.128198634336589,0.111775265608433,0.115699968270626,0.244654314532186,0.475009589164771,0.380027984035844,0.212647197483706,0.475009589164771,0.128198634336589,0.119743898858439,0.212647197483706,0.697739031795965,0.23988898962347,0.935330010671904,0.944136310579374,0.815862660741159,0.754940641366303,0.394977402382874,0.935330010671904,0.24948323138947,0.212647197483706,0.104274105737748,0.935330010671904,0.539801763800694,0.119743898858439,0.0884810385627956,0.301159888791525,0.481486956275553,0.0972212437313369,0.791305122526224,0.221469365259438,0.388791195131404,0.114378601828445,0.807934761889959,0.117034582599097,0.680615269602573,0.84841132586915,0.494458197669333,0.520405593420107,0.40477850086371,0.0843822388770947,0.411048013434342,0.355885993893363,0.221469365259438,0.344075805617989,0.272366332978187,0.0972212437313369,0.869078242203417,0.411048013434342,0.925245409752154,0.392334082874588,0.603276065460158,0.872003692785762,0.907393571879156,0.144760198986219,0.481486956275553,0.944136310579374,0.398539904742484,0.807934761889959,0.0843822388770947,0.876080462696764,0.05595931177818,0.467104055414228,0.40477850086371,0.830953497846562,0.114378601828445,0.073101456377156,0.614266466943264,0.539801763800694,0.948104509260459,0.696081567652232,0.104274105737748,0.935330010671904,0.782601651417608,0.927021193770214,0.143418087023928,0.277541244528421,0.0713619135101058,0.535140714179914,0.106723725753493,0.644529156875674,0.153253361656219,0.883152486600675,0.538368039254982,0.0995235864051246,0.0916722399168089,0.235187582507667,0.0679969831620955,0.401197430578353,0.109223875348543,0.907393571879156,0.117034582599097,0.24948323138947,0.167218054753944,0.0713619135101058,0.500947710312181,0.86305692905381,0.942751267796851],["Not Survived","Survived","Not Survived","Not Survived","Survived","Survived","Survived","Not Survived","Survived","Survived","Survived","Not Survived","Not Survived","Survived","Survived","Not Survived","Not Survived","Survived","Survived","Not Survived","Not Survived","Survived","Not Survived","Not Survived","Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Survived","Survived","Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Survived","Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Survived","Not Survived","Survived","Survived","Survived","Survived","Not Survived","Survived","Not Survived","Not Survived","Not Survived","Survived","Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Survived","Not Survived","Not Survived","Not Survived","Survived","Not Survived","Survived","Survived","Not Survived","Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Survived","Not Survived","Survived","Not Survived","Survived","Survived","Survived","Not Survived","Not Survived","Survived","Not Survived","Survived","Not Survived","Survived","Not Survived","Not Survived","Not Survived","Survived","Not Survived","Not Survived","Survived","Survived","Survived","Survived","Not Survived","Survived","Survived","Survived","Not Survived","Not Survived","Not Survived","Survived","Not Survived","Survived","Not Survived","Survived","Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Not Survived","Survived","Not Survived","Not Survived","Not Survived","Not Survived","Survived","Survived","Survived"]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>Survived<\/th>\n      <th>Survived_pred<\/th>\n      <th>classifica<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":2},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

]

---
# MRLog - Exercício

.panelset[

.panel[.panel-name[R Code]

```r
blorr:: blr_step_aic_both(m1)
```

]

.panel[.panel-name[Output]

```
Stepwise Selection Method 
-------------------------

Candidate Terms:

1 . Sex 
2 . Pclass 
3 . Age

Variables Entered/Removed:

- Sex added 
- Pclass added 
- Age added 
```

```

Stepwise Summary                    
------------------------------------------------------
Variable     Method       AIC        BIC      Deviance 
------------------------------------------------------
Sex         addition    754.700    763.842     750.700 
Pclass      addition    680.431    698.714     672.431 
Age         addition    657.283    680.138     647.283 
------------------------------------------------------
```
]

.panel[.panel-name[R Code Plot]

```r
m1 %>%
  blr_step_aic_both() %>%
  plot()
```

]

.panel[.panel-name[Plot]

```
FALSE Stepwise Selection Method 
FALSE -------------------------
FALSE 
FALSE Candidate Terms: 
FALSE 
FALSE 1 . Sex 
FALSE 2 . Pclass 
FALSE 3 . Age 
FALSE 
FALSE 
FALSE Variables Entered/Removed: 
FALSE 
FALSE - Sex added 
FALSE - Pclass added 
FALSE - Age added
```

<img src="index_files/figure-html/step_plot_out-1.png" style="display: block; margin: auto;" />
]

]

---
# Regressão Logística

## Bom trabalho pessoal!
 
<img src="https://media2.giphy.com/media/QLtO7Hs5FXtJe/giphy-downsized.gif" width="70%" style="display: block; margin: auto;" />