データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506...

40
データ解析基礎論B W06:GLM

Transcript of データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506...

Page 1: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

データ解析基礎論BW06:GLM

Page 2: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

回帰分析の理論回帰モデル:!"# = %& + %()#­ !"#: データ(被験者)iの予測された目的変数値­ %&: intercept、回帰線の切片­ %(: slope、回帰線の傾き­ )#:データiの説明変数・独立変数­ 回帰線: 説明変数と目的変数の相関線上近似

母集団の関係:"# = *& + *()# + +#­ +#: 誤差

Page 3: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

回帰分析の理論:仮説・仮定Zero-mean assumption: ! "# = 0, ∀(­ 誤差(epsilon)はランダム変数で、その平均はゼロである。

Constant-variance assumption: )*+, = )*,, ∀(­ 誤差の分散は独立変数の値に関係無く定数である。

Independent assumption: cov "#, "0 = 0, ∀(, 1­ 誤差は独立である。(他の誤差に対して)

Normality assumption: "~3 0, )*­ 誤差は正規分布に従う。

! 4|6 = ! 78 + 7:6 + "|6 = ! 78|6 + ! 7:6|6 + ! "|6 = 78 + 7:6VAR 4|6 = VAR 78 + 7:6 + "|6 = VAR "|6 = )*,4~3 78 + 7:6, )*|6

Page 4: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

回帰分析の診断> par(mfrow=c(2,2))> plot(dat.lm01)

-1.5 -1.0 -0.5 0.0 0.5 1.0

-2-1

01

2

Fitted values

Residuals

Residuals vs Fitted

43

2716

-2 -1 0 1 2

-2-1

01

23

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q-Q

43

27 16

-1.5 -1.0 -0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

Fitted values

Standardized residuals

Scale-Location43

2716

0.00 0.04 0.08 0.12

-2-1

01

23

Leverage

Sta

ndar

dize

d re

sidu

als

Cook's distance 0.5

0.5

1Residuals vs Leverage

16

43

27

Page 5: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

二項分布に従う目的変数の場合

Page 6: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

二項分布に従う目的変数の場合

Page 7: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

二項分布に従う目的変数の場合

Page 8: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

ロジスティック回帰分析ロジスティック回帰分析­ 従属変数:2値 (2項分布、ベルヌーイ)­ 独立変数:定量的&定性的変数­ 目的:

­ 2グループを説明・予測­ 成功・失敗を説明・予測

回帰分析­ 従属変数:定量的変数­ 独立変数:定量的&定性的変数­ 目的:定量的変数を予測

(グループの差を検証)

Page 9: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

勉強時間と単位修得

Page 10: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

回帰モデル案確率を予測するモデル­ P = b0+b1X1

­ P = link(b0+b1X1)

Page 11: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC REGRESSION! "|$ = & $ = exp *+,*-.

1+exp *+,*-.↔ 234 5 .

675 . = 89 + 86$exp 89 + 86$1+exp 89 + 86$

= 11+exp − 89 + 86$

234 & $1 − & $ = 234;< & $

min & $ = 0 &max & $ = 1 → π(x) は確率と解釈できるlim*+,*-. →7E

exp *+,*-.1+exp *+,*-.

= 0

lim*+,*-. →,Eexp *+,*-.1+exp *+,*-.

= 1

Page 12: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

ロジスティック回帰分析

Page 13: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

ロジスティック回帰分析

Page 14: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

例:ロジスティック回帰分析dat.lr<-glm(pass ~ study, family = binomial, data = dat)> summary(dat.lr)

Deviance Residuals: Min 1Q Median 3Q Max

-2.3104 -0.4319 0.1478 0.4921 2.4196

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.14560 0.38588 -8.152 3.59e-16 ***study 0.27346 0.02922 9.359 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 406.83 on 299 degrees of freedomResidual deviance: 212.73 on 298 degrees of freedomAIC: 216.73

⽬的変数の分布︓⼆項分布

Page 15: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC 回帰モデルP(pass) を従属変数とした場合:! "#$$ = &

1+exp , -./-012345

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.14560 0.38588 -8.152 3.59e-16 ***

study 0.27346 0.02922 9.359 < 2e-16 ***

P(P|5)=1/(1+exp(-1*(-3.146+0.273*5)))=0.145

P(P|15)=1/(1+exp(-1*(-3.146+0.273*15)))=0.722

P(P|25)=1/(1+exp(-1*(-3.146+0.273*25)))=0.976

Page 16: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC REGRESSIONの解釈Coefficients:

Estimate Std. Error z value Pr(>|z|) (Intercept) -3.14560 0.38588 -8.152 3.59e-16 ***

study 0.27346 0.02922 9.359 < 2e-16 ***

•勉強時間が1時間増えるごとに変化するoddsの対数 = 0.273

•勉強時間が1時間増えるごとに変化するodds = exp(0.273) = 1.3145•勉強時間が1時間増えるごとに、パスするオッズ比(確率ではない)は1.31倍になる

Page 17: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC REGRESSIONの解釈パスする確率(10時間から15時間の勉強時間)> pred.pass.p = 1/(1+exp(-(coef[1]+coef[2]*c(10:15))))> pred.pass.p[1] 0.3986719 0.4656686 0.5339273 0.6009388 0.6643720 0.7223802

パス・notパスのオッズ> odds=pred.pass.p/(1-pred.pass.p)> odds[1] 0.6629855 0.8714978 1.1455882 1.5058815 1.9794888 2.6020479

オッズの1時間毎の変化> odds[2:6]/odds[1:5][1] 1.314505 1.314505 1.314505 1.314505 1.314505> exp(coef[2])

study1.314505

Page 18: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC REGRESSIONの評価dat<-read.csv("http://www.matsuka.info/data_folder/datWA01.txt")dat.lr<-glm(gender~shoesize,family=binomial,data=dat)

> anova(dat.lr, test ="Chisq")Analysis of Deviance Table

Model: binomial, link: logit

Response: genderTerms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 69 96.983 shoesize 1 42.247 68 54.737 8.045e-11 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

帰無仮説!": $% &' = $% &)*+)

対⽴仮説!,: $% &' ≠ $% &)*+)

Page 19: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

モデルの比較> dat.lr0<-glm(gender~1,family="binomial",data=dat)> summary(dat.lr0)

Deviance Residuals: Min 1Q Median 3Q Max

-1.153 -1.153 -1.153 1.202 1.202

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.05716 0.23914 -0.239 0.811

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 96.983 on 69 degrees of freedomResidual deviance: 96.983 on 69 degrees of freedomAIC: 98.983

Number of Fisher Scoring iterations: 3

!"# = 2& + ()*+

k: パラメター数(「複雑さ」)L: likelihood(「誤差」)

Page 20: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

例:ロジスティック回帰分析dat.lrS<-glm(gender~shoesize,family=binomial,data=dat)> summary(dat.lr)

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -34.3214 7.3489 -4.670 3.01e-06 ***shoesize 1.3817 0.2969 4.654 3.25e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 96.983 on 69 degrees of freedomResidual deviance: 54.737 on 68 degrees of freedomAIC: 58.737

Number of Fisher Scoring iterations: 5

Page 21: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

モデルの比較dat.lrh<-glm(gender~h,family="binomial",data=dat)> summary(dat.lrh)

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -41.01836 9.19792 -4.460 8.21e-06 ***h 0.24958 0.05603 4.455 8.41e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 96.983 on 69 degrees of freedomResidual deviance: 56.410 on 68 degrees of freedomAIC: 60.41

Number of Fisher Scoring iterations: 5

Page 22: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

!"検定とLOGISTIC REGRESSION

喫煙と健康の相関の検定 喫煙者 非喫煙者 計 喫煙率肺癌患者 52(40) 8(20) 60 0.87健常者 48(60) 42(30) 90 0.53計 100 50 150 0.67

M=matrix(c(52,48,8,42),nrow=2)rownames(M)<-c("present", "absent")colnames(M)<-c("smoker",'non-smoker’)

> Msmoker non-smoker

present 52 8absent 48 42

> chisq.test(M)

Pearson's Chi-squared test with Yates' continuity correction

data: MX-squared = 16.5312, df = 1, p-value = 4.785e-05

Page 23: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

!"検定とLOGISTIC REGRESSION

喫煙と健康の相関の検定喫煙者 非喫煙者 計 喫煙率

肺癌患者 52(40) 8(20) 60 0.87健常者 48(60) 42(30) 90 0.53計 100 50 150 0.67

dat<-as.data.frame((as.table(M)))colnames(dat)<-c("cancer","smoking","freq")dat=dat[rep(1:nrow(dat),dat$freq),1:2]rownames(dat)<-c()dat.glm<-glm(cancer~smoking,family=binomial,data=dat)> summary(dat.glm)

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.08004 0.20016 -0.4 0.689 smokingnon-smoker 1.73827 0.43460 4.0 6.34e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Null deviance: 201.90 on 149 degrees of freedomResidual deviance: 182.44 on 148 degrees of freedomAIC: 186.44

Page 24: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

!"検定とLOGISTIC REGRESSION

喫煙と健康の相関の検定喫煙者 非喫煙者 計 喫煙率

肺癌患者 52(40) 8(20) 60 0.87健常者 48(60) 42(30) 90 0.53計 100 50 150 0.67

> anova(dat.glm, test="Chisq")Analysis of Deviance TableModel: binomial, link: logitResponse: cancerTerms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 149 201.90 smoking 1 19.467 148 182.44 1.023e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> chisq.test(M)data: MX-squared = 16.5312, df = 1, p-value = 4.785e-05

Page 25: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

例:新生児の生存率クロス表dat<-read.csv("http://www.matsuka.info/data_folder/cda7-16.csv")

>table(dat) , , NdaysGESTATION = <=260, survival = no

Ncigarettesage <5 5+

<30 50 930+ 41 4

, , NdaysGESTATION = >260, survival = no

Ncigarettesage <5 5+

<30 24 630+ 14 1

, , NdaysGESTATION = <=260, survival = yes

Ncigarettesage <5 5+

<30 315 4030+ 147 11

, , NdaysGESTATION = >260, survival = yes

Ncigarettesage <5 5+

<30 4012 459 30+ 1594 124

Page 26: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

例:新生児の生存率変数­ Age: 年齢(30歳未満or30歳以上)­ Ncigarettes: 喫煙本数(5本未満or5本以上)­ NdaysGESTATION: 妊娠期間(260日以下、261日以上)­ Survival: 生存(yes or no)

Page 27: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC 回帰モデルdat.glm<-glm(survival~age, family=binomial,data=dat)> summary(dat.glm)

Call:glm(formula = survival ~ age, family = binomial, data = dat)

Deviance Residuals: Min 1Q Median 3Q Max

-2.8325 0.1912 0.1912 0.2509 0.2509

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) 3.9931 0.1070 37.333 < 2e-16 ***age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1435.5 on 6850 degrees of freedomResidual deviance: 1425.4 on 6849 degrees of freedomAIC: 1429.4

! " = $%& '( + '*+,$1 + $%& '( + '*+,$

Page 28: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC 回帰モデルdat.glm2<-glm(survival~Ncigarettes,family=binomial,data=dat)> summary(dat.glm1)

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) 3.85097 0.08897 43.283 <2e-16 ***Ncigarettes5+ -0.39466 0.24391 -1.618 0.106 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1435.5 on 6850 degrees of freedomResidual deviance: 1433.2 on 6849 degrees of freedomAIC: 1437.2

! " = $%& '( + '*+,-./01 + $%& '( + '*+,-./0

Page 29: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC 回帰モデルdat.glm3<-glm(survival~NdaysGESTATION,family=binomial,data=dat)> summary(dat.glm3)

Deviance Residuals: Min 1Q Median 3Q Max

-3.1404 0.1204 0.1204 0.1204 0.6076

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.5959 0.1075 14.84 <2e-16 ***NdaysGESTATION>260 3.3280 0.1842 18.06 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1435.5 on 6850 degrees of freedomResidual deviance: 1093.2 on 6849 degrees of freedomAIC: 1097.2

! " = $%& '( + '*+,-."1 + $%& '( + '*+,-."

Page 30: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC 回帰モデルdat.glmAllAdd=glm(survival~age+Ncigarettes+NdaysGESTATION,family=binomial,data=dat)> summary(dat.glmAllAdd)Coefficients:

Estimate Std. Error z value Pr(>|z|) (Intercept) 1.8139 0.1351 13.430 < 2e-16 ***age30+ -0.4675 0.1803 -2.592 0.00954 ** Ncigarettes5+ -0.4228 0.2624 -1.611 0.10710 NdaysGESTATION>260 3.3098 0.1846 17.929 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1) Null deviance: 1435.5 on 6850 degrees of freedomResidual deviance: 1084.8 on 6847 degrees of freedomAIC: 1092.8Number of Fisher Scoring iterations: 7

! " = $%& '( + '*+,$ + '-./0,+1 + '2.3+4"1 + $%& '( + '*+,$ + '-./0,+1 + '2.3+4"

Page 31: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC 回帰モデルdat.glmAllMult=glm(survival~age*Ncigarettes*NdaysGESTATION,family=binomial,data=dat)> summary(dat.glmAllMult)Coefficients:

Estimate Std. Error z value Pr(>|z|) (Intercept) 1.84055 0.15223 12.090 <2e-16 ***age30+ -0.56369 0.23317 -2.418 0.0156 * Ncigarettes5+ -0.34889 0.39911 -0.874 0.3820 NdaysGESTATION>260 3.27844 0.25508 12.853 <2e-16 ***age30+:Ncigarettes5+ 0.08364 0.72896 0.115 0.9087 age30+:NdaysGESTATION>260 0.17964 0.41026 0.438 0.6615 Ncigarettes5+:NdaysGESTATION>260 -0.43281 0.60829 -0.712 0.4768 age30+:Ncigarettes5+:NdaysGESTATION>260 0.78340 1.34987 0.580 0.5617 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1) Null deviance: 1435.5 on 6850 degrees of freedomResidual deviance: 1083.3 on 6843 degrees of freedomAIC: 1099.3

! " = $%& '( + '*+,$ + '-./0,+1 + '2.3+4" + '5+,$×./0,+1…1 + $%& '( + '*+,$ + '-./0,+1 + '2.3+4" + '5+,$×./0,+1…

Page 32: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

モデルの選択Logistic回帰分析では

Model01(AIC=1429.4): ! " = $%& '()'*+,$-)$%& '()'*+,$

Model02(AIC=1437.2): ! " = $%& '()'*./0,+1-)$%& '()'*./0,+1

Model03(AIC=1097.2): ! " = $%& '()'*.2+34-)$%& '()'*.2+34

ModelAllAdd(AIC=1092.8): ! " = $%& '()'*+,$)'5./0,+1)'6.2+34-)$%& '()'*+,$)'5./0,+1)'6.2+34

ModelAllMult(AIC=1099.3):! " = $%& '()'*+,$)'5./0,+1)'6.2+34)'7+,$×./0,+1…-)$%& '()'*+,$)'5./0,+1)'6.2+34)'7+,$×./0,+1…

Page 33: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC 回帰モデルlibrary(MASS)> stepAIC(dat.glmAllMult)Start: AIC=1099.27survival ~ age * Ncigarettes * NdaysGESTATION

Df Deviance AIC- age:Ncigarettes:NdaysGESTATION 1 1083.6 1097.6<none> 1083.3 1099.3

Step: AIC=1097.63survival ~ age + Ncigarettes + NdaysGESTATION + age:Ncigarettes + age:NdaysGESTATION + Ncigarettes:NdaysGESTATION

Df Deviance AIC- Ncigarettes:NdaysGESTATION 1 1083.9 1095.9- age:Ncigarettes 1 1084.0 1096.0- age:NdaysGESTATION 1 1084.1 1096.1<none> 1083.6 1097.6

Step: AIC=1095.86survival ~ age + Ncigarettes + NdaysGESTATION + age:Ncigarettes + age:NdaysGESTATION

Df Deviance AIC- age:Ncigarettes 1 1084.2 1094.2- age:NdaysGESTATION 1 1084.4 1094.4<none> 1083.9 1095.9

Page 34: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

LOGISTIC 回帰モデル> stepAIC(dat.glmAllMult)....

Step: AIC=1094.22survival ~ age + Ncigarettes + NdaysGESTATION + age:NdaysGESTATIONDf Deviance AIC

- age:NdaysGESTATION 1 1084.8 1092.8<none> 1084.2 1094.2- Ncigarettes 1 1086.7 1094.7

Step: AIC=1092.76survival ~ age + Ncigarettes + NdaysGESTATIONDf Deviance AIC<none> 1084.8 1092.8- Ncigarettes 1 1087.2 1093.2- age 1 1091.3 1097.3- NdaysGESTATION 1 1422.4 1428.4

Call: glm(formula = survival ~ age + Ncigarettes + NdaysGESTATION, family = binomial, data = dat)Coefficients: (Intercept) age30+ Ncigarettes5+ NdaysGESTATION>260

1.8139 -0.4675 -0.4228 3.3098 Degrees of Freedom: 6850 Total (i.e. Null); 6847 ResidualNull Deviance: 1436 Residual Deviance: 1085 AIC: 1093

Page 35: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

非負のデータの場合

Page 36: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

回帰モデル案確率を予測するモデル­ count = b0+b1X1

­ count = link(b0+b1X1)

Page 37: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

POISSON REGRESSION! "|$ = & $ = exp *+ + *-$ ↔ /01 & = *+ + *-$

2 3 = &45$6 −&3!

Page 38: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

例:ポワソン回帰分析dat.pr<-glm(eye.count ~ attr, family = poisson, data = dat)> summary(dat.pr)

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.041779 0.065525 0.638 0.524 attr 0.207579 0.009156 22.672 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 652.310 on 499 degrees of freedomResidual deviance: 84.548 on 498 degrees of freedomAIC: 1565.4

Number of Fisher Scoring iterations: 4

⽬的変数の分布︓ポワソン分布

Page 39: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

POISSON REGRESSIONの解釈Coefficients:

Estimate Std. Error z value Pr(>|z|) (Intercept) 0.041779 0.065525 0.638 0.524 attr 0.207579 0.009156 22.672 <2e-16 ***

λ= exp 0.042 + 0.208,魅力が1ユニット増えることに変化する目への着目回数の増加割合exp(0.208) = 1.231倍

x

Page 40: データ解析基礎論B W06:GLM · (Intercept) 3.9931 0.1070 37.333 < 2e-16 *** age30+ -0.5506 0.1692 -3.253 0.00114 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

POISSON REGRESSION! "|$ = & $ = exp *+ + *-$./0 & = *+ + *-$