朝が苦手な院生ブログ

基礎こそ物の上手なれ. 人間万事塞翁が馬. を大切にしている経済学徒.

社会科学のためのデータ分析入門 章末問題解答(2章-1) Rコード

これまでの章の解答はこちら

章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。
 

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。

 
欠点なのかは分かりませんが、こちらのテキストには章末問題の解答がついていません。そして日本語でも英語でもwebで公開されていません(2018年冬ごろの時点では)。2018年冬に私が上巻の章末問題を解いたのですが、一度公開してみようと思ったので複数の記事に分けて投稿していこうと思います。誰かの役に立てればとも思っているのですが、私のコードにミスがあった場合に指摘していただけると嬉しいです。
I would highly appreciate if you could point out mistakes.
 
また同じ変数に関するプロットをする場合でも複数の方法を使ったりもしています。
 

2章-1 (Chapter2 - Section 1)

 
スクリプトをベタ張りしています。

## Chapter 2 Causality 
## Exercise Solution

setwd("~/qss/CAUSALITY") # ご自身のディレクトリを選択

## -----------------------------------------------------
## Taka(the author of this script) uses Japanese-Version QSS.
## -----------------------------------------------------
## Section 1
## Q1

star <- read.csv("STAR.csv")
head(star)

## Create kinder variable.
unique(star$classtype)
star$kinder <- NA 
star$kinder[star$classtype == 1] <- "small" 
star$kinder[star$classtype == 2] <- "middle"
star$kinder[star$classtype == 3] <- "large" 
class(star$kinder)
## Class of tinder has to be factor in the following question.
star$kinder <- as.factor(star$kinder)
unique(star$kinder)

## Re-create race variable.
unique(star$race)
star$race[star$race == 1] <- "white"
star$race[star$race == 2] <- "black"
star$race[star$race == 4] <- "hispanic"
star$race[star$race == 3 | star$race == 5 | star$race == 6] <- "others"


## Q2

small <- subset(star, star$kinder == "small") 
## small <- subset(star, subset = (kinder == "small")) ## same as above
middle <- subset(star, star$kinder == "middle")
large <- subset(star, star$kinder == "large")

## Ignore the missing value by argument na.rm = TRUE
## reading score of 4th grade
sr <- mean(small$g4reading, na.rm = TRUE)
mr <- mean(middle$g4reading, na.rm = TRUE)
lr <- mean(large$g4reading, na.rm = TRUE)

## math score of 4th grade
sm <- mean(small$g4math, na.rm = TRUE)
mm <- mean(middle$g4math, na.rm = TRUE)
lm <- mean(large$g4math, na.rm = TRUE)

## Display & see means of each score.
## Replacing mean by sd, we can get standard deviation of each.
c(sr, mr, lr); c(sm, mm, lm)


## Q3 

srq <- quantile(small$g4reading, 
                probs = seq(0.33, 0.66, 0.33), na.rm = TRUE)
mrq <- quantile(middle$g4reading,
                probs = seq(0.33, 0.66, 0.33), na.rm = TRUE)

smq <- quantile(small$g4math, 
                probs = seq(0.33, 0.66, 0.33), na.rm = TRUE)
mmq <- quantile(middle$g4math,
                probs = seq(0.33, 0.66, 0.33), na.rm = TRUE)

## We can also get them by another way.
## sr33 <- quantile(small$g4reading, probs = 0.33, na.rm = TRUE)
## sr66 <- quantile(small$g4reading, probs = 0.66, na.rm = TRUE)
## 1/3 instead of 0.33 is more acculate.


## Q4

## Make a contingency table.
table(class_size = star$kinder, year = star$yearssmall)

## Function tapply() applies one function repeatedly
## to each level of the factor variable. 
## tapply(x2, x1, mean) means to calculate mean of x2 for x1.

tapply(star$g4reading, star$yearssmall, mean, na.rm = TRUE)
tapply(star$g4reading, star$yearssmall, median, na.rm = TRUE)

tapply(star$g4math, star$yearssmall, mean, na.rm = TRUE)
tapply(star$g4math, star$yearssmall, median, na.rm = TRUE)


## Q5

## White people have higher score in both class size.
tapply(middle$g4reading, middle$race, mean, na.rm = TRUE)
tapply(middle$g4math, middle$race, mean, na.rm = TRUE)

tapply(small$g4reading, small$race, mean, na.rm = TRUE)
tapply(small$g4math, small$race, mean, na.rm = TRUE)


## Q6

tapply(star$hsgrad, star$kinder, mean, na.rm = TRUE)
tapply(star$hsgrad, star$yearssmall, mean, na.rm = TRUE)
tapply(star$hsgrad, star$race, mean, na.rm = TRUE)

 
章末問題解答(1章-1) Rコード
www.econ-stat-grad.com
章末問題解答(1章-2) Rコード www.econ-stat-grad.com