R Markdown

Prepare data.

pheno <- read.csv("RiceDiversityPheno.csv")
line <- read.csv("RiceDiversityLine.csv")
line.pheno <- merge(line, pheno, by.x = "NSFTV.ID", by.y = "NSFTVID")
data <- data.frame(
    height = line.pheno$Plant.height,
    flower = line.pheno$Flowering.time.at.Arkansas)
data <- na.omit(data)
x <- data$flower
y <- data$height

Calculate regression coefficient.

n <- length(x)
ssx <- sum(x^2) - n * mean(x)^2
ssxy <- sum(x * y) - n * mean(x) * mean(y)
b <- ssxy / ssx
b
## [1] 0.6728746
m <- mean(y) - b * mean(x)
m
## [1] 58.05464

Calculate MSE and SSX.

ssr <- b * ssxy
ssr
## [1] 26881.49
ssy <- sum(y^2) - n * mean(y)^2
sse <- ssy - ssr
sse
## [1] 133903.2
mse <- sse / (n - 2)
mse
## [1] 360.9251

test \(H_0: \beta_0 = 0.5\).

t.value <- (b - 0.5) / sqrt(mse/ssx)
t.value
## [1] 2.217253
2 * (1 - pt(t.value, n - 2))
## [1] 0.02721132

Draw a graph of \(t\)-distribution to which \(\frac {b - 0.5} {\sqrt{MSE/SSX}}\) with \(n - 2\) degree of freedom follows.

xx <- seq(-5, 5, 0.01)
tt <- dt(xx, n - 2)
plot(xx, tt, type = "l")
# calculate the 100 * (1 - alpha/2) percentile 
t.975 <- qt(1 - 0.025, n - 2)
t.975
## [1] 1.966379
# calculate the 100 * alpha / 2 percentile
t.025 <- qt(0.025, n - 2)
t.025
## [1] -1.966379
# the above value can be calculated also as
- qt(1 - 0.025, n - 2)
## [1] -1.966379
# this is because the shape t-distribution is symmetric
# draw lines of the bounds
abline(v = t.025, col = "green", lty = "dotted")
abline(v = t.975, col = "blue", lty = "dotted")
# draw the line of t.value
abline(v = t.value, col = "red", lty = "dotted")

The range from the green line to the blue line is the range in which the t-distribution has a value at the 95% probability. The red line is out of the range.

Next, calculate the range in which \(beta_0\) is inclouded at the 95% probability, given the 2.5 percentile and 97.5 percentile of the \(t\) distribution. When the \(beta_0\) is included in the range, the following equation holds. \[ t_{n-2, 0.025} \le \frac {b - \beta_0} {\sqrt{MSE/SSX}} \le t_{n-2, 0.975} \] Here, the 2.5 percentile equal to the 97.5 percentile multiplied by \(-1\), then \[ - t_{n-2, 0.975} \le \frac {b - \beta_0} {\sqrt{MSE/SSX}} \le t_{n-2, 0.975} \] \[ - t_{n-2, 0.975} {\sqrt{MSE/SSX}} \le b - \beta_0 \le t_{n-2, 0.975} {\sqrt{MSE/SSX}} \] \[ -b - t_{n-2, 0.975} {\sqrt{MSE/SSX}} \le - \beta_0 \le -b + t_{n-2, 0.975} {\sqrt{MSE/SSX}} \] \[ b - t_{n-2, 0.975} {\sqrt{MSE/SSX}} \le \beta_0 \le b + t_{n-2, 0.975} {\sqrt{MSE/SSX}} \] From the above equation, the confidence interval of \(\beta_0\) is obtained.