# If necessary, please list the packages you need here

Load data.

# this data set was analyzed in Zhao 2011 (Nature Communications 2:467)
pheno <- read.csv("RiceDiversityPheno.csv", stringsAsFactors = T)
line <- read.csv("RiceDiversityLine.csv", stringsAsFactors = T)
line.pheno <- merge(line, pheno, by.x = "NSFTV.ID", by.y = "NSFTVID")

Prepare the variables targeted in the analysis.

mydata <- data.frame(
    Panicle.number.per.plant = line.pheno$Panicle.number.per.plant,
    Panicle.length = line.pheno$Panicle.length,
    Primary.panicle.branch.number = line.pheno$Primary.panicle.branch.number,
    Seed.number.per.panicle = line.pheno$Seed.number.per.panicle,
    Florets.per.panicle = line.pheno$Florets.per.panicle)
missing <- apply(is.na(mydata), 1, sum) > 0
mydata <- mydata[!missing, ]
subpop <- line.pheno$Sub.population[!missing]

Answer to the following questions

  1. Principal component analysis based on particle.number.per.plant, particle.length, primary.particle.branch.number, seed.number.per.particle, and florets.per.particle. Answer whether this principal component analysis should be done based on the covariance matrix or on the correlation matrix.

Write your answaer and comment here

  1. Perform the principal component analysis in 1 and draw a graph showing the magnitude of the variance of each principal component.
# write R code required for the answer
  1. How many principal components are to be selected if the contribution proportion exceeds the average explanatory power per original variable (i.e., if the number of variables is q, the contribution proportion exceeds 1/q)?
# write R code required for the answer

Write your answaer and comment here

  1. For principal component analysis in 1, draw a scatter plot of principal component scores between the first and second principal components. In doing so, color each subpopulation based on the variable subpop.
# write R code required for the answer
  1. Based on the figure in 4, answer which kind of values (large or small, positive or negative etc.) of the first and second principal component scores TEJ takes.

Write your answaer and comment here

  1. Draw a biplot for the first and second principal components.
# write R code required for the answer
  1. Calculate the factor loadings of the first and second principal components and draw a figure of the factor loadings.
# write R code required for the answer
  1. Based on figures 6 and 7, answer for each trait whether each trait takes a large or small value when the first principal component takes a large value. Also answer for each trait what value it takes when the second principal component takes a large value. Note that it is also possible that each trait has a small (no) association with each principal component.

Write your answaer and comment here