Learning objectives

Background

This week, we will continue exploring formal statistical models, moving on to analysis of variance (ANOVA). ANOVA is a statistical modeling technique that is used to test whether a continuous response variable differs by a categorical grouping factor. We will use this approach to test whether nutrient density differs among crop varieties in the combined Pasa and Bionutrient Institute dataset.

Resources

Part 1: Create an R Markdown file for your problem set

Part 2: Set expectations

Take a moment to record your expectations before you begin.

You may find it helpful to review the exploratory graphs created by the variety group before spring break. (link here)

Your notes should include the following (2-4 sentences):

Based on what you have learned so far and your exploration of the data, do you expect to see differences among varieties in your crop and nutrient? Why or why not?

Assume that variety is the independent variables and nutrient density is the dependent variable. Your expectations should include any specific patterns you expect to see among groups (e.g. do you have any expectations about which varieties will be more nutritious?, etc.).

Part 3: ANOVA for differences among varieties

A: Prepare your dataset

  • Data loading + checking
    • Load the tidyverse using library()
    • Load the necessary dataset (varieties_clean.csv) using read.csv()
      • FYI - this dataset has been filtered to remove Pasa data from BI, so there is no duplication of values in the dataset (what does this have to do with independence?). This dataset has also been cleaned to only include those varieties with at least four observations across the Pasa and BI dataset combined.
  • Data transformation
    • Filter the dataset using filter()
      • Include only your crop
    • Use select to narrow the dataset to the variables of interest
      • See the data dictionary to decide what variables you need: Pasa and BI
      • I recommend using the variable var_title for variety, because it has already been reformatted to be cleaner for graphing.
    • Filter the dataset to remove NA values
      • After selecting your variables, use drop_na() to remove NA values
    • Store the filtered and narrowed dataset as a new dataframe
  • Check your data and make sure it loaded correctly
    • Check the structure of the data using str()
    • Check the sample size using table(df$variable) on your dataframe (df) and independent variable
      • If there are varieties with fewer than 4 observations, remove them using filter()
      • Remember that the != means not equal
      • The form of the code would be filter(var_title != c("variety_name1","variety_name2"))
    • Generate an initial summary of the data using summary()
    • Adjust chunk options so that the code shows, but the output is hidden in your knitted file

B: Fit the model

  • Now, use aov to fit the model that you are interested in.

Q: What null hypothesis is this model testing? What is the alternative hypothesis? (2-3 sentences)

  • A:

C: Assess assumptions

Create the appropriate diagnostic plots. Based on what you know about the assumptions of ANOVA and your understanding of the experimental design, assess how well your data fits the assumptions of the model.

Record your conclusions about the fit of the data to the assumptions. Be sure to explain the thinking/evidence behind your conclusions. (1-2 sentences per assumption)

  • Independence:

  • Normality:

  • Equal variance:

Adjust the model if necessary

The next step will depend upon whether or not your model is meeting assumptions. If it is, you can skip ahead to the next section. If it is not, you should consider either transforming the data or using Welch’s ANOVA.

Note: If the Welch’s ANOVA is throwing error messages or strange results (e.g. NA or NaN values in the ANOVA table), it is likely that you have varieties with too-small sample size and/or varieties without any variation in the measurement variable. In that case, use filter() to remove the problematic varieties and re-fit the model. (see more code tips under Prepare your dataset above)

D: Summarize results

Now, you will need to examine the outcomes of your final model (using summary() for standard ANOVA or simply printing the model object for Welch’s ANOVA). Remember to examine the output from the model that best fits your assumptions and the data.

IF you found significant differences, you should perform a post-hoc test to determine which varieties are different from each other (standard ANOVA: HSD.test, Welch’s ANOVA games_howell_test().

Write a brief paragraph summarizing the results of your analysis in full sentences. If you did not find significant differences, this will likely be 1-2 sentences. If you did find significant differences, this will likely be 3-4 sentences. Your paragraph should include:

  • Whether or not you found significant differences among groups (including the name of your statistical test, F value, degrees of freedom, and P value)
    • To make a subscript in R Markdown, you can use the tilde: F~4,30~ becomes F4,30
  • If there were significant differences, where did they occur? (i.e. which groups were different from each other?)
  • What was the direction and magnitude of the observed differences?

E: Final graph

Create a final graph to summarize your data + model.

This should be a bar graph showing the means of each group, with error bars representing +/- 1 standard error. Order the bars from high to low mean using fct_reorder(). (to see how this function works, see the example code from the variety group linked below)

Other specifications of graphs should be similar to our prior graphs:

For your reference, here is a link to the scripts created by the variety group before spring break: link here

Part 4: Final interpretation

Interpret the results of your analysis (4-6 sentences):

Submit your problem set!

Knit your R Markdown file using the Knit button at the top of the code editor. This is a good check on whether your analysis is reproducible!

To access your file, navigate to the Files tab in the lower right window. Find the .html file for your problem set and click the box next to it. Navigate to More –> Export to download the file. It will likely go to your downloads folder.

Examine the file closely to make sure that it knitted correctly and contains all parts of your problem set. If you need to make revisions, you can simply revise your code and then knit it again. Submit the .html file in the appropriate Moodle dropbox.


sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       cli_3.6.2         knitr_1.45        rlang_1.1.3      
 [5] xfun_0.41         stringi_1.8.3     promises_1.2.1    jsonlite_1.8.8   
 [9] workflowr_1.7.1   glue_1.7.0        rprojroot_2.0.4   git2r_0.33.0     
[13] htmltools_0.5.7   httpuv_1.6.13     sass_0.4.8        fansi_1.0.6      
[17] rmarkdown_2.25    jquerylib_0.1.4   evaluate_0.23     tibble_3.2.1     
[21] fastmap_1.1.1     yaml_2.3.8        lifecycle_1.0.4   stringr_1.5.1    
[25] compiler_4.3.2    fs_1.6.3          Rcpp_1.0.12       pkgconfig_2.0.3  
[29] rstudioapi_0.15.0 later_1.3.2       digest_0.6.34     R6_2.5.1         
[33] utf8_1.2.4        pillar_1.9.0      magrittr_2.0.3    bslib_0.6.1      
[37] tools_4.3.2       cachem_1.0.8