This week, we will continue exploring formal statistical models, moving on to analysis of variance (ANOVA). ANOVA is a statistical modeling technique that is used to test whether a continuous response variable differs by a categorical grouping factor. We will use this approach to test whether nutrient density differs among crop varieties in the combined Pasa and Bionutrient Institute dataset.
results = "hide"
to show the code but not the output in the
final documentLab_10_ANOVA
)
PS 10: ANOVA
File
-> Save As
10_ps_Crop_Nutrient_ANOVA.Rmd
- replacing Crop
and Nutrient
with the name of
your crop (e.g. Beet
) and nutrient
(e.g. Calcium
)##
for main headings###
for subheadingsTake a moment to record your expectations before you begin.
You may find it helpful to review the exploratory graphs created by the variety group before spring break. (link here)
Your notes should include the following (2-4 sentences):
Based on what you have learned so far and your exploration of the data, do you expect to see differences among varieties in your crop and nutrient? Why or why not?
Assume that variety is the independent variables and nutrient density is the dependent variable. Your expectations should include any specific patterns you expect to see among groups (e.g. do you have any expectations about which varieties will be more nutritious?, etc.).
tidyverse
using library()
varieties_clean.csv
) using
read.csv()
filter()
select
to narrow the dataset to the variables of
interest
var_title
for variety,
because it has already been reformatted to be cleaner for graphing.drop_na()
to remove
NA
valuesstr()
table(df$variable)
on your
dataframe (df) and independent variable
filter()
!=
means not equalfilter(var_title != c("variety_name1","variety_name2"))
summary()
aov
to fit the model that you are interested
in.Q: What null hypothesis is this model testing? What is the alternative hypothesis? (2-3 sentences)
Create the appropriate diagnostic plots. Based on what you know about the assumptions of ANOVA and your understanding of the experimental design, assess how well your data fits the assumptions of the model.
Record your conclusions about the fit of the data to the assumptions. Be sure to explain the thinking/evidence behind your conclusions. (1-2 sentences per assumption)
Independence:
Normality:
Equal variance:
The next step will depend upon whether or not your model is meeting assumptions. If it is, you can skip ahead to the next section. If it is not, you should consider either transforming the data or using Welch’s ANOVA.
Note: If the Welch’s ANOVA is throwing error
messages or strange results (e.g. NA or NaN values in the ANOVA table),
it is likely that you have varieties with too-small sample size and/or
varieties without any variation in the measurement variable. In that
case, use filter()
to remove the problematic varieties and
re-fit the model. (see more code tips under Prepare your
dataset above)
Now, you will need to examine the outcomes of your final model (using
summary()
for standard ANOVA or simply printing the model
object for Welch’s ANOVA). Remember to examine the output from the model
that best fits your assumptions and the data.
IF you found significant differences, you should perform a post-hoc
test to determine which varieties are different from each other
(standard ANOVA: HSD.test
, Welch’s ANOVA
games_howell_test()
.
Write a brief paragraph summarizing the results of your analysis in full sentences. If you did not find significant differences, this will likely be 1-2 sentences. If you did find significant differences, this will likely be 3-4 sentences. Your paragraph should include:
F~4,30~
becomes F4,30Create a final graph to summarize your data + model.
This should be a bar graph showing the means of each group, with
error bars representing +/- 1 standard error. Order the bars from high
to low mean using fct_reorder()
. (to see how this function
works, see the example code from the variety group linked below)
Other specifications of graphs should be similar to our prior graphs:
coord_flip
)theme_light
)For your reference, here is a link to the scripts created by the variety group before spring break: link here
Interpret the results of your analysis (4-6 sentences):
If a farmer wants to increase nutrient density of your nutrient, could they do so through variety selection? Why or why not?
What do you recommend in terms of specific varieties?
Do you think it would be worthwhile to explore for your crop and nutrient whether there is variation based in color or type (hybrid vs. open pollinated)? Your response should be informed by your results in combination with the information we compiled on varieties.
What are your overall conclusions? Were your expectations supported or challenged?
Knit your R Markdown file using the Knit
button at the
top of the code editor. This is a good check on whether your analysis is
reproducible!
To access your file, navigate to the Files
tab in the
lower right window. Find the .html file for your problem set and click
the box next to it. Navigate to More
–>
Export
to download the file. It will likely go to your
downloads folder.
Examine the file closely to make sure that it knitted correctly and
contains all parts of your problem set. If you need to make revisions,
you can simply revise your code and then knit it again. Submit the
.html
file in the appropriate Moodle dropbox.
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.2 knitr_1.45 rlang_1.1.3
[5] xfun_0.41 stringi_1.8.3 promises_1.2.1 jsonlite_1.8.8
[9] workflowr_1.7.1 glue_1.7.0 rprojroot_2.0.4 git2r_0.33.0
[13] htmltools_0.5.7 httpuv_1.6.13 sass_0.4.8 fansi_1.0.6
[17] rmarkdown_2.25 jquerylib_0.1.4 evaluate_0.23 tibble_3.2.1
[21] fastmap_1.1.1 yaml_2.3.8 lifecycle_1.0.4 stringr_1.5.1
[25] compiler_4.3.2 fs_1.6.3 Rcpp_1.0.12 pkgconfig_2.0.3
[29] rstudioapi_0.15.0 later_1.3.2 digest_0.6.34 R6_2.5.1
[33] utf8_1.2.4 pillar_1.9.0 magrittr_2.0.3 bslib_0.6.1
[37] tools_4.3.2 cachem_1.0.8