PS 5: Historical data

Learning objectives

Practice using R Markdown to implement reproducible data analysis in R
Practice basic steps in loading, checking, and preparing a dataset for analysis
Correctly apply and interpret a one-sample t-test and associated confidence intervals
Explore patterns in historical datasets to gain insight into nutrient decline

Background

This week, we will shift gears a bit to delve in more detail into the historical evidence for nutrient decline. This will involve re-analyzing historical data (from either Mayer 1997 or Davis et al. 2004) to test whether nutrient decline of particular nutrients is evident in vegetables. In the process we will learn how to conduct and interpret a t-test and associated confidence intervals in R.

Resources

Class spreadsheet for results
- This includes student assignments to a particular nutrient-study combination
Data dictionaries
- Mayer 1997
- Davis et al. 2004
Chunk options
- In addition to the options listed, you can also use results = "hide" to show the code but not the output in the final document
Graphing resources
- See Cheat Sheets –> Graph
- See the ggplot2 Cheat Sheet
- See the R Graph Gallery
t tests
- See Cheat Sheets –> t-test for guidance on running t-tests and non-parametric alternatives
- See lecture/lab slides

Part 1: Create an R Markdown file for your problem set

Make sure you are working in the R Project for this week in Posit Cloud (called Lab_05_Historical-data)
- This is important because if I have a question about what you did or how your code is working, I need to be able to find/access your code on Posit Cloud
Create a new R Markdown document using the green plus sign in upper left
- Title your document PS 5: Historical data
Navigate to File -> Save As
- Save the file as follows: 05_ps_Study_Nutrient.Rmd - replacing Study and Nutrient with the name of your study (Davis or Mayer) and nutrient (e.g. Calcium)
- See the class spreadsheet for your nutrient assignment
Adjust the R Markdown header
- Write your student ID number (not your name) as the author of the script
Use subheadings to organize your document into the sections shown below
- Use ## for main headings
- Use ### for subheadings
Use code chunks to organize your code within each section (for those sections needing code)
- Make sure that every one of your code chunks is named in the chunk header
- Use R Markdown chunk options to make your output more readable. You are welcome to suppress output for initial data processing steps, but please show both code and output for all ‘deliverables’ outlined below.

Part 2: Set expectations

Take a moment to record your expectations before you begin. Your notes should include the following (3-4 sentences):

Do you think nutrient density for your assigned nutrient has declined in vegetables over time? Why or why not?
Based on your previous answer, what do you expect to be the mean or median value of the Response Ratio? (= new nutrient value / old nutrient value)
Based on your previous knowledge, do you have any expectations about which types of vegetable(s) will be the richest sources of your assigned nutrient?

Part 3: Analyze historical data

Prepare your dataset

Load libraries you will need for this dataset: tidyverse and DT
Prepare your assigned historical dataset for analysis (either Davis_et_al_2004_clean.csv or Mayer_1997_clean.csv)
- Load the dataset
- Use filter() to include only your assigned nutrient and store it as a new data frame.
- Use mutate() to create a new variable for the response ratio. This should be calculated as the newer value divided by the older value for the same crop.
- Use arrange() to order your data frame from high to low by nutrient concentration. Remember that the minus sign (-) can be used to adjust the order of sorting.
  - You should arrange by the variable that represents the most recent value for nutrient concentration
- Check your data and make sure it loaded correctly and that the range of values for the response ratio appears reasonable.
  - Data dictionary for Mayer 1997
  - Data dictionary for Davis et al. 2004

Note: If you are working with the Mayer 1997 data, please also filter to only include vegetables! (not fruits)

Create a table

Use datatable() to display your filtered dataset, arranged from high to low nutrient density/concentration by crop
Should be arranged by the most recent variable representing nutrient concentration

Test whether nutrients have declined

Your goal now is to conduct a test to determine whether nutrient content of vegetables has declined for your nutrient.

See Cheat Sheets –> t-tests for more guidance on how to implement this in R

Your problem set should include the following steps:

Check the normality assumption for the distribution of the response ratio
- This should include a visual inspection and a formal test
If the data are not normal, try a transformation and check again
Conduct a standard t-test and a Wilcoxon signed rank test to find out whether there is evidence for nutrient decline in this nutrient and dataset
- Hint: What should the response ratio be if nutrients have not declined?
- Please use a 95% confidence limit, as is standard for the field
- Please use a two-sided test, even though our question is mostly one-sided (explanation here)
- Please use the raw data for both tests, not the transformed data
- Think about which of the two tests is most appropriate, given the outcome of the normality assessment
- Don’t worry if you see the warning: “cannot compute exact p-value”. There are different ways to compute P values, an ‘exact’ method that is more computationally intensive, and an ‘approximate’ method that is slightly less accurate but easier to compute. In practice, it rarely matters which you choose so long as your dataset is sufficiently large (and ours should be okay). If you like, you can turn this warning off by setting exact = FALSE.
Interpret your results (2-3 sentences)
- Did you find evidence for nutrient decline in your nutrient?
- Write two sentences summarizing the results of your test, as you would in the Results section of a scientific paper. Your summary should incorporate the results of the test and the confidence interval. Report whichever test you believe is more appropriate for your data (t-test or Wilcoxon signed rank test).

Enter your results into the class spreadsheet

See comments on column headers for tips on where to find the requested information.

Link to class spreadsheet

Part 4: Compare expectations to data

Revisit the expectations you recorded at the beginning. Examine your results and consider them in light of your expectations (4-6 sentences).

Were your expectations met? Why or why not?
What have you learned about nutrient density across crops and across time?
Do your results agree with the results of the original paper?
- Mayer 1997 (see Table 2)
- Davis et al. 2004 (see Table 3 + Figure 2)
What is the relevance of your results for our project with Pasa?

Submit your problem set!

Knit your R Markdown file using the Knit button at the top of the code editor. This is a good check on whether your analysis is reproducible!

To access your file, navigate to the Files tab in the lower right window. Find the .html file for your problem set and click the box next to it. Navigate to More –> Export to download the file. It will likely go to your downloads folder.

Examine the file closely to make sure that it knitted correctly and contains all parts of your problem set. If you need to make revisions, you can simply revise your code and then knit it again. Submit the .html file in the appropriate Moodle dropbox.

sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.1

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       httr_1.4.7        cli_3.6.2         knitr_1.45       
 [5] rlang_1.1.3       xfun_0.41         stringi_1.8.3     processx_3.8.3   
 [9] promises_1.2.1    jsonlite_1.8.8    glue_1.7.0        rprojroot_2.0.4  
[13] git2r_0.33.0      htmltools_0.5.7   httpuv_1.6.13     ps_1.7.5         
[17] sass_0.4.8        fansi_1.0.6       rmarkdown_2.25    jquerylib_0.1.4  
[21] tibble_3.2.1      evaluate_0.23     fastmap_1.1.1     yaml_2.3.8       
[25] lifecycle_1.0.4   whisker_0.4.1     stringr_1.5.1     compiler_4.3.2   
[29] fs_1.6.3          pkgconfig_2.0.3   Rcpp_1.0.12       rstudioapi_0.15.0
[33] later_1.3.2       digest_0.6.34     R6_2.5.1          utf8_1.2.4       
[37] pillar_1.9.0      callr_3.7.3       magrittr_2.0.3    bslib_0.6.1      
[41] tools_4.3.2       cachem_1.0.8      getPass_0.2-4