Learning objectives

Background

This week, we will start exploring formal statistical models, starting with linear regression. Linear regression is a statistical modeling technique that is used to characterize the relationship between two continuous variables. We will use this approach to test whether measures of soil status (organic matter, mineral availability) are related to nutrient density in the combined Pasa and Bionutrient Institute dataset.

Resources

Part 1: Create an R Markdown file for your problem set

Part 2: Set expectations

Take a moment to record your expectations before you begin.

You may find it helpful to review the exploratory graphs created by the soil status group before spring break. (link here)

Your notes should include the following (3-5 sentences):

Based on what you have learned so far, what do you expect will be the relationships between nutrient density and each of the following measures of soil status?

Assume that soil status is the independent variable and nutrient density is the dependent variable. Your expectations should include whether you expect each relationship to be positive/negative/neutral, strong/weak, and linear or some other shape. Please explain your thinking.

Which relationship to do expect to be stronger, and why?

Part 3: Prepare your dataset

Part 4: Examine suitability of data for linear regression

Part 5: Fit the models

Q: What null hypothesis are these models testing? (2-3 sentences)

Part 6: Assess assumptions

Create the diagnostic plots you learned about in lab. Based on what you know about the assumptions of linear regression and your understanding of the experimental design, assess how well your data fits the assumptions of the model.

Record your conclusions about the fit of the data to the assumptions. Be sure to explain the thinking/evidence behind your conclusions. (1-2 sentences per assumption)

Part 6b: Adjust the model if necessary

The next step will depend upon whether or not your model is meeting assumptions. If it is, you can skip ahead to the next section. If there is a mismatch between the model and assumptions, you will need to fit a new model in the chunk below to hone it. Some adjustments you may want to consider:

  • Transforming the X or Y variable (e.g. a log10 transform or square root transform) to improve normality/equal variance
  • Excluding influential data points from the model to see how they affect your conclusions
  • Fitting a non-parametric model (Thiel-Sen regression)

Remember to re-run your diagnostic checks to see if your efforts have improved the model fit.

Part 7: Interpret results

Now, you will need to use summary on your fitted model to examine the outcomes of your final model. Remember to examine the output from the model that best fits your assumptions and the data.

Interpret the results of your analysis (4-8 sentences):

Part 8: Create final graphs

Create two final graphs to summarize your data + model for each of your analyses. The graph should include a linear trend line only IF you found a significant linear relationship. If you use a standard linear regression, it should also include a 95% confidence band around the slope. If you used the Thiel-Sen regression, you can include only the trend line (not the confidence band).

Other specifications of graphs should be similar to our prior graphs:

For your reference, here is a link to the scripts created by the soil status group before spring break: link here

Submit your problem set!

Knit your R Markdown file using the Knit button at the top of the code editor. This is a good check on whether your analysis is reproducible!

To access your file, navigate to the Files tab in the lower right window. Find the .html file for your problem set and click the box next to it. Navigate to More –> Export to download the file. It will likely go to your downloads folder.

Examine the file closely to make sure that it knitted correctly and contains all parts of your problem set. If you need to make revisions, you can simply revise your code and then knit it again. Submit the .html file in the appropriate Moodle dropbox.


sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.1

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       httr_1.4.7        cli_3.6.2         knitr_1.45       
 [5] rlang_1.1.3       xfun_0.41         stringi_1.8.3     processx_3.8.3   
 [9] promises_1.2.1    jsonlite_1.8.8    glue_1.7.0        rprojroot_2.0.4  
[13] git2r_0.33.0      htmltools_0.5.7   httpuv_1.6.13     ps_1.7.5         
[17] sass_0.4.8        fansi_1.0.6       rmarkdown_2.25    jquerylib_0.1.4  
[21] tibble_3.2.1      evaluate_0.23     fastmap_1.1.1     yaml_2.3.8       
[25] lifecycle_1.0.4   whisker_0.4.1     stringr_1.5.1     compiler_4.3.2   
[29] fs_1.6.3          pkgconfig_2.0.3   Rcpp_1.0.12       rstudioapi_0.15.0
[33] later_1.3.2       digest_0.6.34     R6_2.5.1          utf8_1.2.4       
[37] pillar_1.9.0      callr_3.7.3       magrittr_2.0.3    bslib_0.6.1      
[41] tools_4.3.2       cachem_1.0.8      getPass_0.2-4