This week, we will start exploring formal statistical models, starting with linear regression. Linear regression is a statistical modeling technique that is used to characterize the relationship between two continuous variables. We will use this approach to test whether measures of soil status (organic matter, mineral availability) are related to nutrient density in the combined Pasa and Bionutrient Institute dataset.
results = "hide"
to show the code but not the output in the
final documentLab_09_Linear-regression
)
PS 9: Regression
File
-> Save As
09_ps_Crop_Nutrient_Regression.Rmd
- replacing
Crop
and Nutrient
with the name of your crop
(e.g. Beet
) and nutrient (e.g. Calcium
)##
for main headings###
for subheadingsTake a moment to record your expectations before you begin.
You may find it helpful to review the exploratory graphs created by the soil status group before spring break. (link here)
Your notes should include the following (3-5 sentences):
Based on what you have learned so far, what do you expect will be the relationships between nutrient density and each of the following measures of soil status?
organic_matter_percentage_10cm
)Assume that soil status is the independent variable and nutrient density is the dependent variable. Your expectations should include whether you expect each relationship to be positive/negative/neutral, strong/weak, and linear or some other shape. Please explain your thinking.
Which relationship to do expect to be stronger, and why?
tidyverse
using library()
combined_clean_indep.csv
)
using read.csv()
filter()
select
to narrow the dataset to the variables of
interest
str()
summary()
ggplot
to…
lm
to fit the two models that you are
interested in.Q: What null hypothesis are these models testing? (2-3 sentences)
Create the diagnostic plots you learned about in lab. Based on what you know about the assumptions of linear regression and your understanding of the experimental design, assess how well your data fits the assumptions of the model.
Record your conclusions about the fit of the data to the assumptions. Be sure to explain the thinking/evidence behind your conclusions. (1-2 sentences per assumption)
Linearity:
Independence:
Normality:
Equal variance:
The next step will depend upon whether or not your model is meeting assumptions. If it is, you can skip ahead to the next section. If there is a mismatch between the model and assumptions, you will need to fit a new model in the chunk below to hone it. Some adjustments you may want to consider:
Remember to re-run your diagnostic checks to see if your efforts have improved the model fit.
Now, you will need to use summary
on your fitted model
to examine the outcomes of your final model. Remember to examine the
output from the model that best fits your assumptions and the data.
Interpret the results of your analysis (4-8 sentences):
Is the relationship between the variables statistically significant in each model? How do you know?
How much of the variation in the response can be explained by each of the predictors? How do you know?
What are the two equations you would use to summarize the relationships between your variables?
What are your overall conclusions? Were your expectations supported or challenged?
Create two final graphs to summarize your data + model for each of your analyses. The graph should include a linear trend line only IF you found a significant linear relationship. If you use a standard linear regression, it should also include a 95% confidence band around the slope. If you used the Thiel-Sen regression, you can include only the trend line (not the confidence band).
Other specifications of graphs should be similar to our prior graphs:
mediumpurple
and deeppink
theme_light
)For your reference, here is a link to the scripts created by the soil status group before spring break: link here
Knit your R Markdown file using the Knit
button at the
top of the code editor. This is a good check on whether your analysis is
reproducible!
To access your file, navigate to the Files
tab in the
lower right window. Find the .html file for your problem set and click
the box next to it. Navigate to More
–>
Export
to download the file. It will likely go to your
downloads folder.
Examine the file closely to make sure that it knitted correctly and
contains all parts of your problem set. If you need to make revisions,
you can simply revise your code and then knit it again. Submit the
.html
file in the appropriate Moodle dropbox.
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] workflowr_1.7.1
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 httr_1.4.7 cli_3.6.2 knitr_1.45
[5] rlang_1.1.3 xfun_0.41 stringi_1.8.3 processx_3.8.3
[9] promises_1.2.1 jsonlite_1.8.8 glue_1.7.0 rprojroot_2.0.4
[13] git2r_0.33.0 htmltools_0.5.7 httpuv_1.6.13 ps_1.7.5
[17] sass_0.4.8 fansi_1.0.6 rmarkdown_2.25 jquerylib_0.1.4
[21] tibble_3.2.1 evaluate_0.23 fastmap_1.1.1 yaml_2.3.8
[25] lifecycle_1.0.4 whisker_0.4.1 stringr_1.5.1 compiler_4.3.2
[29] fs_1.6.3 pkgconfig_2.0.3 Rcpp_1.0.12 rstudioapi_0.15.0
[33] later_1.3.2 digest_0.6.34 R6_2.5.1 utf8_1.2.4
[37] pillar_1.9.0 callr_3.7.3 magrittr_2.0.3 bslib_0.6.1
[41] tools_4.3.2 cachem_1.0.8 getPass_0.2-4