PS 6: Draft report section I

Learning objectives

Practice using R Markdown to implement reproducible data analysis in R
- Use chunk options to adjust appearance in the knitted document
Practice basic steps in loading, checking, and preparing a dataset for analysis
Hone understanding of ggplot to create more polished exploratory graphs
Explore patterns in the Pasa and Bionutrient Institute data

Background

This week, we will focus on polishing and expanding our existing graphs to draft our preliminary results to share with Pasa and the Bionutrient Institute. We will focus especially on building capacity to adjust graphs in ggplot for effective data visualization for an external audience. You will continue working with a combined dataset that includes the Pasa and Bionutrient Institute datasets.

Resources

Data dictionary (for combined dataset)
Chunk options
- In addition to the options listed, you can also use results = "hide" to show the code but not the output in the final document
Nutrient reference values
- Davis et al. 2004 (the data table is also file Davis_et_al_2004_clean.csv in Posit Cloud)
- USDA Nutrient Database
Graphing resources
- General ggplot2 reference (see Articles -> FAQ as well)
- R Graph Gallery
- Data to Viz
- Preliminary code + results from Problem Set 4
- How to create a log scale in ggplot2
- How to reorder a boxplot (Note: Use argument .fun = median to sort by the median value)

Part 1: Distribution graphs with your group

Set up your R Markdown document

You will only turn in one .Rmd file for your entire group - that said, please work in parallel so that everyone stays engaged. You can ‘divide and conquer’ by dividing up the different nutrients to figure out what values to annotate. Decide with your group who will turn in the ‘final’ version.

Make sure you are working in the R Project for this week in Posit Cloud (called Lab_06_Draft_report_I)
- This is important because if I have a question about what you did or how your code is working, I need to be able to find/access your code on Posit Cloud
Create a new R Markdown document using the green plus sign in upper left
- Title your document PS 6: Distributions
Navigate to File -> Save As
- Save the file as follows: 06_ps_Crop.Rmd. (replacing Crop with the name of your crop)
Adjust the R Markdown header
- List all group member names as the author of the script
Use subheadings to organize your document into the sections shown below
- Use ## for main headings
- Use ### for subheadings
Use code chunks to organize your code within each section (for those sections needing code)
- Make sure that every one of your code chunks is named in the chunk header
- Both the code and output should be included in your final problem set, unless otherwise noted (this is the default)

Data Preparation

Data loading + checking (code)
- Load the tidyverse using library()
- Load RColorBrewer using library()
- Load the necessary dataset (combined_clean.csv) using read.csv()
- Check the structure of the data using str()
- Generate an initial summary of the data using summary()
- Adjust chunk options so that the code shows, but the output is hidden in your knitted file
Data transformation (code)
- Filter the dataset using filter() to include only your crop and store it as a new dataframe
Check your data and make sure it loaded correctly

Distribution graphs

This section should contain six total graphs - one for each nutrient. They should each have their own chunk and be listed in a particular order (for consistency across the report):

Antioxidants
Polyphenols
Calcium
Magnesium
Phosphorus
Potassium

The goal of the graph is to show the distribution of nutrient values for your crop in the Pasa and Bionutrient Institute datasets and compare them to reference values. Adapt the example code to create each graph. Each one should meet these criteria:

Horizontal boxplot with nutrient concentration on the X axis and organization on the Y axis
Show individual data points using geom_jitter, colored by organization, using the Dark2 color scale from R Color Brewer. Make points semi-transparent using alpha = 0.6.
Boxplots are outlined in black with no fill. Boxplots are layered over the raw data.
Y axis does not need a label
X axis is labeled in format Calcium (mg per 100 g) with the correct units (see data dictionary linked above)
Consider using a log-transform on the X axis if your data are highly skewed
Hide legend (redundant with axes labeled)
Where possible, annotate with labeled vertical lines showing reference values for comparison:
- Antioxidants + polyphenols: No annotations
- Calcium + phosphorus: Use historical values from 1950 and 1999 (from Davis et al. 2004). Check and see if there is an updated value more recent than 1999 in the USDA database, using the crop that most closely matches the crop name in Davis et al. 2004. If so, add that too.
- Magnesium + potassium: Check USDA database and use value(s) from there if available. Annotate with the year of the data. Choose the crop that most closely matches the crop name in Davis et al. 2004.
- Use color #8C8C8C (dark gray) for annotations + adjust year labels for clarity if lines are close.
- Link to USDA nutrient density database
Apply class theme (theme_light)
No title (unnecessary)
Set fig.height = 2.5 in the chunk options to make the graphs the same size.

Interpretation

Look at your final graphs together. What patterns do you see? Summarize the important take-home messages in a short paragraph. Some questions to consider:

How much data is available for your crop in the Pasa and Bionutrient Institute datasets?
Which nutrients have meaningful variability in your crop?
How do Pasa and Bionutrient Institute data compare to historical reference values? Are there any consistent patterns?

Part 2: Individual graphs

Set up your R Markdown document

This next section will be completed individually. You will work on one type of graph - to be determined with your group. The three types of graphs that are needed are:

Relationship of nutrient density to crop variety (variety)

Relationship of nutrient density to soil status (soil status)

Variability in management practices (management)

Make sure you are working in the R Project for this week in Posit Cloud (called Lab_06_Draft_report_I)
- This is important because if I have a question about what you did or how your code is working, I need to be able to find/access your code on Posit Cloud
Create a new R Markdown document using the green plus sign in upper left
- Title your document PS 6: and either Variety, Soil status, or Management
Navigate to File -> Save As
- Save the file as follows: 06_ps_Crop_Graph-type.Rmd. (replacing Crop with the name of your crop and graph type with either Variety, Soil status or Management)
Adjust the R Markdown header
- Write your name as the author of the script
Use subheadings to organize your document into the sections shown below
- Use ## for main headings
- Use ### for subheadings
Use code chunks to organize your code within each section (for those sections needing code)
- Make sure that every one of your code chunks is named in the chunk header
- Both the code and output should be included in your final problem set, unless otherwise noted (this is the default)

Data Preparation

Data loading + checking (code)
- Load the tidyverse using library()
- Load RColorBrewer using library()
- Load the necessary dataset (combined_clean.csv) using read.csv()
- Check the structure of the data using str()
- Generate an initial summary of the data using summary()
- Adjust chunk options so that the code shows, but the output is hidden in your knitted file
Data transformation (code)
- Filter the dataset using filter() to include only your crop and store it as a new dataframe
Check your data and make sure it loaded correctly

Option 1: Relationship of nutrient density to crop variety

Work with the other folks creating the same type of graph to use a consistent format. This will allow our report and presentation to be visually cohesive and easier to understand.

This section should contain six total graphs - one for each nutrient. They should each have their own chunk and be listed in a particular order (for consistency across the report):

Antioxidants
Polyphenols
Calcium
Magnesium
Phosphorus
Potassium

The goal of the graph is to show the distribution of nutrient values across varieties for your crop in the Pasa and Bionutrient Institute datasets. Adapt the example code from Problem Set 4 to meet the following criteria:

Horizontal boxplot with nutrient concentration on the X axis and variety on the Y axis
Boxplots are filled according to variety name (different colors for each variety)
Y axis does not need a label
X axis is labeled in format Calcium (g per 100 g) with the correct units (see data dictionary linked above)
Hide legend (redundant with axes labeled)
Facet by group and use faceting options to drop ‘empty’ varieties
Order varieties on the Y axis by median nutrient content
Clean and capitalize variety names as shown in example code
Apply class theme (theme_light)
No title (unnecessary)

Interpretation

Look at your final graphs for your crop. What patterns do you see? Summarize the important take-home messages in a short paragraph. Some questions to consider:

How many varieties are represented in the two datasets? How much replication is there within varieties?
Which nutrients show meaningful variability across varieties of your crop?
By how much does nutrient density change with variety? (if at all)
Could a farmer choose a variety to increase nutrient density in this crop? Why/why not?
What additional data could help clarify this relationship?
Find one outside source that helps you to put your results in context. Integrate it into your interpretation.

Source(s)

List the citation (APA format) for at least one source that informed your interpretation.

Option 2: Relationship of nutrient density to soil status

Work with the other folks creating the same type of graph to use a consistent format. This will allow our report and presentation to be visually cohesive and easier to understand. Everyone should use soil organic matter as your X variable (this was the most popular choice in the last dataset and is widely regarded as a holistic measure of soil health). Decide as a group whether to use organic matter in the top 10 cm of soil or the variable om_percent. I suggest checking which variable has more data in the two datasets.

This section should contain six total graphs - one for each nutrient. They should each have their own chunk and be listed in a particular order (for consistency across the report):

Antioxidants
Polyphenols
Calcium
Magnesium
Phosphorus
Potassium

The goal of the graph is to explore the relationship of soil status to nutrient density. We will explore whether soil organic matter is related to nutrient density for your crop in the Pasa and Bionutrient Institute datasets. Adapt the example code from Problem Set 4 to meet the following criteria:

Scatterplot with soil organic matter (top 10 cm) on the X axis and nutrient density on the Y axis
- Limit X axis to range from 0% to 25%
Points are colored by data source (Pasa or Bionutrient Institute) and are semi-transparent (alpha = 0.75) using colors mediumpurple and deeppink
Y axis is labeled in format Calcium (g per 100 g) with the correct units (see data dictionary linked above)
X axis is labeled in format % organic matter (0-10 cm)
Legend shows data source
Please do not fit trend lines right now - we will explore that option later in the semester when we learn about linear regression
Apply class theme (theme_light)
No title (unnecessary)
Use chunk options to set the size using fig.height = 3.5 and fig.width = 6

Interpretation

Look at your final graphs for your crop. What patterns do you see? Summarize the important take-home messages in a short paragraph. Some questions to consider:

What range of values for organic matter are represented in the two datasets?
Does it appear there is any relationship between nutrient density and organic matter in this crop? Is that relationship positive or negative? Linear or non-linear?
By how much does nutrient density change with organic matter? (if at all)
Could a farmer increase nutrient density by building organic matter?
What additional data could help clarify this relationship?
Find one outside source that helps you to put your results in context. Integrate it into your interpretation.

Source(s)

List the citation (APA format) for at least one source that informed your interpretation.

Option 3: Variability in management practices

Work with the other folks creating the same type of graph to use a consistent format. This will allow our report and presentation to be visually cohesive and easier to understand. Make sure you are using the same words/labels to refer to the different management practices.

This section should contain six total graphs - one for each management practice. They should each have their own chunk and be listed in a particular order (for consistency across the report):

Certified organic
Hydroponic
Greenhouse
Cover crops
Regenerative
No-till

The goal of the graph is to explore how much data is available to look at the influence of each management practice. Adapt the example code provided in today’s Posit Cloud project to meet the following criteria:

Horizontal barplot showing the number of observations in each dataset for each management practice
Use colors to distinguish the different levels of the management practice (e.g. till vs. no-till)
Legend shows management practices
Legend title is suppressed (unnecessary)
Standardize labels for management levels with the group
Label the X axis as # of observations
Y axis does not need a label
Set bar colors to #d95f02 and #1b9e77
Set bar width using width = 0.5
Apply class theme (theme_light)
No title (unnecessary)
Use chunk option fig.height = 2 to set a consistent size

Interpretation

Look at your final graphs for your crop. What patterns do you see? Summarize the important take-home messages in a short paragraph. Some questions to consider:

How much data is there for your crop in the two datasets?
Which management practices have enough variation to support meaningful comparisons?
What additional data could Pasa collect to increase their ability to study the influence of management on nutrient density?
Find one outside source that helps you to put your results in context. Integrate it into your interpretation.

Source(s)

List the citation (APA format) for at least one source that informed your interpretation.

Submit your problem set!

Knit your R Markdown files using the Knit button at the top of the code editor. This is a good check on whether your analysis is reproducible!

To access your file, navigate to the Files tab in the lower right window. Find the .html file for your problem set and click the box next to it. Navigate to More –> Export to download the file. It will likely go to your downloads folder.

Examine the file closely to make sure that it knitted correctly and contains all parts of your problem set. If you need to make revisions, you can simply revise your code and then knit it again.

Submit BOTH the .html file AND the .Rmd file for each analysis:

Group files: Make sure one member of your group submits these to the appropriate Moodle dropbox
Individual files: Submit your files to the appropriate Moodle dropbox

sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.1

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       httr_1.4.7        cli_3.6.2         knitr_1.45       
 [5] rlang_1.1.3       xfun_0.41         stringi_1.8.3     processx_3.8.3   
 [9] promises_1.2.1    jsonlite_1.8.8    glue_1.7.0        rprojroot_2.0.4  
[13] git2r_0.33.0      htmltools_0.5.7   httpuv_1.6.13     ps_1.7.5         
[17] sass_0.4.8        fansi_1.0.6       rmarkdown_2.25    jquerylib_0.1.4  
[21] tibble_3.2.1      evaluate_0.23     fastmap_1.1.1     yaml_2.3.8       
[25] lifecycle_1.0.4   whisker_0.4.1     stringr_1.5.1     compiler_4.3.2   
[29] fs_1.6.3          pkgconfig_2.0.3   Rcpp_1.0.12       rstudioapi_0.15.0
[33] later_1.3.2       digest_0.6.34     R6_2.5.1          utf8_1.2.4       
[37] pillar_1.9.0      callr_3.7.3       magrittr_2.0.3    bslib_0.6.1      
[41] tools_4.3.2       cachem_1.0.8      getPass_0.2-4

PS 6: Draft report section I

Maggie Douglas

2024-02-04

Learning objectives

Background

Resources

Part 1: Distribution graphs with your group

Set up your R Markdown document

Data Preparation

Distribution graphs

Interpretation

Part 2: Individual graphs

Set up your R Markdown document

Data Preparation

Option 1: Relationship of nutrient density to crop variety

Interpretation

Source(s)

Option 2: Relationship of nutrient density to soil status

Interpretation

Source(s)

Option 3: Variability in management practices

Interpretation

Source(s)

Submit your problem set!