dplyr
functions to transform a dataset
(e.g. filter
, arrange
)ggplot
to describe the
distribution of key variablesOur work with Pasa Sustainable Agriculture this semester is focused
on completing an exploratory analysis of data from their Nutrient
Density Study, and related datasets. We will use R Markdown and a
package called workflowr
to create a web report to document
and share our work (the website you are reading right now was also
generated using this package!). The report will have several sections,
which we will create through our work together. The first section we
will tackle is a basic description of the main datasets we are working
with.
Building on your responses to last week’s problem set and your reading of background sources related to nutrient decline, each group will generate one paragraph of text for the section of the report that introduces the datasets for our project.
Keep in mind the following as you write:
- Pasa refers to study participants as ‘farmer collaborators’
- The organization name is not capitalized (‘Pasa’ not PASA)
- The audience for our report is likely to include Pasa staff, farmer members (collaborators and not), Bionutrient Institute staff.
Here is a link to a Google doc where you should write your section.
Resources:
Each group will be responsible for one crop in the dataset, as follows:
PS 3: Data description
.File
-> Save As
03_ps_Data-description.Rmd
.###
to organize your document into the
following sections (to work there needs to be a space after the
###
)
Last week we learned that Pasa’s guiding question for your Nutrient Density Study is:
What are impacts of crop management and soil status on nutrient density?
As we begin our study, the first question that we need to answer is:
Are the data appropriate to answer the guiding question?
Take a moment before you begin to record your expectations in answer to this question. What kind(s) of data do you think the dataset will contain? Do you expect that the data will be suitable to answer the question? Why or why not? (2-3 sentences)
tidyverse
library using
library()
pasa_data_clean.csv
) using
read.csv()
str()
summary()
filter()
to include only your
crop and store it as a new dataframesummary()
Generate tables to summarize the # of samples for your crop according to:
* State
* Farms (farms are indicated by the `farmer_id` column)
* Variety
* Crop management (there are multiple associated variables that can be split up)
Each member of the group should create at least three tables. (i.e. you should divide up the variables among your group members)
You should store each table as a new dataframe, and create it using
group_by()
, summarize()
, and
n()
.
Your tables should be arranged (using arrange()
) so that
the rows are ordered by number of samples, high to low.
Use datatable()
to display your table.
Generate graphs to show the distribution of major nutrient outcomes for your crop:
* Antioxidants
* Polyphenols
* Calcium
* Potassium
* Magnesium
* Phosphorus
Each member of the group should create two to three of these graphs. (i.e. you should divide up the work among your group members)
Revisit the expectations you recorded at the beginning. Examine the outcomes from your data summary and consider them in light of your expectations. Do you think the dataset is suitable to answer Pasa’s guiding question? Why or why not? What do you recommend that we do next in light of what you saw in the data? (3-5 sentences)
Knit your R Markdown file using the Knit
button at the
top of the code editor. This is a good check on whether your analysis is
reproducible!
To access your file, navigate to the Files
tab in the
lower right window. Find the file called
03_ps_Data-description.html
and click the box next to it.
Navigate to More
–> Export
to download the
file. It will likely go to your downloads folder.
Examine the file closely to make sure that it knitted correctly and
contains all parts of your problem set. If you need to make revisions,
you can simply revise your code and then knit it again. Submit the
.html
file in the appropriate Moodle dropbox.
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] workflowr_1.7.1
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 httr_1.4.7 cli_3.6.2 knitr_1.45
[5] rlang_1.1.3 xfun_0.41 stringi_1.8.3 processx_3.8.3
[9] promises_1.2.1 jsonlite_1.8.8 glue_1.7.0 rprojroot_2.0.4
[13] git2r_0.33.0 htmltools_0.5.7 httpuv_1.6.13 ps_1.7.5
[17] sass_0.4.8 fansi_1.0.6 rmarkdown_2.25 jquerylib_0.1.4
[21] tibble_3.2.1 evaluate_0.23 fastmap_1.1.1 yaml_2.3.8
[25] lifecycle_1.0.4 whisker_0.4.1 stringr_1.5.1 compiler_4.3.2
[29] fs_1.6.3 pkgconfig_2.0.3 Rcpp_1.0.12 rstudioapi_0.15.0
[33] later_1.3.2 digest_0.6.34 R6_2.5.1 utf8_1.2.4
[37] pillar_1.9.0 callr_3.7.3 magrittr_2.0.3 bslib_0.6.1
[41] tools_4.3.2 cachem_1.0.8 getPass_0.2-4