- Decide how the data sets will be accessed by your students. Note: If you are using iPads in your classroom, you will need to first install the Numbers app. Google Sheets does not have the functionality to insert graphs on the iPad. For ease of use, we recommend using computers that have a mouse so students can easily highlight multiple cells of data. XLminer is a toolpak add on that can be used with Google Sheets to perform statistical analysis.
- If students do not know how to use Excel or the graphing program of your choice, provide them with some practice tasks (we have an Excel Tutorial available on our website).
- Decide how you will break your class into four groups (pairs or larger groups), with at least one group for each location for the river sampling activity.
- Ask students to work in pairs to answer the question: “What do you need to know in order to answer the focal question: Do PCB levels vary in striped bass based on where they are caught along the Hudson River?” Monitor students’ progress and make sure they aren’t trying to answer the focal question directly, but instead are making a list of things they would need to know. Students don't need to know the answers to these questions! Questions might include:
̶ Size of fish at different sites
̶ Size of fish
̶ How long have fish spent in the Hudson River?
̶ Differences in PCB levels at different sites
̶ fat levels in the striped bass
̶ age of the bass at different sites
̶ food that the bass has been eating at different sites
Data and monitoring information
̶ the “safe” level of PCBs for eating, and whether fish from ALL sites are either above or below this level
̶ Who monitors water quality?
̶ How many fish are collected?
̶ How often is monitoring done?
̶ How reliable is the testing?
̶ Where is the sampling done?
̶ Are there significant differences in PCB levels in fish at different sites … such that some are above and others below the allowed level
- Ask students to share the ideas that they have generated, and create a list on the board. This will give you an idea of what students are thinking about, and whether they are thinking about the ecological factors (biotic or abiotic), sampling effort, statistical differences, etc.
- As a group discussion, ask students to brainstorm what kinds of data they might want in order to answer the question. Students may want data from the last ten years for every location along the river – talk about the feasibility of data collection at this scale, and then introduce the activity: “Today, you will be investigating whether there are differences in contamination levels of striped bass at five locations in the Hudson River."
Explore for Part 1:
- Explain to students that they will be exploring striped bass data from different locations along the Hudson River. Hand out the student worksheets, and ask students to work alone to answer the first question. Assign each student/student groups one of the four comparisons, and have them answer question #2 (hypothesis creation). Take a quick look at what each group came up with and/or ask students to share their ideas.
- There are four locations for comparison with Troy (river mile 153):
a) Catskill (river mile 113)
b) Poughkeepsie (river mile 76)
c) Haverstraw Bay (river mile 36)
d) George Washington Bridge (river mile 12)
- Students should open the Excel file that corresponds with their assigned locations. They should calculate the average and standard deviation for each location and create a frequency histogram using their location and the Troy data. While there is a way to graph histograms in Excel, it requires using the Data Analysis "add-in" and may be more trouble than it is worth! Graphing by hand will likely be easier. The answer key shows what each graph should actually look like.
- Stop students after they answer question #6 and ask them to share whether they think the averages between their location and Troy are “significantly different”, using just the visual of the histogram and the summary statistics (mean and standard deviation). Based on the visual, they should see that there is not a lot of overlap in their two data sets, and that is one way to deduce that the means are significantly different. Ask students to explain what they mean by the word "significant". Then, hand out the t-test worksheet. Direct students to answer question #7.
- Ask students to share their answers. Point out to students that all of the comparisons between Troy and the other locations are significant.
Explain for Part 1:
- A t-test is used when you want to compare the means from two groups to see if they are statistically different from one another. It is one of the simplest tests. Technically, it asks the question, “how likely is it that we could get a difference when in reality there is no difference?” The result is expressed as a probability, and we commonly use a cutoff of 5% (p=.05) for calling two means significantly different. So, for instance, in the case of the comparison of the means at Troy and Catskill, we would say that the probability of finding that large a difference when in reality both samples are taken from a single distribution is less than 0.004 or less than four tenths of a percent.
- For the full data set from several sites, scientists would use a more complicated ANOVA test (which stands for ANalysis Of VAriance) since we are dealing with comparisons among multiple means. Doing an ANOVA on these data tells you that there are significant differences between the locations. You then would ask which locations are different from each other using either planned "post-hoc" tests. However, for the purposes of this lesson, we are treating each set of locations as separate t-tests as a simple way to get students started in thinking about the issue of significance.
- To avoid having students calculate the t-test themselves, we have created a worksheet "key" that provide the results for all pairs of locations.
Explore for Part 2 – Examining Patterns
- In this section, students look at the bigger picture of PCB levels across locations or river miles, and then also across years (besides 2011).
- Work with students on question #8 - thinking about whether fish are more contaminated at Haverstraw Bay as compared to the George Washington Bridge. Based only on the visual of these data, students should notice that there is a lot of overlap in the datasets, and therefore the comparison is likely not significant. Have students vote on whether they think the differences are significant (using just the histogram as a visual), and then talk through the results of the t-test.
- Here are the t-test results:
George Washington Bridge
t-statistic= 0.705 p = 0.48
* means are not significantly different
- Students will now share their averages from their site and whether their comparison was significant, and create a summary graph of all of the locations, using the average from each location. Allow students to work on the summary graph for a few minutes before checking their work. An example of what the graph should look like is available in the PDF of the Lesson Plan.
- Students may make a scatterplot instead of a bar graph; this is fine, as students may be thinking of asking a question about whether there is a relationship between space and PCB level instead of comparing the groups. This would be a good time to dive into graph choices and discuss why you would make a bar graph instead of a scatter plot, and vice versa. Remind students that they are just looking at one type of fish species for one year. Students should now answer questions #10-14 in Part 2.
- Then, show students all of the raw data by projecting the graph in the PDF of the Lesson Plan and supporting information and have them answer question #15.
George Washington Bridge
- Students should recognize that there is a lot of variability in the data from Troy, which helps explain the high standard deviation. They also should notice the outlier in the data from the George Washington Bridge, which was a fish that was larger than all of the other individuals (7,200g or almost 16 pounds, while the average weight of fish in this group was 2,700g or about 6 pounds). This should help students think about the fact that regardless of location, there is a potential risk of eating a fish that is high in PCBs, because location is just one factor that impacts PCB loading. In Lesson 3, students will explore some of the other factors in more depth.
- Now have students look at the summary data from five years for all the locations, explaining that one way to get more evidence to look for patterns over space is to use data from more years. This graph shows the average PCB level in striped bass, using only the spring sampling dates. In 2009, there were no data collected from Poughkeepsie, Haverstraw, or the George Washington Bridge. Have students use the figure in the PDF of the Lesson Plan to complete questions 16-18.
- Students should notice that in all years, the Albany/Troy locations had higher levels of PCBs on average. This supports their finding in Part 2 that fish from this site are more contaminated than at the other sites. There do not appear to be consistent differnces between the other sites from year to year. Help students think about what is missing when they only see an average for a location for one year. Students should point out
- there is a lot of variability between fish so the standard deviation or range would also be helpful,
- there is variability between years, and the differences in site may vary in different years
- there may be trends over years that are interesting.
- Ask what would help them be more confident in difference due to location - more sampling locations (e.g., between the current sites)? More years? More types of fish? More information about the fish themselves? Then, ask students whether they would be willing to eat striped bass from the Hudson River, based on what they’ve learned.
- Have the students look at the health advice for eating striped bass again – for men and older women it is the same from Catskill to New York City, you can eat it up to once a month. Does that line up with the data shown here?
Explain for Part 2:
- When students make the comparison between any of the sites other than Troy, they will not find significant differences. This can be challenging and frustrating to them. The means appear different, but the statistics say that you can’t be confident in the difference. There are two important lessons for them to learn here: 1) Increasing sample sizes and/or reducing variability make it more likely to find significance, especially when the effect size of the variable of interest (in this case, location) is small. This should lead them to want more samples, or to think creatively about how to reduce variability (e.g., maybe they should just compare fish all of the same age at each site). Also, variables with small effect sizes may be less biologically relevant than other variables with large effect sizes. Conversely, small sample sizes and highly variable samples make it more likely that you could have found differences just by chance when in reality the two groups don’t actually differ. 2) Failing to find significant differences is NOT the same as concluding that the two groups are not different. In the current example, while it is tempting to say that the PCB levels at the GW Bridge and Haverstraw Bay were “the same,” it is more accurate to say that they were different but not significantly so. Subtle, but crucial!
- Fish accumulate PCBs in their bodies by eating smaller organisms that contain PCBs. Fish like striped bass also migrate throughout the estuary, with adults spending the majority of their lives in the open ocean. Consequently, PCB levels will vary based on the size and age of the fish, the location where it lives and has lived, and what it eats. These ideas are explored more in Lesson 3.
- In general, fish that spend the majority of their lives closer to the most contaminated part of the Hudson River estuary will have higher levels of PCBs, due to the higher loading in their prey.
- When students look at an average instead of all of the data, they may say that it is easier to see a trend, but they should recognize that they lose seeing the variability within the dataset.
- Students may be interested in comparing 2011 data with data from another year. We provide summary data from 2008 in the PDF of the Lesson Plan because the two years were very different, and will help students think about whether one year is really "enough" data for a decision like creating a fish advisory. From these data, students should notice that in 2008, there were more fish caught in Poughkeepsie with higher PCB levels than in 2011. Consequently, students should recognize that there is variability in a location between years, and this makes setting an advisory challenging.
- Students may also want to ask a related question: Is there a relationship between location and PCB level? The graph in the PDF of the Lesson Plan shows all the available data for striped bass (2001-2011) for a variety of locations. River mile, which is on the x-axis, refers to the distance above the mouth of the Hudson River. This means that the lowest numbers are closest to New York City, where the Hudson meets the ocean. A trendline with the equation and the R2 value are provided. There are three data points that represent fish with very high PCB levels, but students may be interested in seeing whether there is a strong relationship or not. Remind students that this is just one fish species!
Evaluate: Using exit slips, ask students to answer the following: Is it more harmful to eat striped bass from different parts of the Hudson River? Explain your answer.
New York State Science Learning Standards
HS-LS2-2. Use mathematical representations to support and revise explanations based on evidence about factors affecting biodiversity and populations in ecosystems of different scales. [Clarification Statement: Examples of mathematical representations could include finding the average, determining trends, and using graphical comparisons of multiple sets of data.]
Science and Engineering Practices
Disciplinary Core Ideas
1. Asking questions and defining problems
LS2A: Interdependent Relationships in Ecosystems
4. Analyzing and interpreting data
LS2C: Ecosystem Dynamics, Functioning and Resilience
3. Scale, proportion, quantity
5. Using mathematics and computational thinking
LS4D: Biodiversity and Humans
4. Systems and system models
7. Engaging in argument from evidence
7. Stability and change