About the Datasets
All datasets have been collected by professional scientists or research agencies who have kindly shared their data with Hudson Data Jam. Each dataset contains a link to a Google Sheet with the data and a brief introductory section describing the research.
You can view and sort the 30+ datasets at the bottom of this page.
For a Google Drive folder of sample graphs to guide student exploration, advisors can contact email@example.com.
New This Year
- Want to use a unique dataset not listed in our collection? Email us a message (firstname.lastname@example.org) that includes a link to the proposed dataset, or attach a file. Cary Educators will evaluate whether the dataset may be used in the competition.
- Some datasets were removed from Data Jam. In an effort to improve our collection and the student experience, we carefully selected and updated many of our datasets for the 2023 competition. This includes removing datasets that were either rarely used and/or needed substantial updates. However, if you notice a dataset missing that you had really wanted to work with, please email us! Your feedback is important.
How to work with a dataset: When you open a Google Sheet, click "File" in the upper left corner. From the drop-down list, either 1) Click "Download," which will download it as a Microsoft Excel file or 2) Click "Make a Copy" and save a copy of the Google Sheet to a personal Google Drive folder. Please do not "Request Access" to the dataset's Google Sheet. Thank you!
You can also work with select datasets within TUVA - see details below.
Tuva & the Hudson Valley Data Portal
We’ve created a partnership with TuvaLabs, Inc. to host many* of our Data Jam datasets on their interactive graphing platform. Students can drag and drop the variables right onto the axes and build graphs in seconds without the complexity of manipulating a spreadsheet.
*Note: Many, but not all, of our datasets are on the TUVA platform. More of the Level 1 and Level 2 datasets are available on TUVA, and fewer Level 3 datasets. If a Data Jam dataset is available in TUVA, there will be a link to it on its associate webpage.
Explanation of Levels
Dataset levels are derived by looking at the number of factors in the dataset and by the sheer amount of data collected. We suggest that elementary students begin with Level 1 datasets, especially if it's their first competition. Most middle schoolers will be successful with a Level 1 or 2 dataset, and the appropriate level for your high schoolers depends on their data experience and determination. Drop us a line if you need help selecting an appropriate dataset for your student.
Level 1= Easy
Level 2= Moderate
Includes an additional PDF with background information and extra resources. These topics are a good starting place for students who are new to data analysis.
Salt Pollution in a Hudson River Tributary
When scientists do a 'budget' of a water source, it helps to think of a bank account. You want to know how much goes in, and how much goes out, of your bank account.
Long-Term Hudson River Fish Surveys (NYSDEC)
The DEC collected a variety of fish in the spring, summer, and early fall when eggs, larvae, and juveniles are more plentiful. This dataset shows their results for tomcod, striped bass, rainbow smelt, and American shad.
Zebra Mussels & Other Organisms
Zebra mussels were first detected in the Hudson in 1991. By 1992 they had spread throughout the freshwater and slightly brackish parts of the estuary and had a biomass greater than the combined biomass of all other consumers.
Historic Pollution and Human Impacts
Wastewater enters the Hudson River from point sources including municipal and industrial wastewater treatment plants, combined sewer overflows, urban storm water, and tributaries of the Hudson River such as Fishkill Creek.
Pharmaceuticals found in the Hudson River Estuary
Understanding how human activity influences the Hudson is a prime concern for the maintenance of the river, especially as the human population grows.
Traffic, Air Pollution, and Human Demographics in New York
Air pollution from traffic can be a major problem in many parts of the world. This dataset examines how traffic congestion and associated pollutants are related to the demographics of the populations that live near traffic.
Blood Lead Levels, Poverty and Housing Trends for Mid-Hudson Valley and NYC
In this dataset, students can explore the relationship between childhood lead levels, county, and poverty level, and explore how these relationships have changed over time.
Primary Productivity in the Hudson River Estuary
Using data from the Hudson River Environmental Conditions Observation System (HRECOS) you can look at how primary productivity changes daily and over the growing season.
Mosquitoes in Two Different Pond Habitats
Mosquitoes play an integral role in the spread of diseases such as malaria, dengue fever, West Nile fever, and encephalitis.
Storm Impacts on Water Chemistry in a Hudson River Tributary
Samples were collected from the East Branch of the Wappinger Creek on Cary Institute grounds in Millbrook, NY.