“On the second floor of the Carnegie Library of Pittsburgh on Sunday, Priya Donti could be found hunched over a laptop, combing through government databases on monthly normal temperatures across different states…”
Kyle Miller works on staff at Carnegie Mellon University, and was working in the Checking and Bagging team on the Outdoor Air Quality dataset from the Environmental Protection Agency. While he was waiting for the large datasets to download and upload (“most of our group’s work is waiting for data to process”), Kyle expressed that he was hopeful that this work would lead to new connections and collaboration, because the agencies that collect these data don’t always talk to each other. Creating this repository of data, he said, could have a significant social and scientific benefit. Current mood: optimistic.
Vishal Dugar was working on the Seeder and Sorter group and is a graduate student in Robotics at Carnegie Mellon University. He participates in a student group called “Tech for Society” that investigates technology for social good. His group had processed 250 links by the middle of the event, and he said that the “sheer scale of the datasets is the most surprising thing that I’ve seen this afternoon.”
Vishal said that 90% of the datasets that he investigated were easy to process, but 10% needed more work to archive. He was impressed by the good organization of the datasets he was looking at and, while he doesn’t typically use government datasets in his work, he hoped that his work would lead to increased use for citizen science and research projects. “It’s nice to use my skills to do something good for people.” Current mood: exhilarated.
Sarah Riccitelli is a student at the University of Pittsburgh pursuing a Masters in Library and Information Science and heard about the Data Rescue event from her professor, Nora Mattern, who is an organizer of the event. She studies archives and is working on a research project about the perception of archives and how events like Data Rescue impact that image. She participated in the Researcher Track, investigating what data were able to be automatically processed and which needed more attention.
Current mood: Hopeful. She was worried that nobody would attend the event, but was happy to see the great turnout and hoped that this would lead to improved connections between data scientists in the area.
Christopher Tracey is an ecologist and conservation planner whose work depends on datasets like those we’re rescuing today. “I can’t do my work without it–I have a vested interest,” he says. As a bagger at today’s event, he’s just uploaded one of those datasets, a set of EPA information on ambient water quality.
Christopher works on state wildlife action plans, which were federally mandated beginning in 2005 and currently enjoy bipartisan support. These plans are designed to keep species from becoming threatened or endangered, and in Pennsylvania we have 664 animal species that might become threatened or endangered–they range from the birds and mammals you might suspect to nearly 200 species of invertebrates.
Christopher told the hopeful story of the “saddest bird”–a male piping plover (a species that hasn’t reproduced in Pennsylvania since 1954) that spent an entire season at Presque Isle near Erie looking for a mate. The story is hopeful because he attributes the bird’s willingness to remain at Presque Isle to conservation efforts that made the habitat hospitable and may, in future years, mean more piping plovers will make the area their home.
As an environmental educator, Miranda Crotsley cares deeply about preserving scientific data. “One of the fundamental aspects of science is that what you do must be reproducible,” she says. “When you don’t have that data to look back at, the process is broken.”
In her work at Jennings Environmental Education Center, a Pennsylvania state park, Miranda uses data mainly for managing resources: Jennings has the only protected prairie ecosystem in Pennsylvania, and it is home to the endangered Massasauga rattlesnake. Data–from temperature, to humidity, to counts of animals–is crucial to keeping the prairie healthy. “We need to be able to use data for public land management. If it disappears we won’t be able to do our jobs effectively.”
As a describer, she has just described her first set of data, a series of biennial reports from various countries to the United Nations Framework Convention on Climate Change, which outlines progress made in reducing greenhouse gas emissions, plans for the future, and plans for helping other countries make progress.
Current mood: anticipation.
Evan Sherwin is a PhD student at Carnegie Mellon University studying Engineering and Public Policy. His work is in energy consumption and its connections to other factors like the weather and income – basically, what factors affect a household’s energy consumption? In particular, he is interested in low income household energy use, in his words, “to make sure that energy is affordable for low income households.”
He uses NOAA weather data and census data to do his work, and it would be “extremely difficult” for him to conduct his research without these datasets. He is thrilled and hopeful that an entire community of people exist who are taking data preservation seriously and is thankful for all the data rescuers for the work that they do.
Current mood: relieved.
Emily Shawgo is a graduate student at CMU in Public Policy and Management with a focus in Cybersecurity. She works with data and has just started learning to program and came to this event as a way to hone her programming skills and do something to help with the disappearing data problem. She is working on the Bagging team, currently processing GIS data from the Department of Housing and Urban Development, and is waiting for a 3.3 GB zip file to download. Current mood: impatient.
Anna Filippova and Priya Donti are collaborating on a weather dataset collected from stations across the United States to make sure it’s complete before archiving it.
They chose this dataset because it is well-researched and valuable to lots of agencies and organizations.
Priya is a graduate student at Carnegie Mellon University in Computer Science and Engineering and Public Policy, and is here because she works on energy research and these data are important to her work and to the public good. Current mood: focused.
Anna is a post-doctoral researcher at Carnegie Mellon University at the Institute for Software Research and studies community building at events like Data Rescue; the loss of these data sets impact the people in her community. Current mood: excited.
Daniel Gingerich is a PhD student in Engineering and Public Policy at Carnegie Mellon University. He studies the “water-energy nexus,” or what the trade-offs are in energy and air quality when treating water. He couldn’t do his research without datasets from the EPA, because private companies would probably not make this information available. Current mood: enthusiastic, because “there are people here who don’t work with these datasets at all, and yet they care enough about data to help preserve it so that work like mine can continue.”