Event Paths

There are lots of ways to participate at Data Rescue Pittsburgh, based on your interests and expertise!

Seeding and Sorting Path

Activities: Seeders canvass the resources of a given government agency, identifying important URLs. They identify whether those URLs can be crawled by the Internet Archive’s web crawler. Using the EDGI Nomination Chrome extension, Seeders nominate crawlable URLs to the Internet Archive or add them to the Archivers app if they require manual archiving.

Recommended Skills: Consider this path if you’re comfortable browsing the web and have great attention to detail. An understanding of how web pages are structured will help you with this task.

Researching Path

Activities: Researchers review “uncrawlables” identified during Seeding, confirm the URL/dataset is indeed uncrawlable, and investigate how the dataset could be best harvested. Researchers need to have a good understanding of harvesting goals and have some familiarity with datasets.

Recommended Skills: Consider this path if you have strong front-end web experience and enjoy research. An understanding of how federal data is organized (e.g. where “master” datasets are) would be valuable.

Harvesting Path

Activities: Harvesters take the “uncrawlable” data and try to figure out how to actually capture it based on the recommendations of the Researchers. This is a complex task which can require substantial technical expertise, and which requires different techniques for different tasks.

Recommended Skills
Consider this path if you’re a skilled technologist with a programming language of your choice (e.g., Python, JavaScript, C, etc.), are comfortable with the command line (bash, shell, powershell), or experience working with structured data. Experience in front-end web development a plus.

Bagging Path

Activities: Baggers do some quality assurance on the dataset to make sure the content is correct and corresponds to what was described in the spreadsheet. Then they package the data into a bagit file (or “bag”), which includes basic technical metadata, and upload it to the final DataRefuge destination.

Recommended Skills: Consider this path if you have data or web archiving experience, or have strong tech skills and an attention to detail.

Describing Path

Activities: Describers create a descriptive record in the DataRefuge CKAN repository for each bag. Then they link the record to the bag and make the record public.

Recommended Skills: Consider this path if you have experience working with scientific data (particularly climate or environmental data) or with metadata practices.

Storytelling Path

ActivitiesYou will record stories about the importance of climate and environmental data on our everyday lives and share this work on social media as well as document the event.

Recommended SkillsConsider this path if you’re on social media (Facebook, Instagram, Twitter, whatever), if you can use Storify, if you have good listening and writing skills, and/or if you can make creative and engaging materials.

Non-Technical Roles

This event requires individuals who are interested in participating in the meta-narrative of Data Rescue events. Below are some sub-paths.


Activities: Surveyors identify key programs, datasets, and documents on Federal Agency websites that are vulnerable to change and loss. Using templates and how-to guides, they create Main Agency Primers in order to introduce a particular agency, and Sub-Agency Primers in order to guide web archiving efforts by laying out a list of URLs that cover the breadth of an office.

Recommended Skills: Consider this path if you’re familiar with federal data, interested in particular offices or data sets that don’t already have a primer, or want to help create materials for use at other archiving events!


Activities: Help groups at this event document their workflows, or improve the DataRefuge documentation to make it clearer and easier to use.

Recommended Skills: Consider this path if you would be comfortable working with a group to capture their process, or if you have an eye for instructional design.