Resources for Data Science & Social Justice
This repository contains a collection of resources that I’ve gathered to facilitate a workshops on teaching statistics and data science with a social justice framing. It includes slides, Jupyter notebooks, publicly available datasets, and some recommended reading. My views on this topic are always evolving so please let me know if you find these materials to be either useful, or lacking; I’ll be grateful for suggestions.
- Coogle Colab notebooks:
The following books and articles have been really helfpul in framing the way that I think about talk about social justice and quantitative methods like data science, statistics, and algorithmic design.
- Data Feminism, Catherine D’Ignazio & Laruen F. Klein, The MIT Press, 2020.
- Invisible Women: Data Bias in a Wold Designed for Men, Caroline Criado Perez, Vintage Books, 2019.
- Algorithms of Oppression: How Seach Engines Reinforce Racism, Safiya Umoja Noble, NYU Press, 2018.
- Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Cathy O’Neil, Penguin Random House, 2017.
Articles & Chapters
- Gender Shades: Intersectional Accuracy Disparities in
Commercial Gender Classification, Joy Buoamwini, Timnit Gebru, Proceedings of Machine Learning Research, 81:1-15, 2018.
- Machine Bias, Julia Angwin, Jeff Larson, Surya Mattu & Lauren Kirchner, ProPublica, May 23, 2016.
- Critical Questions for Big Data, Danah Boyd & Kate Crawford, Information Communication & Society, Vol. 15, No. 5, 2012, 661-679
- Drawing Theories Apart: The Dispersion of Feynman Diagrams in Postwar Physics, Chapter 1: Introduction: Pedgagogy and the Institutions of Theory, David Kaiser, University of Chicago Press, 2005.
Here are some resources for datasets and data related materials that are can be useful in the classroom.
- Social Sciences Data: The International Consortium for Political and Social Research (ICPSR) hosts a webportal and searchable database containing both public and restricted data specifically targeted to the social science research community.
- Census and Survey Data: Formerly the “Integrated Public Use Microdata Series,” the IPUMS provides access to international census and survey data, with a special emphasis on social and economic data.
- Bioclimate Data: The WorldClim database provides climate data for 1970-2000 available at high spatial resolution for environmental conditions and time resolutions suitable for studying climate change.
- Housing Data: Inside Airbnb in a non-commercial and independently developed tools and data for analyzing the publicly available information on Airbnb including listing and host metrics as well as GIS data
- Mortgage Data: The Freddie Mac Database provides data on mortgages purchased by Freddie Mac from 1999 through 2020 including original loan conditions, property details and monthly performance. This requires signing up for login credentials.
- Twitter Data: Workshop on retrieving and analyzing Twitter Data with Python, it includes tutorials for several different tools including Tweepy, Twarc and Twurl. You will need Twitter developer credentials to run the tutorial workbook.