Data for Social Justice Symposium Explores Usage of Data Science to Expose Injustices

Data science, the field of using scientific techniques to extract insights from unstructured data, can be a powerful tool in shedding light on the oppression minorities face. Held on Friday, March 26 and Saturday, March 27, the Data for Social Justice Symposium focused on the role that data takes in our lives, and how that data can be collected and analyzed in both empowering and misleading ways.

The first day of the symposium included a keynote presentation by Catherine D’Ignazio, the Director of the Data+Feminism Lab at the Massachusetts Institute of Technology, and Lauren Klein, associate professor at Emory University, co-authors of “Data Feminism.” Through graphs, tables, and examples, they explained principles of data feminism: examining and challenging power, elevating emotion and embodiment, and embracing pluralism.

D’Ignazio and Klein began their presentation with a recent example of silencing those who speak out against oppression. Timnit Gebru, former co-leader of ethical artificial intelligence at Google, was fired after failing to resolve complaints Google had with her paper exposing the risks of large scale racism and sexism in large language models. In response to the incident, over 2,500 Google employees signed a letter condemning Google’s actions, and nine members of the U.S. Congress wrote a letter demanding accountability from Google.

“This treatment is really shameful, and then at the same time, it is almost to be expected… In fact, it’s so common, the way that Google treated her with the firing and then the subsequent treatment of her describes it as a centuries-old playbook that is used against Black women who speak truth to power,” said D’Ignazio.

Alongside discriminatory data, missing datasets are also symptoms of power imbalance and are a major problem for acquiring concrete evidence of discrimination. According to Klein, many datasets, such as statistics on transgender hate crimes, underpaid undocumented immigrants, and femicide, are either not collected or not made available by the government and large corporations, which can heavily deter future research done in these areas.

In response to this dearth of information, some individuals, such as María Salguero, felt obligated to collect data and fill in the missing datasets themselves. In particular, Salguero sought data on femicides in Mexico.

“María Salguero was frustrated by this lack of action. In 2015, she started compiling femicides by reading news reports about women’s deaths, logging them in a Google spreadsheet, and placing them on a Google map… And now, because of the persistence of this work, she has actually amassed the largest public archive of femicide in all of Mexico,” said D’Ignazio.

In discussion of elevating emotion and embodiment in data visualizations, Klein challenged our notions of valuing reason over emotion, and minimalistic designs over maximalistic ones. She presented an animated data visualization of U.S. gun deaths by Periscopic, a socially conscious data visualization firm, where each person killed by a gun is represented by an arc on the screen: it starts out orange and turns grey when they’re killed, continuing on to their projected lifespan. Klein emphasized that even though this visualization draws out emotion, it is no less accurate and valuable.

“Methodologically, [the data visualization of U.S. gun deaths by Periscopic] is no less statistically sound than any other studies. The data about the people derived from a national crime dataset released by the federal government, their projected lifespans are determined by a model developed by the World Health Organization that has 50 different factors. But it was viewed with really intense suspicion from the scientific visualization community because it made us feel things, and a feminist approach would say, ‘it’s not a problem at all that it made us feel things and actually it’s a more compelling visualization precisely because it blends reason with emotion,” said Klein.

Frank Zhou ’22 particularly enjoyed learning about this gun violence data, particularly how it portrayed data visualization beyond simple bar graphs and pie charts. He also appreciated how the keynote speakers showcased the application of feminist ideas to data science.

“One thing that the symposium did really well was to draw the theoretical connection between data and feminism, and to show how, in practice, feminist ideas can manifest in data analytics to data collection practices and data visualization practices as well. Everything from a more inclusive data collection mindset to placing more marginalized communities front and center departs from feminist concepts and theories,” said Zhou.

To conclude their talk, D’Ignazio explained how data feminism requires an expanded, unconventional definition of data science and visualization, which includes works such as sculptures or murals. She also added how data science must be defined and thought of in a more inclusive way.

“Our data science is not defined by the size of the data set, it’s not defined by the credentials of the people undertaking the work. These are concerns continually used to exclude women and people of color from the field, as well as to exclude work whose contribution is socio-technical rather than purely technical,” said D’Ignazio.

The second day of the Data for Social Justice Symposium featured a panel comprising of alumni and other speakers: Brittany Kaiser ’05, Alba Disla ’15, Miles McCain ’19, Corrina Wainwright, and Hijoo Son, Instructor in History, who focused on how data is used and who controls and has the rights to it.

According to Kaiser, data surpassed oil as the world’s most valuable asset in 2017. Companies worth hundreds of billions of dollars collect and operate using this data every day. However, the producers, from which data is collected, have little rights to the ownership of the data produced, which could lead to unwanted uses of the data, such as for discriminatory or political purposes.

“There’s not a single organization that doesn’t touch data for internal or external strategies. Data drives all decision making, it drives nearly every single industry and everything we’re producing every single day. Yet somehow, we still don’t have those definitions that protect the rights of the producers and that define how that asset class is bought, sold, and traded around the world. Largely, whoever has collected the data owns it, as opposed to the producer,” said Kaiser.

McCain explained how this exploitation of data from underprivileged individuals presents a challenge in collecting data for social justice. He emphasized that low-cost products are generally the most exploitative in terms of data practices.

“There was a huge outcry somewhat recently about Facebook, trying to create free internet worldwide… This was, in my opinion, while perhaps a great opportunity to get people connected, it was also quite exploitative because it would force the users of this very low cost service or free service, to route everything through Facebook. And I think that this is a very clear example of how capital and privilege yields data control of one’s own data,” said McCain.

However, in recent years, through endeavors of folks including Kaiser, many acts and laws were passed to protect the rights of data producers.

“I am starting to see these shifts and changes, which is particularly exciting, but when we really talk about data and social justice, what does this mean?… If every single person around the world actually owned their data, actually had rights to the value that they were producing, all of us would at least be able to feed ourselves every single day, take care of ourselves and our family with the value of the information that we produce,” said Kaiser.