DATA CONSCIENCE ALGORITHMIC S1EGE ON OUR HUM4N1TYEXPLORE HOW D4TA STRUCTURES C4N HELP OR H1NDER SOC1AL EQU1TY
Data has enjoyed ‘bystander’ status as we’ve attempted to digitize responsibility and morality in tech. In fact, data’s importance should earn it a spot at the center of our thinking and strategy around building a better, more ethical world. It’s use—and misuse—lies at the heart of many of the racist, gendered, classist, and otherwise oppressive practices of modern tech.
In Data Algorithmic Siege on our Humanity, computer science and data inclusivity thought leader Dr. Brandeis Hill Marshall delivers a call to action for rebel tech leaders, who acknowledge and are prepared to address the current limitations of software development. In the book, Dr. Brandeis Hill Marshall discusses how the philosophy of “move fast and break things” is, itself, broken, and requires change.
You’ll learn about the ways that discrimination rears its ugly head in the digital data space and how to address them with several known algorithms, including social network analysis, and linear regression
A can’t-miss resource for junior-level to senior-level software developers who have gotten their hands dirty with at least a handful of significant software development projects, Data Conscience also provides readers
Discussions of the importance of transparency Explorations of computational thinking in practice Strategies for encouraging accountability in tech Ways to avoid double-edged data visualization Schemes for governing data structures with law and algorithms
Marshall opens the first chapter of this work with the following: "In tech, the approach is to create a solution that works well for 80 percent of users. The remaining 20 percent have to conform, find workarounds, or suffer in silence."
The 80/20 rule or Pareto Principle is well known and is used in many ways. As someone who has worked in the software world for over three decades the application I'm most familiar with is to strive for a solution that solves 80 percent of a problem rather than waiting to provide a complete solution before release. (That other 20 percent may never be completed because the next problem may be "more important.)
She continues: "So tech folks don't want to discuss or address racism, sexism, ableism, and otherism."
My work has been in networking/network management. It's a very different world than the applications Marshall is concerned with. And that presented some issues for me with this work. Why? Because she states that all AI, all data software is oppressive. My experience says that that is not the case. Network management is moving toward AI solutions. Network devices are monitored, logs of vast sizes are ingested, processed, and interpreted. AI tools are being developed to detect issues and automatically correct, or offer suggested corrections. These tools have nothing to do with people, race, gender, etc. The one area I'm aware of where such solutions have been discriminatory has been wrt ableism. UI's when I started working never were concerned with how blind, poor vision, or colorblind people would interact with the tools. Nor were tools developed with those that can't easily operate a keyboard and mouse in mind. Progress has been made in the last few decades fortunately. The tools she's talking about are focused on people's data, used by people in daily life. But that doesn't mean all AI tools are such.
Setting the above aside, her first chapter is fascinating in a horrifying way. The history of oppression in the country is ugly. For example, this from Greenville, SC, in 1918:
"A number of complaints have come to members of Council of negro women who are not at work and who refuse employment when it is offered them, the result being that it is exceedingly difficult for families who need cooks and laundresses to get them. Wives of colored soldiers, getting a monthly allowance from the GOvernment, have, a number of them, declined to work on the ground that they can get along...without working, according to reports. Others have flatly refused jobs without giving any reason whatever, while still others pretend that they are employed when, as a matter of fact, they derive a living from illegitimate means. The proposed ordinance will require them all to carry a labor identification card showing that they are regularly and usefully employed, and the labor inspectors and police will be charged with the duty of rigidly enforcing the law."
That alone is worth the read. She concludes the first chapter with:
"Of course, the often-contradictory views that are represented in the Mankind Quarterly are those of the individual authors, not those of the journal's publishers of editors. So does it still support eugenics, aka scientific racism aka Darwinism? Check it out and draw your own conclusions."
Any writer that invites the reader to investigate original sources and think for itself is worth listening to IMO.
A big issue I had with the work is that while describing a resume screening program (algorithm) Amazon developed but learned in 2015 that it was not gender-neutral, Marshall does not ever explain how that happened. Marshall mocks her own resume screening program - walks through how one could be developed using open source, and even including in the Appendix the python code. She spends a good amount of time discussing the problems with open source libraries - not fully tested or functional, uncertain how they work, not well documented so can be interpreted to behave differently by different developers, etc. But never does she show how such issues lead to discrimination. How it can have unintended consequences is clear - how it could cause a tool to return matches for resumes that don't properly match skill keywords of the job description for example. Marshall does a great job listing her sources fortunately. I went to the reuters report cited and here's what is written:
"But by 2015, the company realized its new system was not rating candidates for software developer jobs and other technical posts in a gender-neutral way. That is because Amazon's computer models were trained to vet applicants by observing patterns in resumes submitted to the company over a 10-year period. Most came from men, a reflection of male dominance across the tech industry. "
Nowhere in Marshall's recap does she discuss that this was a machine learning tool that was trained on a data set that was clearly skewed. The problem had nothing to do with Open Source libraries or what she writes. How to fix the problem? More on that later.
Other examples of algorithmic oppression Marshall points out include Zoom Background Image generation that wasn't developed for non-white skin tones. The result?
"The machines sees the background and my face as the same color, even though the wall color is almond and the skin tone is caramel. Both colors fall on the brown color palette. ... The washing away of a whole person digitally occurs quite effortlessly." The dehumanization of minorities clearly occurs and is problematic. Marshall would argue it should have been caught at design phase and the way to do that is by having a diverse team. It may or may not have been caught so early. But it definitely would have been caught at unit test phase by a diverse development team. The overarching problem I see here is completely inadequate QA.
Overall this work had a handful of significant areas helpful to my understanding of oppression in history and in development. I won't strongly recommend it.
Recommended by Emily M. Bender (the lead author of "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?") during her appearance on the Tech Won't Save Us podcast https://techwontsave.us/episode/163_c...
Good concepts, and definitely a topic is not written about enough. Gave me a lot to consider as I work with data and learn more about code. Would have liked more examples on the impacts (there were a number of them but some things highlighted as issues did not have examples). Also felt like it could have been a bit shorter.
Overall, highly recommend to read if you're working with data.