November 25, 2019

Did No One Audit the Apple Card Algorithm?


(The RAND Blog)

November 21, 2019

Jennifer Bailey, VP of Apple Pay at Apple, speaks about the Apple Card during an Apple special event in Cupertino, California, March 25, 2019

Photo by Stephen Lam/Reuters

by Osonde A. Osoba

In the world of social media, tech executive David Heinemeier Hansson's thread of outrage about Apple Card has been categorized as viral Twitterstorm.

Data scientists would call it a rather tidy example of an algorithm audit.

Here's what happened: Jamie Heinemeier Hansson, Hansson's wife, asked to increase the line of credit on her Apple Card, a credit card Apple created in partnership with Goldman Sachs. The increase was denied. At the same time, her husband—with whom she shares all assets as a married couple in a community property state—had a credit line 20 times higher. Apple reps' reply: “It's the algorithm.”

So in this mini-audit, does the algorithm produce the same results (credit limits) for the same relevant inputs (reported personal assets)? Not so much.

David Hansson's verdict that the Apple Card algorithm is sexist lit up Twitter earlier this month. But the existence of biased/sexist/racist algorithms is not a new discovery; dozens of scholars have written about the hazards of letting AI mine data patterns for everything from job applicant screening to data-driven policing. Still, Goldman Sachs weakly argued that the company had no discriminatory intent in its credit limit determination process because it does not have information about applicants' gender or marital status. This is an example of arguing for “fairness through unawareness.” But research shows that excluding sensitive attributes (gender, marital status, race, etc.) does not automatically render the algorithm unbiased.

Excluding sensitive attributes like gender or race does not automatically render the algorithm unbiased.

Share on Twitter

Furthermore, deliberate bias—or even awareness of bias—is usually irrelevant to federal regulations on non-discrimination. When the law tests for discriminatory practices, it often applies the “disparate impact” standard—which means comparing outcomes, regardless of intent. Laws are written this way because, historically, it's been easy to hide discriminatory intent behind practices that seem at face value to be neutral. And as a matter of justice/fairness, intent is rather beside the point. Good intentions that lead to bad outcomes are still harmful to the people adversely impacted.

The more troubling takeaway from this event was this: David and Jamie Hansson could not get anyone to give them a clear reason for the credit decision outcome. They heard variations of “credit limits are determined by an algorithm,” “we do not know your gender or marital status during the Apple Card process,” “it's just the algorithm,” etc.

This kind of deep failure of accountability could become increasingly common as opaque algorithms are used for more kinds of decisionmaking (what we have called “automation bias”). The Hanssons' case highlighted a toxic combination: The companies relied on a “black box” algorithm with no capability to produce an explanation, and then abdicated all responsibility for the decision outcomes.

Complex, opaque technologies (like AI, machine learning, and other algorithm systems) provide significant benefits to society. They help speed up complex decisions, enable wider access to services, and in many cases make better decisions than humans. But those benefits do not obviate the need for accountability and transparency. With these complex algorithms, we may not always be able to pinpoint the factor that led to a bad outcome—but there are technical safeguards and procedures that institutions can use to audit their outputs for the most egregious (sexist, racist, biased, etc.) behaviors.

Complex, opaque technologies provide significant benefits, but those don't obviate the need for accountability and transparency.

Share on Twitter

In this case, Goldman Sachs should not have assumed that ignoring gender or marital status automatically made its credit algorithm fair. “Disaggregated evaluation,” which tests an algorithm on sub-demographics and accounts for any differences in outcomes, also likely would have caught this bias.

One other thing would have helped, too: Providing the customer with some insight into the decision. Even if the precise cause isn't known, presenting an explanation—e.g. what the most important factors are in this specific decision—makes the process feel less Kafkaesque. It should be the default procedure. Likewise, organizations need protocols to redress errors in real time. Realistically, the other option is dealing with the PR nightmare of outraged posts on Twitter.

Finally, government regulations can keep companies that hold data or deploy algorithms accountable to their users. The European Union, for example, has pioneered regulations giving consumers ownership of their data, a right to privacy, a right to rectify incorrect data, and a right to explanation for automated decisions based on user data. In the United States, the tech regulatory landscape is much more fragmented, making such sweeping regulations less feasible. But the need is there regardless.

As a final point, we should spare a thought for the unfortunate agencies charged with regulating organizations as they deploy more complex and unvalidated technologies. Are they sufficiently equipped and informed to regulate the decisionmaking processes of large sophisticated corporations and their black-box algorithms? This is not just about credit cards, after all, but also about unsafe airplanes, dangerous power grids, and a legion of increasingly complex technologies that affect daily life.

Osonde Osoba is an information scientist at the nonprofit, nonpartisan RAND Corporation and the co-director of the RAND Center for Scalable Computing and Analysis.

No comments: