Certifying and Removing Disparate Impact

被引:929
作者
Feldman, Michael [1 ]
Friedler, Sorelle A. [1 ]
Moeller, John [2 ]
Scheidegger, Carlos [3 ]
Venkatasubramanian, Suresh [2 ]
机构
[1] Haverford Coll, Haverford, PA 19041 USA
[2] Univ Utah, Salt Lake City, UT 84112 USA
[3] Univ Arizona, Tucson, AZ USA
来源
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2015年
关键词
Disparate impact; fairness; machine learning;
D O I
10.1145/2783258.2783311
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender) and an explicit description of the process. When computers are involved, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the process, we propose making inferences based on the data it uses. We present four contributions. First, we link disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on how well the protected class can be predicted from the other attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.
引用
收藏
页码:259 / 268
页数:10
相关论文
共 22 条
[1]  
[Anonymous], 2014, CIV RIGHTS PRINC ER
[2]  
[Anonymous], 2011, P 17 ACM SIGKDD INT
[3]  
[Anonymous], P INN THEOR COMP SCI
[4]  
[Anonymous], P IEEE INT C COMP CO
[5]  
[Anonymous], 2014, BIG DATA SEIZING OPP
[6]  
[Anonymous], 2012, P 27 ANN ACM S APPL
[7]  
Barocas S., 2014, TECHNICAL REPORT
[8]   Building Classifiers with Independency Constraints [J].
Calders, Toon ;
Kamiran, Faisal ;
Pechenizkiy, Mykola .
2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, :13-18
[9]  
Fan RE, 2008, J MACH LEARN RES, V9, P1871
[10]  
Feldman M., 2014, CoRR