The proliferation of machine learning and its deployment in safety-critical systems has sparked recent interest around the need to embed morality into machines. Ingesting huge volumes of data, algorithms are trained to output a corresponding prediction to a given input, demonstrating remarkable predictive capabilities. However, the decision-making pipeline of a machine learning model generates hypotheses from training datasets with no semantic consideration, and no distinct measure to regulate the interpretability of proprietary models.
Issues arise when datasets with skewed distributions are used to train models, as imbalanced datasets that disproportionality represent minority groups make it difficult to identify correlations and patterns independent of the attribute defining that subgroup. For example, underrepresentation of women in a dataset of employees will teach a model to see patterns between successful candidates and male candidates. From this, the model will infer that future candidates are likely to be fit for a role if they, too, are male.
Bias in historical data may be the cause of disparity in the data. Algorithms inherit this prejudice from the training data, further absorbing and amplifying pre-existing patterns of inequality and in turn perpetuating and creating discriminatory practices concerning demographic groups. All models are expected to contain bias to an extent else they wouldn’t have preferences on which to base decisions. However, bias resulting in decisions influenced by discriminatory judgements must be mitigated.
When these judgements stem from a particular characteristic it is known as a sensitive or protected attribute, and as a result the subpopulation defined by the attribute are subject to discrimination. However, simply removing the sensitive attribute from the training dataset is insufficient as it may be inferred by other features. Furthermore, iteratively removing attributes to avoid this compromises the accuracy of the model and in turn produces meaningless predictions.
Attempting to explain the route to a decision isn’t a constructive approach; sufficiently advanced technology is characterised by its ability to produce results beyond human comprehension. As the opaque nature of these black-box models makes it difficult to locate the source of bias, the data used to train models should instead be sufficiently audited to ensure decisions do not rely on sensitive information.
To eliminate the bias from the process, synthetic data can be added to redistribute the original dataset. In theory, if the correlation between the sensitive attribute and the positive outcome is smaller than the correlation between other features and the outcome, the relationship between the sensitive attribute and the positive outcome will become decoupled and patterns between the two will no longer be visible.
To evaluate the bias reduction, the error rate equality difference  can be inspected, which calculates the variance of false positive and false negative rates across the overall dataset and the subset containing sensitive identity terms. A large variance is indicative of biased results.
However, quantifying the success of a fair model in production isn’t easy. In particular if the automated service is exposed as an API (application programming interface) for external use then the training dataset won’t be visible, and the algorithmic accountability becomes ambiguous. Additionally, the process requires human input for data selection/ preparation and model implementation, so even with good intention algorithms may still exhibit bias tendencies. Fundamentally, objectively proving the morality of a model is difficult.
Machine learning is a valuable tool and has capabilities that far surpass those of any human. In order to maintain the sustainable growth of machine learning in applications, we must develop regulatory practices and ethical inspections at a sufficient rate to ensure the technology is adopted safely and sensibly.
 John Li, Lucas Dixon, Nithum Thain, Lucy Vasserman and Jeffrey Sorensen. Measuring and Mitigating Unintended Bias in Text Classification