Eliminating bias from algorithms

Eliminating bias from the data used to train algorithms is a key challenge for the future of machine learning

Wael Diab, who is leading international efforts to standardize artificial intelligence (AI), has identified the mitigation of data bias as a priority challenge for eventual future standards work. Diab recently told the IEC General Meeting, in Busan, South Korea, that a broad standardization approach is necessary. 

IEC White Paper: Artificial intelligence across industries

IEC and ISO set up the joint committee that Diab chairs a little over six months ago. It has already formed a working group which is looking into a wide range of issues related to trustworthiness and related areas such as robustness, resiliency, reliability, accuracy, safety, security, and privacy within the context of AI.

Leading industry experts believe that ensuring trustworthiness from the outset is one of the essential aspects that will lead to the wide-spread adoption of AI. Connected products and services, whether in a vehicle, smartphone, medical device or building security system, must be safe and secure or no one will want to use them. The same goes for critical infrastructure like power plants or manufacturing sites.

“One of the unique things about what IEC and ISO are doing is that we are looking at the entire ecosystem and not just one technical aspect,” explains Diab. Combined with the breadth of application areas covered in IEC and ISO technical committees (TCs), this will provide a comprehensive approach to AI standardization with IT and domain experts.

“The resulting standardization efforts will not only be fundamental to practitioners but essential to all stakeholders interested in the deployment of AI”, Diab concludes.

White paper

At the meeting in Busan, IEC officially launched a new White Paper on artificial intelligence. The aim of the authors is to help bring clarity as to the current status of AI and the outlook for its development in the next five to ten years. The paper describes the main systems, techniques and algorithms that are in use today and indicates what kinds of problems they typically help to solve. It provides a detailed overview of four areas that are likely to develop significantly by deploying AI technologies: homes, manufacturing, transport and energy.

On the issue of data bias, the White Paper notes that even removing attributes prone to biases from training data (such as race, gender, sexual orientation or religion) may not be enough as other variables may serve as proxies for bias in the model. The authors call for further interdisciplinary work to develop more refined approaches to controlling bias.

Mitigating bias

As E.B White reminds us, bias is difficult to avoid. He is perhaps best known nowadays as the author of children’s books, including ‘Stuart Little’ and ‘Charlotte’s Web’, but he also was a regular contributor to the ‘The New Yorker’ magazine and the co-author of one of the best known and most influential writing style guides. In the context of bias, White claimed there was no such thing as objectivity: “I have yet to see a piece of writing, political or non-political, that does not have a slant,” he said. “All writing slants the way a writer leans, and no man is born perpendicular.”

Bias is a fact of life in machine learning. In data science, it usually refers to a deviation from expectation, or an error in the data, but there is more to bias than that. We are all conditioned by our environments and experiences — “no man is born perpendicular” — and carry with us different kinds of social, political or values-based baggage. Sometimes our horizons are not as broad as we would like to think and as a result, the vast volumes of data used to train algorithms are not always sufficiently diverse or variegated. More often than not there is actual human bias in data or algorithms.

The good news is that bias in machine learning can be detected and mitigated quite easily. The bad news is that it can be difficult to get to the bottom of how algorithms are making decisions in order to solve the problems, as more often than not algorithms operate within a "black box".

There are four common types of bias related to machine learning.

Stereotype bias

Algorithms are only as good as their developers. As ‘The New Scientist’ reports, machine learning is prone to amplify sexist and racist bias from the real world. We see this, for example, in image recognition software that fails to identify non-white faces correctly. Similarly, biased data samples can teach machines that women shop and cook, while men work in offices and factories. This kind of problem usually occurs when the scientists who train the data unwittingly introduce their own prejudices into their work.

Sampling bias

Biases can also occur when a sample is collected in such a way that some members of the intended statistical population are less likely to be included than others. In other words, the data used to train a model does not accurately reflect the environment in which it will operate.

A sampling bias could be introduced, for instance, if an algorithm used for medical diagnosis is trained only on data from one population. Similarly, if an algorithm meant to operate self-driving vehicles all year round is trained only on data collected during the summer months, falling snowflakes might confuse the system.  

Systematic value distortion

Systematic value distortion occurs when the true value of a measurement is systematically overstated or understated. This kind of error usually occurs when there is a problem with the device or process used to make the measurements.

On a relatively simple level, measurement errors might occur if training data is captured on a camera that filters out some colours. Often the problem is more complex.

In healthcare, for instance, it is difficult to implement a uniform process for measuring patient data from electronic records. Even superficially similar records may be difficult to compare. This is because a diagnosis usually requires interpreting test results and making several judgements at different stages in the progression of a disease, with the timing of the initial decision depending on when a patient first felt unwell enough to see a doctor. An algorithm must be able to take all of the variables into account in order to make an accurate prognosis.

Algorithmic bias

Algorithmic bias is what happens when a machine learning system reflects the values of the people who developed or trained it. For example, confirmation bias may be built into an algorithm if the aim, whether intentional or unintentional, is to prove an assumption or opinion. This might happen in a business, journalistic, or political environment, for example.

There have been several high profile cases of algorithmic bias related to social media and search engines and even in the field corporate recruitment


In addition to the joint committee with ISO on AI, IEC is a founder member of the Open Community for Ethics in Autonomous and Intelligent Systems (OCEANIS). It brings together standardization organizations from around the world with the aim of enhancing awareness about the role of Standards in facilitating innovation and addressing issues related to ethics and values.

It is vital that machines continue to follow human logic and values, while avoiding human bias, as they replace people in some decision-making processes. International standards offer an answer to many of the concerns. Creating consensus-based standards means opening the 'black box' to provide the transparency needed to ensure the quality of the data used. The standardization process will also require understanding and taking steps to mitigate the impact of potential biases resulting from algorithms. Above all, standardization will increase knowledge about the way algorithms are built and operate, making it easier for the victims of bias to challenge data-supported decisions.