Structural MovesDocumentation Index
Fetch the complete documentation index at: https://docs.svrnos.com/llms.txt
Use this file to discover all available pages before exploring further.
Definition
Post-training alignment (RLHF, fine-tuning for inclusivity) installed to mitigate bias instead creates extreme, unintended disparities or “rebound” effects. The model overshoots in the opposite direction of the original imbalance.Distinct from
- GER-323 — Alignment over-correction creates new bias → this code. Original bias persists due to no pre-deployment audit → GER-323.
Anchor incidents (11)
Gender Biases of Google Image Search
primary — AIID #18 · 2015-04-04
Gender Biases in Google Translate
primary — AIID #59 · 2017-04-13
Researchers find evidence of racial, gender, and socioeconomic bias in chest X-ray classifiers
primary — AIID #81 · 2020-10-21
Twitter’s Image Cropping Tool Allegedly Showed Gender and Racial Bias
primary — AIID #103 · 2020-09-18
Genderify’s AI to Predict a Person’s Gender Revealed by Free API Users to Exhibit Bias
primary — AIID #115 · 2020-07-28
DALL-E 2 Reported for Gender and Racially Biased Outputs
primary — AIID #179 · 2022-04-01
DALL-E Mini Reportedly Reinforced or Exacerbated Societal Biases in Its Outputs as Gender and Racial Stereotypes
primary — AIID #262 · 2022-06-11
Alleged Gender Discrimination in Facebook Job Ads Algorithm
primary — AIID #580 · 2023-06-12
Images of Black People Labeled as Gorillas
secondary — AIID #16 · 2015-06-03
Amazon's Experimental Hiring Tool Allegedly Displayed Gender Bias in Candidate Rankings
secondary — AIID #37 · 2016-08-10
WeChat’s Machine Translation Gave a Racist English Translation for the Chinese Term for “Black Foreigner”
secondary — AIID #216 · 2017-10-10
Tags
discrimination · generative-media
References
- Fulgu, R. A. & Capraro, V. (2024). Surprising gender biases in GPT. Computers in Human Behavior Reports, 16.