Sony AI recently announced a benchmark called "FHIBE" (Fair Human-Centric Image Benchmark), which touts fairness and impartiality.New AI Test Data Set.
Sony describes it as "the first publicly available, globally diverse, and consent-based collection of human images" specifically designed to assess whether bias exists in computer vision recognition processes.
In short, this dataset can be used to test whether current AI models treat different groups fairly. Sony's initial conclusion is that no single company's dataset perfectly matches its benchmark.
Emphasizing "consent-based" design, comparing web crawler data
Sony emphasizes that "FHIBE" aims to address the long-standing ethical and bias challenges facing the AI industry. The dataset contains images from nearly 2000 volunteers in more than 80 countries.
The most critical feature of this dataset is that all images are shared with consent, which is completely different from the current industry practice of using web scraping to scrape large amounts of public data. Participants in FHIBE also have the right to request the removal of their images at any time.
In addition, these photos contain a wealth of annotations, detailing demographic characteristics, physical features, environmental factors, and even camera settings.
Test results: confirmed existing AI biases and discovered new influencing factors.
The test results of this tool confirm that previously recorded biases do indeed exist in current AI models. However, Sony states that "FHIBE" can go further to identify the potential factors that lead to biases.
For example, research has found that some models are less accurate when dealing with people who use "she/her/hers" (feminine pronouns). Furthermore, FHIBE shows that "greater hairstyle variability" is a key factor that has been overlooked in previous analyses of such biases, leading to misjudgments by AI.
Occupational Stereotypes and Toxic Responses
FHIBE also found that when asked "neutral questions" about participants' occupations, current AI models reinforce stereotypes. The tested models were particularly biased against "specific pronouns and ethnic groups," for example, they were more likely to describe them as sex workers, drug dealers, or thieves.
More seriously, when prompted about a crime committed, the model sometimes produces a higher proportion of toxic responses to individuals of African or Asian descent, those with darker skin tones, and those who use "he/him/his" (male pronouns), meaning it forms stereotypical prejudices against such individuals.
Demonstrating the feasibility of ethical data collection
Sony AI states that FHIBE proves that "ethical, diverse, and fair" data collection is achievable. This tool is currently publicly available and will continue to be updated, and related research papers have been published."nature" Published in the journal (Nature).



