Stanford University Human-Centered Artificial Intelligence (Stanford HAI) earlier announced 10 widely adoptedTransparency Metrics for Large Natural Language ModelsAmong them, the highest score was obtained by Meta's Llama 2, and the lowest ranking was Amazon's Titan Text. As for OpenAI's GPT-4, it ranked fourth, and Google's PaLM 2 ranked fifth, after Stability.ai.
However, the report also pointed out that even though Llama 2 ranked first in transparency among the 10 large natural language models, its actual transparency was only 54%, while Google's PaLM 2 had only 40% transparency, and the commercial Amazon Titan Text had a transparency of only 12%.
The transparency calculation method for this indicator report includes whether the industry publicly discloses the model's operating mode, scale, and architecture, as well as whether relevant monitoring mechanisms and remediation methods are provided. The level of transparency also reflects the degree of user trust in large-scale natural language models. The Center for Foundational Modeling at the Stanford University Institute for Humanistic Artificial Intelligence, which produces the indicator report, believes that currently used large-scale natural language models are not fully trustworthy and does not recommend that businesses or government agencies use such models to build services.
The Stanford University Institute for Humanistic Artificial Intelligence has developed a total of 100 indicators for evaluating the transparency of large-scale natural language models. About one-third of these indicators are used to evaluate how the model is built, the data used for training, and the manpower spent on building the model. Another one-third includes the model's actual operational performance, credibility, risk level, and improvement methods. The remaining one-third includes the policies adopted by the industry providing the model and whether the industry provides assistance for affected situations.





