Vaishaal Shankar, a scientist in Apple's machine learning research team, said earlier at "X" that they had released two small-scale open-source artificial intelligence models, both of which belong to"DCLM" (DataComp for Language Models) development project, corresponding to 69 billion parameters and 14 billion parameters respectively, emphasizing that it can directly compete with Mistral AI's 70 billion parameter-scale artificial intelligence model, as well as Meta's Llama 3, Google's Gemma, and Alibaba Cloud's open source model Qwen2.
Vaishaal Shankar also touted DCLM as a truly open-source model. The 69 billion-parameter version, based on the OpenLM framework, was trained with 2.5 trillion tokens, with context lengths corresponding to 2 tokens each. It achieved 63.7% on the Massive Multitask Language Understanding (MMLU) test, surpassing Mistral-7B-v0.3's 62.7% and approaching Meta Llama3 8B's 66.2%, Google Gemma's 64.3%, and Microsoft Phi-3's 69.9%. Furthermore, it completed the relevant tests with significantly less computing power.
For the 14 billion word version, Apple worked with the Toyota research team to train it with 2.6 trillion word units. It achieved a score of 41.9% in large-scale, multi-task language comprehension tests, surpassing Microsoft Phi-1.5B's 35.90%.
We have released our DCLM models on huggingface! To our knowledge these are by far the best performing truly open-source models (open data, open weight models, open training code) 1/5
— Vaishaal Shankar (@Vaishaal) July 18, 2024
In addition, Apple has also expanded the context length to 69 word units based on an artificial intelligence model with 8 billion parameters, maintaining essentially unchanged performance in large-scale, multi-task language understanding. This means that the design of the training dataset will become more important than the design of the language model framework.
Currently, the "DCLM" project is collaborating with industry researchers in an open source format. Current partners include the University of Washington, Tel Aviv University, and Toyota Research Center. However, the research projects in the "DCLM" project will not be used in Apple's commercially available products to avoid unnecessary controversy. Currently, they are mainly used for research.



