Related reports indicateTechnology companies including Apple, NVIDIA, and Anthropic all use a dataset created by the non-profit artificial intelligence research group EleutherAI to train their artificial intelligence models. However, this dataset contains text data from 4 YouTube channels and more than 8000 videos, including videos shot by famous internet celebrities MrBeast and MKBHD (Marques Brownlee), as well as content from the New York Times, BBC, ABC News, etc.
In a previous statement, Google stated that any practice of directly training artificial intelligence models with YouTube video content would violate YouTube's terms of service. Apple, NVIDIA, Anthropic and other industry players have not responded to this.
This situation may become a major controversy in the future development of artificial intelligence technology. Although many companies try to avoid using other people's data to train artificial intelligence models in the event of infringement, they may to some extent circumvent the restrictions of the terms of use by using gray areas.
In addition to the recent news that tech companies are using data sets constructed by third-party organizations for training, many tech companies are still avoiding discussing the data sources used behind their artificial intelligence models, or have not transparently explained them to the public. For example, many creators have opinions on Apple's upcoming "Apple Intelligence" service because Apple has not clearly explained how the artificial intelligence models behind its services are trained.


