Reuters News reported that several American writers were in a federal court in New YorkFiling a class action lawsuit, alleging that Meta, Microsoft, Bloomberg News and other industry players used their works to train artificial intelligence models without permission.
The lawsuit stated that Meta and Microsoft used the controversial Book3 database to train artificial intelligence models. Bloomberg News was also involved in using this controversial database to train artificial intelligence. The artificial intelligence research organization Eleuther AI was involved in providing databases including Book3 for training artificial intelligence models to technology companies, and was also accused.
The Book3 database obtains data from "shadow libraries," including academic documents and novels obtained in violation of copyright. This method of obtaining content is often controversial. Currently, major shadow libraries include Genesis Library, Z-Library, and Sci-Hub, and most of them are decentralized and anonymous.
Because Llama 2, a large-scale natural language model jointly developed by Meta and Microsoft, was trained using databases including Book3, it was accused by several American authors. In addition to demanding an injunction against the misuse of their creative works, they also demanded corresponding compensation from Meta and Microsoft.
Microsoft declined to comment, while Meta did not make any statement. Bloomberg News emphasized that it did not use the Books3 database to train its commercial large-scale natural language model, BloombergGPT, and that the training data used came from its own database.
In addition to Meta and Microsoft, technology companies including OpenAI and Google have also been accused of using data for artificial intelligence model training. Therefore, in subsequent public statements, they emphasized that they would use copyright-free and publicly available data online for training to avoid further disputes in the future.



