The Wikimedia Foundation recently partnered with Kaggle, Google's data science community platform, to launch an optimized version of Wikipedia for artificial intelligence model training.
This Wikipedia version differs from the generally open-source version in that it does not include bibliographic information, references, or Markdown-coded content. This reduces overall website traffic and avoids impacting Wikipedia's access efficiency. It also helps more artificial intelligence models be trained using open-source data content.
However, this Wikipedia version currently only provides English and French content, and more language options are expected to be added in the future.
The Wikimedia Foundation stated that since the current competition in artificial intelligence technology has increased, the traffic to Wikipedia servers has increased.A full 50% increaseWithout affecting the access to the original Wikipedia content, the Wikimedia Foundation hopes to provide a version that is more suitable for the training needs of artificial intelligence models, thereby reducing the impact of browsing traffic.
Brenda Flynn, head of Kaggle partnerships, expressed her honor to collaborate with the Wikimedia Foundation to make artificial intelligence model training more efficient through customized Wikipedia content.
In this collaboration, Kaggle will pay for data use through Wikimedia Enterprise, a for-profit platform under the Wikimedia Foundation. The Wikimedia Foundation also expressed the hope that in the future, more artificial intelligence model companies will continue to abide by Wikipedia's usage license terms, rather than blindly assuming that all content posted online should be free to use.
In previous disputes, many AI companies built AI models by crawling the web.Retrieving data stored on various websites, using this as the basis for the "thinking" of its artificial intelligence model, which not only affects the access traffic of the content website operator's own network server, but also affects the user's willingness to click on the web page to browse, and instead ask questions directly through the artificial intelligence service.
Therefore, in a recent interview, Reddit CEO Steve Huffman called on companies such as Microsoft, Anthropic, or Perplexity.ai that use crawler robots to mine data from various websites to be responsible for their actions.Pay the fee. The network infrastructure provider Cloudflare recently announced the launch ofAI Labyrinth (AI Maze)’s new feature combats unauthorized content scraping by feeding fake AI-generated content to web crawlers.



