Microsoft removes guide on how to train LLMs on pirated Harry Potter books
Following backlash in a [Hacker News thread][1], Microsoft deleted a blog post that critics said encouraged developers to pirate Harry Potter books to train AI models that could then be used to create AI slop.
The blog, which is archived [here][2], was written in November 2024 by a senior product manager, Pooja Kamath. According to her LinkedIn, Kamath has been at Microsoft for more than a decade and remains with the company. In 2024, Microsoft tapped her to promote a new feature that the blog said made it easier to "add generative AI features to your own applications with just a few lines of code using Azure SQL DB, LangChain, and LLMs."
What better way to show "engaging and relatable examples" of Microsoft's new feature that would "resonate with a wide audience" than to "use a well-known dataset" like Harry Potter books, the blog said.
[Read full article][3]
[Comments][4]
[1]:
[2]: https://archive.is/D9vEN
[3]:
[4]:
Microsoft generated an AI image of Harry Potter with a Microsoft logo in a now-deleted blog.

Microsoft guide to pirating Harry Potter for LLM training (2024) [removed] | Hacker News

Ars Technica
Microsoft deletes blog telling users to train AI on pirated Harry Potter books
The now-deleted Harry Potter dataset was "mistakenly" marked public domain.

Ars Technica
Microsoft deletes blog telling users to train AI on pirated Harry Potter books
The now-deleted Harry Potter dataset was "mistakenly" marked public domain.

Ars Technica
Microsoft deletes blog telling users to train AI on pirated Harry Potter books
The now-deleted Harry Potter dataset was "mistakenly" marked public domain.