Were your books stolen for AI training?

There’s a lot of buzz on Bluesky the last couple of days referring to an article and search tool published by The Atlantic. In brief, there is a site, LibGen, which has pirated over 7.5 million books and research articles. It is not the only pirating site out there, but what bubbled this one to the surface is that Meta and other AI companies have used this site for AI training, in part or fully. Rather than duplicated one author’s summary (Jason Sanford) and a useful post from an advocacy group (The Author’s Guild), I will be adding the Bluesky and The Author’s Guild links to these below. You don’t need to be part of Bluesky to view Jason’s. And I should note, he is one of many author’s posting on this topic. I found his thread to have additional useful information. In addition, the article referred to in their links is behind a paywall, but The Atlantic allows viewing and use of the LibGen search box portion of said article.

If you’ve been following the topic in the news, a number of AI training companies have been lobbying governments around the world to loosen or abandon copyright laws around printed and artistic (e.g. photography and artwork) materials. Sam Altman, creator of ChatGPT, which has confirmed plans to move to a pay for use model, has repeatedly stated that his program will not be able to make a profit should existing laws remain as they are. In essence asking for theft to be allowable so he can make money. Other companies, including Meta, have been slightly less obvious as to their core motivation by pointing to countries like China who consistently ignore copyright rules and are thereby beating “us” in the AI race.

On that last point… Look, you will probably make a successful case with me regarding the importance of AI in fields such as scientific advancement, national security, and technological achievement, but you will not succeed when it comes to the applications which these companies are aggressively promulgating for profit. Specifically, the purchasing of apps and services to generate books, artwork, and music; or handholding a user writing an email or Twitter post.

As Sanford mentions, last year Meta posted a profit of $62 billion, yet claim that the cost of paying for rights would be prohibitive. Bear in mind also, that $62 billion is current profit, not the expected gains over time for use of their AI tools after their rollout. And if anyone wishes to check out the range of user output from one paid for app, I suggest you search on what people have done with the pay-for-use tool, Grok, much of which can be rated at the cesspool level. And when you do, remind yourselves that this is the kind of output that we are “loosing the race in.”

The links are as follows:

Jason Sanford’s Bluesky post (containing The Atlantic’s article and search tool): https://bsky.app/profile/jasonsanford.bsky.social/post/3lkte7equxc2s

The Author’s Guild response on their site: https://authorsguild.org/news/meta-libgen-ai-training-book-heist-what-authors-need-to-know/

1 like ·   •  0 comments  •  flag
Share on Twitter
Published on March 21, 2025 06:55
No comments have been added yet.