{"id":12449,"date":"2025-07-03T19:53:27","date_gmt":"2025-07-03T17:53:27","guid":{"rendered":"https:\/\/haimagazine.com\/uncategorized\/training-ai-on-books-is-legal-under-certain-conditions\/"},"modified":"2025-07-10T15:55:50","modified_gmt":"2025-07-10T13:55:50","slug":"training-ai-on-books-is-legal-under-certain-conditions","status":"publish","type":"post","link":"https:\/\/haimagazine.com\/en\/ai-in-industries\/training-ai-on-books-is-legal-under-certain-conditions\/","title":{"rendered":"Training AI on books is legal under certain conditions"},"content":{"rendered":"<p class=\"wp-block-paragraph\">Three non-fiction authors \u2014 Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson \u2014 sued Anthropic for illegally using their books to train the Claude model. While the case is still ongoing, <a href=\"https:\/\/storage.courtlistener.com\/recap\/gov.uscourts.cand.434709\/gov.uscourts.cand.434709.231.0.pdf\" target=\"_blank\" rel=\"noopener\"><mark style=\"background-color:#82D65E\" class=\"has-inline-color has-contrast-color\">Judge William Alsup issued a partial ruling<\/mark><\/a> addressing key aspects of the dispute.<\/p><h4 class=\"wp-block-heading\"><strong>Reproduction in the training process is fair use<\/strong><\/h4><p class=\"wp-block-paragraph\">This judicial opinion doesn&#8217;t end the proceedings, but it&#8217;s worth mentioning because it is partially favorable for AI companies, though with some caveats. The court recognized that machine learning from books obtained legally might meet the criteria for fair use. <strong>It also clearly emphasized that copying content from pirated sources is a violation of copyright law<\/strong>. While the court hasn&#8217;t yet decided the whether responses generated by the model could constitute a secondary infringement, it suggested that using appropriate filters might be enough to prevent this.<\/p><p class=\"wp-block-paragraph\">In his opinion, Judge Alsup noted that training an LLM model on books isn&#8217;t aimed at reproducing them but at creating new value, which is the ability to generate diverse responses.<\/p><blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p class=\"has-contrast-color has-accent-background-color has-text-color has-background has-link-color wp-elements-9fe12fcbed98a56c3bf75a677cb5f2cf wp-block-paragraph\"><strong><em>&#8220;The model trained upon works, not to race ahead and replicate or supplant them, but to create something different&#8221; \u2013 Judge William Alsup<\/em><\/strong><\/p><\/blockquote><p class=\"wp-block-paragraph\">The fact that some parts might be &#8220;remembered&#8221; by the model doesn&#8217;t automatically mean there\u2019s an infringement, as long as they aren\u2019t mechanically reproduced in the responses.<\/p><p class=\"wp-block-paragraph\">This reasoning is similar to the argument used by the US Supreme Court in the Google Books case, where it was recognized that processing books for searching and indexing constitutes transformative use.<\/p><h4 class=\"wp-block-heading\"><strong>Scanning paper copies and illegal sources<\/strong><\/h4><p class=\"wp-block-paragraph\">The next issue was about the digitization of books. Anthropic defended themselves by saying they only scanned legally acquired paper copies to facilitate processing. The court found that there was no illegal redistribution, just a conversion of format.<\/p><p class=\"wp-block-paragraph\">On the other hand, the court had no doubts that sourcing content from pirate repositories like Books3 or Library Genesis is a law infringement. The explanation that it was a &#8220;research library&#8221; simply wasn&#8217;t enough. This is a relevant message not just for Anthropic but also for other companies in the industry like Meta, which also trained models on Books3.<\/p><h4 class=\"wp-block-heading\"><strong>It&#8217;s not a photocopy, it&#8217;s a creative tool<\/strong><\/h4><p class=\"wp-block-paragraph\">Can the court&#8217;s position be surprising? Not really. Models aren&#8217;t designed to copy and store works, but to learn structures and correlations. However, generating a response style could be seen as violating someone&#8217;s rights, although this issue isn&#8217;t fully settled yet.<\/p><p class=\"wp-block-paragraph\">For the AI industry, this court opinion sends a clear message: use legal sources and apply filters. That&#8217;s all it takes to be on the right side of the law.<\/p><h4 class=\"wp-block-heading\"><strong>A judge who understands technology<\/strong><\/h4><p class=\"wp-block-paragraph\">It&#8217;s worth mentioning WHO issued this partial ruling. It was William Alsup, known from the Oracle vs Google case, where he ruled that Google&#8217;s use of Java APIs fell within the bounds of fair use. This is a judge who has repeatedly shown that he understands the complexity of computer technology.<\/p>","protected":false},"excerpt":{"rendered":"<p>A US federal court has made a statement regarding the training of artificial intelligence on copyrighted works. The case is still ongoing, but a partial ruling sheds new light on the issue of copyright laws.<\/p>\n","protected":false},"author":8,"featured_media":12387,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"rank_math_lock_modified_date":false,"footnotes":""},"categories":[797,805],"tags":[818,817],"popular":[],"difficulty-level":[36],"ppma_author":[660],"class_list":["post-12449","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-in-industries","category-law-and-ethics","tag-ai-law-2","tag-claude-2","difficulty-level-easy"],"acf":[],"authors":[{"term_id":660,"user_id":8,"is_guest":0,"slug":"kamil-swidzinski","display_name":"Kamil \u015awidzi\u0144ski","avatar_url":{"url":"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/06\/freepik__retouch__63609.png","url2x":"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/06\/freepik__retouch__63609.png"},"first_name":"Kamil","last_name":"\u015awidzi\u0144ski","user_url":"","job_title":"","description":"\u015aledz\u0119 najnowsze technologiczne trendy, w tym AI. Jako Innovation Manager jestem blisko nowych rozwi\u0105za\u0144 wsp\u00f3\u0142pracuj\u0105c ze startupami."}],"_links":{"self":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts\/12449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/comments?post=12449"}],"version-history":[{"count":2,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts\/12449\/revisions"}],"predecessor-version":[{"id":12451,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts\/12449\/revisions\/12451"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/media\/12387"}],"wp:attachment":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/media?parent=12449"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/categories?post=12449"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/tags?post=12449"},{"taxonomy":"popular","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/popular?post=12449"},{"taxonomy":"difficulty-level","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/difficulty-level?post=12449"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/ppma_author?post=12449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}