From stone-faced investors discussing industry, to tech hipsters babbling narcissistically at Bushwick parties, to clueless CEOs on morning earnings calls, Artificial Intelligence (AI), and AI’s legal issues stands at the forefront of the Zeitgeist. Already, it’s translating into major changes.
Currently, it is driving speculation for a broad range of possibilities, such as:
- Birthing Star Trek replicators to usher in Utopia, eliminating the traditional 9-5
- Proving my theory that Donald Trump is an algorithmic breeding of L. Ron Hubbard and P.T. Barnum
- Opening our understanding and manipulation of human behavior via the processing of ungodly amounts of information
- Maybe getting another Heavy Metal movie.
However, AI still faces some legal hurdles before it can be seriously appraised. While generative AI differs between image and text-generated content, they overlap on the most basic issues, i.e., copyright.
Currently, the most pressing issues from an intellectual property perspective revolve around web scraping. In other words, how much information a program takes in, or how close to copyrighted information can the generated content be without infringing on the rights of the writer, artist, or other creator?
While the law addresses some circumstances analogous to web scraping and generative AI, many issues are entirely new. And previous issues have new dimensions or emphasis through new technologies.
Web Scraping and the Ninth Circuit
Generative AI has no original creative powers—everything is taken from pre-existing work. Thus, the more leverage scrapers take, the more likely the content will result in infringement.
What information is off limits to an automated web scraper, even by request or demand by the owner of the information, is a hotly debated topic. For instance, artists must have a public presence or portfolio to showcase their talent—would they be able to make their paintings, writings, or other copyrightable work unavailable for AI analysis via web scraper? Or is it more like photographer rules where essentially anything in public is fair game?
While Congress must grapple with this question, today it’s the Wild West. There’s little-to-no law or precedent restricting the gathering of public information by automated processes, incredible first-mover advantages that may never be lost. I’d be willing to bet that current web scrapers have already gathered every bit of information publicly available (including about yours truly… in my defense, it was dark, I was drunk, and that panda cub was delicious).
Remember, like most technology, this one is “values-neutral.” This means the same technology powering new generative AI scrapes the internet to help artists find instances of infringement to protect their own copyright or trademark.
HiQ Labs v. LinkedIn Corp
The most interesting case was in the highest court in California, the Ninth Circuit Court of Appeals.
In HiQ Labs v. LinkedIn Corp, this court ruled that automated web scraping of information available on public websites (i.e., user data from LinkedIn that you or I could see) does not violate the Computer Fraud and Abuse Act (CFAA), even if the website owner objects.
While this isn’t the most pressing issue concerning scrapers, it represents the first step in developing web scraping caselaw in the circuit responsible for the most copyright production in the nation and breaks in favor of the scrapers.
The CFAA was a cybersecurity bill prohibiting unauthorized access to computers and systems. It was basically an anti-hacker law, and the court’s ruling, in essence, holds that automated web scraping isn’t analogous to illegal hacking, an argument advanced by LinkedIn.
What will be far more interesting are arguments about what legislation may create a right to protect information even when otherwise offered publicly and how it’s challenged. Sadly, that’s a debate for another day.
New Developments as a Result of AI
Should Congress create legislation allowing potential goldmines like Reddit with a bajillion interactions to analyze and monetize their databases by providing them more leverage to deny access to scrapers while still keeping information public?
We could see an entirely new, valuable information technology industry emerge. Or at least, we can see one begin after legislation passes once enough new material is created that makes databases already created by scraping the entire internet obsolete. Alternatively, we could see:
- Restrictions concerning the information that cannot be scrapped for
- New businesses to make certain websites are scrape-proof or create information that damages AIs, i.e., overloading them with nonsense or inaccuracies
- AIs trained only on public domain information, causing them all to speak with an accent and terminology that is roughly 100 years behind our own, giving us all an omniscient consciousness that sounds like an extra in a movie about Chicago gangsters
- The web scraping we know today is partially banned, and then data entry becomes a bit more analog—think something akin to the intellectual assembly line that is document review.
The current state of law will allow web scraping to confidently continue on publicly available websites within the 9th Circuit, with other circuits likely to follow the path they pave on “scraping” not being “hacking” unless and until another case makes its way up the court system.
This does not, however, touch on the incredibly controversial topic of what can be produced with the information that’s being scraped. This vital issue of “fair use” in copyright recently received attention from the Supremes due to a bit of recklessness by Andy Warhol, a case which will have deep implications in debates on AI
Copyright Questions and Andy Warhol Synergy
Perhaps the thorniest question in generative AI is this: At what point is infringement infringement?
The question of copyright infringement has been kept as an analysis to be done by the four factors of fair use on a case-by-case basis (catch our in-depth explanation of copyright and AI: At The Intersect of AI and Copyright Law) as much of it involves evaluating art, a thing that the court has come to terms with being poor at, i.e., drawing the line between “art” and “pornography.”
In July, the Andy Warhol Foundation for the Visual Arts (AWF) v. Goldsmith case was decided (see our breakdown of the issues here). This was the most important copyright case arguably since Campbell v. Acuff-Rose, a precedent the court and both certiorari briefs relied on heavily.
Andy Warhol Foundation for the Visual Arts (AWF) v. Goldsmith
In framing the issue, the court stated:
In this Court, the sole question presented is whether the first fair use factor, “the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes,” §107(1), weighs in favor of AWF’s recent commercial licensing to Condé Nast. On that narrow issue, and limited to the challenged use, the Court agrees with the Second Circuit: The first factor favors Goldsmith, not AWF.
Perhaps the best illustration drawn in the case was that between parody and satire, two media that the court is looking at differently, which the court drew from Campbell:
Distinguishing between parody (which targets an author or work for humor or ridicule) and satire (which ridicules society but does not necessarily target an author or work), the Court further explained that “[p]arody needs to mimic an original to make its point, and so has some claim to use the creation of its victim’s (or collective victims’) imagination, whereas satire can stand on its own two feet and so requires justification for the very act of borrowing.
Looking at the text and Supreme Court’s interpretation of Campbell—unless the purpose of a work of generative AI is to make a point on every copyrighted work it drew from infringingly—parody is essentially off the table.
In elaborating on their position, the opinion of the court held:
…Campbell cannot be read to mean that §107(1) weighs in favor of any use that adds some new expression, meaning, or message. Otherwise, “transformative use” would swallow the copyright owner’s exclusive right to prepare derivative works. Many derivative works, including musical arrangements, film and stage adaptions, sequels, spinoffs, and others that “recast, transfor[m] or adap[t]” the original, §101, add new expression, meaning or message, or provide new information, new aesthetics, new insights and understandings. That is an intractable problem for AWF’s interpretation of transformative use.
Where Do We Go From Here?
While the AWF case does not address anything concerning AI specifically, it offers creators some protection, as changing the meaning or message via satire alone won’t be enough to shield generative art or text from infringement.
At its heart, the debate around AI’s legal issues is essentially one of the availability of public information to automated gathering, which is a brand new problem, and fair use, my favorite complex old problem that the courts have intentionally kept ambiguous.
While the courts and agencies like the Copyright Office will do their best with the patchwork of law they can quilt together to address these situations, they are social paradigm-shifting technologies that were not contemplated when current copyright and internet law were legislated and will require Congress toaddress them directly.