11 KiB
categories | comments | toc | date | disable_share | draft | featured_image | omit_header_text | tags | title | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
true | true | 2022-11-20 22:17:41+01:00 | true | true | /images/banner.jpg | true |
|
An art-challenged individual's perspective on AI (anime) generated art |
You might have heard it in the past few months: some areas of the Internet are buzzing with discussions on "generative art", that is, artwork generated from "AI" models that were fed with an absurdly large amount of images as training as a base. There are supporters, there are critics, and techology advances: among all this, this post offers my humble experience with computer-generated imagery.
In this post, you'll be guided by Yumiko, Satsuki, and Maya: the first two are characters created by someone else I know (who wants to remain anonymous) which I then expanded, cooperating with their creator in a certain project from a few years ago; the latter is... well, a character with an interesting history, which will be explained later.
{{< multithumb "images/2022/11/yumiko-hi.png" "images/2022/11/satsuki-hi.png" "images/2022/11/maya-hi.png" >}}
But first, an introduction for those unfamiliar with this world is in order.
What is this AI you speak of?
{{< imgthumb src="images/2022/11/yumiko_question.png" size="600x" caption="Even Yumiko wants to know!" >}}
Unless you were living in hibernation in Alpha Centauri waiting for the right moment to start the invasion, you might have heard about the use of various methods of "machine learning" to have a computer program (to be very simple) "learn" particular features out of a data set (chess games, video games, images, sounds...) and use them to perform various tasks, such as playing games, generate texts, tackle complex scientific problems and many more things.
Scientific background or not, my mind is limited and I'd need someone more expert than me to find a precise definition of these systems (feel free to do so in the comments!). In this space there is active scientific research, both by universities and commercial companies, such as DeepMind (now owned by Google) and OpenAI (although they're not really "open"...).
One important point, which is relevant for the background of all this, is that aside the algorithms (the instructions required to "learn" and then "apply" the digested data), one of the key ingredients of this potentially palatable recipe is the model: you can view it as what the "AI" has learned after processing the data. While most of the algorithms are public (there are even research papers), models are often not, so only their creators can change, improve, or build upon them.
You know, it sounds awfully like the prologue to an incident involving a former MIT researcher and a jammed printer, but I'll leave that out for now. We'll get back to it later.
A robot that can paint
{{< imgthumb src="images/2022/11/maya-confused.png" size="600x" caption="Eeh~ I don't really understand all this stuff..." >}}
In January 2021, OpenAI announced DALL-E, a play on Disney's WALL-E and renowned artist Salvador Dalì: an extension to their GPT-3 system (which generated texts) which allowed the generation of images from natural language. This meant that the text "an apple on a wooden table" would produce (more or less) an image of an apple on a wooden table. To prevent any possible potential liability (they said "misuse" or similar terms, but it was mainly for liability), both sexual and violent images were removed from the (massive) data set used for training. However, despite "Open" in the name, DALL-E was only available to OpenAI's paying customers (like GPT-3). There was no way to alter, modify, or improve the whole deal unless OpenAI wanted to (in part they did, with DALL-E 2).
Similar models were made by other companies, like Midjourney, but likewise, they kept everything to themselves. Some places started offering generation services (free, or at a price), but it looked like some sort of niche interest. Until summer 2022.
One event that changed everything
That summer, Patrick Esser (from RunwayML) and Robin Rombach from a university research group (Machine Vision & Learning, LMU Berlin) released Stable Diffusion, another approach to generate images, trained on about 5 billion reference images. The big deal is that, although with some restrictions, the model was substantially more open than the others, and in fact was available, and allowed modification and reuse. That was when AI generation exploded.
{{< imgthumb src="images/2022/11/satsuki-flame.png" size="600x" caption="Explosions? Mess with Satsuki, and you'll get burned." >}}
Many, from researchers to particularly smart lay people, actually started working on improvements to the models, the algorithms, and everything that turned around Stable Diffusion. Although Stable Diffusion aimed at all kinds of art, specialized models were made, for example, to draw anime-style art. In addition, NovelAI, a company which provided a service to create stories using GPT-3, developed a custom (and high quality) model to draw anime art. Said model was also somehow leaked, and prompted further modifications (link in Chinese), although of likely questionable legality.
The advantage of all of this is that a regular user, assuming the availability of a suitable high-end GPU (NVIDIA or AMD, although Intel's ARC could also prove useful in the future), can generate art. There's a plethora of software available to do so, although the most popular one is very ambiguous on licensing. There is still plenty of movement (warning: some links contained there may be NSFW) in the field, as well.
The rest of this post, in fact deals to what I actually did with all this stuff.
Art-challenged?
Those who are familiar with me or my background know it already: art was never my forte. In high school, my lowest grades were in art and related subjects. I was never, ever able to draw more than stick figures. It wasn't a big deal: whatever I lacked in art I made up for it in other disciplines (like science).
{{< imgthumb src="images/2022/11/abilities.png" size="600x" caption="Seriously. This is the extent of my skill." >}}
So what has that to do with AI art? Well, there are a couple of reasons that might be worth telling.
Ideas, and lack of implementation
As I discussed with the creator of Yumiko and Satsuki several times, it would've been nice to actually see characters "spring to life". Ever since the characters were conceived, thanks to a totally random comment I made that opened the lid to a train of thought which brought them in this form.
{{< imgthumb src="images/2022/11/yumiko-writing.png" size="600x" caption="I've noted down your total inability to draw. Go on." >}}
Neither of us could do it. Even Yumiko's creator is an artistically-challenged person, although less than myself. However, we asked (and paid, of course) artists. Despite sometimes troublesome relationships (involving, in one case, a dispute on a payment site), it helped shape the characters and even prompted new ideas for them. One of the issues was that often for cost or time we would actually cut some of the planned ideas for the images.
Example: one image arrived late, there were some more adjustments to do, but we glossed over to avoid waiting more time. Plus some of the artists weren't trusting their skill enough drawing certain types of characters (either women or men, for example) Nevertheless, there was quite an amount of art produced in the past years (our wallets weren't happy).
For these reason, using AI-generated imagery is a potentially useful way to prototype scenes, fine tune them, and see whether they "fly" in the context of what they're depicting (sometimes a good idea might have a bad implementation, for example). At this point, one can ask an artist to draw the final composition after gathering enough information.
Building upon a memory
The second reason involves memories, or rather, fading memories. Around twenty years ago, or perhaps a little more (in the 1999-2002 period), back when high speed Internet had just arrived in my country but I was stuck with a 28.8K modem (or an even more unreliable 56K one). At the time, I somehow found myself on Japanese sites, although at the time, I didn't understand even a single word of what was written. One of these sites was an aggregator called Surfer's Paradise (or "surpara").
Clicking at random through the list of the "newly added" sites made me end up on an artist's page: I do not know who it was. Even more clicking around, and I found myself in the rough sketch (らくがき) section, and in particular my attention was attracted by the drawing (rough) of a girl with light blue, long hair, wearing a futuristic armor (a mixture of black, white and blue), with a shield and a blade that sprung out of a vambrace on the right wrist and two long ribbon-like structures attached to the headgear. A scribbled text next to it read "i-Girl" (note, this is not a more famous image of an "i-Girl" related to the iMac). I liked the concept, and fantasized a bit on it. That's how "Maya" (at the time without a family name) was born.
{{< imgthumb src="images/2022/11/maya-shocked.png" size="600x" caption="I'm... I'm the copy of someone else's memory!?" >}}
However, I was never able to find the drawing again. I don't even remember the web page, and the image was not on any of the many backups I made. I remember saving it, but it was likely lost. So for many years, "Maya" was just an idea that kept going back and forth in my head. At some point, the concept of a "civilian Maya" (the one you've seen so far) and a "battle Maya", heavily inspired by 1980s anime like 超音戦士ボーグマン (often translated as Sonic Soldier Borgman), having the ability to transform and don some kind of powered suit (with a bracelet heavily borrowed from Borgman's Baltector).
{{< imgthumb src="images/2022/11/maya-suit.png" size="600x" caption="I'm also Maya, and I will fight whatever evil threatens this town!" >}}