dennogumi/content/post/2022-11-20-an-art-challenged-individual-s-perspective-on-ai-anime-generated-art.md at cc5b298c03c33b74a3aae9a27f2ef98f163e0479

websites/dennogumi

Fork 0

Luca Beltrame cc5b298c03

[ci skip] Update draft

2022-11-22 00:12:38 +01:00

7.4 KiB

Raw Blame History

What is this AI you speak of?

{{< imgthumb src="images/2022/11/yumiko_question.png" size="600x" caption="Even Yumiko wants to know!" >}}

Unless you were living in hibernation in Alpha Centauri waiting for the right moment to start the invasion, you might have heard about the use of various methods of "machine learning" to have a computer program (to be very simple) "learn" particular features out of a data set (chess games, video games, images, sounds...) and use them to perform various tasks, such as playing games, generate texts, tackle complex scientific problems and many more things.

Scientific background or not, my mind is limited and I'd need someone more expert than me to find a precise definition of these systems (feel free to do so in the comments!). In this space there is active scientific research, both by universities and commercial companies, such as DeepMind (now owned by Google) and OpenAI (although they're not really "open"...).

One important point, which is relevant for the background of all this, is that aside the algorithms (the instructions required to "learn" and then "apply" the digested data), one of the key ingredients of this potentially palatable recipe is the model: you can view it as what the "AI" has learned after processing the data. While most of the algorithms are public (there are even research papers), models are often not, so only their creators can change, improve, or build upon them.

You know, it sounds awfully like the prologue to an incident involving a former MIT researcher and a jammed printer, but I'll leave that out for now. We'll get back to it later.

A robot that can paint

{{< imgthumb src="images/2022/11/maya-confused.png" size="600x" caption="Eeh~ I don't really understand all this stuff..." >}}

In January 2021, OpenAI announced DALL-E, a play on Disney's WALL-E and renowned artist Salvador Dalì: an extension to their GPT-3 system (which generated texts) which allowed the generation of images from natural language. This meant that the text "an apple on a wooden table" would produce (more or less) an image of an apple on a wooden table. To prevent any possible potential liability (they said "misuse" or similar terms, but it was mainly for liability), both sexual and violent images were removed from the (massive) data set used for training. However, despite "Open" in the name, DALL-E was only available to OpenAI's paying customers (like GPT-3). There was no way to alter, modify, or improve the whole deal unless OpenAI wanted to (in part they did, with DALL-E 2).

Similar models were made by other companies, like Midjourney, but likewise, they kept everything to themselves. Some places started offering generation services (free, or at a price), but it looked like some sort of niche interest. Until summer 2022.

One event that changed everything

That summer, Patrick Esser (from RunwayML) and Robin Rombach from a university research group (Machine Vision & Learning, LMU Berlin) released Stable Diffusion, another approach to generate images, trained on about 5 billion reference images. The big deal is that, although with some restrictions, the model was substantially more open than the others, and in fact was available, and allowed modification and reuse. That was when AI generation exploded.

{{< imgthumb src="images/2022/11/satsuki-flame.png" size="600x" caption="Explosions? Mess with Satsuki, and you'll get burned." >}}

Many, from researchers to particularly smart lay people, actually started working on improvements to the models, the algorithms, and everything that turned around Stable Diffusion. Although Stable Diffusion aimed at all kinds of art, specialized models were made, for example, to draw anime-style art. In addition, NovelAI, a company which provided a service to create stories using GPT-3, developed a custom (and high quality) model to draw anime art. Said model was also somehow leaked, and prompted further modifications (link in Chinese), although of likely questionable legality.

The advantage of all of this is that a regular user, provided there a suitable high-end GPU (NVIDIA or AMD, although Intel's ARC could also prove useful in the future) is available, can generate art. There's a plethora of software available to do so, although the most popular one is very ambiguous on licensing. There is still plenty of movement (warning: some links contained there may be NSFW) in the field, as well.

The rest of this post, in fact deals to what I actually did with all this stuff.

Art-challenged?

Those who are familiar with me or my background know it already: art was never my forte. In high school, my lowest grades were in art and related subjects. I was never, ever able to draw more than stick figures. It wasn't a big deal: whatever I lacked in art I made up for it in other disciplines (like science).

So what has that to do with AI art? Let me tell you two anecdotes

Ideas, and lack of implementation

As I discussed with the other anonymous creator of Yumiko and Satsuki several times, it would've been nice to actually see characters "spring to life".

But neither of us could do it. As a matter of fact, we did ask (and pay) artists in the past. Despite sometimes troublesome relationships (involving, in one case, a dispute on a payment site), it helped shape the characters and even prompted new ideas for them. One of the issues was that often for cost or time we would actually cut some of the planned ideas for the images.

7.4 KiB Raw Blame History