Machine Learning for Composition

Posted on on October 25, 2022

There is some hype around the idea of artificial intelligence (AI) creating art that rivals the work of actual working artists. In many cases, this seems possible. The work created by the current systems are quite good. They also seem to offer a great deal of imagination. Though I think a few things may be going on behind the scenes that give the impression that the images created are greater than the sum of their parts.

Let’s talk about the sum of their parts. I mean the image sources and text identification systems that help categorize the images and image subjects. I am afraid that a good chunk of what is happening is just a mashup of image content and a smart implementation of keywords or tag association. The base technology of natural language understanding is pretty darn amazing. So it is fantastic that the brilliant word and context recognition that is going on with the typed in prompts for images. I also see that the systems have a pretty good understanding of what a head looks like. They even “understand” light transmission, physics, and other interactions that happen in the real world. They are pretty brilliant at starting with a mashup, or cloud of nothing, and slowly … iteratively making it make sense. Or getting the jumble of nonsense they start at, into something that humans are happy to qualify as a good result.

RESOURCES

Runpod.io – a system where you pay for your cycles – and where I can get results by running …

Dreambooth Stable Diffusion – a set of code that allows for adding unique images/subjects to a system … that can be stitched into the results from a trained system

Dreambooth – original research repository from Google

Corridor Digital Video – that demonstrates some results from the system

Joe Penna’s implementation of Dreambooth available on github

Instructions on installing a specific version of Dreambooth