Why Am I Not that Excited About Text-to-Image AI Art Generators? by Mahmoud Elalfi

Why am I not that excited about text-to-image AI art generators?

I pick a pencil up to draw and I am immediately struck by the impeding gap between what I have in mind and what I put out on paper. At times, the act of materializing would possibly render the experience pleasurable and distract from the failures. But as the novelty withers away, it becomes harder to ignore that impasse, and I would sink deeper into a fundamental frustration: why do I fail to properly exteriorize what is bustling inside? Why do I end up with what feels dead in contrast to the vividness it aimed for? I take this frustration, at times, to be a necessary buzz that I have to tolerate as I go through any creative endeavor, while at others it reaches paralyzing heights. Yet any hope of achieving anything ultimately requires grappling with it.

So, when the algorithms that usually show me the hopes, struggles, and works of artists brought to my attention that AI (Artificial Intelligence) text-to-image art generators are finally here I thought I would be more than thrilled. With what is now a relatively accessible tool, you just need to enter the right prompt to get the image you want. At least that was the main message: it is a genie in the lamp. Just tell it what you want, sift through the options, and you will eventually get what you dream of. Of course, there was an uproar on the issue. What does the machine feed on? Is AI art true art? Are artists going to lose their jobs? Can AI replace us? I sympathized. In fact, I had just started taking up illustration as something that can potentially make me money. In no way, therefore, was I detached from all the worries about how AI is going to affect the material conditions of art and artists. Yet one question keeps on bugging me: why am I not, aside from those worries, excited about this? Why do I find so lackluster that which promises to easily bring out what I have always struggled with having trapped inside?

One question would be though: does it really bring out what is trapped inside? In its simplest form, the generator receives a “command,” or a descriptive text. The text is used to represent a plethora of images digested by the generator. I might be able to provide clues regarding the content, the style, the mood, the lighting, and so on, and ideally, with the perfect prompt, I will get results very close to the description I provided. And if the description happens to be close to what I have in mind, the generator might grant me my wish and flesh out my inner life. Yet this process already reveals a gap on two levels, that of the transformation of text or language into image and vice versa by the generator, and the transformation of inner reality into text or language by myself. Therefore, I might either fail due to my inability to articulate in the language appropriate for the generator, or the language itself might fail to capture what I want to articulate. What the text can exhaust at times, might be too excessive at others. Besides, isn't this trouble in exteriorizing the interior, ironically, the heart of my impulse to produce the artwork to begin with? In this sense, the AI generator does not resolve the problem, it just seems to move it elsewhere. The genie is not one after all: an AI generator won't “read my mind,” and it is precisely this impossibility of reading minds that is fundamentally frustrating. Maybe even a genie won't be of much help; don't the stories of genies after all tease out the troubles presented by the play on words, wishing for the wrong thing, and ultimately the difficulty of knowing what we actually want?

Consequently, I can no longer be, in the language of the market, the passive client, whose verbal wish happens to be exactly what he wants. After all, every positionality as a client leaves something to be desired. Instead, the position is shifted to that of the need for active involvement or create-ivity. The best scenarios I can imagine, whether I am appealing to the generative powers of a machine or a human, are the ones where I can provide specific feedback to the process, in a way that almost renders me as a co-author. Of course, there are merits to seeing works by others: art can say things you weren't aware could be said, or make you feel things you weren't aware could be felt, and thus open you up to the worlds of others. But when it comes to the question I am posing here, the question of giving external shape to my own interiority, the artwork seems to be fundamentally an attempt at navigating the relationship between self and world. It becomes a creative quest. It concerns my status as a client who is simultaneously a creator. The AI generator technology in this scenario is thus better suited for being a tool of actualizing this quest, or a medium whose suitability depends not only on how I handle it but also on what it can offer due to its nature in relation to me. The art is the outcome, while the AI becomes an instrument of image rather than art generation — images that can be taken up as part of the creative process. Under these conditions, the more suitable question would be: In what ways are AI image generators, as tools of creating art, more or less suited for the task at hand?

If we take what is in the mind to be something more than textual description, then it can become quickly apparent that what is in one's head is not necessarily a concrete image that can be instantly described if one is willing to give it the proper time, but is rather a constellation of meanings, feelings, fragments, and blurry and overlapping entities that vary in clarity and definitiveness. While some mental contents may seem penetrable and directly translatable, others retain an enigma and might contain within what one doesn't yet fully understand, recognize, or is able to properly express. Making art thus becomes an act of investigating those contents, discovering their nature, emphasizing some parts over others, configuring a whole, and finding ways to give them shape. Some elements might have pictorial and graphic clarity from the get-go, while others prove to be more ideational, referential, or associative. Some details may be marginal, acting as good candidates for automatic “filling in” while others might be so central that figuring out the exact properties is essential for the desired effect. In all cases, the creative act understood as such is an attempt to interrogate the gap and expose its constituents in the bridging.

Now in light of this understanding, take drawing by pencil as an example. As a physical, movable instrument, it offers itself as an extension of the body. Its shape heavily relates it to the language of lines and weights and, faced with a blank page, it allows control over all that goes on it, opening up the possibility of constructing various formal relationships, in regards to position, contrast, emphasis, configuration, and so forth. Unlike definitive elements, which are easier to describe in text, these formal relations are harder to capture. Furthermore, the progressive nature of the unfolding lends itself to higher levels of control and decision making over the outcome. It is a constant feedback process— an enmeshment into a state of assessing the parts and the wholes. While it might come off as an impossible fantasy of control, and in a way it might indeed be a reaction of the sort to frustration, it is also an encounter with this very impossibility in a revelatory manner. The process becomes a way of knowing and a way of coming to terms with the nature of this futility.

In contrast to drawing by a pencil, an AI image generator doesn't directly go up against the gap. It somehow goes around it, sometimes even enlarging it. The AI image-generating process is in fact frustrating; it will keep producing images that barely translate the inner opaqueness, yet it is the kind of frustration that doesn't readily open itself up for negotiation, building-up, or constructive discovery. So while the generator might present me with countless iterations of, say, tigers, what is struggling to take shape might be a slight ferociousness to be captured by a gesture, a hint of majesticity to be articulated in a composition, an intense gaze to be accentuated through distortion, or even something yet ambiguous that only an extra little heterogeneously thick line could channel. Paradoxically, transcribing this spatial map of relations, even if possible, needs prior knowledge that usually arises, at least in a satisfactory form, in the process itself. And even when I want to put something on paper that can simply pass as a tiger without any further layers of meaning, then it is probably also an attempt at figuring out what “passing as a tiger” is about. And even when that part is already figured out, when it is a marginal concern, or even when it becomes, understandably, boring, it would rarely constitute the totality of the artistic project. There would always, in my case, be an impulse towards a transformational act that mediates inner and outer spheres. The magical appearance of the tiger is rarely the point, but rather how it is part of that personal mediation, and how, by extension, the tool plays a role in it.

This is not to say that an outcome without the generator would guarantee “perfection,” but it might come closer to revealing what the imperfection is made of. After all, the fundamental drama seems to be exactly that what materializes outside is rarely, if not never, the same as what is inside, for they are of a slightly different nature. Yet, it is through engaging with this transformation that I can come to terms with it and with the contours of what I can control, what I cannot, and what I might decide I don't want to control at all. Perhaps it is then that the gap can become a little less frustrating and a little bit more enjoyable. In my case, it is this freedom to “construct” the artwork that instant text-to-image AI generators fall short of. The forms of AI image generation that I find more attractive are therefore ones that allow a higher level of involvement and feedback over time, which allows me to move beyond the “idea of the image” toward its actual fabric. Through models that allow localized actions, transforming images that one provides, or having control over the input it trains on and the kind of visual and textual language it develops, AI can become more of an assisting intelligence that is able to aid some aspects of the process as fitting for the mode of creation.

So why am I not that excited about instant text-to-image AI art generators? Because they don't magically tackle the dilemma at the core of my creative impulse, nor do they offer the space to work through my relationship with the world. It is this elusive space that creating an artwork tries to reconstruct and make clearer through a process that is not an expression of a self or a thing that already exists but rather a discovery, transformation, and reconstruction of them as they engage with that space and bridge over. Such space might not mean much for many, and for them, the resistance to this form of generator might seem unwarranted. What I am sure of, however, is that some of the art that AI Generators feed on, learn from, and integrate into themselves is perhaps the work of people whose relationship with art has been heavily colored by a similar primary frustration, a need for facing the gap, and a constant drive towards finding a way through.

Mahmoud Elalfi is an MA Researcher in the Philosophy program at GCAS, with a degree in

Architecture and Urban Design, experience as a teaching assistant, and a love for

illustration. In his research, he is interested in subject-object relations and symbolization from

philosophical and psychoanalytic lenses, especially as they relate to aesthetic and spatial

dimensions of ideation, experience, and construction of meaning.

Andrew KeltnerJuly 6, 2023