Making AI Art With Midjourney

Making AI Art With Midjourney

The artwork you see above was created using an AI art generator called Midjourney. I typed four words: religious and superstitious terror.

Don’t worry, I’m fine, I plucked that phrase out of a book I happened to be reading.

First, the generator gave me four thumbs. I chose which I wanted to enlarge. Obviously I picked the first one.

I’m also creeped out by that third one, with the elongated head and the tiny little person which I’m sure is a floating child monk ghost.

Full disclosure, this has been very lightly retouched. Spindly branches tend to decorate weird parts of sky in AI generated art. (As a side note, I’m noticing a disproportionate number of human(like) figures with their backs turned. I wonder if the algorithm has been trained on large numbers of back views because it can’t yet do faces and limbs very well?)

All right, that’s it. I’m a convert. The first AI art generator I used was Wombo. I quickly moved on to Deep Dream Generator, and occasionally use NightCafe Studio. I wrote about that here. But I can’t be bothered with any of those now. While the news feed was filled up with other news in 2022, it was easy to miss the announcement that the beta of DALL-E 2 had been released. (Midjourney uses that, pretty much.)

This is a gamechanger for AI generated art. This engine is NOT LIKE THE OTHERS. This one will… maybe… do exactly* what you tell it to? Midjourney is amazing.

*it doesn’t

HOW DOES IT WORK?

  1. Get yourself a Discord account if you don’t have one already. (This app is generally used for group chats, but Midjourney AI generation all happens within the app. It’s a bit weird, but it works.)
  2. Sign up for Midjourney here.
  3. Keep reading so you don’t waste your free 25 time credits. (I bet you’ll want to sign up after that.)
/imagine [ENTER] a lonely guy drinking coffee inside a bar at night, dark city, rain and thunderstorm through the window, high definition, realism (steam from coffee added in Affinity Photo)

IS MIDJOURNEY EXPENSIVE?

At the moment you can get a plan for $USD10 per month. (FYI, I blew through it in two days of heavy use. I upgraded to Standard Plan after that.) Some people are saying ten bucks costs too much. I’m not sure what they’re comparing it to? Not to commissioned digital artists. What you’re buying: some heavy-duty processing power.

Someone on the forum made this point:

I understand that it might feel a lot of money, but given what kind of hardware (no, you can’t run this on your local GPU even if you have a $2000 3090Ti extra nice) is required to do these large models and run all the processing at reasonable and even fast times … I am just so thankful to have access to it in a flatrate model. And sure … 5 years down the line newer server GPUs and other improvements will make the pricing of today feel expensive … but today it is extremely cheap and we get access to cutting edge technology without having to invest a ton of money.

GeraldT

However, the ethics behind charging to use the information run on those super computers is another matter entirely. Regarding DALL-E, David O’Reilly has this to say:

Using millions of investor dollars, Dall-E has harvested vast amounts of human creativity — which it did not pay for and does not own or credit — and demands investorship over whatever you make (find) with it.

Paying for it benefits a tech company on the back of a century of human effort — a bullshit deal.

Dall-E undermines the work of creators of all kinds, most obviously photographers, illustrators, and concept artists who shared their work online, and never asked to be included in a proprietary learning model. It rips off the past generation for the current one and charges them money for it. Scam.

Because it’s black box, passing off Dall-E images as one’s work is always going to be akin to plagiarism. All controls are hidden and many ordinary words are censored. Like Google it uses opaqueness to conceal ideology, very far from being ‘Open AI’.

@davidoreilly.2

Although I blew through my monthly quota in two days, I got many usable images out of it. The success rate is very good. And I’m new to it. The learning curve is quick. In fact, you don’t need to know much at all if you don’t mind what kind of art Midjourney gives you.

However, if you want to art direct it, you’ll need to be making a lot of re-rolls. To get to that next level, you’ll be blowing through that 200 minute quota quickly, without much to show. So really, if you’re on the $10 plan, you sit in this awkward place where you’re pretty sure you could get something better if you gave it another couple of re-rolls, but you want to conserve your limit.

Honestly, this must be what gambling addiction feels like. “If I just re-roll one more time, it’ll give me the perfect piece of art!” Do what’s right for your psychology, I guess.

Once you have a paid account (including the basic plan), you’re expected to leave the newbie and general channels and generate art by DMing the Midjourney Bot. (If you stay on the Newbie channel the bot kicks you off, as I found out.)

In case you’re brand new to Discord, you’ll find your direct messages by clicking on the blue Discord icon at the top, left hand corner of the screen.

Chatting with the bot is great, because now you won’t have a feed filled up with other people’s “imagines”. (That’s what people call these artworks. I dunno if it’ll stick. Midjourney calls them “generations”.)

Bear in mind: Although it looks like you’re making stuff that’s entirely private between yourself and a bot, the artworks themselves are not private. They can be seen by others, and do not belong to the users who prompted them. (You can use them however you like. So can others.)

To find all your imagines in one place, go to https://www.midjourney.com/app/

The first page of a picture book which hasn’t been made. It took a while to work out how to art direct this one. I wanted a city vista of skyscrapers, but one skyscraper made out of hamburger. After trial and error I uploaded a png (transparent bg) 512×512 photo of a basic burger, then wrote the following prompt: /imagine [ENTER] [url to the image] hamburger stacked skyscraper building cinematic, detailed –ar 2:3 (Obviously I have done a lot to it! But instead of spending two days, I spent two hours.) The logos/signage were also made in Midjourney with prompts like ‘neon sign retro hamburger restaurant’ and ‘hamburger signage retro 1960s’.
imagine [ENTER] tiny truck driver, stands next to huge pickup truck :: cinematography, realistic, golden hour

A TIP FOR MAKING THE MOST OF YOUR TIME CREDIT

Legend has it that the thumbs (the 2×2 images) use barely any processing power. Nonetheless, Midjourney snatches a disproportionate amount of your time credits to process them. What really uses their expensive computing power: Your upscales.

If you’re on a Standard Plan you have the option of switching to relax mode.

Hint: Switch to “relax” mode to generate the thumbs. Do this by typing /relax [ENTER] in the Discord server. Generate thumbs to your heart’s content.

When it’s time for upscaling, switch back to “fast” by typing /fast [ENTER] in the Discord server.

If you don’t do this, you will burn through your fast time real quick. Then you’re stuck in relaxed for the rest of the month unless you pay extra for more “fast” processing time.

I’m not sure if this is because I’m on Australian time, generating art when most of the world is asleep, but I find the fast mode isn’t all that much faster than relax mode. Which is good. Midjourney’s relax mode is still faster than other AI generators I can think of.

WHERE DOES MIDJOURNEY FIT IN THE AI ART GENERATION LANDSCAPE?

If you’re interested in where AI art is at right now and the current state of Midjourney, its relationship to DALL-E, why it’s called DALL-E and how fast all this is changing, this video will explain all that. (It’s changing very quickly!)

I’m actually a little scared about what DALL-E 2 will do to the media landscape and our increasingly post-truth world as images become increasingly photorealistic. As for Midjourney, this AI is designed for making art.

DALLE is great at creating more accurate, clear images. But Midjourney makes the BEST album covers. It feels like DALL-E was trained on stock photography, while Midjourney was trained on concept art & renaissance paintings.

@Johnny Motion

RESOURCES AND SOFTWARE TO USE ALONGSIDE MIDJOURNEY

  1. If you’re the sort of person who reads instructions, Midjourney has a QuickStart Guide. It also has a fuller User Manual but not everything gets into it right away. The devs make announcements on Discord. Because of instant notifications, Discord is a good way to keep everyone up to date with changes. (As soon as this post is published it will be slightly out of date.)
  2. Once you subscribe (even on the lowest tier) you get access to community images. What I didn’t realise until I subscribed, and how useful this would be: You can copy and paste other people’s prompts. You’ll learn so much simply by doing this. Find other people’s prompts by going to Community Feed on your own Midjourney profile page. You can bookmark the ones you love, copy their prompts.
  3. Photo software. I use Affinity Photo because it has one upfront price and does everything I need. Don’t get Adobe Photoshop unless you need features Affinity Photo doesn’t have, or unless you already own a huge library of presets, plug-ins and brushes (which won’t work in Affinity Photo). Even if you never learn much features of your photo software, a few tools will make AI art look way better, namely: The inpainting tool (paint over, automatically remove or fix) and the clone tool. The OpenSource version is, of course, good old GIMP.
  4. Your own way of enlarging images. Midjourney can do this but it costs you unnecessary time credit. The OpenSource way: Real-ESRGAN Inference Demo. People also recommend Cupscale. A paid, prettier way: Buy a copy of Topaz GigaPixel.
  5. This Aesthetics Wiki can help you find prompts to make a particular style e.g. the entry for Honeycore: “Honeycore is centered around the rural production and consumption of goods such as honey, bread, and waffles. It is similar to Cottagecore in that rural agricultural imagery and values are emphasized, but the visuals are streamlined to create a color palette of mostly pale yellows and browns…”
  6. Tips For Text Prompts
  7. A Guide To Writing Prompts For Text-To-Image AI
  8. Midjourney Styles and Keywords Reference, an extensive guide on GitHub with lists of theme and design styles, artists, material properties, intangibles and things you haven’t even thought of. WITH IMAGE REFERENCES. When you see people using words like ‘supersonic’, they’re probably read this document, or borrowed prompts from someone who has.
  9. Stuck for prompts? Try the Promptomania tool.
  10. You can find all sorts of random generators on the web. One that makes me laugh: The Arty Bollocks Generator, which pumps out artist statements to go with pieces of art. This was made long before Midjourney, so it would be interesting to see what AI does with it.
  11. Here’s another Prompter. Save the spreadsheet to your own Google drive. The guy who made Prompter also made his own QuickStart Guide which you may prefer over Midjourney’s own documentation.
  12. A Facebook group which is a spin off of “Procedural / Generative / Tech Art” group, but dedicated to AI generated images.
  13. Midjourney lets you provide an image URL in the beginning of your prompt as a target image. (The algorithm doesn’t pay all that much attention it?) In any case, if you have an image you’d like to use as base, it needs a place to live on the Internet. (It needs a url.) A place to upload images: im.ge There’s a 50MB limit. You don’t need to create an account to get a temporary link. (Alternatively, upload images within Discord using plus sign to the left. Right click on it in discord “copy link”.)

Also, because this thing is in beta, the algorithm is still learning, we are still learning it, and no documentation will tell you everything. Unfortunately, Discord is great for discussion and hopeless for archiving. This may be regrettable in future, unless someone’s creating an archive.

/imagine [enter] extraterrestrial intelligence discovered :: photorealistic, unreal engine, diagram, encyclopedia, scientific breakthrough, editorial –w 724 –h 1024 (Note Midjourney only did the background.)
/imagine [ENTER] dark energy, dark matter :: photorealistic, unreal engine, diagram, encyclopedia, scientific breakthrough, editorial –w 724 –h 1024
/imagine [ENTER] extraterrestrial intelligence, in the style of arthur getz, detailed watercolor, inky –w 1000 –h 1500

NOW TO THE PROMPT ENGINEERING

“Prompt engineering” is catching on in acknowledgement that using Midjourney to direct output is a skill in its own right.

Midjourney does not yet take specific art direction. However, at this rate it will in 5-10 years’ time. For an example of where this could go, there’s a Prompt Clips thread under the Prompt Craft chat on the Discord server for a growing number of examples where users post “groups of multiple words that reliably/consistently create a known effect on MJ output.” tl;dr: Midjourney is reliable if you can work out what to tell it.

For example, someone reliably worked out how to make sprite sheets using prompts such as ‘full page grid sprite sheet, game assets, asset sheets, sprite sheet, and how many times you need to hit the V button to get a sprite sheet.

There tends to usually only be one generation in a grid that’s close to the “literal” interpretation of any prompt. Now and then you get a grid and all four images are usable. Oftentimes, the excellent art you see on the Community Feed have been “re-rolled” many times (meaning numerous times hitting V and U.)

GENERAL HINTS FROM THE DISCORD CHAT WHICH MAY OR MAY NOT WORK FOR YOU
  • Try using things like “humanoid robot, metal spider” to get more arms. The basic idea is to try get MJ to blend two “big” ideas that have the details you want, rather than going for the details directly.
  • When making hybrids, a phrase such as “cat bird hybrid” will confuse Midjourney. Try instead: cat-bird or cat :: bird.
  • Midjourney is bad at snakes. If you want a snake head try the term basilisk.
  • You can’t tell it how many legs to give something but try bipedal and quadrupedal.
  • Ginger emphasises hairstyles (not the root vegetable)
  • --no makeup emphasises faces
  • --hd can do weird stuff, it’s made just for landscapes. (It’s a parameter which gives you an image you can use as a desktop image.)
  • Try “detailed diorama“. Dioramas can have really good spatial coherence if you find the right prompt.
  • 85% of the time, you do not want to enhance. --stop 70 is worth trying. See if you like it.
  • adding --no background mostly returns flat/gradient colour background, handy for importing into other image software.

TIPS FOR MIDJOURNEY PORTRAITS AND HEADSHOTS

/imagine portrait of katherine mansfield 60s retro poster (with minor fix-ups done in Affinity Photo)

If you want to create portraits, I recommend Betcha She Sews for an excellent how-to guide.

You’ll notice a lot of ageism and racism when creating portraits. It’s very difficult to create a middle-aged woman, for instance. Requests for people default to white, with the exception of anime and manga styles. “Woman” defaults to young. “Elderly woman” works, if you want a woman in her eighties. If you want a woman in her forties, your best bet is to prompt with “elderly” then stop it at 80% or something. If you want a full-body render, try prompts ‘full shot’ or ‘full body’.

There’s a strong white skin bias. Even “Black skin” and “dark skin” don’t work. Better instead to specify a location (city or country) or continent e.g. “in africa“, “african-american female“. Or find an actor with the skin you want and use their name as a prompt.

People (especially women?) get long necks a lot of the time. Try --no long neck. Also try making your image dimensions less tall.

Let’s go one further: --no long neck and double face whenever you’re making portraits.

Most results are artistic and stylized. But certain types of subjects (e.g. famous people) tend to push things more in a non-realistic direction.

If you’re getting a lot of distortion (e.g. in a face) try stopping an upscale at 50% (by typing --stop 50) then upscale again from there.

Use FaceApp not just on your own selfies — you can use it to fix AI generated faces as well.

Many people have tried to art direct angles (including me) to no avail. Midjourney returns front-on shots. Every now and then you’ll get someone looking to the side, but no prompt encourages various common portrait photography poses e.g. the 45-degree angle rule.

I tried every photography term I could think of. Every English phrase:

  • 45 degree angle
  • 3/4 angle
  • looking off to the side
  • side-on view
  • etc

I even uploaded a 3D model with a 45-degree angle to use as an image prompt.

Nothing did anything to change the angle. That prompt image did change the aliens to look more human. But then I upped the --iw and got face on versions of the reference image.

In the end I dispensed with the prompt image and got a bunch of aliens, all staring straight at me.

But I knew it was possible, because I got one measly alien with a 45-degree head pose. I got it entirely be accident. (Also got an extra eye.)

/imagine [ENTER] alien in style of charles harper

FULL BODIES

How long did it take to get a full-body alien, without any body parts cut off? When I used the prompt alien --ar 2:3 on its own, I got four head and shoulders shots. Next I tried full-body alien --ar 2:3 which gave me one head and shoulders shot, and three down to the knees. After a re-roll I got two out of four.

Next I tried full-body alien, bipedal. I got one full-body alien out of four.

‘Bipedal’ is fine and dandy if you’re making fantasy creatures, but what if I want a bog-standard human? Let’s try an elderly man. You wouldn’t want this face, but it took three re-rolls to get four full-body shots of elderly men.

I think when creating a man, bipedal is messing me up.

/imagine [ENTER] elderly man, full-body –ar 2:3

In the case of a human bipedal must have thought I wanted something interesting going on with the legs (which would explain the old guy who skipped leg day above).

These four images are a good example of how Midjourney works. It doesn’t know if I want him facing towards me or away. Do I really want the whole body? How about Grandpa Midjourney here, who has cropped up twice? And do I really want a man? Or an elderly woman with a pretty scarf? The first roll frequently gives you one image you can work with. Sometimes none, sometimes two. Occasionally all four.

Hint: I’m finding ‘lonely’ is good as a prompt if you’re after a scene with one person in it.

/imagine [ENTER] grandstand, lonely :: cinematic –ar 3:2

A NOTE ON PUNCTUATION

At time of writing, it’s not yet clear where you should use commas, full-stops and colons in your prompts.

It’s not case sensitive.

Some users think punctuation does nothing.

Here’s another view, from users @fractl and clarinet:

So :: is the only ‘official’ break in a prompt, but comma, plus, pipes all have some (minor) effects as well. Nothing consistent, but in some cases one may be better than another.

Here’s a test I ran a while back:
Red panda clearly shows the animal of that name.
Red, panda separates them a little bit (a red-haired red panda)
Red:: panda gives a panda that is red
Red:: panda:: —no red panda is even more clearly a panda which is red (and not a red panda)

One thing is clear: You’re not writing an essay, so don’t use commas grammatically. Use commas to separate parts of an image, not, say, to list attributive adjectives in front of a noun.

Some people are making use of full stops. I’ve seen: gigantic robot monster. Eating a schoolbus. Octane render. Rainy. Realistic

However, if you want a hard delineation between parts of a prompt, we are advised to make use of colons instead. I’ve seen a single colon, a double colon, a double colon with a space between. I’m not sure if these return different results.

Current wisdom: Commas are soft breaks, :: are hard breaks.

Some uses insist the colon does something. But it’s tricky to get right.

Note: the parameter of a prompt begins with two en dashes. THIS DOES NOT MEAN AN EM DASH. Sometimes software automatically corrects two en dashes to a single em dash, but the em dash is not recognised by Midjourney.

This: --ar 9:16

Not this: —ar 9:16 (This won’t work.)

Midjourney seems strangely forgiving of typos. Still, spell check your prompt before wasting your credit.

I’m also seeing the Boolean operator + in some of the top trending art. Stuff like:

scary cyberpunk ganguro mech-robot wearing hoodie + yellow honey oozing from body + cool street style + relaxed pose + centre image + Studio Ghibli style + moebius style + pastel color style + line drawing

Perhaps the plus symbol works like old-timey search engines, meaning if it comes after the plus, the algorithm will definitely factor it in? (In case you’ve never heard of Boolean logic, a plus sign means AND.) More likely, it’s a way for the user to keep track of grouping and ordering. (People use square brackets in the same way. It doesn’t mean anything to Midjourney.)

Looking through the Community Feed, I don’t think the algorithm pays any attention to spaces after commas. The spaces seem useful for our own reading ease. The exception is, don’t insert a space after the double dash of an argument.

Anyway, I had to check what that prompt gives us (it’s a modification on someone else’s):

I tried creating four new variations based on thumb 2, but these are no better. I ended up going with the original thumb 2 for the upscale.
What you don’t see in thumbs: The final detail. The quality of the fabric is stunning.

EVOLVING YOUR CREATION: APPROACHES TO THE UPSCALING AND VARIATIONS BUTTONS

Midjourney gives you two rows of buttons: V for ‘create four new variations’ and U for ‘upscale’. But which should you press first, to avoid wasting time credits?

  1. Upscale the generations you like after the first prompt then make variations of the upscales you like most.
  2. Keep picking variations until you get one you want to upscale.
  3. Upscale each image before hitting the V button. (Is this method the best way of achieving detail? Could be.)

Before you make variations, consider doing an upscale first. That extra bit of visualization effort can sometimes get you where you want quicker than just making additional random variations.

Betcha She Sews
IMAGINE, REROLL, VARIATION AND UPSCALE

/imagine renders a grid of possible compositions from your prompt inside a cached session

Reroll 🔄 also renders a grid of possible compositions from your prompt… plus adds another iteration of detail

Variation [V1] renders a grid similar to your selection… plus adds another iteration of detail

Upscale [U1] increases the size of your selection from thumbnail to full …plus adds another iteration of detail

AFTER THE EVOLUTION

--uplight at the end of your prompt uses a light touch to simplify details when it is rolled, and
--stop 90 halts the whole render process like a handbrake at whatever percentage you specify (replace 90 with your own number)

WHAT IS SAMESEED?
from the Discord forum

MESSING AROUND WITH WEIGHTS

--iw stands for image weight. Image weight refers to the weight of the uploaded image prompt versus the text portions of your command. Typing --iw 1 keeps the generated image very similar to your original image.

I can’t guarantee this does much, but try cropping the prompt image to 256:256 and maybe touch them up to emphasise the parts you want Midjourney to take notice of.

/imagine less of this ::0.5 but more of that ::5

If you prefer, work with percentages. (My brain does better with percentages.) Don’t use the percentage symbol, though. That doesn’t work. Instead:

/imagine apple ::80 orange ::20
(for 80% apple, 20% orange)

Also this:

dog:: cat:: = is dog (once), cat (once), averaged
dog::2 cat::2 = is dog dog (twice now) cat cat (twice now), averaged
dog::4 cat:1 = dog dog dog dog cat, averaged
dog::1 cat:3 = dog cat cat cat, averaged

Something::1
Lightly Something::0.5
Eliminate Something::-1

Messing around with image weights is sometimes necessary because Midjourney has some quirks. Like… eyes:

Maybe the algorithm has been trained to focus more on eyes because (neurotypical) humans focus so much on eyes?
IMAGE WEIGHTS

Midjourney lets you tell it what you what by reading a url to an image. Compared to how well the text prompts work, it’s surprisingly not that great right now.

However! With a lot of re-rolling (like maybe ten times?) I did get from this cartoon portrait of Lisa Simpson:

To this 3D realistic portrait of Lisa Simpson:

Lisa Simpson wears a red dress with bare shoulders. When I tried prompting with ‘naked shoulders’ I was warned by the bot: ‘naked’ is not allowed.

But the eyes are creepy and I could not for the life of me manage a white pearl necklace so I retouched in Affinity Photo (took about five minutes):

Here’s the prompt that got me there, but don’t ask me whether the image weight instruction actually works, I don’t know (I used a smaller version of the cartoon of Lisa Simpson above in this prompt):

/imagine [ENTER] realistic girl, lisa simpson, detailed, short spiked blonde hair, large eyes, large pearl necklace, portrait, unreal engine --ar 2:3 --iw 1.5

Yeah, but here’s the true test. Can I also make a Homer Simpson?

I kept getting a whole lot of dudes who look like Jane’s dad off Breaking Bad.

In some of these images, the yellow of the cartoon is coming through. If it weren’t for that, I’d wonder if Midjourney were taking any notice of the Homer cartoon at all.

For some of those I used the prompt ‘smiling’. Do you think I could get him to smile? No. Midjourney is drawing from 3D models (which I asked for when I included ‘unreal engine’ in the prompt). If you’ve spent any time at all on those 3D engines, you’ll know there are two main types of people: Young attractive and sexually appealing women, and cranky men. Maybe if I’d used the term schlubby I’d have got a better result. But I used this one:

/imagine [ENTER] realistic middle-aged man, homer simpson, detailed, comb over hair, large eyes, five o'clock shadow, portrait, unreal engine

I had trouble getting to Homer. Gave me a bouncer/boxer dad. I had to put ears and a white shirt on him myself. But at least he’s smiling! He’ll do until the real Homer shows up.

When using image prompts, people on the Discord server advise to resize prompt images to a 1:1 square. So if you submitted a 16:9 landscape, it’ll get squashed. (I predict image prompting will get better soon.) For now, try to use squares as prompts, or something close to a square?

For a comprehensive run-down see this Google Doc: Understanding Midjourney Prompt Combinations and Weights. (Contains pictures. Experiments are done on an AI generated teapot.)

That document is really super detailed and if you need to generate an art directed image, learning image weights will be of benefit.

If you’re looking for 20th century illustration (in particular) click on the Art category of this blog. I have collections such as:

You’ll find things like that on Pinterest as well, but I also collect images according to composition precisely for the purpose of AI art generation:

I also have some less common illustration collections:

For interesting palettes of specific colour:

Background: ‘muppet lying down, wearing white wedding dress, transparent veil above head, holding bouquet of white flowers, realistic, cinematic, lars von trier, melancholia –ar 2:3‘. Muppet heads were done separately: ‘woman muppet portrait, cinematic, lars von trier, melancholia –ar 2:3‘ for Ophelia/Justine and ‘the muppets, lars von trier‘ for the other muppet heads.

ADDING WEIGHTS TO WORDS

Adding weights to words with ::n can significantly affect the result. Same with –iw n for images. A value of 0.5 will result in taking some small elements, a few shapes and colors, into the resulting image. But the words will take precedence. A value of 10 is almost like telling Midjourney to give you a new version of the image prompt while disregarding any words.

credit to Ivan Mickovski on Facebook (see more of his tips on Reddit)

WORKING WITH IMAGE LINKS

If you’re using both an image and a text prompt in Midjourney, double up. i.e. Do what you’re not supposed to do when creating picture books: In the text, describe the thing in the image. This seems to work better to get the result you want.

For best results, make the image the same dimensions as your Midjourney image will be. If you’re keeping it square, create a prompt image 512×512. (We are told values above 512 are unstable and may cause errors.)

As mentioned above, a good place to upload images is: im.ge

WRITE PROMPTS LIKE YOU’RE TAGGING AN IMAGE FOR ACCESSIBILITY

  • a woman sitting at a table inside a 1970s kitchen
  • an antelope runs across a plain

Prompt engineering tip. Instead of: —Imagine the output you want —Specify its attributes to the AI Try: —Imagine that you already live in a timeline where the output you want exists —If you were using/quoting it in a blog post, what caption or context might you write for it?

@davidad

There are resources on the web about how to alt tag for accessibility. Here’s a good one. Advice applies equally to Midjourney.

Interestingly, the crossovers between prompt engineering and alt tagging for accessibility go further than you might expect. For example, telling someone who has been blind from birth that a man is wearing a red hat isn’t going to be useful to them. What is red? Likewise, Midjourney doesn’t find ‘red hat’ useful either (at the moment). Midjourney knows the colour red exists, and that it is a colour, but clearly doesn’t know what you mean by ‘red hat’.

https://twitter.com/davidad/status/1551148727464771584

THE ‘MIDJOURNEY’ LOOK: REDS AND TEALS

Midjourney has a look. Unless you add prompt images, artist names, and further specifications, you end up with something like below. Reds and blues with yellow tints proliferate. (The algorithm is probably drawing heavily from primary colours.) There’s a dreamlike, fantasy quality to it. Everyone will soon recognise this as Midjourney’s default style. (Until it evolves, that is.)

Someone on the forum has worked out why these colours proliferate. Basically, you get reds and blues (or oranges and teals) when Midjourney doesn’t recognise your style, or if you haven’t set one:

How to check first if Midjourney even understands your sourcing reference:

  1. You want to say in the style of Ren & Stimpy (for example) but you don’t know if Midjourney will understand that.
  2. You think about something that appears commonly in that style. For example, something that appears often in Ren & Stimpy is a cartoon chihuahua (that’s Ren himself).
  3. Do this simple test: /imagine a cartoon chihuahua in the style of Ren & Stimpy
  4. If output looks like it’s adopted the style you named, you’re golden. 👎 If it appears generic with lots of orange and teal colors, you’re looking at Midjourney ‘defaults’ which is an error message meaning NOT FOUND.

See this red and blue palette in action below. I tried messing around using lines from poet Emily Dickinson.

/imagine [PRESS ENTER HERE] that it will never come again is what makes life sweet

Naturally, Midjourney is perfect for making those inspiration images you see all over social media.

/imagine [PRESS ENTER HERE] Unable are the loved to die for love is immortality, 8k (The 8k at the end tells Midjourney to take inspiration from high quality images people use as desktop images and screensavers.)

USE THE LANGUAGE OF PHOTOGRAPHY

  • a close up of
  • extreme closeup
  • macro shot
  • a Canon EF 50mm photograph portrait
  • timelapse
  • sharp focus
  • focal lengths — @davidad tells us: “You can prompt for specific f-stops and focal lengths. Here I generated an orchid at f/1.8, then used inpainting over the out-of-focus parts to get roughly the same shot at f/5.4, f/9, and f/14.”
https://twitter.com/davidad/status/1551298669168807937
He also gives a demo of varying the focal length while staying at f/14.

Good luck direct Midjourney when you want a specific angle, but you can give it a whirl!

ASPECT RATIOS

By default, Midjourney gives you square artworks. But you can change this really easily.

--ar width:height

Be mindful about your aspect ratio. The algorithm will use the canvas and fill it, so if you want, say, a tall skinny house, create a tall skinny canvas. (This Facebook post shows examples of that using generated images of Notre Dame.)

Portraits on images with ratios with room to fit more than one face sometimes develop extra faces or facial features. (You may have noticed this phenomenon if you’ve used the app “This Face Does Not Exist”.)

If you’re hoping to print artworks out and get it framed:

COMMON STORE FRAMING SIZES (in inches):: =
--ar 4:5 (8×10 & 16×20)
--ar 1:3 (11.75×36) … it’s really 1:3.06 = 47:144
--ar 11:4 (16.5×6)
--ar 11:14 (11:14 & 22×28)
--ar 3:4 (12×16)
--ar 13:19 (13×19)
--ar 7:9 (14×18)
--ar 3:8 (18×24)
--ar 5:6 (20×24)
--ar 2:3 (24×36 & 20×30)

WHAT DOES 4K AND 8K MEAN?

You’ll see other people adding 4k and 8k to their prompts. What does it do?

This is from the world of TV, cinema and monitors, and refers to resolution. What does it mean for Midjourney, though? People tend to use these resolutions as tags on desktop wallpaper sites, so using this prompt in Midjourney will encourage the algorithm to draw on high quality images people tend to use as desktop wallpapers.

TYPES OF LIGHTING

For the best results, be sure to include information on how you want light and shadow to work.

Experiment with:

  • dim light
  • soft light
  • harsh light
  • warm lighting
  • cool lighting
  • cinematic lighting
  • dim lighting
  • candlelit
  • volumetric light (“God Rays” or Crepuscular Rays, beams of light)
  • dramatic neon lighting
  • strong shadows
  • hard shadows
  • bioluminiscent (like those jellyfish that glow)
  • noir describes a type of lighting as much as anything (chiaroscuro — strong lights and darks). I’m getting nice results with ‘realistic noir’.
/imagine [ENTER] babysitter urban legend, realistic, noir –ar 3:2 Unretouched. Notice how the babysitter has a babyish face. It did that several times over, not knowing what I meant by ‘baby’ sitter.
TYPES OF LENS
  • Telephoto of a (A telephoto lens is a long-focus lens which brings distant subjects closer. When talking to AI, you’re asking for an image that is ‘zoomed in’, say a lion but without a mountain vista behind it.)
  • distant view of
DEPTH OF FIELD

Depth of field is the distance between the closest and farthest objects in a photo that appears acceptably sharp.

  • shallow depth of field (A small circle of focus. The foreground object might be in focus but everything in the background blurry. Popular for portraits and macrophotography. The subject stands out from the background.)
  • deep depth of field (Everything in the image is the same amount of focus. Sharp from front to back. Popular in landscape, street and architectural photography, where artists want to show every detail from the scene.)
  • macro photo of
  • aerial view of a
/imagine [ENTER] a blurring of boundary between self and space
INSIDE OR OUTSIDE?
  • interior of
  • in a room
  • exterior of

USE THE LANGUAGE OF TRADITIONAL ART

  • still life
  • portrait
  • landscape
  • cityscape
  • seascape
MOVEMENTS

You can find art movements on Wikipedia or head over to NightCafe Studio for a full and expansive list. (NightCafe Studio is another AI generator. It gives you a whole heap of options for prompts when you hit ‘Create’.)

  • realistic art deco
  • art nouveau

Others have compiled super comprehensive lists. Try Wikipedia categories. E.g. Art Movements.

MIX AND MATCH MOVEMENTS WITH STYLES
  • Surreal Fantasy
  • anime style
  • abstract painting of
  • painterly
  • highly detailed
  • in the style of a WW2 recruitment poster
  • medieval
  • flat design
  • Japanese painting
  • pinup girl
  • Victorian painting
  • inspired by cartographic maps
  • sketch of
  • photograph of
  • steampunk
  • Renaissance

/imagine [PRESS ENTER HERE] cats in disguise, silk screen printing

Images below have been lightly retouched and cropped. I actually wanted dark trees framing the foreground, like, really close, World of Goo style, but I liked what it gave me anyway:

/imagine [PRESS ENTER HERE] landscape with dark trees in foreground, eric carle, lotte reiniger

Note that the modern era of Internet has given us a whole lot of styles which don’t appear in official lists of ‘art styles’. (See the link up top for the Aesthetics Wiki.) Aside from that, we’ve got:

  • selfie (often gives you the arms)
  • knolling (That top-down view people use when making YouTube videos of cooking demos. Sometimes knolling will give you items inside a cardboard box. From unboxing videos, I guess?)

A number of people are wanting their art to look like their favourite games and movies:

  • fortnite
  • ready player one
  • red dead redemption

/imagine [PRESS ENTER HERE] school yard in the style of red dead redemption --ar 3:2

I hit V2 (to make four new variations riffing on the second image in the grid.)
I upscaled number 3. As you can see, AI still doesn’t do all that well with straight lights and grids of buildings, but Midjourney is the best I’ve seen yet.
FAMOUS ARTISTS
  • in the style of [ARTIST NAME] works well
  • drawn by moebius
  • painted by Goya during his black period
  • by gustav klimt fine art painterly
  • lovecraft (actually a writer, but one who continues to inspire many artists)
  • bauhaus style painting (a German art school operational from 1919 to 1933 — this prompt works really well)
  • in the style of studio ghibli (an animation studio rather than a single artist — hugely influential)
  • in the style of steven universe
/imagine [PRESS ENTER HERE] suburban street, summer, in the style of steven universe (The linework is janky. AI is still not very good with the clean lines you see in Atomic Style of Illustration — think Tintin)

I felt like making some food out of a Studio Ghibli film. Like faces, food can be difficult. Artificial intelligence doesn’t need to eat, and has a history of making food look disgusting.

(Try this Random Food Generator for prompts.)

This caramelized broccoli stew is fairly tough with a sugary texture. It has subtle hints of red potato with marjoram and has anise. It smells fragrant with an overwhelming amount of cress. It is sweet and delicate.

Random Food Generator

Anyway here’s what Midjourney did with my request for eggs on toast:

/imagine [PRESS ENTER HERE] eggs on toast, studio ghibli style

I asked for more variations on number one. (What are those blue dots? The things Chihiro eats in Spirited Away, which make her stay in the world?) Here’s what it gave me next.

I hit Upscale on number four. What do you think of the result? I don’t think it’s quite Studio Ghibli level delicious but Midjourney did a better job of food than I expected!

Norman Rockwell was a hugely influential American painter. Somehow the illustration below looks like it was done by Rockwell, even though it completely messed up the faces. (The food looks… disgusting this time. But maybe that’s what Midjourney knows aliens eat.)

/imagine [PRESS ENTER HERE] alien family eating dinner, norman rockwell style

Then I got obsessed with aliens.

/imagine [ENTER] aliens at the cinema

What the hell are they watching? Us?
/imagine [ENTER] alien in the style of charles harper (This 20th century artist is good if you want symmetry)

I made a bunch of these and made myself a bunch of licences.

WELL-KNOWN INTERNET FORUMS WHERE YOU FIND A LOT OF ART
  • artstation (a massive art-sharing site)
  • trending on artstation (It’s not entirely clear whether ‘trending on’ does anything, but many people are using it.)
  • digital art is a good catch-all term

USE THE LANGUAGE OF DIGITAL ART/FILM-MAKING

POPULAR adobe PLUG-INS

To learn what these do, best to look for examples on the Internet because they’re impossible to explain in words!)

RENDER
  • octane render (This means you want the AI to use art in which base images used the famous Octane graphics card to render photo-realistic images super fast. It basically means ‘photo-realistic’.)
  • redshift render (Redshift: “The world’s first fully GPU-accelerated, biased renderer”. Click through to the website to see the sort of images the algorithm will be using. Basically, photorealistic 3D.)
  • unreal engine (a 3D creation tool which will conjure art made with it. This tends to be photorealistic, fantasy, with lots of mood lighting, so you’re really asking the AI to convey a certain mood.)
  • toon boom render
  • physics based rendering (a computer graphics approach that seeks to render images in a way that models the flow of light in the real world)
I wanted a brain and eyes in a jar to illustrate a Roald Dahl story (“William and Mary”). Using the language of rendering gave me something creepily realistic: /imagine [ENTER] brain and eyes inside mason jar on table, detail, unreal engine

The reason I had to specify mason jar is because it kept giving me things like this:

I made the mistake of asking for brain and eyes KEPT ALIVE in a jar and the algorithm thought I wanted a face (as well as the brain and eyes):

I also wanted an illustration for Roald Dahl’s “The Ratcatcher”. The main character is part man part rat. Can Midjourney do that?

The first dude looks the spitting image of Grant over the back fence after he’s been working in the shed, and realised he can’t fix his own car after all. Awkward. The third one looks knitted. That second one, though! That’s exactly what I want. Stuff of damn nightmares.

/imagine [ENTER] half man half rat, portrait, detailed, unreal engine, realistic (With very minor retouching)

Can you turn anyone into part man, part animal? YES.

/imagine [ENTER] half man half owl, portrait, detailed, unreal engine, realistic

I had a bit more trouble turning a man into a flower. I kept getting flowers with men, which is lovely.

/imagine [ENTER] part man part flower, unreal engine, portrait, detailed, realistic

The fourth dude doesn’t like flowers and isn’t having any of this.
Four new thumbs based on image one.
He’s welling up! I know! It’s so beautiful.

But I was after something way more creepy than that. So I looked up ‘parts of a flower’ and incorporated the technical terms. (Note: Although I used ‘portrait’ as prompt, I messed up the aspect ratio, got it back to front. Should’ve been 2:3 for portrait, not 3:2):

/imagine [ENTER] cryptobotany man shaped like a flower peduncle, sepal, petal, stamen, pistil, anther unreal engine, detailed, realistic, portrait

This gave me some alien landscape botanical specimens, but NOT PART MAN PART FLOWER. Which is what I’ve decided I need.

Next, I decided to use the magic word. CRYPTOBOTANY is the magic word, folks.

/imagine [ENTER] cryptobotany man shaped like a flower peduncle, sepal, petal, stamen, pistil, anther unreal engine, detailed, realistic, portrait

Now we’re cooking with gas. Number four is getting closer.

Look at that beautiful botanical magazine texture in the upscale. A bit of light retouching and this would be perfect. (Perfect meaning, it’s what I had in mind. I managed to art direct this one.)

Soo… what happens if I use the cryptozoology when I’m making my part-man, part-animal portraits? Will that do anything? I’m very happy with the half-man half rat I already have, but what happens if I shove the word cryptozoology at the start of the same prompt?

/imagine [ENTER] cryptozoology, half man half rat, portrait, detailed, unreal engine, realistic

I kept re-rolling. Not obvious from seeing these final thumbs, but very obvious watching the progress images: The word ‘cryptozoology’ returns images of bigfoot. I deduce that’s because the word ‘cryptozoology’ and ‘bigfoot’ go together so frequently. (Less clear: Why did I get Grant from over the back fence, again? Why do the same people keep cropping up in certain prompt combos?)

Try also: creature anatomy.

/imagine [ENTER] creature anatomy, pebbles, photorealistic --ar 3:2

/imagine [ENTER] creature anatomy, perfume bottles, photorealistic

/imagine [ENTER] tall humanoid made of the universe --ar 2:4

/imagine [ENTER] tall humanoid made of power pylon --ar 2:4

USE THE LANGUAGE OF COMPUTING
  • bitmap
  • low frequency
  • glitch art

/imagine [ENTER] bitmap, suburban street at night --ar 3:2

This is not what I was expecting. But I like it.

/imagine [ENTER] low frequency, suburban street at night --ar 3:2

What does ‘low frequency’ mean, exactly? I don’t know. But this is what I got.

/imagine [ENTER] glitch art, suburban street at night --ar 3:2

glitch art, suburban street at night upscale
USE THE LANGUAGE OF BUILDING AND ARCHITECTURE

Once again, don’t forget Wikipedia has categories. Here is Wikipedia’s list of architecture categories, with way more architecture words than a person could ever use. It gives you lists of woods, roof shapes, building materials, architecture in Europe, you name it.

  • eco brutalist apartment
  • fantasy castle
  • neoclassical architecture
  • dimension
  • symmetry
  • medieval
  • old ruins
  • dark tunnel
  • flat roof
  • made of tungsten
  • carved from sapphire stone
  • curvilinear
  • wooden house
  • --no triangular roof
  • frank lloyd wright style
  • architectural section (interesting when applied to things other than buildings — this is a type of diagram where someone has sliced into the thing to reveal a cross-section)

/imagine [ENTER] the white house in the style of Hayao Miyazaki bath house from spirited away

/imagine [ENTER] rococco treehouse –ar 3:2

CREATING ORNAMENTS

The word baroque returns good results.

  • baroque Italian pastel marble statue of
  • baroque [X] made of ornate gold and pearl
  • a [X] statue, baroque, ornate, highly detailed, gilded, soft lighting, unity engine, octane render
  • A crystal with a [X] inside of it, render, realistic, cgi, cinematic
  • opalescent [X] in the shape of a [Y]
  • baroque bejeweled [X], ornate, intricate, detailed, with gold filigree
  • crystal form

But I’m finding a lot of re-rolls are necessary to get something good. More typically you get something that isn’t quite there:

/imagine [ENTER] ornate baroque Italian marble statue of pineapple

TELL MIDJOURNEY WHAT YOU WANT IN THE BACKGROUND

Midjourney doesn’t take art direction when it comes to perspective. Unlike a human artist, you can’t reliably tell it if you want a side-on view, a low-angle, oblique view and so on. But you can get some of the way there by telling it what you want in the background. (Also, sometimes it does do what you want when you prompt with something like ‘low angle’, ‘a wide view from behind’ etc.)

  • sitting on grassland
  • in flower field
  • in vast field of poppies
  • in empty room
  • in the middle of a town square
  • with forest in background
  • in wilderness
  • roaming the barren plains
  • on a wooden table
  • bokeh background
  • on a small and quiet lake in a dense pine forest
  • on the edge of a frozen canyon surrounded by alpine forest
  • metaverse (describes a virtual-reality space in which users can interact with a computer-generated environment and other users)

TELL MIDJOURNEY WHAT TIME OF DAY YOU WANT IT

Meaning where you want the sun (or moon).

  • golden hour
  • sunrise
  • in the middle of the night
  • shimmering colourful night sky
  • underneath a moonlit starry dark night sky
  • sunlight filtering through the trees
  • rays of light
  • beautiful raking sunlight

ORDERING THE PROMPT

A command has two basic parts to it:

  1. PROMPT (the keywords and description)
  2. SETTINGS (settings to last)
  1. Name the Item (what you’d find if you looked it up in a dictionary)
  2. Describe the item (what it’s made of, its texture, decorative detail)
  3. Style of the art (genre of art etc.)
  4. Describe the art (This is where your aesthetics terminology comes in handy. Move away from what you’d find in an art book or on Wikipedia and describe it like you’d describe it to friends, or how people describe things on ArtStation e.g. ‘crazy detail’)
  5. Photography, mood and composition terms.

So, what it looks like right now: If you load too much info into 1. and 2 (the prompt) you’ll only confuse the algorithm. But you can really go to town on 4. and 5 (the settings).

CONVEY THE MOOD

  • a dark morbid (remember, no comma necessary when listing adjectives)
  • cute
  • kawaii
  • chibi (means short/small in Japanese, similar to kawaii style)
  • moody
  • ethereal
  • glowing
  • futuristic
  • isometric (Also called isometric projection, method of graphic representation of three-dimensional objects, used by engineers, technical illustrators, and, occasionally, architects.)

Using the words sticker and sticker sheet and die cut sticker for character design inspo. None of these monster bananas below are quite right, yet, but I can see where I’d take them.

/imagine [ENTER] cute banana monster sticker sheet

EXPLAIN THE COMPOSITION

If you have a composition in your head and want the AI to paint that, you’ll have to try and explain that to a computer.

  • romantic comedy (gives you something based on a poster)
  • LP album
  • hip hop album cover
  • artstation album art
  • box office (movie poster)
  • renaissance painting of
  • horror movie poster
  • half blueprint
  • 1970s postcard
  • japanese poster, print
  • portrait
  • waist up photograph of
  • lively movement
  • headshot (a close up of the face)
  • ultrawide shot
  • an expansive view of
  • full color encyclopedia illustration of
  • full page scan hi res techno
  • detailed concept art lithograph
  • antique lithograph
  • leonardo da vinci drawing of 
  • a map of
  • infographics
  • graph data knowledge map made by neo4j
  • tarot card (this is a popular one — gives you symmetry and intricacy)
ovary pelvis chicken, anatomical drawing, encyclopedia –ar 2:3 (With a nod to an article at McSweeney’s Internet Tendency for the term ‘pelvis chicken‘.)

/imagine [PRESS ENTER HERE] canberra city in the style of Alphonse Mucha, tarot card --ar 2:3

For the purposes of clarity, I have upscaled the thumbnails in Gigapixel so you can see them better. In fact, it’s difficult to see the detail of thumbnails within Discord. (I have to lean in close to the screen.)
I’ve blown thumbs on this post up to double the size. Obviously, the quality isn’t great. You can’t blow these up using other software. Upscale from thumbs within Midjourney. Then upscale further (if you need to) with Gigapixel or similar.
I want to remove the signing at the bottom left hand corner. But I also want to show you an untouched version. (Try adding ‘no watermarks‘ to your prompt)

THE PROBLEM WITH DESCRIBING COLOURS

MAPPING COLORS ONTO SPECIFIC OBJECTS

/imagine [ENTER] a black dog on the left :: a white dog on the right

I re-rolled. I got:

  1. One super long dog in a horse-ghost suit
  2. Two block dogs(ish)
  3. Getting closer. Except the black dog has dressed in a ghost-sheet for Halloween.
  4. Getting closer, except the black dog is actually a white dog wearing a black polo.

Telling the AI which colour to make different objects doesn’t always work but you can try. I asked for a red swimming pool with a white house behind and it gave me a regular blue pool with a red house behind.

I wanted an illustration to match this paragraph from “Afternoon in Linen” by Shirley Jackson:

IT WAS a long, cool room, comfortably furnished and happily placed, with hydrangea bushes outside the large windows and their pleasant shadows on the floor. Everyone in it was wearing linen—the little girl in the pink linen dress with a wide blue belt, Mrs. Kator in a brown linen suit and a big, yellow linen hat, Mrs. Lennon, who was the little girl’s grandmother, in a white linen dress, and Mrs. Kator’s little boy, Howard, in a blue linen shirt and shorts.

“Afternoon In Linen”

Jackson has been specific about who is wearing which colour. I tried a number of prompts.

First I took Shirley Jackson’s exact words, tidied them up a bit and got something totally useless. The hydrangeas are inside, not outside. Midjourney decided not to give me any people at all! (I didn’t specify the art style and it gave me photorealistic.) The manual says to be as descriptive as possible, but actually? If you tell it too much, it seems to forget the first bit.

/imagine [PRESS ENTER HERE] a long, cool room, comfortably furnished, hydrangea bushes outside large windows and their pleasant shadows on the floor, a little girl in a pink linen dress with a wide blue belt, an old woman in a brown linen suit and a big yellow linen hat, another old woman in a white linen dress, a little boy in a blue linen shirt and shorts

Next I tried this. Midjourney doesn’t seem to understand ‘low chroma’. (It does seem to understand ‘psychedelic’. Try low contrast and washed out instead.) It gave me two characters, but I need four. Two grannies and two kids. Midjourney killed the damn kids!

two grandmothers, grand-daughter, grandson, tea-party, still life oil painting, low chroma

I still haven’t worked out when to use the colons rather than the commas, so next I tried this. Midjourney took my instructions for long and ran with it. Not the vibe the story calls for at all.

tea party :: two old women :: charlie dye style :: linen :: long purple shadows :: tall windows

So I pared it right down. I gave up on the colours of the clothing, realising I’d have to do that myself in Affinity Photo:

prompt: two elderly women drinking tea, purple shadows

I upscaled this one:

two elderly women drinking tea, purple shadows (upscaled). These poor, poor old ladies.

In Affinity Photo I shifted the right woman’s face up. I added a color layer, set it to color blend mode and made the left granny’s hat yellow, her kimono brownish.

You’d use the same technique if you want a character with, say, blue skin.

But one of my old ladies needs to be dressed in white. Shirley Jackson said so.

We must use a different blend mode to achieve white, because in ‘color’ blend mode, a white paint brush turns lower layers gray. Paint white onto a ‘soft light’ blend mode, merge down, then do it again if the object is still not white enough.

Good enough for my purposes.

Although AI art generators cannot yet take instructions to make certain objects a particular colour, describing colour in more general terms (e.g. muted color) often works well.

DESCRIBING THE PALETTE

Other things to try:

  • red sky
  • made with small to large irregular dashes of muted color
  • warm
  • black-and-white
  • monochrome
  • at golden hour
  • water elemental
  • pastel color style
  • with gold filigree
  • full of golden layers
  • bronze-scaled
  • retro palette
  • various gradient colors (try ombre also)

I tried using colorful in the prompt and got this (a palette which I always find truly extra). psychedelic gets you there, too.

/imagine [PRESS ENTER HERE] canberra cityscape in summer, colorful, Wuhtercuhler –ar 3:2

MAKE USE OF STRONG EMOTIONS

The manual tells us to use the language of strong emotion. This has the effect of making the prompt vague.

There’s also a cognitive bias at work here: When we input something vague, we tend to like whatever we get because we didn’t start with a specific image in mind. The AI can only surprise us in a good way.

So, if you want an illustration to accompany a story, pick the story’s theme, turn it into a partial sentence and go with that. Or try a line from poetry, your favourite song lyric.

MAKE USE OF ADJECTIVES

  • a hyperreal portrait of [person] looking regal and confident
  • girl with a cheeky smile
  • angry and cocky facial expression

Midjourney seems to understand ‘cute’ but doesn’t know what ‘beautiful’ means. When creating portraits, use symmetry as a prompt. artbreeder also works.

One thing I’m glad of (for now): Midjourney bans use of this tool for making naked renders, but in future of course we’ll see dedicated AI porn generators. Not for a while though, because Midjourney does a horrible job of generating people. (Also bikes, horses and things like that.)

The Midjourney team have also banned a number of words associated with violence. For instance, you can’t use ‘blood’ in a prompt. (Say ‘red paint’ instead.) ‘Cutting’ is also banned, which is annoying because I wanted to make some paper cutting art.

Here’s how that experiment went.

TRYING TO MAKE SHADOWBOX 3D PAPERCUTTING ART

None of these gave me what I wanted, but some of the results are still pretty cool.

/imagine [PRESS ENTER HERE] autumn forest, 3d paper art
/imagine [PRESS ENTER HERE] autumn forest, illuminated shadow box
/imagine [PRESS ENTER HERE] autumn forest, papercutting
/imagine [PRESS ENTER HERE] autumn forest, layered paper art
/imagine [PRESS ENTER HERE] autumn forest, papercut lightbox

TELL MIDJOURNEY WHAT ART MEDIUM TO USE

And perhaps what paper/canvas to paint on:

  • expressive gouache illustration
  • paint (plain old paint works)
  • acrylic or acrylic on canvas
  • oil or oil on canvas
  • impasto and palette knife (work well if you’re after that oil-painting look with large strokes and lumps of paint)
  • watercolor
  • realistic, highly detailed watercolor
  • watercolor and ink
  • sumi-e
  • ukiyo-e
  • woodblock
  • lithograph
  • paper crumpled texture
  • origami
  • paper quilling
  • a paper diorama of X, construction paper texture
  • on cardboard canvas

ORDER OF WORDS: DOES IT MATTER?

Words closer to the beginning of the prompt seem to have a greater effect on the result, except if the description is too long, and then it seems to forget what came first (in my experience).

STOPPING YOUR IMAGES EARLY

-- uplight is a parameter, not a prompt, and does not refer to photographic ‘uplighting’. It means you want the upscale to add fewer details. (Must be short for light-handed upscale.)

Alternatively, try early stopping your imagines with --stop 90

Why would you want to do this? Oftentimes the generator keeps going just a little too long. The 90 refers to 90 per cent of what the generator would’ve done. You can of course knock it back further for a more simple design.

See the difference in the images below.

/imagine [PRESS ENTER HERE] a tiny castle floating in a bubble above a suburb, studio ghibli style

The thumbnails came back like this. Although it has decided to ignore my request for ‘above a suburb’, I like number four. So I hit upscale on that.

When it was 90% done I hit save. (If you do it like this without specifying parameters, which is like watching a pot boil, the image saves as an HTML, by the way.)

90% cooked
100% cooked. Now it needs retouching.

Studio Ghibli generations look better at 90% because they are more smooth. (Probably applies to other anime and manga styles, too.) Can you see now why you want photo editing software with retouch capabilities? My fave is the inpainting tool.

Try combining different but similar styles, e.g. your favourite animation studio with your favourite game.

Try using a prompt such as lofi, digital art. That might give you what you want, regarding the lower level of detail.

(Add guitars to the list of items AI cannot yet draw.)

/imagine [PRESS ENTER HERE] girl playing guitar in bedroom, lofi, digital art
I upscaled number 2

DEALING WITH SYNONYMS AND ASSOCIATIONS

Generally, avoid telling the AI what you don’t want.

Avoid: ‘a horse but without a head’.

However, there is away to tell it what you don’t want. You have to put those two dashes before it, treat it like a parameter.

--no

Sometimes you do need to tell it what you don’t want. Say you want an image of an underground railway (a subway), but the generator keeps giving you images of Subway restaurants. Try using the --no parameter. e.g. --no restaurant

CREATING MINIMALIST ARTWORK IN MIDJOURNEY

Maximalist works proliferate among enthusiasts of AI. Fewer people are generating minimalist art on purpose. So why not buck the trend?

Try adding minimalist cinematography to your prompt.

/imagine [ENTER] suburban street at night, minimalist cinematography --ar 3:2

I upscaled number one. Definitely got a bit of texture in there I wouldn’t want in a ‘minimalist’ piece.

Let’s lowering the stylize setting to 625 or 1250. Wait, what’s ‘stylize’ mean, you ask? I haven’t mentioned that one yet. Here’s the screen cap from the Midjourney User Manual:

The default setting is 2500. That’s enough styling for Midjourney to think it’s got carte blanche to make everything extra. Let’s knock it back using the --s parameter.

/imagine [ENTER] suburban street at night, minimalist cinematography --s 625 --ar 3:2

I definitely see the difference in the thumbs.

I upscaled number four. Compare with the upscale above and there’s definitely a bit less texture. Mind you, it looks like I’ve run it through Topaz Gigapixel on ‘Very Compressed’ setting. So rather than waste your Midjourney time trying to get something looking a tiny bit more minimalist, you could try that instead.

While we’re talking about stylize, let’s investigate what those higher stylize settings do with the same text prompt. I’ll knock it right up to 20k. Type it like this:

/imagine [HIT ENTER] suburban street at night, minimalist cinematography --s 20000 --ar 3:2

Now let’s knock it all the way up to 60000.

/imagine [HIT ENTER] suburban street at night, minimalist cinematography --s 60000 --ar 3:2

Not a suburban feature in sight. It’s not even dark. However, it is dawn, or dusk. Something crepuscular, anyway. (Is that Midjourney’s small concession to my wishes?)

Well, the documentation wasn’t wrong. ‘S’ may stand for ‘STYLIZE’ but it may as well stand for ‘SURPRISE ME’.

I like to think of Midjourney as a designer’s assistant, on drugs. Using “--stylize” lets me modulate the drugs that my assistant is on!

@JohnnyMotion

CREATING FLAT DESIGN

You know that flat design which is so popular? The prompt you want is ‘vector art’ or ‘smooth vector’ if you want non-delineated gradients.

Try also: 2D matte illustration

/imagine [ENTER] suburban street at night, 2D matte illustration

As I was watching this upscale cook, I realised I’d have got a flatter design by stopping at about 60% (--stop 60).

Number four, upscaled

/imagine [ENTER] suburban street at night, smooth vector

Ditto for this one. It’s given me an old-timey border, for some reason. Makes it look like a retro newspaper photograph.

What if I use the word ‘grandient’? Make any difference?

/imagine suburban street at night, gradient vector

Yes, I can definitely see a difference.

This might be handy for inspiration when creating a vector illustration from scratch, but the shapes aren’t clean enough. I’ll continue to use the Image Vectorizer app (Microsoft Store) if I want flat images. With a vectorizer app (who knows if this is the best one), you choose how many colours you want. Using a photo of a Japanese street from Unsplash, the free stock photo site:

Screep cap of Image Vectorizer. The photo is by Denys Nevozhai.

Obviously these are really rough. I only want to demonstrate how to get genuine flat design. Images export as an .svg file, which opens for edit in Adobe Illustrator or Affinity Design. It also opens in Affinity Photo where I convert it into whatever format I want.

Each one of these has an extra colour in it.

Image Vectorizer
Image Vectorizer
Image Vectorizer

PICTURE BOOK STYLE

If you want children’s style art, you’ll have to specify what that is, or provide the name of an established children’s illustrator (preferably a few piled up, to avoid stealing someone’s style — not that “style” can be copyrighted… in current law).

/imagine [PRESS ENTER HERE] aerial view of a town, mountains and sea in distance, in the style of eric carle, highly detailed

I really liked those thumbs, so I enlarged some of them.

But here’s what happened when I typed ‘picture book’ as one of the paramaters and learned this does not return children’s picture book styles of artwork. It returns art which looks like it’s done in a scrapbook, complete with bad photography and a shadow over the page.

I kept playing with Eric Carle some more.

/imagine [PRESS ENTER HERE] autumn landscape made of ripped shapes of rice paper, eric carle style
Number 2 looks nice upscaled.
/imagine [PRESS ENTER HERE] suburban street with children playing in the style of eric carle, highly detailed, paper crumpled texture

I hit the V2 button for another four variations on the second thumb:

Then I upscaled the second one. You get a naive, childlike poster which looks like it’s been hanging on the classroom wall in the sun all year, then came home in a backpack on the last day of the year.

Next I wondered if I could change Eric Carle’s distinctive palette up a little and get something in ochres and yellows. But I messed it up. Looks like as soon as you use the prompt golden hour, the algorithm makes use of landscape photography.

/imagine [PRESS ENTER HERE] a selection of adorable early Carboniferous invertebrates, in the style of eric carle, golden hour (I didn’t bother upscaling these but I did get a nice 2.5 dimensional look.)
Prompt: domestic abuse, picture book, silkscreen print, offset style (The design itself is very cool though, right? Could use it as the basis for a logo.)

Sketchbook, notebook, comic book page prompts like that will give you a photo of an actual book with a spine down the middle, the edges of a page, possibly poor lighting as one of the images in the first grid. Likewise, if you use painting as a prompt, or print, you’re more likely to get a framed piece.

Below, I was going for a metaphor in a Katherine Mansfield story. Perhaps it was the circa 1920 which gave me the frame as much as painting:

/imagine [PRESS ENTER HERE] white terrified blancmange. still life, oil painting circa 1920

The image below looks like a print on creased linen. I actually just typed in a description of the printing process, and have no idea why it gave me this abstract design.

/imagine [PRESS ENTER HERE] a print made using a stencil process, a design is superimposed on a very fine mesh screen, printing ink is squeegeed onto the printing surface offset art style

TYPOGRAPHIC IMAGES

In my experience so far, attempts to get a pretty poster with typographical elements is hit and miss. I’ve seen some excellent results but you do get a lot of rubbish as well with this.

Try something like:

  • a monochromatic infographic poster
  • Text Desired : type : font : wordmark --aspect 16:9

(This person is using Wordmark as a prompt. Wordmark is a website that helps you choose fonts by quickly displaying your text with your fonts.)

Add prompts such as black line or single color to get a flatter result.

PATTERNS

  • pattern
  • fractal(s)
  • generative art
  • geometric
  • recursive
  • p5js (a JavaScript library for creative coding — do an image search you’ll see what this is)
  • fabric wallpaper by William Morris

PROVIDE A TIME PERIOD

‘Retro’ works well. Providing decades also works well e.g. 1920s, 1970s.

  • in the style of a Baen science fiction paperback cover
  • polaroid of
  • 1940 vintage
  • photograph, colorize
  • autochrome (a 1930s, 1940s look)
  • tintype (an old style of photo which looks black and sepia with a black, rough border and/or heavy vignetting)
  • wet plate (another vintage style with scratchy textures)
/imagine [PRESS ENTER HERE] teddy bears ranging in size, picnic in forest at night, tintype
I upscaled number 2. These teddies have seen better days. I’m glad Midjourney doesn’t allow specific prompts to elicit gore because the algorithm tends towards darkness even without encouragement. (Likewise, the better AI gets, the more it becomes like us: racist, sexist, ableist; obsessed with violence and sex.)

ARTISTS AND MIDJOURNEY: COMPOSITION INSPIRATION

I typed the following into Midjourney:

/imagine [PRESS ENTER HERE] steampunk cafe, clean linework, magical atmosphere, honeycore

/imagine [PRESS ENTER HERE] steampunk cafe, clean linework, magical atmosphere, honeycore

Not bad, but unusably messy. I love how the lights look like giant droplets of honey. Would I have thought of that? Maybe not! If I spent a week on this I could make something really cool, not exactly from scratch, but tidying it up.

WHAT ABOUT COPYRIGHT?

Artists, especially digital artists, have long had trouble being taken seriously. Midjourney and other DALL-E 2 generators compound the problem: Even if artists spend many hours re-compositing, tidying things up, adding hand-drawn elements — even if we eschew AI inspiration for philosophical reasons, end users will soon assume every cool thing they see is made with AI now.

And I can imagine how client conversations will go down for digital artists from now on:

“You charge how much? I can get something that gives me, like, 90 per cent of these specs just by running keywords through an AI generator!”

Disgruntled ‘client’ goes away, but they still like artist’s style, right? So they run this favourite artist’s Behance folio through the Midjourney generator. Next, they pay someone a fiver to composite exactly what they need on a bootleg copy of Photoshop using uncopyrightable Midjourney generations which very much make use of the original artist’s work, which would be dodgy af, yes, even though an artist can’t copyright their style.

“The Devil Making Devilled Eggs”. I have composited this from three different Midjourney images and painted in the hand. The seed prompt was /imagine [ENTER] devil in a kitchen wearing an apron preparing devilled eggs –ar 3:2

Most of the composition comes from this. Generally with a Midjourney generation, it’s a matter of making it simpler and firming up the edges. Eyes always need to be fixed.

I also took bits from these two generations:

The law is not currently protecting original content creators. What I’ve heard as a non-legal person: The artist who composited the Midjourney generation for a fiver does have copyright of the image because of the modifications they made. The modifications may be very minor. On the other hand, fixing something up can take a long time and high level photo software skills. So we shouldn’t undervalue that form of work, either.

I mean, I love this tech. It’s here to stay. But I can also see its problems. Can we make a pact to use this thing as ethically as we can? If resources were spread more ethically, if universal basic income were a thing, artists wouldn’t be hung out to dry.

Jason Silva (storyteller/filmmaker) is all for it.

FURTHER LINKS

Welcome to the Dawn of AI Art by John Lepore, who is conflicted about what AI generated art means for us.

The Unknown Power of AI to Maximize Your Creativity With Gene Kogan, an episode of The Rhys Show podcast. They dive deep into different generative AI interfaces such as Dall-E, Midjourney and Abraham the one he is currently working on, other generative models, how to create open-source systems and how they connect to collective intelligence and the environmental niches that AI is going to evolve into.

Here is the same show but on YouTube.

Here’s a Getting Started with Midjourney YouTube video with professional photographer Gailya Sanders. They start out explaining how Discord works. Even if you’ve used other chat apps, Discord can look a little complicated when getting started.