Stable Diffusion For Beginners

Update: There is now a one-click installer for Windows and Linux. No tech know-how required.

You may have seen a whole lot of AI generated art come through your feed lately. 2022 has been big for AI art. A whole lot has happened at once.

This post is for non-technical artists who are interested in recent talk about AI art generation. You’re perhaps a little baffled at the lingo. I am, too. But today I will explain what I’ve learned in the past month about how to make art with an AI generator. Last month I had a Midjourney subscription. Now I’m using Stable Diffusion for free.

Disclaimer: I’m still learning it myself. Plus, it’s changing more quickly than I can keep up. But I can direct you to an excellent feed in the latest in Stable Diffusion. This tech is changing by the hour.

Stable Diffusion Resource Goldmine (not useful to a beginner)

As one wag said on Reddit:

This is advancing hella fast. There is an army of AI scientists, software engineers, and tireless perverts all working non-stop and making contributions towards this.

I’m using the Automatic 1111 fork. If you are using or wanting to use Automatic1111’s Stable Diffusion, this Discord is dedicated to helping anyone and also provides an easy Installer.

Why is the official version such a paint to install? Because:

it was intended as a proof-of-concept demonstration, with the assumption that third parties would create better implementations. That’s pretty common within the open source community.

red286

WHAT THE FORK IS A FORK.

Yes. I had that exact question. Stable Diffusion is the code base. Stable Diffusion is open source, meaning other programmers get a hold of it free of charge. They can change it a bit and turn it into something different. Those new iterations are called forks. You might call them spawn of the Devil, depending on how you feel about AI generated art.

(My take: AI generated art ain’t going anywhere, especially now it’s free for anyone to use. Best learn what it can do!)

You also hear ‘repo’, short for repository (code repository). Technically, this word is more correct than ‘fork’ to describe the same thing.

A list of other Stable Diffusion forks.

WHAT IS STABLE DIFFUSION?

Midjourney is a private company which makes use of AI, including Stable Diffusion, actually. If you’ve been playing around with Midjourney’s --test parameter, you’ve already played around with Stable Diffusion.

After playing with SD on our home desktop and fiddling around with a few of the repos, we can confidently say that SD isn’t as good as Dall-E 2 when it comes to generating abstract concepts.

That doesn’t make it any less incredible.

Hackaday

Some generators are “text to image”, others are also “image to image”.

“Text to image” means you type in words and an image comes back.

Some generators area also “image to image”, meaning you can base a new work of art on an image you upload or point it to via url. But each engine does something a bit different with that image. Midjourney doesn’t make direct use of it.

“Diffusion model” is a generative tool which works with *hand waves” Gaussian noise (turns stuff into static and then back to something humans recognise as meaningful). Basically, diffusion models take an image from their massive database, destroy it, then build it back up.

This is as technical as I get. If you understand the thermodynamics of gas molecules you’ll find all of this very fascinating. As for me, I just want to use the darn thing to generate art.

STABLE DIFFUSION IS OPEN SOURCE AND FREE

At time of writing — September 2022 — setting up a Stable Diffusion fork (repo, whatever) on your home computer requires a bit of tech knowledge. However! If you don’t have skills to do this, you won’t be waiting long for a user friendly interface and one-click install. People are making plugins for Photoshop, Clip Studio, Krita… etc. Canva has integrated stable diffusion into its free and paid subscriptions. Soon you won’t even know if digital art has been created with the aid of artificial intelligence. Nay! You won’t know if it’s had a hand in hand-painted canvases, either. Canvas is expensive. Paint is expensive. I create digital versions of artworks before propping up a canvas and wetting a brush. (Isn’t that… sensible?) Why wouldn’t artists use AI as part of the brainstorming process?

If you have an NVidia graphics card with at least 6GB of RAM and want to know how to use Stable Diffusion for free, see my earlier post, which includes a link to instructions I found to install Automatic 1111 on your home computer and generate as many AI artworks as you want for free. (I’ll try and keep it updated for a little while, until it has clearly proliferated. I give it a few weeks.)

Update: It’s now possible to make AI art on your home PC with just 4GB of RAM on your video card. (Someone made modifications to Automatic 1111 to make it more accessible to more people.)

In case you missed it, I’ll show you what Stable Diffusion can do!

An oldie but a goodie. Spoof instructions on how to draw a horse. A good test, since AI is currently terrible at horses. (Excellent at landscapes.)

I cropped the meme until I had only the Step 4 horse. I want to use this horse as a prompt. It’s nice and clean. AI art generators aren’t good at counting legs, but this horse clearly has four separate, full-length legs. Lucky horsey!

Using the img2imag tab of Automatic1111, I generate a few images until I definitely get a horse with four complete and separate legs.

I got a few like this.

I tinkered with the sliders. Wasn’t even sure what I did at the time. Anyway, a minute later I had this. That meme is now a historical snapshot from a pre-AI era of art.

UPDATE 29 SEPT 2022:

Memory improvements have been made. This was to allow people with less powerful NVidia GPUs to use Stable Diffusion. You can now use it happily with 4GB of VRAM. If you have more than that, you can make larger images.

A ‘dice’ button now resets the Seed to -1, making it more obvious to people without maths/programming backgrounds that -1 means ‘random’.

I happen to be using a fork called Automatic1111. (Some people think this is the best one. Here are some other installation instructions on YouTube.)

Automatic 1111 doesn’t auto update. But coders are updating it several times per day. The screenshots I share below are already a few days out of date. This tech is moving so quickly now it’s been opened up.

Beside the dice button is a new button which allows you to reuse the seed from the previous generation.

Highres.fix is also new, to accommodate the potential for larger canvases. The thing about AI art: It fills the canvas. The bigger the canvas, the weirder composition can get. Better to start with a small canvas (e.g. 512 by 512, because Stable Diffusion was trained with 512×512 images). You get a much cleaner composition that way. However, then you don’t get the quality. By checking Highres.fix, you get the best of both worlds: The generator composes as if for a much smaller canvas but gives you the quality and dimensions you want with the bigger one.

WHAT IS AI ART GOOD FOR?

Professional and hobbyist artists will find ways to integrate it into the process, especially at the brainstorming stage. Just a few other ideas:

  • Generate neat images for TTRPGs (tabletop role playing games)
  • Students can generate characters for novel and poetry studies
  • Create art for your fanfic, or original creative writing
  • Desktop and phone backgrounds
  • Memes, of course
  • Turn your favourite cartoon characters into real-looking people and creatures
  • Turn yourself into your favourite cartoon characters
  • Turn yourself into your favourite teapot
  • Turn your middle manager into your least favourite teapot
  • Satisfy the need to know what your cat would look like if she were made of LEGO
  • Generate new art quickly using your very own style (using the img2img tab)

MAKING TILED PATTERNS WITH STABLE DIFFUSION

Open Affinity Photo, create a New Fill Layer, throw that tile in and boom: We now have my new dining room wallpaper.

LISA SIMPSON AS A REAL GIRL

I did this experiment with Midjourney. When I tried the same prompt with Stable Diffusion I learned this: I could get something as realistic, but the process was completely different. For Midjourney I used an image prompt of the cartoon. But with Stable Diffusion, that only made things worse. Stable Diffusion does not seem to know what to do with 3D modelling prompts such as Unreal Engine? So I was forced to do a workaround: I used the names of the world’s most famous portrait photographers. Honestly, this was way better because unreal engine doesn’t make a good job of an eight-year-old girl.

First, the creepy version. Now, I feel this is actually realistic because if Lisa Simpson were still inside an eight-year-old body she’d be looking mighty weathered by now. I mean, she’s as old as I am.

(I hadn’t checked the ‘Restore faces’ box before generating these images. I’ve since learned to use it whenever I’m expecting faces. It definitely makes a difference.)

But here’s the one I like. I redid the ears in Affinity Photo and also tidied up the hair with a digital pencil. Minor retouching is usually necessary around the eyes. Stable Diffusion also gave her red lips. I changed them to a natural colour. Gave her eyelashes. But this is mostly the work of AI!

(She reminds of Sheldon’s genius nemesis off Young Sheldon. The artifacts weren’t created by Stable Diffusion but by myself when making it smaller for web.)

If you’re wondering already what those brackets do, unlike when I was using Midjourney (crapping in the dark), there’s documentation about this! Round brackets gives something more weight. Square brackets gives something less. I kept getting MASSIVE eyes, so I put large eyes in square brackets. I also kept getting cartoons, which is why realistic is in round brackets.

The bracket functionality was refined in an update. Now instead of using a whole lot of brackets, you can add a number as weighting e.g. (highway at night:1.2).

Shortcut trick: highlight highway at night then hold down control and hit the arrow key up or down. This will fill the brackets, colon and number weighting for you.

Stable Diffusion kept giving me hats, so I put ‘hat’ in the negative prompt field.

If you use those exact prompts and parameters I’ve given you, you’ll see for yourself the image of Lisa Simpson I got, before modification.

PROMPT FIELD

Update: A Stable Diffusion Prompt Book has now been released. You can download the free PDF and use it to create your own styles. The current version of Automatic1111 lets you apply two saved styles to a prompt. I’ve been getting some excellent results with it.

Apart from the parameters section, what I wrote about text prompts in Midjourney works for Stable Diffusion, too.

In Midjourney, ‘re-roll’ means ‘regenerate another version’. Here it doesn’t mean that at all. ‘Roll’ is short for ‘automatically insert a random artist from our database’.

I’m finding Stable Diffusion doesn’t generally recognise my favourite artists unless they’re in the database. (My favourite artists are perhaps a little niche.)

A workaround: Add “by artist First name Surname” after the main prompt, not just “by First name Surname” or “First name Surname”. This can sometimes have a huge impact on the result.

I have learned more about artists and art in two months in the latent space than in my 20 years of graphic design.

Gno Man Silva

I recommend adding “by artist” even when the artist is well-known.

If you want a photography, “photography of [subject] by artist [Name]” works.

What’s a prompt token?

Each phrase between commas creates a token.

There is an upper limit of prompt tokens but I haven’t hit it yet. Can’t tell you what it is. If you get the warning that your prompt is too long, try taking out some commas. (Warning: too many input tokens; some (x) have been truncated.)

Try experimenting with using a period instead of a comma for your main subject and commas for all the descriptors after. For example “A man chasing a dog. HD, midday, fields, taken on iPhone”

inbetweenthebleeps on Reddit
MAKING USE OF ACTORS

If you want a character who looks like a fictional movie/TV character, try using the name of the actor who plays them. There will be more images on the net of the actor than of the character.

Try also, following the name (of the actor/character) with ‘man’ or ‘woman’ or ‘person’ e.g. Tom Cruise man, Nicole Kidman woman, Tilda Swinton person. (Actually the capitalisation doesn’t matter. I do it out of habit.)

TEXT PROMPT WEIGHTS: USE BRACKETS

Remember: ( ) Increases attention to enclosed words, [ ] decreases it. Highlight the prompt, hold control, hit the up and down arrow.

For a reference library of different styles and the settings used to produce them see lexica.art

STABLE DIFFUSION PROMPT CRAFT TIPS AND TRICKS

FOR FULL BODY SHOTS

Try:

  1. (ensure you’ve checked hires fix even though processing takes longer)
  2. full body portrait
  3. full body shot
  4. wide angle
  5. standing
  6. zoom lens zoomed out (position at beginning of prompt)
  7. specify the body parts you want included
  8. change the dimension of your canvas (bodies tend to get cut off in landscape sizes)
  9. generate the environment/background first then inpaint the full body into it.
  10. Mentioning hairstyle/color or headgear will more often than not solved decapitation
CREATING CREATURES

When creating a creature of any sort, and you have multiple mouths in the picture. Put multiple jaws in the negatives instead of multiple mouths. it will actually solve the issue and create a higher quality of picture.

GenericMarmoset on Reddit
IMPROVING EYES
  • Add rendered eyes, iris, contacts, or eye color like hazel eyes. If you get iris flowers, type flower into the negative prompt field.
  • highly detailed symmetric eyes
  • Type cross-eyed or cross eyed into the negative field.
  • Inpaint the eyes at high res after you’ve generated the image. (You may need to type eyes into the inpainting field, and be as specific as possible about what kind of eyes you would like.)
  • More generally, face details can be improved externally via Faceswap on any generated image. “Faceswap is the leading free and Open Source multi-platform Deepfakes software.”
  • Using the new VAE (end of October 2022) also helps improve the eyes quite a bit
MATERIAL AND FABRIC

The model understands material types quite well for many things. Clothing for example can be made of different materials and fabrics. A fun one you can try is wearing (pearlescent:1.2) or even (translucent:1.2) clothing which works pretty well believe it or not… 😉

Increasing the number increases the amount in my testing. Substitute clothing with the type of clothes you’d prefer. If you wanted something reflective and shiny then latex or glittery for example works. There’s many more interesting and weird material possibilities you get into as well.

Magneto– on Reddit

I’m finding floral lace at anything over :1.21 starts making flowers pop up everywhere. It’s not what I was looking for, but it usually makes an image look cool.

Chief_Broseph
TOO MANY ROMANTIC COMPOSITIONS WITH WOMEN IN FRILLY DRESSES?

I have definitely noticed this myself. Stable Diffusion must have been trained on many wedding photos. (Makes sense. The life blood of photographers is weddings.)

Someone else noticed the same thing:

I noticed a lot of the images I was generating had an ugly wedding photo look to them (bad lighting; bad composition; women sometimes wearing frilly formal dresses even if that wasn’t in the original prompt; generally cheap/amateur looking). […] Images with a female subject will work the best.

tjw2469

Try adding wedding photo to the negative prompt field. This may improve the look of realistic images.

ART DIRECTING FOR COLOUR

An issue with hires fix: The denoising strength is set high. If you ask for ‘blue flowers’, this tends to create a wash and apply it to the entire canvas. Workaround: Turn down the denoising down.

Make use of negative prompts. You wanted a green shirt but you got green hair? Type green hair into the negative prompt field. However, this may remove all of the green you do want.

If you prefer a specific hair color and a specific eye color make sure to physically put the hair color in front of the eye color so that the eye color doesn’t say turn your hair green.

Yes: A black haired woman with green eyes

No: a woman with green eyes and black hair.

GenericMarmoset
WORDING THE PROMPT

Overspecifying or being overly verbose gives nothing and can sometimes make things worse. Try saying `Woman standing in the rain, street photography` rather than `Woman standing in the street as it is raining`. This has mixed results, sometimes the model works alright with verbose prompts, but I rarely see benefits in them.

PM5k on Reddit
KEEPING THE SUBJECT IN FRAME

Adding square image can help to keep the subject in the frame, even in none 1:1 images

tul-lb on Reddit
FOR REALISTIC PHOTOGRAPHY

To make photos more realistic:

Nikon Z9

Canon 5d

Other cameras models

Tends to make photos more historic/realistic:

Historical photo

Associated press

High resolution scan

Tends to make photos better at drawings (especially cartoons art/editorial art):

Cartoon

Editorial illustration

New York Times (or other famous newspaper/magazine) cartoon

Improves aesthetic in general:

Hasselblad Award Winner

Award-winning photograph

Masterpiece

eric1707 on Reddit

Camera models are supported and can affect prompts.

Focus: Phrases like soft focus, light depth of field, motion blur can add to the prompt, experiment with this.

Lighting: I sometimes see prompts using rim lighting when it’s not needed and it results in a washed out part of the subject or entirely does not fit the prompt. Try to experiment with other types of light for instance soft diffuse lighting and so forth.

PM5K on Reddit
MORE TRICKS TO TRY WITH PHOTOGRAPHY
  • add depth of field and an aperture as F5.6_aperture_lens
  • tilt-shift
ARTISTS

Use “by artist” or “by photographer” rather than simply their name. (Rolling the dice would suggest you only need the name, but that doesn’t work nearly as well.) If the artist is extremely famous, you may gave an image of the artist themself.

SAMPLING METHOD (“SAMPLERS”)

Is there any difference between Euler A, Euler etc.? Some scripts will only work with certain Sampling Methods (i.e. with Euler). Some people are starting to see patterns. Current wisdom is basically this: DDIM does the best job of inpainting and outpainting.

The script lets you do all sides in one go, but people are getting better results doing one side at a time.

I use Euler_a when I want more photorealistic faces/bodies. Otherwise DDIM or LMS for fantastical landscapes (don’t ask me why though)

thunder-t on Reddit

I switch between DDIM, Euler A and DPM 2a. DPM 2a seems to give the most texture, euler A seems the most smoothed out. DDIM is my default at 20 steps, it is fast and it gives comparable or better results than klms at lower step counts.( at least for what I use it for)

reddit22sd

I like the output from Euler A best, not sure why. I haven’t done any extensive testing.

I’ll use DDIM if I’m getting too many duplicated elements in the photo, e.g. many faces stacked together, as this sampler definitely reduces the effect.

FascinatingStuffMike

`Euler_A` works best between 10 and 40 for me, it is also an incredibly unpredictable (read: creative) sampler which means that raising CFG to high levels won’t always yield good results. Sometimes stuff’s gonna come out cursed.. It’s also hella fast if you don’t stick it to 80 samples (because for one, that won’t do anything and secondly it is wasted compute).

`DDIM` is a fast denoiser and for getting composition initially for a seed you may want to reuse works well with low samples and almost any sensible CFG. However it needs a high number of sample steps to produce something decent by itself. It also greatly varies on what that is. Portraits of faces seem to suggest `DDIM` holds up to `Euler_A` and sometimes gives better results than even `DPM2_A`.

`DPM2_A` This one has been a bit mixed for me. Needs a decent amount of sampling steps (60-90) and playing around with other settings to get good results, it’s far slower than the others I’ve mentioned, but when it gets something right, it’s super nice.

`Heun` is another one I have had good results with when treating it like `LMS` or `DDIM` with some sampling variation.

PM5K on Reddit

SAMPLING STRENGTH (PROCESSING STEPS)

You may be wondering at first if increasing this does anything, except increase the processing time.

Steps do start to mean something once you use more complicated prompts and are sometimes vital when making use of scripts.

Square brackets do more than decrease attention to enclosed words. They also allow you to tell the computer

  1. What to start drawing
  2. What to switch to
  3. At what point

The instruction looks like this: [from : to : when]

Say you’ve set the sampling steps to 100. The ‘when’ refers to the steps.

In conversational English: Start drawing me a cat, then halfway through start changing the cat into a dog.

In the prompt text field: [cat : dog : 50]

This doesn’t mean the cat itself will morph into some kind of chimera. You’re likely to get a cat and a dog, each separate. But if you keep generating, you’ll start to get cats which look kind of beefed up, like cats with dog genes CRISPRed in. Your cat not doggy enough? Bring the dog in early, e.g. [cat : dog : 10]

use the brackets to create more complex artwork

AI art generators currently have a hard time generating complex images such as a family eating a picnic at the beach, or a rat riding a turtle. Using brackets, you can get around that by kicking off the first thing, letting it make a decent start on that, then kicking off the second thing.

Related to this issue, consider separating setting from style. For example:

[detailed description of setting : detailed description of style : 0.4]

NEGATIVE PROMPTS

Examples of the sorts of things you might enter into the negative prompts field:

  • low detail
  • closeup
  • low quality
  • bad lighting
  • out of frame
  • multiple people
  • Disfigured
  • bad art
  • amateur
  • poorly drawn
  • ugly
  • flat
  • deformed
  • poorly drawn
  • extra limbs
  • close up
  • b&w
  • weird colors
  • lowres
  • bad anatomy
  • bad hands
  • text
  • error
  • missing fingers
  • extra digit
  • fewer digits
  • cropped
  • worst quality
  • low quality
  • normal quality
  • jpeg artifacts
  • signature
  • watermark
  • username
  • blurry
  • stacked torsos and totem pole in the negatives will help with the stacked torso issue which comes about because Stable Diffusion is trained on square images.

(Stable diffusion is great at rendering materials and lighting.)

People are using many negative prompts, and also making use of brackets in the prompts. For example:

((nipple)), ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))). (((more than 2 nipples))). out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))

(Multiple negative prompts of the same word happen when you fuse styles together. Sometimes you accidentally realise something works better when repeated, or given more weight.)

You won’t know what to put in this field until it keeps giving you a weird thing you don’t want. When generating a landscape view including a meatworks (for a short story I was writing about) it kept giving me landscapes made of saveloy or something. So I had to put meatworks in the prompt and meat in the negative prompt.

When creating city scapes and town scapes, I put ‘powerlines’ in the no field because otherwise you get way too many and they’re never in the right place. Powerlines are easy to add yourself.

RESTORE MESSED UP FACES

‘Restore faces’ uses Generative Facial Prior GAN (GFPGAN) to maintain left-right symmetry. Seems to work best for me with close ups. Mid-shots of faces, not so well.

In the settings, you get a choice between which type of face restoration algorithm you want to use.

I’ve heard this: Coderformer produces good photorealistic eyes. It was designed to fix photos. Unfortunately it can sometimes change the line of sight.

GFPGAN produces better eyes when you’re using an art style (hyper-realism, anime etc). Try both.

Here’s an example of no face restoration compared to GFPGAN and Code Former.

I like to start with about 0.5 GFP-GAN, and 0.25 CodeFormer (weight, I always do 1.0 on visibility or you get ghosting). Too much of either one can cause artifacts, but mixing both at lower settings can yield great results. I adjust from there.

I’ve also found that running it through GFP-GAN and/or CodeFormer again after upscaling makes the face even clearer at high resolution (perhaps because on the first run it cleans up the face before upscaling?)

jonesaid on Reddit

For stylized work I recommend doing another few iterations in img2img after face restoration. It can restore the style.

Manesus on Reddit

BATCH COUNT AND BATCH SIZE: WHAT’S THE DIFFERENCE?

For me, the distinction between ‘batch count’ and ‘batch size’ is not intuitive.

Batch count: How many separate generations.

Batch size: How many images to generate per generation (or ‘batch’).

Here’s the important thing: A higher batch size uses more VRAM (video card memory), but a higher batch count does not. The process runs more times. For me the batch size tops out at two.

If your batch count is high, keep the batch size at 1. This is slower, sure, but you get higher quality images.

tl;dr: Move the first slider but don’t touch the second slider.

Basically, rather than sit around hitting ‘Generate’ every forty seconds or however long it takes to generate one image, increasing the first slider allows you to make 9 or 12 (or however many) at once while you go away and make a cup of coffee. Of course, you’ll want to do this after fine-tuning the prompt somewhat. Or, do it right at the beginning and it will become super clear what needs to go in the negative prompt field. (e.g. my saveloy landscapes when I was wanting a meatworks. Every single image I got back after returning with a cuppa: buildings made of saveloys.)

CFG SCALE: “Creative Licence”

When first starting out, pick a CFG number between 7 and 11.

Higher CFG scale values require more steps in order to produce better images. you see a kind of grey “fog” when CFG and iterations are low. 

When you’re trying to find a good seed, fine-tuning your prompt, start with a low CFG. This will get you the speed. Put it right above where you get the grey fog. That’s the sweet spot between speed and ‘good image’.

CFG: For txt2img, I tend to go between 5 and 15 because I’ve noticed that I get a wide range of different results that might inspire me. For img2img, I crank it up to 25ish because I want to start from something and really arrive at my desired result.

thunder-t
OVERCOOKING IT

There are only two settings to pay attention:

Scale (CFG) and Steps. The rest, leave as default.

These values seem dependent on one another.

As far as have seen, adding too many Steps to the default values either ‘overcook’ the process or do nothing; it seems that SD either gets confused, or thinks it got right by earlier steps.

While too much Scale (CFG > 11) seems to ‘burn’ the image (at a lower step count) as it ‘guesses harder’ what you prompted, but then if you add more steps (towards or above 100) the image might be get fully realized, and that’s some of the best results I’ve got

franzsanchez on Reddit

HEIGHT AND WIDTH

I am running Stable Diffusion on a NVIDIA 3060 graphics card with 12 GB of RAM and before the latest update, could only reliably make images with a maximum 640px on the longest side.

However, this is getting better with updates. I’m now making bigger images, and need to make use of the Highres.fix checkbox.

SEED

Now for some gardening tips!

Notice the default is set to -1. What does this mean?

In programming world, negative one is special: It means ‘pick one at random, bitch!’ When generating AI images, imagine all the billions of possible images inside one big lottery barrel.

When you leave the seed number at negative one, you’re telling the computer to pick one of those billions of images inside your barrel. Your barrel? Yes, yours. You’ve already chosen your very own barrel, of all the billions of barrels, by specifying the text prompt, steps etc.

So, each seed number refers to one of the images inside your barrel. But you have no idea until you hit ‘Generate’ what each seed number even refers to. So why would you specify a seed number?

When you’re close but no cigar.

Let’s say the algorithm returns something you kinda like — something you were hoping for — but it’s not quite right. You wished you’d used a different keyword now, because you can see what the algorithm is doing with it. You need different dimensions, actually. Oh, and that’d look better with landscape dimensions, not portrait. You want to give the computer a smidge more creative licence, so you wish you’d upped the Cfg.

In that case, you copy the seed number. It’s right there, below the image. Paste it into the ‘Seed’ text field. Make your adjustments and try again!

In case that makes no sense, I’ll show you an ugly example.

I’m after a kite in the night sky with its tail on fire. When starting out, I obviously leave the seed number set to completely random. (Negative one.) It makes no sense to specify the seed when I can’t see the seeds (images) inside the barrel! After a few generations I get something a little interesting.

I want that general shape but I’ll modify the text prompt and see what a different sampling method might throw. I copy and paste the seed number into the ‘seed’ text field, change a few other settings and try again. (Update: There’s now a button for that, rather than copying and pasting the seed number you can press what looks like a ‘recycling’ icon and it ‘copy and pastes’ the last seed for you.)

Okay, for the sake of this experiment, let’s just pretend this is exactly what I wanted. I could hit ‘Generate’ a million times now, and unless I change at least one of the settings/prompts, the exact same image will keep coming back. That’s because I have its seed number and other parameters are left unchanged. (It’s valuable to have a seed number, which is why some people are trying to sell them online, and also why some people think it’s unethical to share AI art without attaching its seed number.)

YOU’RE SO EXTRA!

Notice the ‘Extra’ check box beside the Seed? That gives you more options. Wait, what do they do? WTH IS A SEED VARIATION?

Keep reading. It all becomes clear when you start messing around with the scripts to generate test grids.

SCRIPTS

If you’re trying to art direct an image — and trust me, you will want to art direct once you get over the amazingness of your completely random images — you’ll want to be a bit more strategic about it. Scripts allow you to do just that. Generate grids of images side-by-side, each a little different from each other. Pick your favourite and reproduce.

There are a few different ways of generating test grids. The prompt matrix is slightly easier to get your head around.

PROMPT MATRIX

Normally you type something like this as a text prompt, separated by commas:

suburbs, illustration, cinematic lighting, Charles Ginner

This function requires you to swap out your commas for pipes. (I never need to type pipes. Had to look up how! It’s shift + the key above ENTER.)

suburbs | illustration | cinematic lighting | Charles Ginner

Don’t worry about spaces. (Spaces are for the reading ease of humans.) Those pipes tell Stable Diffusion you want a grid.

The first part of the text prompt (ie. ‘suburbs’) is considered the most important part. You’ve just told Stable Diffusion you want a picture of the suburbs. It gave you some suburbs, just suburbs. (Top left.) Next it gave you suburbs as an illustration. Beside that, suburbs with cinematic lighting (but not an illustration). See how it works?

Note: The first part of the prompt is treated differently by the algorithm. The first part gets priority. It doesn’t change. Everything after that first pipe is considered a variable. That’s why the word ‘suburbs’ is left off the grid. It assumed you knew you were looking at ‘suburbs’. I mean, that’s what you definitely asked for.

Make use of this script because you’re testing different prompt combos. Instead of generating a whole lot of separate images, you’ll speed up the process and roll them into one. Side by side, the differences become clear.

As it turns out, I like the combination with every single one of those prompts the best. (The bottom right.) Stable Diffusion has already generated all of those images for us. All I need to do is click on it, and send it to the enlargement tab to make a bigger version.

Of course, once enlarged it looks nowhere near as good. The AI beer goggles come off…

But now you have the seed, prompts and parameters, you can play with it. Put its seed into the ‘seed’ text field, create minor variations by checking ‘extra’ (next to the seed text field) and adjust the first slider, or change a few of the text prompts.

X/Y PLOT (A GRID FOR COMPARING)

Here’s a slightly different way of comparing different but similar generations. You get more functionality with this method. I had trouble generating a grid in which all images are different, so I ended up using the very same artists and parameters in the Automatic1111 instructions at GitHub.

What I was doing wrong (I think): My text prompt was too short. But maybe that wasn’t it. (May have been fixed. Am yet to try with the latest update.)

This test changes up the CFG scale. (Remember that basically means ‘creative licence’.) And as it does that, it also tests three different artists as text prompts. (Roy Liechtestein is already used in the text prompt above, so every image includes a bit of him. The first column doubles up the Roy Liechtestein-ness.)

Which is your favourite? I’ll go with the one in the dead centre. But how do I get it??

First, I copy ‘Sherree Valentine Daines’ and paste it to the end of my original prompt. Then I move the CFG slider to 10. I copy and paste the seed into the seed field. And now the variation seed makes sense. I copy and paste that too, after checking the Extra box.

Generate! There we are. I’ve embiggened the middle image exactly by copy and pasting both the seed AND the variation seed.

the automatic1111 LOOPBACK SCRIPT

To use this script, increase the batch count. This script is for creating a succession of images with each new image based on the previous one.

(If you put these altogether in a short video it looks like something morphing into another thing.)

DENOISING STRENGTH: HOW MUCH DO YOU WANT TO CHANGE THE ORIGINAL?

“Denoising strength” has a range between 0 and 1.

0 tells the computer: Don’t change anything and keep the original.”

1 tells the computer: “replace everything with something new.”

You may want to muck around with this when you’re inpainting. (See below.) But don’t go below point five. (Because what’s the point of regenerating exactly the same thing.)

Low denoising keeps the image but if iterated (run the resulting low denoising through again and again) you can get refined results. High denoising for about 7-9 is when you start getting details and dramatic departures from the original image and the result might just include general color information.

I have found that CFG above 11 doesn’t add much, but that might depend on the prompt and the level of denoising.

Jcaquix on Reddit

When inpainting and outpainting, current wisdom is this: Select DDIM.

IMG2IMG INPAINTING

What’s this for?

Don’t sleep on inpainting. It’s a powerful way to add detail to a gen where the initial sampler fell short. Great for experimenting further and fine-tuning.

PM5k on Reddit

You’ve generated a busy scene which looks pretty good at first glance, but then you take your beer googles off and it looks pretty rubbish, actually. The people don’t look right. The architecture wouldn’t work in real life. You can’t make use of images like these, even if the basic composition and color palette works.

When you get one of these images, send it to img2img.

Basically, you draw a mask over the part you want the algorithm to make a better job of.

Say you’ve got a great landscape, but you asked for a kombi van and it made a pig’s ear of the kombi van. Send the image to img2img and check the box for inpainting. Use the brush to paint over the rubbish kombi van. You may need to hit it ten or twenty times to get a good kombi van facing the right direction. (I did this exact experiment. I ended up with an excellent kombi van and used the end result in this post.)

Further instructions are here. (“Strength” refers to “denoising strength”.)

An experiment for you: Try generating hands by asking for them “zoomed all the way”. I’m not sure if this works, but others say it does.

The masking paintbrush is way less janky in the update. But for serious work, you’ll need to make a mask in your art software because that brush won’t change size or shape.

Automatic1111 webui users : if inpainting does not work for you (generates the exact same image), it can be due to your adblocker and duckduckgo privacy extension. Add 127.0.0.1 to your list of “unprotected sites”

cowkb on Reddit

I’ve also seen inpainting fail if you set the masked area to “original” and there’s not enough variation for it to play with (and the denoising is too low).

draqza

MAKING THE IMAGE BIGGER

The great thing about Automatic 1111: You can upscale images in the img2img tab. (Also in the extras tab.)

Comparison of the Three Upscalers from Automatic1111 (comparing Real ESRGAN, SwinIR and LDSR)

LDSR takes a long time to process but is many people’s favourite.

Let’s talk about the img2img tab for a sec. Once you have an image in the viewport (either because you sent it over from txt2img or uploaded it directly here). Update: This option was previously a radio button, but I see it’s now been moved into scripts.

Check ‘Just resize’.

Next, check tiling. You can leave the tile overlap where it is. Now pick an upscaler. I don’t see much difference between them but other people do. (By default it’s set to None.)

Once you’ve done those three things you’re all set to upscale. Hit Generate. You now have a much bigger image. Still not big enough for you? Send your newly enlarged image back to img2img and do it again.

Not that I need to know this, but SD upscale works by breaking an image into tiles and making each one individually more detailed before stitching them back together for us. I don’t care, except that’s why I check the box that says ‘Tiling’. So, it’s not just upscaling the image. It’s regenerating the image, adding new detail that would only be seen at higher resolution.

Apply a low denoising strength with SD Upscale, otherwise you’ll end up with something you didn’t want!

Upscale to huge sizes and add detail with SD Upscale, it’s easy! on Reddit

Note that ‘Tiling’ in the img2img tab refers to SD upscaling and is completely different from the ‘Tiling’ button you can check in the txt2img tab, which is not an upscale feature at all. That’s a button to create tiles for seamless patterns and textures.

(More generally, the wording on this Automatic 1111 user interface is not what I’d have chosen e.g. it says ‘loading’ when it’s really ‘processing’.)

If you have a good graphics card, the best is to render at 512, img2img at 1024 and then keep using SD upscale while pushing the strength and CFG down.

hapliniste

WHAT IS TEXTUAL INVERSION?

Word on the street as of October 2022: Textual inversion isn’t working well in Automatic1111 and has been abandoned for now in favour of improving other things.

You may have seen the phrase ‘textual inversion’ floating about.

Textual inversion: Training the algorithm on your own images rather than relying on the (large but limited) library of images in the Stable Diffusion database.

Why would you want to do this? Well, because you’re an artist and you want to use your own art to create more art which has your own, unique style.

Or you’re not an artist at all and would like to turn your boss into a teapot.

At this time, you need a 3090 and at least 24GB of VRAM, so this is not a possibility for me on my home computer. However, you can rent GPU power. Here’s a YouTube tutorial by Aitrepeneur with instructions.

If you’re not a coder, this may seem daunting because then you’re wholly reliant on someone giving you step-by-step instructions, yet the tech is changing daily, in small but crucial ways.

To give you an idea:

  • Each training will cost you about 50c (so this is cheap but not free)
  • Select photos of your subject (he uses characters from the Game of Thrones prequel — Rhaenyra from the Internet.) You’ll need around twenty images of her, from various angles and distances (close ups, mid-shots, full-body shots). Save them all in a folder on your computer. Crop them if they include other characters or too much background.
  • Now batch crop them all as 512×512 squares. The website berme.net does this for free.
  • Once you have the cropped images in a folder, you need to rent some GPU services e.g. RunPod and vast.ai. (This looks like a pain in the neck because the tech keeps changing day-by-day.)
  • Import your own training images to the CPU you’re renting. These need to be url links to images, so use an image hosting service like Imgur.
  • You’ll have to put the name of your character into the code. Rename the project.
  • You set the thing going, to train. This will take about 15 minutes.
  • Make some adjustments to the code
  • Prune (change a 12GB file into a ~2GB file)
  • Download the training model file and use it in your favourite Stable Diffusion repository (e.g. Automatic 1111). This takes another 20 minutes.
  • You’ll be going into Settings in the browser UI and also into console. Requires a restart.

FOR FURTHER INVESTIGATION

The Stable Diffusion Subreddit can be pretty good. It’s where I found my help. You can also join Stable Diffusion Discord chats. A Stable Diffusion search on Twitter was good until a few dudes started filling it up with you-know-exactly-what, but it depends on time of day. (Lucky for me, their late night is my morning.)

If you find the maximum values for CFG Scale, Batch Count and Sampling Steps are too low