Dall-E, Midjourney, Stable Diffusion (etc) - who's playing, and with which?

Out of context: Reply #189

  • Started
  • Last post
  • 831 Responses
  • yuekit0

    Are there any good articles on how Midjourney works in terms of the mechanics of rendering the images?

    • it rearranges pixel noise till it's happy.
      diffusion modeling is what they call it.
      uan
    • this article is on how gpt (language part) works:
      https://writings.ste…
      uan
    • So I wonder what then explains it's inability to render only some things? Some people and characters are dead on, others it misses the mark completely.yuekit
    • I also noticed after using it a while that the same stock models and scene compositions seem to show up.yuekit
    • when you get a few months in and you start to recognise the facial features of prominent (circa 2019) celebrities in your renders... like looking at the matrix:kingsteven
    • "theres Benedict Cumberbatch"... "theres Don Rickles"kingsteven
    • with SD you notice with text2img how even the cropping of training data to 1:1 influences the layout so you have that to deal with before realising thatkingsteven
    • layout in art is overwhelmingly generic and text2img diffusion only really does generic well.kingsteven
    • its good to have an eye for how it works but i maintain that typing text prompts in to MJ is a bit of a fad. guys using SD in production are effectivelykingsteven
    • creating their own MJs (custom embeddings and interpretation) and then using guided images, inpainting. alongside several AI processes in a workflowkingsteven
    • it really requires that bottom up understanding to lessen the randomness when you have an image in mind you want to createkingsteven
    • otherwise you become a little goblin man generating 1000 images to pick the one thats randomly 'correct' there's no merit in it long term.kingsteven
    • humans are rubbish at identifying randomness and exponentially and images from latents is exponentially random. there's not much differencekingsteven
    • in sharing a MJ image and numbers guy sharing every permeation of pixels in a 9 pixel cube. just more pixels.kingsteven
    • technically the most impressive MJ image would be a prompt+seed that pulls a high resolution artstation image from the training data and fixes the handskingsteven
    • but if i wanted to do that, i could use my AI workflow in SD for fixing hands. and as phanlo below, if you want a layout - sketch it in, if you want a likenesskingsteven
    • Thanks that makes sense, are there any others than Stable Diffusion you think are good?yuekit
    • train a model. the more you dig the more you find the tools are out there to overcome the limitations reliably.kingsteven
    • the big SD platforms are adding workflow editors which is a big deal because staged processing can automate the removal of almost all the identifiable traitskingsteven
    • I found this article that explains how Stable Diffusion works with illustrations...
      https://jalammar.git…
      yuekit
    • The part I was most curious about is how SD and other AIs come up with the idea for the layout of the image in the first place. How do they decide which thingyuekit
    • to put in the foreground, what to include and what to leave out etc.yuekit
    • https://jalammar.git…yuekit
    • The article doesn't actually go into a great deal of detail about this but it must be happening during the "Conditioning" phase in that diagram above.yuekit
    • https://www.youtube.…
      https://www.youtube.…
      some Computerphile explanations I remembered watching
      uan
    • and about the composition part...I remember watching a clip from about a year ago, where they said it has composition rules (from data) built in.uan
    • I think they use something like the google image likes or something similar to create those models about pleasing compositions and they are built in.uan
    • but you can override them...the description you use to generate the image you want can override those 'default' compositions.uan
    • i think what you're looking for in that conditioning phase is the aesthetic scoring in the laion datasets. an AI trained on human responses to images has ratedkingsteven
    • every image. for example SD favours watercolours because (along with other conditioning) the aesthetic AI that tagged the training data loves watercolours.kingsteven
    • maybe read about markov chains and GANS for an understanding of diffusion. i think a lot of these articles assume some understanding of AIkingsteven

View thread