← Course Outline

AI Image Generation Tools

AI image generation tools collage showing outputs from various platforms
The landscape of AI image generation: from photorealistic to artistic, every tool has its strengths

AI image generation has transformed visual content creation. Whether you need cinematic stills, concept art, product mockups, or social media graphics, there is an AI tool designed for your use case. This chapter covers 9 leading AI image generation tools, their unique strengths, ideal prompt structures, and practical tips for getting the best results.

📝 Note: AI-generated images are only as good as the prompts you write. Each tool interprets prompts differently, so understanding each platform's strengths and syntax is essential for consistent, high-quality output.
ToolBest ForKey StrengthAccess Method
MidjourneyCinematic, stylized, emotionally rich visualsArtistic quality and aesthetic controlDiscord / Web App
Runway ImageFilm production, consistent character generationVideo-to-image and multi-frame consistencyWeb App
DALL-EPhotorealistic, design-ready, inpaintingOpenAI integration, editing capabilitiesChatGPT / API
GeminiContext-aware, educational, data-integratedGoogle ecosystem, multimodal understandingWeb App / API
Stable DiffusionOpen-source, fully customizableLocal deployment, model fine-tuningLocal / Web UIs
Dream StudioAccessible Stable Diffusion interfaceCredit-based simplicity, SDXL modelsWeb App
Adobe FireflyCommercial-safe, design workflow integrationCreative Cloud integration, content credentialsWeb App / Photoshop
Meta AISocial media content, casual generationFree access, Facebook/Instagram integrationWeb App / Social
GrokUncensored generation, real-time contextX (Twitter) integration, fewer restrictionsWeb App / X Platform

1. Midjourney

Midjourney interface showing Discord bot and web app side by side
Midjourney operates through Discord and its dedicated web application

Midjourney is widely regarded as the gold standard for AI-generated art. It excels at producing stylized, cinematic, and emotionally rich visuals with exceptional aesthetic quality. Midjourney's latest models (V6 and V7) have dramatically improved photorealism, text rendering, and prompt coherence.

Midjourney: Discord vs Web Access

Midjourney can be accessed in two ways, each with distinct advantages:

FeatureDiscordWeb App
AccessVia /imagine command in Discord servermidjourney.com direct interface
CommunitySee others' generations in real-timePrivate workspace
OrganizationThreads can get clutteredClean gallery and folders
ParametersFull parameter supportGUI sliders and dropdowns
Batch GenerationGenerates 4 images per promptGenerates 4 images per prompt
Best ForCommunity interaction, inspiration browsingFocused work, portfolio building
📝 Note: The web app at midjourney.com provides a more streamlined experience with visual settings controls. Discord remains popular for community engagement and seeing what others are creating in real-time.

Midjourney: Settings & Configuration

Midjourney offers several configurable settings that dramatically affect output quality and style. Understanding these settings gives you fine-grained control over your generations.

SettingOptionsWhat It Controls
Image SizeSquare (1:1), Landscape (16:9, 3:2), Portrait (9:16, 2:3), CustomAspect ratio and dimensions of generated images
Aesthetics1-1000 (default ~100)How much artistic stylization is applied; higher = more artistic
Model VersionV5, V5.2, V6, V7, Niji (anime)The AI model used; newer versions have better coherence
Speed ModeFast, Relax, TurboGeneration speed — Turbo is fastest but costs more GPU time
Weirdness0-3000 (default 0)How unconventional or experimental the output should be
VarietyLow, Medium, HighHow different the 4 generated images are from each other
RAW ModeOn/OffLess automatic beautification, more literal prompt interpretation
Midjourney settings panel showing sliders for aesthetics, weirdness, and variety
The Midjourney settings panel allows precise control over generation parameters

Midjourney: Upload Functionality

Midjourney supports image uploads as references. You can upload a photo and use it as a style reference, character reference, or composition guide. This is invaluable for maintaining consistency across a series of images.

Upload TypeParameterWhat It Does
Image PromptUpload URL at start of promptUses the uploaded image as a visual influence on the generation
Style Reference--sref [URL]Copies the artistic style (colors, mood, technique) of the reference
Character Reference--cref [URL]Maintains character appearance consistency across generations
Image Weight--iw 0-2Controls how strongly the uploaded image influences the result
Midjourney Image Reference Prompt
https://example.com/my-character.png A warrior standing on a cliff at sunset, epic fantasy landscape, dramatic lighting --cref https://example.com/my-character.png --sref https://example.com/style-reference.png --ar 16:9 --v 7

Midjourney: Creation Actions (Vary & Upscale)

After generating your initial 4 images, Midjourney provides powerful post-generation actions to refine your selections.

ActionButtonWhat It Does
Vary (Subtle)V1-V4 (Subtle)Creates small variations of the selected image, maintaining core composition
Vary (Strong)V1-V4 (Strong)Creates significant variations, keeping the general concept but changing details
Vary (Region)Select area + promptRegenerates only a selected region of the image (inpainting)
Upscale (Subtle)U1-U4 (Subtle)Increases resolution with minimal changes to the image
Upscale (Creative)U1-U4 (Creative)Increases resolution while adding new fine details
Zoom OutZoom Out 1.5x / 2xExpands the canvas around the image, generating new surrounding content
PanArrow buttonsExtends the image in a specific direction
Midjourney vary and upscale buttons interface
Post-generation actions: Vary, Upscale, Zoom, and Pan controls

Midjourney: Prompt Codes (Parameters)

Midjourney parameters are appended to the end of your prompt using the -- prefix. These give you precise control over technical aspects of the generation.

CodeSyntaxDescriptionExample
--ar--ar W:HSets the aspect ratio--ar 16:9, --ar 1:1, --ar 9:16
--q--q 0.25/0.5/1Quality level — higher = more detail, slower--q 1 (default, best quality)
--v--v 5/5.2/6/7Model version to use--v 7 (latest)
--s--s 0-1000Stylize amount — how artistic vs literal--s 750 (high stylization)
--no--no [items]Negative prompt — things to exclude--no text, watermark, blur
--tile--tileCreates seamless tileable patterns--tile (for backgrounds/textures)
--c--c 0-100Chaos — how varied the 4 results are--c 50 (moderate variety)
--w--w 0-3000Weirdness — how unconventional results are--w 500 (slightly weird)
--seed--seed [number]Reproduce a specific generation--seed 12345
--stop--stop 10-100Stop generation partway for abstract effects--stop 50 (half-rendered look)
--niji--nijiSwitch to anime/manga specialized model--niji (for anime style)
Midjourney Full Prompt with Parameters
A lone samurai standing in a field of red spider lilies, fog rolling in from ancient mountains, golden hour lighting, cinematic composition, hyper-detailed armor with intricate engravings, volumetric light rays, Studio Ghibli meets dark fantasy --ar 16:9 --v 7 --s 750 --q 1 --no text, watermark
Midjourney Tileable Pattern Prompt
Japanese wave pattern, navy blue and gold, traditional ukiyo-e style, seamless repeating design, woodblock print texture --tile --v 7 --s 500 --ar 1:1
📝 Note: Combine --sref (style reference) with --cref (character reference) to maintain both visual style consistency and character appearance across a series of images — essential for storyboarding and video pre-production.

2. Runway Image (Gen-2 / Gen-3)

Runway ML image generation interface
Runway's AI tools bridge the gap between still images and video production

Runway is primarily known for AI video generation, but its image capabilities are powerful for film production workflows. It excels at generating consistent characters, scenes, and storyboard frames that can then be animated into video.

Key Features: Text-to-image, image-to-image, style transfer, frame interpolation, multi-frame consistency for storyboarding.

Runway Image Prompt
A detective in a rain-soaked noir city, standing under a flickering neon sign, trench coat and fedora, film grain texture, 1940s atmosphere, dramatic chiaroscuro lighting, cinematic still frame
📝 Note: Runway's greatest strength is that images generated on the platform can be directly animated into video using Gen-2 or Gen-3 Alpha, making it ideal for end-to-end AI video production.

3. DALL-E

DALL-E image generation and editing interface in ChatGPT
DALL-E integrates directly into ChatGPT for conversational image generation

DALL-E (by OpenAI) focuses on photorealistic, design-ready images with strong inpainting and editing capabilities. Integrated directly into ChatGPT, it allows conversational image generation — describe what you want, see the result, then refine with natural language.

Ideal Prompt Structure: Be descriptive and specific. DALL-E responds well to detailed scene descriptions, lighting specifications, and style references.

DALL-E Photorealistic Prompt
A cozy independent bookshop on a rainy autumn evening, warm golden light spilling through the windows onto wet cobblestone streets, a hand-painted wooden sign, vintage bicycles parked outside, photorealistic, shot on 35mm film, shallow depth of field, tilt-shift effect
DALL-E Editing/Inpainting Prompt
Take the generated bookshop image and:
1. Add a cat sitting in the window display
2. Change the sign text to read "The Wandering Page"
3. Add string lights hanging between the buildings
4. Make the sky a deeper twilight purple
FeatureDALL-E 3Details
InpaintingYesEdit specific regions of an image with text prompts
OutpaintingYesExtend images beyond their original borders
Text in ImagesImprovedMuch better text rendering than previous versions
Style ControlVia promptNo parameter codes — all control is through descriptive text
IntegrationChatGPT, APISeamless conversational workflow
Safety FiltersStrictStrong content filters, no public figure generation

4. Gemini (Imagen)

Gemini image generation interface showing text and image output
Gemini combines Google's search knowledge with Imagen's generation capabilities

Gemini (powered by Google's Imagen model) generates images with strong contextual awareness and factual grounding. It excels at educational visuals, diagrams, and images that need to accurately represent real-world concepts.

Ideal Prompt Structure: Gemini works well with descriptive, context-rich prompts. It understands references to real locations, historical periods, and scientific concepts better than most tools.

Gemini Image Prompt
Generate an educational infographic-style image showing the water cycle. Include labeled arrows showing evaporation, condensation, precipitation, and collection. Use a clean, modern illustration style with a blue and white color palette. The image should be suitable for a high school science presentation.
📝 Note: Gemini integrates with Google Search, so it can generate images that reflect current events, real locations, and up-to-date cultural references. It is particularly strong for educational and informational content.

5. Stable Diffusion

Stable Diffusion running locally with ComfyUI interface
Stable Diffusion's open-source nature enables local deployment with custom interfaces like ComfyUI

Stable Diffusion is the most flexible and customizable AI image generation tool available. As an open-source model, it can be run locally on your own hardware, fine-tuned on custom datasets, and extended with community-built models (LoRAs, embeddings, ControlNets).

Key Features: Fully open-source, local deployment (no cloud dependency), thousands of community fine-tuned models, ControlNet for pose/composition control, LoRA models for specific styles or characters.

InterfaceDescriptionBest For
Automatic1111 (A1111)Feature-rich web UI with extensionsPower users who want maximum control
ComfyUINode-based visual workflow builderComplex pipelines and automation
FooocusSimplified interface, Midjourney-like easeBeginners who want local generation
InvokeAIProfessional creative tool with canvasArtists and designers
Stable Diffusion Prompt (SDXL)
masterpiece, best quality, ultra-detailed, 8k uhd, a cyberpunk street market at night, holographic signs in Japanese, vendors selling bioluminescent food, rain-slicked streets reflecting neon colors, dense crowd of diverse characters with cybernetic augmentations, atmospheric fog, volumetric lighting

Negative prompt: low quality, blurry, distorted, deformed, watermark, text, signature, out of frame
📝 Note: Stable Diffusion requires a GPU with at least 6GB VRAM for local generation. SDXL models need 8GB+ VRAM. If you do not have suitable hardware, use cloud-based interfaces like Dream Studio or RunPod.

6. Dream Studio

Dream Studio is the official web interface from Stability AI for Stable Diffusion. It provides a user-friendly, credit-based system that makes Stable Diffusion accessible without any technical setup or local hardware.

Ideal Prompt Structure: Similar to Stable Diffusion — use descriptive language with quality modifiers. Dream Studio supports negative prompts and various generation settings through its GUI.

SettingOptionsWhat It Controls
ModelSD 1.5, SDXL, SD3Which Stable Diffusion model to use
Style PresetPhotographic, Cinematic, Anime, etc.Pre-configured style modifiers
CFG Scale1-30 (default 7)How strictly the AI follows your prompt
Steps10-150 (default 30)More steps = more refined (but slower)
SeedRandom or specific numberFor reproducing exact results
Dream Studio Prompt
A serene Japanese zen garden in morning mist, raked sand patterns, moss-covered stones, a single cherry blossom tree in full bloom, soft diffused sunlight, photographic style, shot on Hasselblad medium format, f/2.8, golden hour

Style Preset: Photographic
CFG Scale: 7
Steps: 40

7. Adobe Firefly

Adobe Firefly integrated into Photoshop workspace
Adobe Firefly brings AI generation directly into the Creative Cloud workflow

Adobe Firefly is designed for commercial safety and professional design workflows. Trained exclusively on licensed Adobe Stock images, openly licensed content, and public domain material, Firefly-generated images are safe for commercial use without copyright concerns.

Key Features: Content Credentials (provenance tracking), Generative Fill in Photoshop, Text Effects, commercial-use licensing, seamless Creative Cloud integration.

FeatureDetails
Training DataAdobe Stock, licensed content, public domain only
Commercial UseFully cleared for commercial projects
Content CredentialsEmbedded metadata showing AI generation provenance
Photoshop IntegrationGenerative Fill, Generative Expand directly in PSD files
Illustrator IntegrationText-to-vector, recolor artwork, generative patterns
Best ForDesigners, agencies, brands needing IP-safe content
Adobe Firefly Prompt
Professional product photography of a luxury watch on a dark marble surface, dramatic studio lighting with a single key light from the upper left, subtle reflections, shallow depth of field, magazine advertisement quality, clean and minimal composition
📝 Note: Adobe Firefly is the safest choice for commercial projects where copyright and IP concerns are paramount. Content Credentials provide transparent provenance tracking, which is increasingly important for brand trust.

8. Meta AI

Meta AI image generation within social media platform
Meta AI makes image generation accessible directly within social media platforms

Meta AI provides free, accessible image generation integrated into Facebook, Instagram, WhatsApp, and Messenger. It is designed for casual content creation and social media graphics.

Ideal Prompt Structure: Simple, conversational descriptions work best. Meta AI is optimized for social media contexts, so prompts do not need to be as technically detailed as those for Midjourney or Stable Diffusion.

Meta AI Image Prompt
A golden retriever puppy wearing a tiny graduation cap and gown, sitting on a stack of books, confetti falling, celebration background, bright and cheerful, perfect for a social media congratulations post
FeatureDetails
PriceFree to use
Accessmeta.ai, Facebook, Instagram, WhatsApp, Messenger
Best ForSocial media content, casual images, quick visual ideas
LimitationsLess control than specialized tools, safety filters, lower resolution
Unique FeatureIntegrated into social platforms — generate and share without leaving the app

9. Grok (xAI)

Grok AI image generation interface on X platform
Grok's image generation integrates with X (Twitter) for real-time context-aware generation

Grok, developed by xAI (Elon Musk's AI company), provides image generation with fewer content restrictions than most competitors. It is integrated into the X (Twitter) platform and can generate images with real-time context from trending topics and conversations.

Key Features: Fewer content filters, real-time X platform data integration, ability to generate images of public figures (with some limitations), humorous and irreverent style options.

Grok Image Prompt
A dramatic editorial magazine cover featuring a robot CEO giving a TED talk to an audience of surprised humans, photojournalistic style, TIME magazine aesthetic, bold headline typography, dramatic stage lighting
📝 Note: While Grok has fewer restrictions than some competitors, it still has safety guidelines. Its real-time integration with X makes it particularly useful for generating topical, meme-worthy, or culturally relevant imagery.

Choosing the Right Tool

Selecting the right AI image tool depends on your specific needs. Use the decision guide below to match your project requirements to the best tool.

If You Need...Use This ToolWhy
Highest artistic qualityMidjourneyUnmatched aesthetic control and stylization
Photorealistic imagesDALL-E / Midjourney V7Best photorealism with natural lighting
Commercial-safe imagesAdobe FireflyIP-clear training data, Content Credentials
Full customization/controlStable DiffusionOpen source, LoRAs, ControlNet, local deployment
Video production pipelineRunwayImages that convert directly to video
Educational/factual visualsGeminiContext-aware, Google search integration
Free social media contentMeta AIFree, integrated into social platforms
Fewer content restrictionsGrokMore permissive content policies
Easy Stable Diffusion accessDream StudioNo setup required, credit-based system
Decision flowchart for choosing an AI image generation tool
A decision flowchart to help you select the right AI image tool for your project

Universal Prompting Tips

Regardless of which tool you use, these prompting principles will improve your results across all platforms:

TipExample
Be specific about subjectInstead of 'a dog' say 'a border collie with heterochromia'
Specify lighting'golden hour backlighting' not just 'good lighting'
Define camera/lens'shot on 85mm f/1.4, shallow depth of field'
Set the mood/atmosphere'melancholic, foggy, muted tones'
Reference art styles'in the style of Studio Ghibli watercolor backgrounds'
Include composition details'rule of thirds, subject on left, negative space right'
Specify what to excludeUse negative prompts or --no to remove unwanted elements
Iterate and refineTreat first generations as starting points, not final outputs
Exercise:
Which AI image tool is best suited for commercial projects where copyright safety is the top priority?
Exercise:
What does the Midjourney parameter --sref do?
Exercise:
Which tool allows you to run AI image generation locally on your own hardware?
Exercise:
What is the primary advantage of Runway's image generation compared to other tools?