AI image generation has evolved significantly, enabling artists, designers, and enthusiasts to create stunning visuals.
One of the leading tools in generative AI, Flux AI, combined with the ComfyUI visual user interface, provides a seamless image creation experience.
This article will show you how to navigate ComfyUI, set up your own environment, and maximize your use of these tools, while also exploring advanced features and real-world applications.
Why upgrade from Midjourney or DALL-E to ComfyUI?
Midjourney and OpenAI’s DALL-E are great AI image generators. Even more so since Midjourney became accessible via a website instead of a cumbersome Discord bot. It’s great for beginners and even advanced users.
However, ComfyUI offers several advantages over Midjourney and DALL-E for users looking for more control and customization in their AI image generation process:
- Local processing: Unlike Midjourney, which runs on remote servers, ComfyUI can be run locally on your own hardware, giving you more privacy and control over your creations. Inference tokens are on the house!
- Customizable workflows: ComfyUI’s node-based interface allows for intricate customization of the generation process, enabling fine-tuned control over various aspects of your image creation.
- Model flexibility: ComfyUI supports multiple AI models, including different versions of Stable Diffusion and Flux, allowing you to switch between models or even use custom ones.
- Cost-effective: After the initial setup, using ComfyUI is free, whereas Midjourney requires a subscription for continued use.
- Transparency: The open-source nature of ComfyUI allows users to understand and modify the underlying processes. E.g. you can develop and add your own custom nodes to perform a unique workflow.
How to get started with ComfyUI
A word of warning: While ComfyUI is the most powerful and modular stable diffusion GUI and backend, it is not easy to set up and it requires a powerful graphics card to run smoothly (inference), preferably from Nvidia.
Getting started with ComfyUI involves a few key steps:
- System Requirements: Ensure your computer meets the minimum requirements, including a compatible GPU with sufficient VRAM (8GB+ the more of it, the better).
- Installation:
- Download and install Python
- Clone or download the ComfyUI repository from GitHub
- Install the required dependencies using pip
- Download models:
- Obtain Stable Diffusion or Flux model checkpoints from Hugging Face (e.g., SD 1.5, SD 2.1 or Flux.1-dev, Flux.1-schnell)
Here’s an according instruction I compiled on Perplexity. - Place the models in the appropriate folder within the ComfyUI directory. The README files contain the instructions where to put the different files
- Obtain Stable Diffusion or Flux model checkpoints from Hugging Face (e.g., SD 1.5, SD 2.1 or Flux.1-dev, Flux.1-schnell)
- Launch ComfyUI:
- From a command shell where the script resides, run the start script provided in the ComfyUI package (preferably the one for Nvidia GPUs, i.e. run_nvidia_gpu.bat)
- Access the interface through your web browser. The batch file should automatically open your browser at http://127.0.0.1:8188/
- Familiarize yourself with the interface:
- Explore the available nodes
- Learn how to connect nodes to create basic workflows
- Start with a simple text-to-image workflow to generate your first image.
ComfyUI image-to-image workflow
After you have managed your first text-to-image workflow, you might want to add an image-to-image workflow.
Here’s a basic image-to-image workflow in ComfyUI:
- Load Image node: Use this to import your starting image. Simply copy and pasting an image from your drive does the trick.
- VAE Encode node: Connect the Load Image node to the VAE Encode node to convert the pixels into something the image model can understand and work with.
- KSampler node: This is where the image processing occurs. 8 to 16 Steps are good to start with. The higher the better the image quality, but also the longer the render time. Connect the LATENT output to the VAE Decode samples.
- VAE Decode node: This decodes the processed image back into a viewable format.
- Save Image node: Connect the VAE Decode Image output to the Save Image images input to save your result to the Output folder in your ComfyUI folder hierarchy.
- Conditioning nodes: Add CLIP Text Encode nodes for your text prompts (positive and negative) and connect them to the KSampler.
- Add a Checkpoint Loader node to load your chosen AI model and connect it to the KSampler. I used the Flux Dev model available from here. Put the downloaded file in the Checkpoint folder of your ComfyUI installation.
Play around with the numerous parameters to find the one that suits you best. This is the most time-consuming part of the whole process.
The Denoise factor in the KSampler will be crucial in the image-to-image context, as it controls how much to use the input image. Below is a helpful explanation.
How to add your photos to the AI model: Train a Low-Rank Adaptation LoRA
You can easily train a Visual Language Model (VLM) with LoRA (Low-Rank Adaptation) to incorporate your own photos into the AI model.
Think of a LoRA as fine-tuning for visual diffusion models like Stable Diffusion or Flux.1.
By using LoRA, you can train the model on your own images, allowing it to generate pictures in your specific style or of particular subjects like your portrait photos. Here’s how to do it:
- Prepare your dataset:
- Collect 10 – 20 high-quality images that represent the style or subject you want to train.
- Ensure images are diverse but consistent in style or subject.
- If you use Replicate or Fal.ai, you can forget about resizing and labeling each image, as many tutorials will tell you. These platforms do it for you automatically. Here’s a good tutorial for Replicate.
- Set up the training environment locally or better yet, use Replicate or Fal.ai:
- You will probably train your portrait photo only once, therefore it’s not worth the hassle to set up and train on your own machine.
- Training a Flux.1 Dev model on Replicate with 10 portrait photos takes approximately 20 minutes on their high-end Nvidia H100 GPUs and costs as little as $2.50.
On your local machine the model fine-tuning may easily take a few hours. - Important: Set a trigger word before you start the training. You will need it later when you refer to your LoRA in your image prompt.
- After the training process:
- Download the LoRA weights file from Replicate or Fal and place it in your ./models/loras/ folder.
- Using your LoRA in ComfyUI:
- In your ComfyUI workflow, add a «LoRA Loader» node and select your LoRA.
- Connect the LoRA Loader to your model checkpoint in the workflow.
- Adjust the LoRA strength to control how much influence it has on the generation.
- Generate images:
- Use your hot word in your prompts related to your trained subject or style.
- Experiment with different LoRA strengths to find the right balance.
- Ethical considerations:
- Ensure you have the right to use the images in your training set.
- Be mindful of potential biases in your training data.
- Consider the implications of generating images that closely mimic real individuals.
Remember, while LoRA allows you to customize the AI model with your own images, it’s still building upon the base model’s capabilities.
The quality of your results will depend on both the base model and the quality of your training data. Flux Dev or Flux Schnell are among the best at the time of writing.
By incorporating LoRA into your ComfyUI workflow, you can create unique, personalized images that blend the power of large AI models with your specific visual style or subjects of interest.
If I didn’t know better, I would think the following picture was actually a photo of yours truly. But it’s not, it’s completely Generative AI.
Outlook: What to add from here
To further enhance your ComfyUI experience, consider exploring:
- Advanced workflows: Learn to use more complex nodes such as ControlNet for precise control over someone’s posture.
- Custom nodes: Develop your own or incorporate community-made custom nodes to extend ComfyUI’s functionality. E.g. I will want to add a label «Created with AI» that I export to its dedicated Photoshop layer.
- Model merging: Experiment with merging different AI models to create unique styles and capabilities.
- Batch processing: Set up workflows for generating multiple images with variations.
- Animation workflows: Explore techniques for creating animated sequences using ComfyUI.
- Integration with other tools: Learn how to use ComfyUI in conjunction with photo editing software like exporting layers and masks to Photoshop for post-processing.
- Community engagement: Join ComfyUI forums and Discord channels to share knowledge and stay updated on new developments.
- Contributing to the project: As an open-source tool, you can contribute to ComfyUI’s development or documentation.
Use cases I plan to work on next:
- Creating ad variations of a given photo for different platforms.
- Adding layers of company logos or labels like «Created with AI« that are automatically exported to Photoshop for final touches.
Have fun and keep learning, because…
Alternative, simpler services
There are also other, easier-to-use online services such as Remini or Pixelup (Android | iOS).
The entry barrier here is very low, but you don’t have as much control over image generation.
Remini’s AI-powered service offers several key capabilities:
- Unblurring and sharpening images
- Denoising photos
- Restoring old and damaged photographs
- Enlarging images while maintaining quality
- Enhancing colors and tones
- Improving facial details