While OpenAI opened the door to daily usage of AI for text-to-text use cases, AI image generation didn’t yet experience the same massive boom.
Image generation is experiencing difficulties because the model hasn’t been trained for as long as the text ones and the use cases needing it will directly use the Image as the final output.
The lack of quality, consistency, and scalability is a brack to the adoption of AI for image generation. So far the only player that was getting closer to the needed quality was Midjourney.
However, the experience there is super daunting and not friendly to all type of users:
You need to create a discord account and interact with both in a general discussion to get your image generated and many people are not ready to get through this for their use case.
The solution to that is Midjourney API, as it would provide startups with an easy way to integrate the foundational model, fine-tune it to specific use cases and make the user experience easier.
However Midjourney API is not out yet, and that is frustrating for many players of the AI ecosystem. Even OpenAi with its image generation model (Dall-E) is not yet able to compete with the level of the Midjourney foundational model.
To deal with that OpenAI released their Plugins feature in which you can directly from ChatGPT use the solution of other AI companies.
At Argil we released a Plugin with one of our features (Image Studio) but to conform with the rules of OpenAi we had to use Dall-E instead of SDXL (Stable Diffusion’s foundational model) which is far from what we can get with SDXL.
Let’s explore in this Article the state of image generation and the role that Argil is playing.
Industries in which the Image represents the base of their business operation didn’t yet adopt AI because image generation faces many challenges.
Here are a few reasons why image generation faced so much difficulty to be adopted by agencies and corporations:
1/ Consistency of the image quality
While prompting is a daunting process, once you successfully got the results you wanted how do you keep consistency?
How do you push this one step further and make the AI understand what model are you looking to generate:
Fine-tuning AI on specific datasets wasn’t possible until a few months ago, or at least wasn’t possible for everyone. But on Argil it’s a feature you can access now and start using it for all your use cases.
Try it here.
When generating an image there’s a prompt associated with it. That prompt is a unique configuration of words. Let’s say you’ve been playing with a studio for Image generation and want to generate rooms in a specific mood: Harry Potter.
After having generated 100 pictures you decided that 20 of them are the best and you would like to build a template you can reuse when you want your image generation to be on the base of a Harry Potter mood.
On Argil you can choose the 20 images and build a template out of them. Here’s a video that explains this in more detail: Here
Finally, the scalability of the generation of high-quality images was impossible:
These 3 together made it impossible to build a scalable use case. Now with Argil API, you can.
Here’s a visual representation of how our API works:
The challenge is present, but we push daily the realm of possibilities at Argil to improve the quality and experience of Image generation to fit in our automation multimodal approach.
The Two main foundational models for an image generated are:
While SDXL is open source, has an API, and gives room for optimization and fine-tuning it does not yet match the image quality and details of what you can get on Midjourney.
But Midjourney is failing on another part, it is not open-sourced, there’s no Midjourney API and the user experience is super daunting. You need to create a Discord account and interact with a chatbot using different parameters including letters and numbers.
This heavy and inefficient experience led corporate and agency people to believe that AI was not yet ready for an efficient and scalable solution for their image generation use cases.
But that was until SDXL launched its two latest updates:
Both showed that the gap between Midjourney and Stable Diffusion is getting thinner and that many use cases are already possible at scale.
The success of ChatGPT is mainly due to the ease of use of its interface. People don’t use the best solution or the solution that solves their problems, people use the solutions they enjoy using.
That’s why when building a SaaS one of the top priorities should be the user experience, features may be lacking and they might not solve concretely a challenge.
If people enjoy the experience they will stay, and give you the right feedback to iterate on your product.
Our vision at Argil is based on three industries:
The intersection between these 3 is creating a new industry: Hyper-automation.
Our goal at Argil is to build the application that represents the best this new industry and all the use cases it corresponds to. For that we have identified different components we’re building on top of:
To get the right results with generative AI you need context, so we’re designing the SaaS in a way you’ll be able to upload documents and use them as base of reflection.
The main limitation of automation was that it focused exclusively on repetitive tasks, while now we can automate creative tasks and contextual ones.
Product builders, solopreneurs, and entrepreneurs are looking for scaling in their applications. For that, they need a solution that can be easily integrated into their current processes.
Argil’s API allows you to do it easily and outsource vertical features development to our automations.
Productivity means understanding, and hyper-automation means at increasing productivity. Having the ability to share with the rest of your team the advancement of some specific part of your work is essential.
Generation of text and images does not only need context but also training also called fine-tuning. On argil you can upload pictures of yourself and let the AI build a model of yourself you can then use to generate pictures of you in different settings.
5/ Natural language
No one wants to learn how to prompt, it’s daunting and not natural. What people want is an easy way to automate the flow of writing down what you want, making the AI understand it and give the right outputs.
We’re building Argil following this principle, no one will need to learn how to prompt to use Argil.
While ChatGPT focused on text, our vision is more global you’ll have the ability to create in a multimodal setting (Images, text, Documents, etc) directly from our Chat.
If you haven’t tried yet Argil, what are you waiting for? Try it here
We’re here to build proactively, we help people looking for a Midjourney API and we’ll help you even more if you give us feedback and let us hear your voice.