GIMPIX: Video AI Software Analysis |
We have tested (or are fixing to) some of the currently available video AI software to ascertain suitability for our client desires.
Static image from text software has already reached the point of being able to produce highly realistic retro cast images a decent percentage of the time. Does it still generate rubbish? Yes, frequently, but will likely only improve. It's certainly a quantum leap over crude image editing techniques we used yesteryear. Video will add a whole new dimension and at Gimpix we don't think it's hyperbole to say that coupled with VR, video AI is the closest we will likely come to inventing a time machine. So where do we stand as of late 2024? Sora from Open AI is still in the beta testing stage in house despite a promised 2024 release date. There are some other video AI software products available though so where do they stack up?

images used for testing
The prompt(s) we are using for these tests will be:
full view of 1960s woman walking in supermarket (simple 'control' prompt to test basic concepts)
and
1960s woman with leg in plaster cast from toes to upper thigh walking on wooden crutches in supermarket (more complex)
|
Product |
Control |
Sample |
Comments |
Aitubo.ai |
Ctrl 01 |
Ctrl 01 |
There does not appear to be a free trial of the video generator. Using the image generator it appears that this product also does not grasp the key concepts we're looking for. |
|
Haiper |
Ctrl 01 |
|
Has options for both text to video and image to video. Cast video results were among the worst we've seen so far. We don't think it's even worth the space on the server to upload the cast sample! The control sample is not horrible but lacks the smoothness of other products. We actually tested this product with text to video back in March 2024. It does not appear to have improved by using an image. |
|
Kling |
Ctrl 01 |
Cast 01 |
The first challenge is signing up! On the human verification screen it is far from obvious that the way to drag the puzzle piece is via the arrows below! Once into the interface one gets 365 or so free credits to use although the pricing plans seem quite reasonable. Our first sample took a good 30 minutes to generate and it wants both a source image & prompt. The first control sample looks very promising! The cast sample however, was a disappointment. The software does not appear to grasp the concept of crutching motion or the cast remaining rigid. We went ahead and sent that feedback. However, when we substituted the word 'hobbling' for 'walking', the results became much better: Cast 02 This might also have been a factor with Minimax? Kling has a variety of monthly plans starting at under $9/mth (660 credits) to $81/mth (8000 credits). One also gets sixty or so free credits each day just for logging in but these expire every 24 hours. Our samples used around 35 credits for reference. Update 22/12/24: We have been experimenting with Kling's voice dub/lip sync feature and have found it to be quite good in this regard. One has to create a video first and then run the lip sync feature so it's a two pass process. The lip sync typically takes longer to process than the initial video! We have been unable to find a comparable feature in Minimax. |
|
Krea.ai |
|
|
This appears to be the 'Trivago' of: Hailuo, Luma, Runway and Kling (std/pro). We tried to give it our complex prompt using the Hailuo model but there does appear to be a free trial. |
|
LTX Studio |
|
|
Appears to only do storylines? |
|
Luma Labs Dream Machine |
Ctrl 01 |
Cast 01 |
The control sample gets the basics right although the motion is jerky but once the complexity of a cast & crutches are introduced the software does not know how to interpret those items. This one has a long way to go to be useful. |
|
Mid Journey |
|
|
.. |
|
Minimax aka Hailuo |
Ctrl 01 |
Cast 01 |
Site is hailuoai.video Signup is simple and you get a free trial. We gave it the same sample photo & prompt as Kling. Video generation was quite a bit quicker. This one looks promising. The standard plan at $9.99/mth gives 1000 credits or they can be purchased in various packages. $10=1000 so it's a bit of a wash but might make sense for continued evaluation purposes. They also give you 100 credits each day just for logging in but these expire every 24 hours. Now, if you've resisted clicking on the cast sample, OMG!, this is both exciting and a bit bloody scary! Notice how her expression changes from smiling to grimacing as she struggles in her big cast! The crutching motion is not quite there but that might also be a function of video length. We'll see if the paid version allows longer video. This one looks very promising. They might've beaten Sora to the punch. What about if we try animating a real cast image? Cast 02 Holy s**t. Not sure what she's doing right at the end but this has so much potential. It does not quite get the concept of descending stairs though: Cast 03 |
|
Pika |
|
|
We also tried this one some months back using a woman skiing (from a text prompt) Whilst it got the basics right, the motion quality left a great deal to be desired. We were unable to generate a new sample as the server was too busy. |
|
Pixverse |
|
Cast 01 |
At $8, the standard plan gives 30 daily renew credits and 1200 monthly so this one shows some promise. We tried 'hobbling' and 'walking with crutches' but the results differed little from the sample provided. |
|
Pollo |
|
Cast 01 |
We used a slightly enhanced version for this one: '1960s woman with leg in plaster cast from toes to upper thigh hobbling on wooden crutches in supermarket keeping cast rigid and non bending'. The results in 01 are quite promising despite some cast flex and the 'floating' along that Kling & Hailuo (Minimax) sometimes succumb to. We also tried the first sample with only 50% prompt strength (read adherence). Upping it to 75% actually produced worse results. One gets the impression that the interface is very similar to Kling and Hailuo so is this truly a different product? The pricing structure would suggest so. The free version uses only 10 credits for standard videos as does the lite version for $10 month which provides 300 credits. There does not seem to be any (significant) bonus for daily login. The pro version if this emerges as a winner is quite flexible allowing one to tailor credits desired to monthly outlay up to 50,000 credits for $659! |
|
PromeAI |
Ctrl 01 |
Cast 01 |
Site is promeai.pro It also uses a combination of image and prompt. Whilst it did OK on the control image albeit with nowhere near the smoothness of Kling & Minimax, it totally fell down when the cast concept was introduced |
|
Runway (Gen.3) |
Ctrl 01 |
Cast 01 |
Re-directed to imagine.art Both the control and the cast samples are disappointing. This product has a long way to go still. |
|
Stable video diffusion |
Ctrl 01 |
Cast 01 |
Site: stablevideo.com v1.1 Anything but stable in our view. The faces and other features are horribly distorted and the concept of crutching non-existent. Pass on this one for now. |
|
Sora Open AI Chat GPT+ |
30/03/25 |
Sora Screen |
Sora was released on Dec.9th 2024. We finally gained access on the morning of Dec.12th and now we know why the excitement has died down. We will try some more text prompts but honestly at this point our conclusion is that it's a massive failure and wasn't worth the wait. We expected that since Dall-E3 has a decent concept of what we're looking for (even compared to Kling & Minimax which also struggle with the idea of a cast) with just a text prompt that it would shine in comparison. As can be seen by the screen image, sadly that isn't so. What kind of B.S. is it that you can't create a video with people (using a source image)? Furthermore, our test image was created with AI's own bloody product Dall-E3 (via Bing). If it's so clever why can't it tell that? Sora is also not cheap. Basic access requires a GPT plus subscription ($20/month). GPT Pro is ten times that. We would pay for Pro if it could do the job - clearly, thus far it cannot as we have seen online that Pro does not solve the people problem either. Their people policy must change or we cancel GPT. (As an aside, the stills that GPT produces are generally not as good as Bing and there are usage limits & more censorship nonsense to contend with) Kling & Minimax have the upper hand to date as Sora is basically useless to us..
Update: It seems there has been some improvement on the Sora front. They are now allowing uploads of images with people so we shall text the video generation and see if it stacks up to Kling. Early indications are that it might. Also, it looks like 4-o image generation is possible within Sora and so far it seems to accept casts. Good thing too, because we were fixing to really let them have it over this image from their site which we find highly offensive. However, this is America. No matter how much we may find that image abhorrent, it is your 1st Amendment right to create it. We shall hold off registering a formal complaint with Open AI in light of the censorship in Chat GPT over casts unless Sora image generation is affected too. |
|
VEO 2 (Google) |
|
|
22/12/24: Don't know much about this one yet. It's reputed to give Sora a run for its money. The signup process is involved. We had to fill out a form and wait for an email. Nothing thus far. |
|
videoexpress.ai |
|
|
This is old school software in that it appears to be a one time price for the license. Sadly, there did not appear to be any kind of trial period to enable us to test this made in Canada product which looks like it might be able to easily add voice to video? We'll email them 'aboot' that eh? |
|
Videogen.io |
|
|
Has zero concept of elements we're looking for. |
|
Vidu.ai |
|
|
Appeared to re-direct to: aihugging.org No free trial so not able to generate samples and the starter plan at $7.99 month doesn't even allow downloads. Not much use we reckon. |
|
Static Image Generators: Since many of the video generators enable the use of an image as a starting point a review of the current offerings is probably in order. |
Dall E-3 |
|
|
This is the one we have been using via Bing. It certainly understands the concept of a cast and generated some decent images. However, the ones we have published are the better ones (although there are other good ones awaiting upload). Many, many have been rejected as outright rubbish. We have recently been able to convey to the it the concept of writing on a cast although sometimes it ends up all over the walls as well! Some concepts that it really struggles with are:
1. Eliminating the non casted leg in sitting poses. We have tried all sorts of different language to no consistent avail. It does what it wants and if you get both legs its a fluke a small percentage of the time.
2. Cast elevated in traction: Again, we've tried to convey this is many ways without much success.
3. Walking cast with (rubber) heel. Must admit we've not really tried the older style walking iron.
4. Getting basics wrong (sometimes) like the toes going the wrong way for the leg side. Add to that too many or (less often) too few toes.
All things considered this is a decent product and it's free. Even when one uses up the daily 15 'boosts', it's still possible to keep going for quite a long time. We've not really noticed any difference in image accuracy with or without the boosts, in fact, some time after boosts run out is when it can produce a near gem. Learning a new concept?
Generally, Dall E3 does best with standing poses and almost never gets it wrong on these as described in number one above.
|
|
Deep AI |
15/03/25 |
Supermarket |
Used the standard prompt at top of page for this. |
|
Freepik (Freep!k) |
15/03/25 |
|
Anything but free beyond the daily free credits (20?) and has no idea what a cast or for that matter crutches is/are. |
|
Gemini (Google) |
15/03/25 |
Supermarket |
At first it drew only one leg so we had to modify the prompt to: "1960s woman with one of her legs in a plaster cast from toes to upper thigh walking on wooden crutches in supermarket". This is not a terrible result compared to some sites, however one has to re-prompt each time. |
|
Microsoft Copilot |
15/03/25 |
Supermarket |
Used the standard prompt at top of page for this. Out of four tries refused to generate image on three of them. It seemed to settle down with a different, more comples prompt. Still requires new prompt or a request each time though beyond just a click. |
|
Microsoft Designer |
15/03/25 |
Supermarket |
Used the standard prompt at top of page for this. This is a paid service. Looks like we got 15 credits free and it took one to get three images. This one is by far the best and the page took massive system resources - fan was roaring - had to force close. This test image then showed up in Bing so they are one & the same. |
|
Mid Journey |
|
|
Used openart.ai with both the prompt at the top of the page (standing instead of walking) and one of our more detailed prompts. Dismal results for both. |
|
Open AI 4o Chat GPT Plus |
30/03/25 |
|
How can we be somewhat polite about this? It's not easy. Open AI is anything but. In fact of all the software we've looked at it has the most censorship B.S. to contend with by far. We were excited to see what their new 4o image generation could do only to find out that it censors virtually everything with 'cast' so it's useless. Really sad since it's an American product. It lulled us in with a couple of decent results in the free version only to refuse requests after renewing Plus which we promptly cancelled again. We shall annoy them as much as possible for the remainder of the subscription. It did allow some enhancement of existing images with mixed results. This is even worse since it is an American product. Someone really needs to get creative and find a way to sue these companies for violation of basic free speech. If it's legal, one should be able to create it-end of discussion. We don't give a damn if they're a private company. They operate as near monopolies with arguably more power than government. Their censorship is totally unacceptable. |
|
Perchance |
15/03/25 |
|
Cartoonish characters (anime I think they call it) and zero concept of cast/crutches. |
|
Pixlr |
15/03/25 |
|
You get at least some free credits to start (50) but it has zero concept of cast/crutches. |
|
Sora 4-o (Open AI) |
06/04/25 |
|
This is a major game changer. A vast increase in the quality of images over Dall-E-3 (Bing). It's called O-4 but don't try to use it through Chat GPT as we found the censorship there impossible to work with. Whilst the quality is greatly improved overall and aside from nudity it appears that everything else is fair game, it does still struggle with some of the same concepts with regard to casts that Bing does: Walking heels; getting things like arrangement of toes wrong & other leg missing in sitting poses, along with two casts when only one was aksed for. It also only seems to know what regular underarm crutches are. We have seen a few successful renderings of forearm crutches but it has no idea on the Unistrut style. Genrally though, prompt adhernace is much improved, perhaps most notably in the ability to specify a certain type of footwear for non casted limb. That can also mitigate the problem of missing limbs in sitting poses. Also, subtle changes in wording remain important: we were trying to re-create the Vicky Powell wagon news photo and a change from simply "in" to "on" made a huge difference! This new software does not come cheap though. A Sora subscription via ChatGPT is required as far as we know. Two levels are offered: Plus at $20/month and Pro at $200/month. We opted for the latter (at least for now) after quickly hitting the image limit in Plus. We may go back to Plus depending upon how well Sora's video works in the short term. At least, they allow people source images now so that's progress. |
|
Here's what ChatGPT had to say when asked about the problem of crutching in AI videos:
1. Limited Physical Reasoning
Even though these models generate realistic-looking video, they often lack deep understanding of physics and biomechanics. Crutch use involves:
Coordinated movement between the arms, legs, and torso
Weight shifting and balance
Gait changes depending on injury type
If the model has not seen enough examples, it will not accurately recreate that complexity.
2. Training Data Gaps
The model is trained on video/image datasets scraped from the internet. If:
Crutch use is underrepresented
Or if those examples are simplified/staged Then the model will not learn the nuanced mechanics of how people actually use crutches.
3. Pose Estimation Limitations
Generating realistic motion means predicting human poses frame by frame. When crutches are involved:
Extra objects (the crutches) change limb constraints
There is more dependence on correct contact with the ground
If the model doesn't anchor the crutches to the floor or sync movement realistically, you get floaty, awkward, or just plain wrong results.
4. Lack of Constraint-Aware Animation
Most tools do not yet simulate:
Object-person interaction constraints (like pressure, resistance)
Injury-aware movement adaptations
So the AI might generate a person with crutches, who still walks normally or even runs - because it does not "know" that a person using crutches would shift weight off one leg or move slower.
|
|
|
|
|
.. |
|
|
Copyright (c) Gimpix. All rights reserved
|