some oddities with your new extend video workflow

#152
by BallisticAI - opened

I just tried it out and generally it works well, but I'm stuck on a few things. I don't know if you're still working on them or not.

  1. I'm sometimes get color shifting in the original extension, and I cant seem to find how to enable color matching in those nodes. Is it hidden behind a node with the slew of get/set nodes you use?
    Example:

    Input video:

  2. the original extension keeps the same exact vocal type and sound of the original video (about 50% of the time), but the extend videos more than not seem to change entirely. Other times its the exact opposite. The original extension changes the voice but the rest are close to consistent. Are you planning to integrate the ID LORA bits to keep the voice consistent?

  3. Occassionally in the first extended extension, the background will be entirely swapped out with some swipe transition effect, or given what looks like the end of a commercial segment. Ive never gotten this before with any LTX generation. Is there a way to prompt to avoid this?

Same input video as above

Extend Attempt 1:

Extend Attempt 2 (without color matching):

  1. Im not worried about captions, I know that's an LTX thing for vertical video and not related to your workflow.

Yes the extended video has its weak points.

  • it uses 73 frames as reference frames (video and audio). That is the last 3 seconds of previous video.

Anything not happening in those last 3 seconds, the next video part has no idea about. So if the voice is not part of the last 3 seconds, its likely to change in next part.
You can set the reference frames higher though. 73 is the minimum recommended by LTX. (and i put that in the wf, since many will probably try extending short 5s videos)

The color drift/changes are strongest in 2-pass workflow. Since it first generate at low resolution version and then runs it again through a 2nd pass upscale.
This introduces new details (often better) than the original, but since it differs from the original it might be quite noticeable

  • The workflow works best with single pass (the workflow has a single pass / 2pass mode toggle)
  • With 2-pass, it tries to blend the frames in an overlap so the changes are a little less noticeable
  • With 2-pass there is also a color match toggle at each group that can help color shifts a little bit (only use this with 2-pass, at least in theory single pass shouldn't need it)

That being said, the color match is set really low to be subtle (0.25 if i remember). You can peek inside the subgraph and set the color match stronger (its near the top right of the subgraph wf).

And for any strangeness (that will happen, 3 seconds and the model has to "guess" what to do next... sort of), the best fix is often just changing the seed and prompt.
If the output was not desirable.

That all being said, its the newest workflow, will take a look and see if it can be made to work even better. Maybe some latent guiding nodes or something

Also color matching node is "dumb" so it takes the whole image. In your examples the green is very strong (background). It will make everything a shade of green likely
So it might not be a node for that particular video at least, and best be turned off

A little test run with no color matching turned on, and single pass... seems to hold up pretty well. Ideally this type of workflows should run at single-pass

But its the 2 pass workflow that's probably the challenge, as 2 samplers introduce new details (and colors).
Will test a bit with 2-pass if it can be made to be more smooth

Yep, second pass ruins it. Keep it as single. Also best to use a video editor when using this workflow.

Edit and cut the last 5 seconds of the recently created video and continue from that for a new generated video. Merge/Join a 5 second talking part from the original video and add it at the front of that newly cut 5 second video beforehand to keep the voice. Once that's done, edit the new longer video and remove the 5 second talking part at the beginning and merge the rest with the first edited video.

Can all be done easily through a freeware program like avidemux. It doesn't take long. Seems to be the only way as far as I'm aware to make really long videos without loosing character or voice.

Stick with the single extend workflow. Don't use the multi extend. Unfortunately that one works horribly and will mess up the characters and audio.

Also use the new OmniNFT lora and the Licon-VBVR-I2V-Video Reasoning390K-R32 lora and have them set at 1 strength. These are needed in order to actually follow the new prompt. I found not using them in this workflow makes the character just stand there like an idiot and not do anything.

Also, make certain the reference setting in this workflow at the bottom is not set higher then the added video or it will not transition properly. Setting it higher then the video length will cause it to skip/cut scene. To keep a smooth transition between the old and new, set it 1 second below your added video.

Something else to keep in mind, the lower your reference number is, the faster the generation will be. Useful if the talking takes place near the end of the video. You don't need to make the reference as long if that is the case.

Works great in single-pass and thats really the mode for such a wf.
Its nice for something quick and easy ...

For more serious video editing, using external editor, and doing one by one extension is better.
That way you can do each extended part over and over until you get the one you like... and keep going from there

Thanks for all the comments, I've been busy with other things so I havent been able to refocus on this, but I should have some time this upcoming week. :)

Yep, second pass ruins it. Keep it as single. Also best to use a video editor when using this workflow.

Edit and cut the last 5 seconds of the recently created video and continue from that for a new generated video. Merge/Join a 5 second talking part from the original video and add it at the front of that newly cut 5 second video beforehand to keep the voice. Once that's done, edit the new longer video and remove the 5 second talking part at the beginning and merge the rest with the first edited video.

Good advice for sure when it comes to keeping voice and motion consistency, but how are you dealing with frame jumps on the final merge? Extending the same video twice with this method and then splicing them together won't be seamless, as the last frame never aligns perfectly with the next starting frame.

Not sure why it doesn't align properly for you. Using avidemux with the cutting and merging is working perfectly for me. You can't tell its been edited at all. I'm making 5+ minute videos. Perhaps I didn't explain it properly in my earlier post.

Lets say I have a 15 second clip. I will remove the first 10 seconds of that clip and save the last 5 seconds and use that for the next extended generation. You must remember exactly where it was cut. This could be a flat 10.000 seconds or something like 10.004(just an example. the .000 number will be different a lot of times.) Go to the roughly 10 second mark and click at the bottom of avidemux, the B- icon(Set end marker) Then click edit then delete. Then save your 5 second clip. If you get a error warning about about a "keyframe" click yes to proceed. Make sure you save your 5 second clip with the video compression option(x264). This will prevent any problems occuring from this error like frame skips. Just set the compression to Quality(0) so it doesn't degrade the video in anyway. *** very important - Select MP4 Muxer as your output format with avidemux. Comfy won't recognize the default setting of avidemux which is MKV.

After you generated the new video from the cut 5 second clip, open the old video and cut/delete the last 5 seconds of that video. Go to the first 10 second mark or 10.004(whatever the case may be)from your earlier cut point and click the A- icon at the bottomof avidemux(set start marker) and select edit then delete. Click Append(Merge) and select your new generated video and save with one of the compression option x264. Be sure to set x264 to high quality(0) so it doesn't change the quality of the video. Sometimes this compression option is needed if you don't want cuts/skips in the video.

The same applies when you add the first 5 seconds of talking to the last 5 seconds of the video you want to continue from. The talking scene does not have to align with the last 5 seconds that your continuing from. Hell, it doesn't even have to be the same scene. It can be a totally different scene. Its only there to preserve the voice of the character. the 5 second talking part will be removed before you merge the new extended part to your first created video. Just keep doing this over and over and you can have very long perfect videos.

Don't skip on those two loras I suggested earlier. The prompting just doesn't seem to work(at least for me) when extending videos without them. There not needed for your first created video, but are necessary if you want to extend on it. Not sure why this is. Very odd.

Hope I explained this a little better. Just know that it is possible to make a perfect 5+ minute video by doing it this way. Again, make certain that the reference number at the bottom of this workflow is at least 1 second below the video you want to extend/generate from or you will end up with skips/cut scenes. You also can't set the ref # higher then 21 or it won't work. RuneXX explains that in his workflow.

Something else to keep in mind. LTX is pretty damn good at cloning voices but it doesn't always work perfectly. If you used a certain lora that has its own audio and you created your first video with it, its probably a good idea to keep using that lora if you wish to extend on the video and keep the characters voice. Some loras come with there own voice overs. Also, you can probably get away with extending on just a 2 or 3 second clip instead of 5 like I do. I just stuck with 5 seeing as its working great for me so far.

One other thing. The camera for my long videos are all fairly close up to the character(s). I havent tried extending any videos where the camera is far away from the character. I can't say if its any good with far away shots.

Edit

Sorry, one final suggestion. This post of mine is getting way to damn long.
If you find it annoying having to keep track of the numbers of where you made your cuts, you can always keep avidemux opened(Be mindful of your ram while generating) and just merge from there. Just select undo and bring back the full scene and the part you highlighted for the 5 second end should still be marked by avidemux. Delete that part. Merge from that point with the new generated clip. Of course if you added a talking part at the beginning, you will need to open a second avidemux and do your editing there to remove the talking scene. Save it. Then merge it with the first opened avidemux. Can maybe save you a little bit of time.

@RuneXX Seems like an error but maybe intentional? If you start with a video with no audio (e.g. a 5-second WAN clip) and toggle INPUT VIDEO HAS AUDIO to false but extend the clip, the first 5 seconds will remain silent and audio kicks in after the 5 second mark. Shouldn't it also be generating audio for the initial 5-second clip, too?

@RuneXX Seems like an error but maybe intentional? If you start with a video with no audio (e.g. a 5-second WAN clip) and toggle INPUT VIDEO HAS AUDIO to false but extend the clip, the first 5 seconds will remain silent and audio kicks in after the 5 second mark. Shouldn't it also be generating audio for the initial 5-second clip, too?

yes... "intentional" .. as in this workflow doesnt add audio to a silent video (only to the extended part).
You could use the "foley, add sound to any silent video" workflow first.

But maybe i should find a way to make this workflow also do audio for the silent part, its doable.. with some extra spaghetti noodles and switches ;-) will take a look

Hey RuneXX, if your updating this workflow, you may want to change the video loader to 'Load Video FFmpeg'. I found using this as opposed to your current one doesn't mess with the colors/brightness. Just a suggestion. Thanks for all you do here.

Hey RuneXX, if your updating this workflow, you may want to change the video loader to 'Load Video FFmpeg'. I found using this as opposed to your current one doesn't mess with the colors/brightness. Just a suggestion. Thanks for all you do here.

ah yes true. Will do.
Also had quite a few videos that wont load at all, unless using the FFmpeg version. So if its not that already, a slip ;-)

Sign up or log in to comment