The video below is completely generated. Kuaishou, a popular Chinese social media company just teased Kling, their video-to-text generator that looks to go head-to-head with Sora. The diffusion-based model is said to be able to create 1080p realistic videos up to 2 mins long.
How do you think this competes with Sora?
The early peeks look impressive. I'm excited for more competition in the video generation space — and I can't wait to see what the world creates.
generative video has long way to go before getting down to real use cases and controllable. it’s also relying on the data used for training which China has both advantages(lack of copyright) and disadvantages(diversity and tolerance).It’d behard to compare if this goes on as they may grow into different path.
Bytedance introduces "Boximator: Generating Rich and Controllable Motions for Video Synthesis" 🎥 with a new approach for fine-grained motion control ✨ in video synthesis.
The authors propose a method called Boximator, which introduces two types of constraints: hard box 📦 and soft box 🌫️.
Users select objects in the conditional frame using hard boxes 📦 and then use either type of boxes to roughly or rigorously define the object’s position, shape, or motion path 🛤️ in future frames. Boximator functions as a plug-in for existing video diffusion models 🔄. Its training process preserves the base model’s knowledge 🧠 by freezing the original weights ⚖️ and training only the control module 🕹️.
Paper: https://lnkd.in/gTPETYV5#bytedance#llm
THE PULSE ON AI - CSPARNELL
Harnessing OpenAI's Sora Model for the Next Wave in Music Videos
With the unveiling of the first music video generated by OpenAI's unreleased Sora model, we're witnessing a new horizon in the digital media landscape. This innovative AI model isn't just altering how music videos are produced; it's reshaping the entire creative process.
What is the Sora Model?
The Sora model by OpenAI represents a groundbreaking approach in generative AI technology, specifically tailored for video production. This tool can synthesise elements like motion, timing, and visual effects to create cohesive and dynamic music videos from scratch, based on audio inputs alone.
Why is it important?
Sora's ability to automate complex video editing and production processes means that artists and creators can focus more on their artistic vision without the constraints of technical execution. This democratizes music video production, making it accessible to artists at various skill levels and budgets.
How does it impact society?
By reducing the barriers to entry for high-quality video production, Sora empowers a broader range of artists to share their visions and voices. This could lead to a more diverse and vibrant cultural landscape, as creators from different backgrounds can present their work on a global stage without the need for expensive resources.
How can we use this today or in the future?
Currently, the technology is still under wraps and not widely available. However, as it develops, we could see it being used not only in music video production but also in filmmaking, advertising, and other areas of multimedia. The potential for Sora to aid in educational content, where engaging visuals are crucial, is particularly promising.
Conclusion
The development of OpenAI's Sora model signifies a transformative shift in creative media production. It offers a glimpse into a future where AI and creativity merge to enhance and amplify human expression, making it more inclusive and accessible than ever before.
#OpenAISora#AIinMusic#CreativeAI#DigitalMediaRevolutionhttps://lnkd.in/gHynCUJP
If you missed our June stream, it is right here!
We discussed ads in video games and when people are in a shopping mood, the latest Google search leak, Google's new AI overviews, and much more.
Watch the full stream here:
https://lnkd.in/ew8-iSA6#Ads#AI#GoogleAI
Hedra offers text-to-speech-to-video, all in one interface. I had a tremendous amount of fun with this one. While it's exciting to see this kind of technology surface in research papers, it's really thrilling to see it make its way to consumer tools so quickly.
These aren't just static zombie heads; there is real anima to be had. I love the nuanced performances. I'm curious if this is licensed tech from Microsoft's Vasa-1 or Alibaba's EMO or if it's all homegrown, but the results speak for themselves, quite literally.
I created all my character portraits directly in Midjourney and generated speech on ElevenLabs with a little cleanup thanks to Audacity Team. All of this could be done within Hedra, but I prefer the fine-grained control you get at the source. The music was from Udio, and the edit was CapCut.
Some minor frustration on the following points:
• Glitching around the eyes on some animations
• Excessive blinking, I would like to see sliders for ramping down general twitchiness
• Not getting great results on three-quarters view faces
• Restricted to 512×512 pixels on free preview tier
• I had a Steve Jobs headshot, but the generation was blocked on account of being "of a public figure"
• Age constraints on my teenager at a birthday party, I had to use progressively older headshots to get a video generation going
Still, this is the first of the talking head tools that didn't want me to hurl my laptop out the window, so it's definitely one to watch.
Congratulations to Michael Lingelbach, Mustafa Işık, and Hongwei Yi at Hedra on an outstanding launch.
#ai#hedra#speechtovideo#midjourney#udio#filmmaking#hollywood
Interesting... Still look at the number of tools needed to achieve this - each one requiring knowledge to operate and then it is the eye of the "director" (Rupert in this case) who have final say.
#AI#VirtualActing
Cobalt AI Founder | Google 16 yrs | Speaker | Advised 15+ companies about AI integration
Hedra offers text-to-speech-to-video, all in one interface. I had a tremendous amount of fun with this one. While it's exciting to see this kind of technology surface in research papers, it's really thrilling to see it make its way to consumer tools so quickly.
These aren't just static zombie heads; there is real anima to be had. I love the nuanced performances. I'm curious if this is licensed tech from Microsoft's Vasa-1 or Alibaba's EMO or if it's all homegrown, but the results speak for themselves, quite literally.
I created all my character portraits directly in Midjourney and generated speech on ElevenLabs with a little cleanup thanks to Audacity Team. All of this could be done within Hedra, but I prefer the fine-grained control you get at the source. The music was from Udio, and the edit was CapCut.
Some minor frustration on the following points:
• Glitching around the eyes on some animations
• Excessive blinking, I would like to see sliders for ramping down general twitchiness
• Not getting great results on three-quarters view faces
• Restricted to 512×512 pixels on free preview tier
• I had a Steve Jobs headshot, but the generation was blocked on account of being "of a public figure"
• Age constraints on my teenager at a birthday party, I had to use progressively older headshots to get a video generation going
Still, this is the first of the talking head tools that didn't want me to hurl my laptop out the window, so it's definitely one to watch.
Congratulations to Michael Lingelbach, Mustafa Işık, and Hongwei Yi at Hedra on an outstanding launch.
#ai#hedra#speechtovideo#midjourney#udio#filmmaking#hollywood
Workflow:
text to image for faces
write a script for what they say
generate voice
generate talking head
The promise is do this in a single workflow. But the most performant tools for controlling generation are across different platforms.
This video is a powerful because it captures the breadth and quality of simulated characters and voices.
It does not address the challenges of character consistency across multiple runs. As a creator, working with Midjourney to establish your shots with a consistent character, is still the best approach. We should expect a unified 3D model (or it's latent space, or splatted equivalent) to eventually be more useful. My crystal ball says 4 months, till that surfaces. At the moment, building up a library of Midjourney characters or poses that are "ready for action" or "ready for speech" is a great way to go.
Then switch programs based on the shot. If it's
action, use Luma or GEN-3. If it's front facing speech, or expression, use Hedra.
Notably, Hedra has a great face model, which captures, facial expression, lip sync and head movement.
Cobalt AI Founder | Google 16 yrs | Speaker | Advised 15+ companies about AI integration
Hedra offers text-to-speech-to-video, all in one interface. I had a tremendous amount of fun with this one. While it's exciting to see this kind of technology surface in research papers, it's really thrilling to see it make its way to consumer tools so quickly.
These aren't just static zombie heads; there is real anima to be had. I love the nuanced performances. I'm curious if this is licensed tech from Microsoft's Vasa-1 or Alibaba's EMO or if it's all homegrown, but the results speak for themselves, quite literally.
I created all my character portraits directly in Midjourney and generated speech on ElevenLabs with a little cleanup thanks to Audacity Team. All of this could be done within Hedra, but I prefer the fine-grained control you get at the source. The music was from Udio, and the edit was CapCut.
Some minor frustration on the following points:
• Glitching around the eyes on some animations
• Excessive blinking, I would like to see sliders for ramping down general twitchiness
• Not getting great results on three-quarters view faces
• Restricted to 512×512 pixels on free preview tier
• I had a Steve Jobs headshot, but the generation was blocked on account of being "of a public figure"
• Age constraints on my teenager at a birthday party, I had to use progressively older headshots to get a video generation going
Still, this is the first of the talking head tools that didn't want me to hurl my laptop out the window, so it's definitely one to watch.
Congratulations to Michael Lingelbach, Mustafa Işık, and Hongwei Yi at Hedra on an outstanding launch.
#ai#hedra#speechtovideo#midjourney#udio#filmmaking#hollywood
We already know Sora's ability to create realistic videos from text.
Now, ElevenLabs has taken it a step further by adding sound effects that match the video scenes, enhancing the sample videos from OpenAI with background sounds. 🎵
This advancement could significantly speed up video production, especially in post-production, where adding sound effects is time-consuming and costly. Imagine creating a video about a beach and having the sound of waves automatically added to it. ⏳️
This tool represents a big leap in making video creation more efficient and accessible. I personally believe this tool could definitely accelerate the delivery by 3x times.
Read more below ⏬️⏬️
https://lnkd.in/ecX3NxZE
💡💹 Microsoft Copilot for Finance!
With the pace of business accelerating every day, becoming a disruptor requires investing in technology that will drive innovation and support the bottom line. In the next three to five years, 68% of CFOs anticipate revenue growth from generative AI (GenAI).³ By implementing next-generation AI to deliver insight and automate costly and time-intensive operational tasks, teams can reinvest that time to accelerate their impact as financial stewards and strategists.
Techno-Evolutionist
1mogenerative video has long way to go before getting down to real use cases and controllable. it’s also relying on the data used for training which China has both advantages(lack of copyright) and disadvantages(diversity and tolerance).It’d behard to compare if this goes on as they may grow into different path.