How to Generate Videos with Open-Sora-Plan Video Generation Model

Author:Murphy | View: 22833 | Time: 2025-03-22 22:06:19

In this article, you will learn how to use the Open-Sora-Plan video generation model [1], a powerful model that allows you to create your own videos, as you have seen with OpenAI Sora. I will discuss different tasks to which you can apply the model to, like generating animated videos and creating synthetic datasets. I will then give my thoughts on the model's performance and discuss its strengths and weaknesses. Ultimately, you should have a clearer picture of the utility you can get from this model.

Open-Sora-Plan generated images from the prompt "Writing an article." The model seems to understand the prompt since it showcases handwriting on paper, but the quality of the image is low. Image by the author made with the Open-Sora-Plan model.

Motivation

My motivation for this article is that an essential part of working as a data scientist is to keep up with the latest models within machine learning. Keeping up with the constant advances in AI takes time and effort. Therefore, I bring up some promising models I discovered on pages like PapersWithCode, GitHub, and HuggingFace in a series of articles. After finding a model interesting, I test it myself to see if it can be helpful for anything I am working with, which could be for work or personal projects. If the model could be beneficial, I then consider applying the model to the tasks I need it for.

Keeping up with the newest model is essential to stay caught up on the constant advancements in AI. This also makes me better at solving future problems, considering I will be more aware of the potential models that can be used to solve problems. My first article in the series of discovering the newest AI models was time series forecasting with Amazon's Chronos model, and my second article was on the fantastic DocOwl vision-language model, which I highly recommend checking out below:

Using a Multimodal Document ML Model to Query Your Documents

· Motivation · Tasks you can apply the model to · Running the model locally ∘ Installing the model ∘ Creating videos · Testing the model ∘ Test 1: The woman walking down a Tokyo street ∘ Test 2: Golden Retriever ∘ Test 3: A young man reading a book · My thoughts on the model · Conclusion · References

Tasks you can apply the model to

Considering this is a video generation model, the main task you can apply this model to is to create new videos. These images, however, can be used for a variety of purposes. An example of a purpose is creating images or GIFs, like I have done with the first image given in this article, which was created with the Open-Sora-Plan model. Creating images like this can create beautiful art, saving you an enormous amount of time compared to making the video yourself.

Creating synthetic datasets is another interesting use case generation model. Synthetic datasets can quickly generate massive datasets, which you can then use to train your machine-learning models. I have previously worked a lot on creating synthetic data, such as fine-tuning your OCR engine, which you can read about below.

How to make a synthesized dataset to fine-tune your OCR

Generating synthetic data will, in most cases, be much faster than manually generating and annotating a dataset, which makes video generation models for synthetic data generation an exciting use case. You should, however, be aware that it can be tricky to have the model generate the exact types of images you need for your dataset. But if you manage to do this, a large, high-quality dataset can quickly be created.

Video generation can also be used to create animated movies. This is an interesting use case because creating animated movies today is a labor-intensive process where humans must put a lot of effort into designing the animated world. This can be done a lot quicker with a video generation model. There are some issues here, however, as well. The first issue is that the output of the video generation model must match the standard we see today in animated movies. Few people will want to watch an animated movie with a lower quality than today's standard, and the model must be good enough to match the quality of today's animated movies.

The second issue is creating the exact video you want. The video generation model is a black box, meaning that even though the Open-Sora-Plan project is entirely open source, it is still tough to understand what is happening inside the model. In some scenarios, given a prompt, generating the exact video you want can be difficult. A way of dealing with this is by allowing video editing via prompting, another applicable task video generation models can use.

Running the model locally

Installing the model

The authors of the Sora model explain on the GitHub page how to run the model locally. However, I will still offer a minimalistic approach to running the model locally on your computer.

First, clone and enter the GitHub repository with:

git clone https://github.com/PKU-YuanGroup/Open-Sora-Plan
cd Open-Sora-Plan

You should then install the required packages with the following:

pip install -e .

After you have all the required packages installed, you can run the code with:

python -m opensora.serve.gradio_web_server

This will start a local Gradio UI element launched on a local host address. Open up this localhost address in your web browser, and you can use it to create videos locally. The Gradio UI will look like the image below:

The Gradio demo you see when running the Open-Sora-Plan model. Image by the author.

When running the code for the first time, it will download the model from HuggingFace for you. The model is around 22GB, so make sure you can fit that on your computer. In my case, I did not have enough space on my laptop and, therefore, had to attach an external SSD. To download directly to and SSD (and not to your .cache directory), you can enter the file located at: _Open-Sora-Planopensoraservegradio_webserver.py and change the four ‘cache_dir' strings to the directory you want to store the model at, for example ‘E:', if E is your external drive.

Creating videos

I then enter the prompt "Writing an article" and convert it to GIF format to display it in Medium, and I get:

Open-Sora-Plan video generated from the prompt "Writing an article" and running the model for 50 sample steps. As you can see, the result, in this case, is not too impressive. Video by the author made with Open-Sora-Plan.

You can also tune some parameters for the model:

Sample steps – The number of steps the model will sample. Note that if you increase this, inference time will increase. The authors recommend 50 steps for creating an image and 150 for making a video.
Guidance scale – This variable indicates how much creative freedom you give the model. A lower number will give Open-Sora-Plan more creative freedom, meaning the video will correspond less to your prompt. While a higher number will make the generated video correspond more with the prompt
Generate image – If you only want an image and not a video, as I did with the first image given in the article

Testing the model

I will not test the model by giving it a variety of prompts and see how it performs. I will also compare the model OpenAI Sora model by looking at samples of the Sora model and using the same prompt to generate a video with the OpenSora model. Unfortunately, I cannot show OpenAI Sora videos directly in this article due to copyright reasons. Still, I will link directly to a page where you can see the results from the OpenAI Sora model.

Test 1: The woman walking down a Tokyo street

OpenAI Sora example from the OpenAI website (scroll down to the first video you see)

This is a video generated on a woman walking down a Tokyo street. Video by the author made with the Open-Sora-Plan model.

This video is interesting. My initial thoughts are that the model gets the setting right with the signs in the background and a woman walking down the street. This implies the model understands the world well and how the world can be described in the text. The humans also look natural at the beginning of the video. After a second or so, however, the model loses track of the physical world, and you see some unnatural movement by the woman. Still, however, this looks impressive, and the setting reminds me of the corresponding OpenAI video.

Test 2: Golden Retriever

OpenAI Sora example from YouTube

Open-Sora-Plan example:

This is a video of some golden retrievers. Video by the author made with the Open-Sora-Plan model.

Wow, this video is more impressive than the last video. The Open-Sora-Plan model gets the details that make up golden retrievers close to perfect, creating a real-looking world around the dogs. The video looks very impressive with the two closest dogs, while the dogs in the background blend into each other and blur out for a bit.

Test 3: A young man reading a book

OpenAI Sora example from Youtube

Open-Sora-Plan example:

Again, the model understands the query by placing a young-looking guy with a book in the clouds, which is impressive. Unfortunately, the guy's face looks relatively flat, as if it is 2D, and the model struggles a bit with creating an accurate representation of a human foot. I still think that the model has done an excellent job here!

My thoughts on the model

I think this is an interesting project that makes video generation models more accessible to the public. The authors' examples on their website look well-made and accurate, though the videos are currently relatively short, limited to a few seconds.

When looking at the model's performance, I see one primary strong side and one main weak side. The strong side is that the model understands the prompts it is given. Asking the model for a woman walking down the street creates exactly that environment in the video. When I ask for a video of writing an article, it makes a world with a human hand, a pen, and a notebook. This proves that the text aspects of the model are working well.

The main weak point, however, is that the model cannot create videos that work within our physical realm. At the beginning of the video of a woman walking down a Tokyo street, for example, everything looks fine. Then, after around a second, the physical laws that exist in our world, not the digital world, disappear entirely. The model needs help grasping the physical aspects of our world. Naturally, this is challenging to achieve in a video generation model. Still, if the model gains a further understanding of the physical world, it can create ultrarealistic videos soon.

If the Open-Sora-Plan model could further grasp our world's physical aspects, it could reach its full potential. Photo by Igor Omilaev on Unsplash

Your interpretation of the model's performance will depend on your expectations. If you expect this model to perform as well as OpenAI Sora, you will be satisfied, as the generated videos from Open-Sora-Plan are far from the quality of the OpenAI Sora. However, from the perspective of not comparing this model to OpenAI Sora, this is really impressive work. Managing to run an open-source project capable of creating completely synthetic videos is astonishing, and the authors of the GitHub repository have done a tremendous job. Furthermore, I look forward to seeing how the repository will improve.

Consider contributing to the Open-Sora-Plan project with a list of tasks given on their GitHub ReadMe page. Even though the performance of the Open-Sora-Plan model is nowhere close to the OpenAI Sora model, the Open-Sora-Plan model has a lot of potential. Managing to create an open-source video generation tool can be hugely beneficial in many scenarios, such as content creation. Still, it is crucial to be aware of the dangers, for example, with the likes of deepfakes. However, a powerful approach to fight the rising issue of deepfakes is to be aware of the capabilities of today's AI models, making spotting deepfakes easier.

Conclusion

In this article, I discuss the Open-Sora-Plan open-source video generation model. I have discussed different tasks to which you can apply the model and how you can run the model locally on your own computer. Furthermore, I tested the model on a series of prompts to determine its capabilities. In the end, I also gave my own opinions of the model. Overall, the model's performance is nowhere close to OpenAI Sora, but Open-Sora-Plan is a very impressive project. I expect further improvements to the performance of the model. The world of open-source video generation has a lot of potential, which can immensely benefit many people.

You can also read my articles on WordPress.