CogVideoX是什么?
CogVideoX是由智谱AI开发的视频生成大模型,具备强大的视频生成能力、只需输入文本或图片就可以轻松完成视频制作。此次开源的是CogVideoX-2B,20亿参数规格的模型,是CogVideoX 系列视频生成模型中的第一个模型,与智谱推出的AI视频生成产品清影同源。功能更强大、参数更大的模型即将推出。
CogVideoX-2B支持以英语输入最长226个tokens的提示词,消耗36GB显存,生成分辨率为720*480的6秒视频。
CogVideoX的核心技术
- 三维变分自编码器结构(3D VAE):智谱AI自主研发的这一结构能将原始视频数据压缩至原始大小的2%,降低训练成本和难度。结合3D RoPE位置编码模块,提升了时间维度上帧间关系的捕捉能力,建立视频中的长期依赖关系。
- 端到端视频理解模型:增强了模型对文本的理解和对指令的遵循能力,确保生成的视频更符合用户需求,能处理超长且复杂的prompt指令。
- 文本、时间、空间三维一体融合的transformer架构:创新性设计了Expert Block实现文本与视频模态空间的对齐,并通过Full Attention机制优化模态间交互效果。
CogVideoX的生成案例
生成该视频的提示词:
A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship’s hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children’s items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship’s journey symbolizing endless adventures in a whimsical, indoor setting.
一艘精致的木制玩具船,桅杆和船帆上雕刻着复杂的图案,在模拟海浪的蓝色长毛绒地毯上平稳地滑行。船身漆成浓郁的棕色,并带有小窗户。地毯柔软而有质感,提供了一个完美的背景,就像一片广阔的海洋。船的周围环绕着各种玩具和儿童用品,暗示着一个充满童趣的环境。这个场景捕捉到了童年的天真和想象力,玩具船的旅程象征着在异想天开的室内环境中的无尽冒险。
生成该视频的提示词:
The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from its tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.
镜头跟在一辆带黑色车顶行李架的白色复古越野车后面,越野车在陡峭的山坡上沿着松树环绕的陡峭土路上飞驰,轮胎上的尘土飞扬,阳光照在越野车上,越野车在土路上飞驰,给整个场景投下了温暖的光辉。土路缓缓弯向远方,看不到其他车辆。道路两旁的树木都是红杉,还有零星的绿色植物。从后方看,汽车轻松地沿着弯道行驶,仿佛在崎岖的地形上行驶。土路本身被陡峭的丘陵和山脉环绕,头顶是晴朗的蓝天和飘渺的白云。
生成该视频的提示词:
A street artist, clad in a worn-out denim jacket and a colorful bandana, stands before a vast concrete wall in the heart, holding a can of spray paint, spray-painting a colorful bird on a mottled wall.
一位街头艺术家身着破旧的牛仔夹克,扎着彩色头巾,站在市中心一堵巨大的水泥墙前,手持一罐喷漆,在斑驳的墙面上喷绘着一只色彩斑斓的小鸟。
生成该视频的提示词:
In the haunting backdrop of a war-torn city, where ruins and crumbled walls tell a story of devastation, a poignant close-up frames a young girl. Her face is smudged with ash, a silent testament to the chaos around her. Her eyes glistening with a mix of sorrow and resilience, capturing the raw emotion of a world that has lost its innocence to the ravages of conflict.
在一个饱受战争蹂躏的城市,废墟和残垣断壁诉说着满目疮痍,在这个令人心碎的背景下,一个凄美的特写镜头定格了一个年轻的女孩。她的脸上沾满了灰烬,无声地证明着周围的混乱。她的眼睛里闪烁着悲伤和坚韧,捕捉到了这个因冲突而失去天真世界的原始情感。
如何使用CogVideoX?
CogVideoX已提供模型下载、在线体验和官方API服务。
1、模型和代码下载:
- CogVideoX-2B模型下载地址:https://huggingface.co/THUDM/CogVideoX-2b
- CogVideoX GitHub地址:https://github.com/THUDM/CogVideo
2、企业和开发者:通过智谱大模型开放平台bigmodel.cn调用API服务https://open.bigmodel.cn/dev/howuse/cogvideox
3、个人用户:CogVideoX模型已在智谱清言的PC端、移动应用端及小程序端上线,可通过清影免费体验。