News 

Papers

Neural Networks in AMV (part 2)

Friday, 23 June 2023

As neural networks are being popular, you might want to be interested in applying them into your videos in order to create some unique scenes that make viewers question the term called originality. So this article will guide you to think what you can further do with neural networks (technologies are shown from easiest to hardest). 


1. RunwayML

Overview

RunwayML is a service that lets you do various image/video processings fast and easy. It does not have advanced controls, but it is very easy to work with this.

Requirements & Installation

All you need is to register to the service and it gives you some credits as a sign up bonus, so you can play with the software.

Experimentation

#1: Gen-1

Allows you to copy one style of the video to another video.

#2: Gen-2

Allows you to create a video from textual input. The result is quite unstable.

#3: Remove Background (Automasking)

Allows you to remove background from shots. Works quite well for garbage matting, Does not work well with dark, unclean shots. Requires Google Chrome.




Also, you can do inpainting, frame interpolation, image expanding, etc. for different types of experiments.

2. AnimeGAN


Transforms video into anime style. You can select different types of styles: Hayao Style, Shinkai style, Disney Style, etc.

All you need is download the repository (https://github.com/TachibanaYoshino/AnimeGANv3, click Code->Download Zip), go to AnimeGANv3, execute AnimeGANv3.exe and use it to transform video into anime.

3. Stable Diffusion

Requirements

First of all, let's start with compatibility. As you know, artificial intelligence consumes a lot of power and needs high end technology (specifically high end GPUs), which unfortunately not everyone can afford. So, in order to offer compatibility for everyone, we will start with Google Colab.
Google Colab - the area containing high end GPUs which you can run code for free. The only drawback is that the server might be unstable and shuts down after some time.
In order to use this technology, all you need to have an account within Google services.

Secondly, a bit about Stable Diffusion. It is open source image creation library with using textual prompts. The result is not often good as another alternative - Midjourney, but hey, it is completely free. On top of that it is possible to tweak the various parts of an image making your prompts way more consistent. It is possible to install it into your computer, but we will proceed to using this technology with Google Colab.

Stable diffusion starts with noise and begins generating an image using text prompt conditioning (the information extracted from a language model that tells the U-Net how to modify the image). At each step, practically the model adds detail and the noise is removed. During the various steps in latent space what was once noise becomes more and more like an image. After that, the decoder transforms what was noise into an image in the pixel space. [source]

Thirdly, there is a library called Gradio. It allows you to visualize console applications into web application and share it.

Now as we know the a bit about the theoretical part, we will move into practical part.

Installation

Take a look at this project: https://github.com/TheLastBen/fast-stable-diffusion
It is Stable Diffusion with Web UI using Google Colab which uses Gradio library to share the link.
So, in other words, go login into your account and go here: https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast_stable_diffusion_AUTOMATIC1111.ipynb
After loading, run all the commands from top to bottom (you can press Runtime -> Run All for convenience)
After Google finishes running all the commands, you will be presented with a link to a gradio session that goes to this Web UI.


Experimentation

Basic Stable Diffusion allows you to create image from textual prompts. For example, if you write "Anime, boy with glasses in a black costume in the sky", it will try to create image suitable for this description.

Negative prompt is the textual prompt that you do not want to see in the image. For example, if you write "Anime, boy with glasses in a black costume in the sky" in the prompt field, and if you write 'clouds' in the negative prompt field, then you won't see clouds in the generated image.

For Stable Diffusion, many different models have been prepared, specializing in certain styles. Many different models can be downloaded at https://civitai.com/. After downloading, place them in <folder with stable-diffusion-webui>/models/Stable-diffusion and restart the program. Then you can select this model at the top left. In the same place on the site civitai.com for each model you can find examples of images and texts from which they were obtained. There are many models who can generate anime. For example:
https://civitai.com/models/7240/meinamix
https://civitai.com/models/6755/cetus-mix
https://civitai.com/models/30240/toonyou
https://civitai.com/models/11866/meinapastel
etc.

Expert mode (by Turbo, generated using Dark Sushi Mix: https://civitai.com/models/24779/dark-sushi-mix-mix):

For extension and models specialized for anime, refer here: https://civitai.com/

Example prompts from civitai: https://civitai.com/images/1099572?modelVersionId=93208&prioritizedUserIds=494339&period=AllTime&sort=Most+Reactions&limit=20

img2img

Img2Img is a cutting-edge technique that generates new images from an input image and a corresponding text prompt.

The output image retains the original color and composition of the input image.

It’s important to note that the input image does not need to be intricate or visually appealing.

The main focus should be on the color and composition, as these elements will be carried over to the final output.

Source and more information can be found here: https://www.greataiprompts.com/guide/how-to-use-img2img-in-stable-diffusion/?expand_article=1


Inpainting

With inpainting, you can change any part of the image to your preference. Just brush the needed part to change ( or to keep), And describe the alterations that needs to be here.

 


Sketching


With sketching, you can draw rough idea of what should look like the image and ControlNet converts into actual image.

 



Resizing


You can denoise & resize do other image processing from Extras Tab.

You can also play with ControlNet which allows far more control for the output, but you will need to install it locally or buy Google Colab paid subscription.

For local installation of Stable diffusion, follow steps here: https://github.com/AUTOMATIC1111/stable-diffusion-webui

ControlNet

ControlNet - allows you to change part of the image allowing coherence between images. For the installation refer here: https://github.com/Mikubill/sd-webui-controlnet

Examples: https://pikabu.ru/story/preobrazovanie_tantsa_realnogo_cheloveka_v_animatsiyu_s_ispolzovaniem_stable_diffusion_i_multicontrolnet_10135049

Guide: https://journal.tinkoff.ru/controlnet/

Deforum

Deforum is an extension for ControlNet which allows you to animate given image. For the installation refer here: https://github.com/deforum-art/sd-webui-deforum

Example:

Addendum

There is a library called Audiocraft that generates music based on MusicGen. You can try describing music to your tastes and get the output for your prompt.

Repo: https://github.com/facebookresearch/audiocraft

Colab: https://github.com/camenduru/MusicGen-colab

Author: okhostok Views: 1642 times
Go back

Comments (4)
You aren't authorized! Comments could be posted only by registered and authorized users!

mwDeus   User profile
  13.07.2023 09:00
Нормалёчек. Новый кон с нейронками - да, надо адаптироваться.
ProSetup   User profile
  24.06.2023 21:40
Наверное, можно теперь с сюжетными клипами развернуться. По крайне мере мороки станет меньше: ни надо с нуля всё рисовать, ни маски точно накладывать.
Но даже так работы остаётся много, а значит мало кто вытянет.
Я продолжу ждать кнопку "Сделать красиво".
Disengager   User profile
  24.06.2023 10:12
Спасибо за гайд.
S.A. Robert   User profile
  23.06.2023 20:59
Хорошая подборка, но у меня лично от нейросетей глаза уже болят, везде эти дефолтные картинки, сразу палятся по качеству.

https://i.imgur.com/erwfZR8.jpg

 Случайная цитата