**Abstract:**This article introduces a method of using OpenCVSharp to “extract the portrait and replace the background” for the green screen video in the camera in real time, and analyzes the algorithm in the project. This article presents a method to simplify the release of managed resources such as Mat and MatExpr in OpenCVSharp. This article also introduces “efficient camera playing control” and OpenCVSharp performance optimization technology, including efficient reading and writing memory of Mat instances, and how to avoid inefficient code, etc.
1. Why develop image matting software by myself
Because of the need of my work, I need an application that can “subtract the background, replace the background, and display the new image a window” for the portrait in the camera in real time. Many conference live streaming software have this function. For example, Zoom, Microsoft Teams and others all have image matting functions, but these functions can only be used in their applications. I tried several other software, including XSplit Vcam, Douyin live-streaming assistant, and OBS. They all had excellent functions, including many AI portrait matting functions without green screens. They are powerful, but they could not meet my special requirements. So I need to develop such a software by myself.
A typical image matting requires to put green screens behind the object being matted, and then the green screen is removed by computer, so that the portrait will be retained, and then the cut-out portrait will be drawn on the new background. A lot of film and television productions are made on this principle. As shown in Figure 1 [1].
Figure 1
As long as the lighting is good, it is very accurate to matting through the green screen. However, the disadvantage of this way is that the layout requirements for the site are very high. So now the “no green screen matting” is popular, that is, AI can identify the foreground portrait and background. XSplit Vcam has this function, and it can simulate the result of image cutout into a virtual camera for output, but if the background is more complex, XSplit Vcam removal background effect is still not ideal. I am not an expert on computer vision, especially AI, therefore, in the case of limited time, I don’t think I can write a more powerful than Vcam software, so I decided to still use the traditional green screen form to implement the function, after all, as long as spend a few bucks for a piece of green screen.
Before I get started with the implementation code, let me show you how the software works. Figure 2 is the original image collected by the camera, and you can see a green screen behind, while Figure 3 is the result of image matting, and it is a real-time matting, which can achieve about 20FPS (frames per second).
Figure 2 Original image
Figure 3 The image after matting
2. Architecture
The software uses OpenCV, which is a very mature and feature-rich computer vision library. OpenCV supports C/C++, Python,.NET, Java and other mainstream programming languages. OpenCV development in Python is the most widely used resource. Personally, I don’t like Python syntax very much, so I developed this software on the .NET 5 using C#. Because OpenCV is used in much the same way across programming languages, the code implemented here in C# is also very easy to be rewritten in another programming language.
There are two OpenCV binding libraries on .NET: OpenCVSharp and Emgu CV. Because OpenCVSharp has no commercial use restrictions, I use OpenCVSharp in this project. However, even if you are using an Emgu CV, the code in this article can be applied to Emgu CV.
3. How to get source code
Since the function of image matting is only a part of my software, the whole software is not convenient to open source for the time being, so I separated the core code function of green screen replacing background into a new open source project.
Source code repository: https://github.com/yangzhongke/Zack.OpenCVSharp.Ext
“GreenScreenRemovalDemo. Cs” is the main code, and you can get the executable binaries at the bottom of the page for all kinds of operating systems, and the GreenScreenRemovalDemo is the entry executable.
In Windows, for example, running GreenScreenRemovalDemo. Exe, the console will appear as shown in Figure 4.
Figure 4
If you enter V, a built-in monster.mp4 green screen video file [2] will automatically play for the ones who don’t have green screen. The program will remove the green screen from the video file and replace it with a background file Bg.png. If you enter a digital in Figure 4, the web camera with the specified number will be used. If you only have one connected camera, please enter 0. You can press any key to exit the program.
Figure 5 video with green screenmonster.mp4
Figure 6 Background Image(Mt. Eden of New Zealand)
Figure 7 Result
4. Core principles
Figure 8 Original frame
The Figure 8 is an original frame from the camera. First, I call the RenderGreenScreenMask(SRC, matMask) method I wrote, converting the original frame ‘src’ to a black and white image ‘matMask’. In matMast, the green parts are rendered black, and the rest are rendered white, as shown inFigure 9.
The main code of RenderGreenScreenMask is as follows [3] :
To accelerate pixel access to the image, pointer operation is used. C# can use pointers to manipulate memory, which can greatly boost the performance. Due to the influence of environmental light, the points on the green screen may have different colors, so I use green == max (blue,green,red)&& green > 30 to check whether a point has certain green, and 30 is the threshold value. The threshold value can adjust the recognition effect.
Figure 9 green removed
Then, FindContoursAsArray() is called to find some of the contours in Figure 9. In order to remove the small area interference caused by some folds or light problems on the green screen, for the contours found, it is necessary to delete the contours with smaller area and only retain the contours with large area. This contours filter can be easily done using the LINQ in C#, as follows:
The minBlockArea above represents “minimum allowable contour area”.
Then, an empty black Mat is created, which is named matMaskForeground, and then render the large contours obtained above into the ‘matMaskForeground’ and fill it with white inside, the code is as follows:
matMaskForeground.DrawContours(contoursExternalForeground, -1, new Scalar(255),
thickness: -1);
The content of the picture corresponding to matMaskForeground is shown in Figure 10. In this way, the matMaskForeground contains only a certain large area of contours, and all other small areas of noise have been removed.
Figure 10 Find the largest contours and fill them with white
Then, cut out the large hollow areas of the arms, hands, shoulders, and neck shown in Figure 9.Therefore, do an “xor” operation on Figure 9 and Figure 10 to get a hollow area like Figure 11.
Figure 11 xor two images,and get the hollow area inside the body
The green light reflected in the screen into your glasses, or the small green on your clothes, can be identified as small hollowed areas. As can be seen in Figure 11, there are some small white areas at the lower right corner, so once again, FindContoursAsArray and DrawContours are used to remove the small areas in Figure 11. Then, combine Figure 11 and Figure 10, and you will get Figure 12, which is a picture with the white part as the body area and the black part as the green screen background.
Figure 12 Remove the small hollows and merge with the body mask
Then, Figure 12 is used as a mask for background transparency processing of the original frame image of Figure 8,and Figure 13 will be output, which is the background transparency image. The main codes are as follows:
The ‘src’ is the original frame image, and ‘dst’ is the result of merging, and Alpha is the transparent mask shown in Figure 12.
Finally, we will draw the background transparent in Figure 13 onto our custom background image, and we will get the image in Figure 14. The core code is as follows:
The parameter ‘bg’ is the original frame image in Figure 8, and the overlay is the transparent background in Figure 13. After drawing by DrawOverlay, the content of ‘bg’ will turn into Figure 14, and then it can be output to the user interface.
Figure 13 A portrait with transparent background
Figure 14 Result
The code above is located in the ReplaceGreenScreenFilter.cs of the GreenScreenDemo project. Here’s the trunk of the ReplaceGreenScreenFilter:
5. Highlights
Limited by space, the basic knowledge of OpenCV is not explained here, but only some key technologies in the project and some matters needing attention during the use of OpenCVSharp are explained here. Since I have only used OpenCVSharp for a few days, please correct me if there are any problems.
1) Simplification of OpenCVSharp objects release
In OpenCVSharp, objects of classes such as Mat and MatExpr have unmanaged resources and need to be manually released by calling the Dispose() method. Worst of all, the +, -, *, and other operators create a new object each time, and these objects need to be disposed, or there will be a memory leak. But the code that these objects release looks very verbose.
For example, the following code is accessing Opencv in Python:
In contrast, the equivalent code in C# is as verbose as the following
using (Mat mat1 = new Mat(new Size(100, 100), MatType.CV_8UC3))
So I wrote a ResourceTracker class to manage OpenCV’s resources. The T() method of the ResourceTracker class is used to add an OpenCV object to the tracing records and then return the object. The T() method is simply a matter of adding the wrapped object to the tracing records and returning it. The core code of the T() method is as follows:
The ResourceTracker implements the IDisposable interface, and when the Dispose() method is called, all resources tracked by the ResourceTracker are disposed. The T() method can trace an object or an array of objects. And this method NewMat() is like T(new Mat(…). Because the +, -, *, and other operators creates a new object each time, the resulting object at each step needs to be disposed, and they can be wrapped with T().For example: t.T(255 – t.T(picMat * 0.8))
Therefore, the verbose C# code above can be reduced to the following:
After leaving the scope of using block of the ResourceTracker, the resources of all the objects managed by the ResourceTracker will be released.
I placed the ResourceTracker class in the NuGet package Zack.OpenCVSharp.Ext, so it can be installed on your project with the following NuGet command:
Install-Package Zack.OpenCVSharp.Ext
Source code repository:https://github.com/yangzhongke/Zack.OpenCVSharp.Ext
2) Efficient way to access data in Mat
OpenCVSharp provides many methods to access the data in Mat. After testing, I found that the At() method is the slowest, and GetGenericIndexer is also slow, because they are all conducted entirely through managed code, the performance is bound to be compromised. GetUnsafeGenericIndexer, which accesses memory directly, is much faster, but the fastest way is to use mat.ptr (x) and pointer, because it reads and writes Mat’s memory directly. Please mark the code using pointer with unsafe, and enable the “allow unsafe code” of the project. Because this way is to read and write memory directly, so please be sure to pay attention to your code, so as not to cause incorrect memory access or AccessViolation.
The reference code for this pointer approach refer to the above RenderGreenScreenMask() and DrawOverlay() methods. The ‘where()’ method of the np class in the open source project Zack.OpenCVSharp.Ext also demonstrates the use of a combination of C# generics, pointer manipulation, and lambda.
In OpenCVSharp, Vec4b, Vec3b, byte and other memory units with different byte lengths must be selected according to the channel number of Mat objects used. Improper use of Vec4b, Vec3b, byte and other memory units will not only affect performance, but also may cause data chaos. The most direct manifestation of data chaos is disordered picture display.
3) CameraPlayer
My software needs to collect images from the camera and display them on the user interface, and before displaying them on the interface, I also need to “cut the portrait and replace the background” on the image. At the beginning, I used AForge.NET to complete the image collection and display of the camera, but its performance was very low. Because I need to convert the Bitmap collected by AForge.NET into an OpenCVSharp Mat instance first, and then convert the Mat back to the Bitmap and display it on the user interface after the matting processing. Therefore, I directly used VideoCapture of OpenCVSharp to complete the camera image collection. Since the collected frame images are in the forms of Mat, the conversion between Mat and BitMap is eliminated, so the performance is improved significantly.
I created a CameraPlayer control to fetch data from the camera and display it on the interface. Meanwhile, I provide .NET Core and.NET Framework version of WinForm control, which can be used directly. In addition, SetFrameFilter(Action<Mat> frameFilterFunc) method is provided to allow setting a method, so as to use OpenCVSharp for processing before drawing the Mat of frame image to the user interface.
The image collection, image processing and image display in the CameraPlayer are performed in different threads and processed in parallel, so the performance is very high.
I have made the CameraPlayer control open source. Please check out the project documentation for usage.
Project repository: https://github.com/yangzhongke/Zack.CameraLib
When developing CameraPlayer, I found that if I did not set VideoCapture’s attribute FourCC (video encoding), It would take 100ms to fetch each frame, while if I set the attribute FourCC to “MJPG”, it would only take 50ms. I don’t know if it’s a camera-specific issue. So, if you read the image slower, you can try to use a different FourCC value.
4) Pay attention to the code that may lead to performance problems
When implementing the RenderGreenScreenMask() method , there is a process “calculate the maximum value of blue,green, and red”. At the beginning, I used the LINQ extension to implement: ‘new byte[]{blue,green,red}.Max();’ But I found that ‘byte max1 = blue > green ? blue : green; byte max = max1>red?max1:red;’ is more performant, and it reduces the processing time per frame by 50%.
Because LINQ operation involves the “create a collection object, put the data into the collection objects, and then get the data”, performance will be slower than the conventional algorithm, in the ordinary scenario of data processing, the performance gap can be neglected, especially when using LINQ to manipulate database, relative to the time consuming IO operations, this performance difference is negligible. But because this is used in two nested loops, the performance gap is huge.
Therefore, be careful with the operations that can lead to performance problems when manipulating images using OpenCVSharp
5) Initialization of Mat memory
When creating an empty Mat object, it is better to initialize the memory data of the Mat object, just as it is better to reset the memory space obtained by malloc with memset() in C language, so as not to cause the old residual data in memory to interfere with our operation. For example, the memory of instance created by ‘new Mat(srcSize,MatType.CV_8UC1)’ may reuse the memory of other objects that have been released previously. The data will be dirty. Unless your next step is to replace every item of Mat, please use the Scalar object parameters of Mat’s constructor to initialize the memory, the code is like: new Mat(srcSize,MatType.CV_8UC1,new Scalar(0))
6. Future work
When I have time in the future, I may do the following work:
Improve the speed of reading images from the camera. Since my Logitech C920 camera is labeled as FPS=30, in theory, the time of fetching a frame is at 33ms, but I’m currently picking up a frame at 50ms, so I’m going to see if I can further speed up the process.
I consider adding features such as beauty, face thinning and skin brightening. The current image matting algorithm needs about 20ms to process a frame, while the speed of taking a frame from the camera is 50ms, so there is still 30ms extra time available for these beautification work.
AI algorithm will be used to achieve “port matting without green screen”. It’s hard to do this on my own. I found a very powerful open source project called MODNet, which is a library implemented by Python + Torch that uses neural networks for smart portrait recognition, including models that have been pre-trained. There is also a .NET ported version for Torch, so in theory this could be done.
7. Conclusion
When using OpenCVSharp, as long as you pay attention to the efficient access to memory introduced in this article and properly call functions, you can carry out image processing with very high performance. Therefore, the software I developed can achieve that each frame of image processing only needs about 20ms. With the help of the ResourceTracker class in the package Zack.OpenCVSharp.Ext developed by me, it is very easy to release resources of OpenCVSharp, and resources can be released in time without modifying the expression and code, so as to avoid memory leak.
Source code repository: https://github.com/yangzhongke/Zack.OpenCVSharp.Ext
[1] “BMPCC4K – Green Screen,” 26 7 2019. [Online]. Available: https://australianimage.com.au/bmpcc4k-green-screen/.
[2] “Green Screen Monster 2 – monster attacks / feeds / eats,” [Online]. Available: https://www.youtube.com/watch?v=GEhPo1n15H4.
[3] B. Friederichs, “Processing an image to extract green-screen mask,” [Online]. Available: https://codereview.stackexchange.com/questions/184044/processing-an-image-to-extract-green-screen-mask.
Bro this is nuts! Props to you. I did a project with OpenCVSharp back in the day to try and parse out tic tac toe grids and that was hard as hell. It’s pretty interesting how the OpenCVSharp platform acts as a bridge class for between C# and the C++ OpenCV. Very, very cool.
This works surprisingly well! In fact on my setup, it actually works better when my greenscreen is unlit, which is odd, but hey, I’m down to have less lights in my face and making heat.
OBS has a Virtual Camera funcion
OBS can create a Virtual Camera, that can be used by every other app that can handle a camera source.
So you can do something like this:
Webcam -> OBS -> keying the background -> OBS Virtual Camera -> Skype, Zoom, etc
Extremely cool.
Cool
great article ! thanks for sharing on this sub.
This is super cool, and a great read!
C# devs
null reference exceptions