(by Jon Hogins, on behlaf of Soma Games and Code-Monkeys)

This article is part of a series that documents our ‘Everyman’ experience with the new RealSense hardware and software being developed by Intel.

Full disclosure, Intel does pay us for some of this stuff but one of my favorite aspects of working with them is that they aren’t asking us to write puff-pieces. Our honest, sometimes critical, opinions are accepted…and even seem to be appreciated…so we got that going for us.

I recently got the fantastic opportunity to use a a pre-alpha version of Intel’s new RealSense camera to build a full-fledged app. It’s still a work in progress, but let me share my experiences and a few tips on getting the most out of RealSense’s video APIs.

The App

My mission has been to create a video conferencing app with a few interesting finger tracking interactions using the RealSense camera. After a bit of research, I decided on the Intel Media SDK for real-time H264 encoding and decoding and OpenCV for the initial display, moving to Unity/DirectX later.

Getting Started

Getting the RealSense SDK installed and creating projects based on the samples is straight forward, even in its Pre-alpha state. The installer adds the RSSDK_DIR environment variable and each VC++ project using RealSense only needs to add a property sheet via Visual Studio’s Property Manager. The documentation and samples are fairly comprehensive, and the APIs are the most accessible of any of the Intel C++ API I’ve worked with.

RealSense -> Intel Media SDK

At first glance, sending frames from the RealSense camera to the Media SDK was going to be painful. The RealSense samples all show that the native format from the camera is YUY2, while the Media SDK developer guide makes it clear that NV12 is format required for high performance encoding.

Here’s the big trick: You can ask the RealSense SDK for video frames in any supported video format and it will handle the conversion internally!

Here are the calls you need



PXCImage::ImageData data;
     PXCImage *color = pp->…
     PXCImage::ColorFormat::COLOR_FORMAT_NV12, &data)

As far as I can tell, there is very little performance overhead to requesting the data in this non-native format. I imagine they are using the Intel Performance Primitives, the fastest CPU-based image processing library out there, to drive this conversion. Very cool.

Copying the frame into the encoder is fairly straight forward. I used an intermediate format in my code, but here’s the gist:

     mfxFrameSurface1* pSurf = …
     MFXFrameAllocator* pMFXAllocator = …
     pSurf->Data.MemId, &(pSurf->Data));…
     memcpy(pSurf->Data.Y, data.planes[0],…
     data.pitches[0] * frame.height);
     memcpy(pSurf->Data.CbCr, data.planes[1],…
     data.pitches[1] * frame.height / 2);
     pSurf->Data.MemId, &(pSurf->Data));

* Note that in NV12, the first pitch is one byte per pixel while the second pitch is ½ byte per pixel. See fourcc.org for more details.

RealSense -> OpenCV

Unfortunately, OpenCV only supports displaying RGBA frames in its HighGui library. Since we are already grabbing frames in NV12 format for the encoder, the conversion into RGBA seems necessary at this point. The actual code to make the conversion I found in this StackOverflow question. It works out of the box, except you have to switch the bit order in the last line from

     argb[a++] = 0xff000000 | (r << 16) | (g << 8) | b;


     rgba[a++] = 0xff000000 | (b << 16) | (g << 8) | r;

While this conversion didn’t demolish performance, in the end I expect it to be a significant bottleneck that will require replacing with either the corresponding Intel Performance Primitives calls or a Pixel Shader for displaying NV12 in DirectX or Unity.

Parting Thoughts

With the tricks outlined above, you should have no problem integrating RealSense video feed into any other image processing or display library. Intel seems to have put some serious thought into making the RealSense SDK easy to use, which is a breath of fresh air in a world where most C++ APIs are nothing more than C functions taking in giant structs of undocumented parameters.

There are a few software issues that need working out, though. The depth feed seems solid, but the skeletal finger tracking module flips out from time to time making precision difficult. I also had issues using 1080p video and depth/finger tracking at the same time. On a more powerful machine it worked, but only at 15fps. Despite that, I expected many more headaches on this super-early alpha build than I actually got.

There are a number of features on this camera I haven’t even touched, but just playing with the samples is giving me way more cool ideas than I’ve got time.

…more to come…

Leave a Reply

Your email address will not be published. Required fields are marked *

Post comment