I'm working on a project where it would be useful to be able to detect people in a camera feed. I use OpenCV for computer vision so I thought I would try the built in person detector. (Always try the easy way when it comes to writing software.)

The detector worked, but there were a couple of problems:

The people need to be relatively prominent in the the image. That means that you really can't detect people from long distances OR you can really only detect people over a small area.
More importantly, the technique used (histogram of oriented gradients or HOG) is very slow on the inexpensive hardware I'm using. That means low frame rates and missed motion.

Since inexpensive hardware is a requirement for this project, I needed another way. I want to detect people, but I can settle for detecting movement. Given that, I figured I would give background subtraction a try. Background subtraction is relatively simple, very fast even on cheap hardware and is very versatile.

Here is a simplified rundown of how background subtraction works. (Assume that every pixel in an image is represented by a number.)

Take two images and convert both to black and white.
Subtract each pixel in the second image from the pixel in the same position in the first image.
Pixels that are exactly the same will give a difference of zero, which basically means "no change" or "no motion".
The more different two pixels are, the higher the difference will be. (or the lighter the color of the resulting difference pixel will be.)
Apply a threshold to the resulting pixels. Pixels great than some value are converted to white. All others are converted to black.
White pixels represent movement. Group those pixels into contours.
If the size of the pixel group is larger than the selected threshold, draw the smallest possible rectangle around each contour on the ORIGINAL image.

(I use the fantastic resource, pyimagesearch.com to help me learn computer vision. Adrian writes simple, useful code that makes learning easy. He's also responsive to questions and comments. Check him out. I used this demo as a starting point for my code.)

As you can see, we are definitely detecting motion. The code draws a box around the detected movement. This is done in real time. I can tune the algorithm to change the size of objects detected. In the image above, I have set it to detect even very small changes.

There are still issues with this however. The detector is biased towards things that are close to the camera. The leaves on the tree are the same apparent size as a person across the street. That means if I'd like to detect a pedestrian some distance away, then I will also be detecting the leaves on the tree. I need to be able to treat far away areas in the image differently than things that are close.

I'd also like to be able to ignore noisy things that aren't of any interest to me. (The foreground tree is a perfect example.)

To do this, I designed a simple class in Python that describes different areas of the image. Each detection area has a mask as well as the minimum size of object I want to detect in that area. Far away areas have lower thresholds than close areas. Each mask uses a different color to draw boxes. Lastly, the class includes a simple flag that I can use to tell me when movement has been detected in that area. (I can use this to set a digital output to control a switch.)

Next I build masks for the different detection zones. A mask is simply a black and white image that tells the computer what parts of the image to consider. Masked parts of the image are ignored. I created a global mask to cover areas I never want to detect for things like leaves and flags, and then I create a mask for each zone. Python makes it easy to create as many zones as I want.

For this example, I created three masks for this image: near, mid-range and distant. The most distant mask will only include the sidewalk and street in the distance. The mid-range mask will include the cul-de-sac and the near mask will include the small grassy area at the bottom of the screen and the driveway to the right. This will limit the number of false alarms. See video.

You can see objects are detected easily and movement in zones are color-coded. There is some overlap due to the low viewing angle of the camera. There are tails on fast moving objects because of the way I time average the background. This can be modified to suit your situation. I am primarily concerned with people so the timing is pretty good here.

I tried refining the motion detection technique to place objects into the correct zones based on the position of the bottom center of the drawn rectangle. This worked well and gave me better depth determination but at the expense of sensitivity. If the camera were mounted higher this would be much less important. (That feature is not enabled in the video above.)

Some notes:

As you can see, the detection regions overlap. This is due to the very low camera angle I'm using. Ideally the camera would be high enough to allow defining non-overlapping zones.
This same camera is available in an infrared model. Assuming there is enough infrared light, this process would work in near total darkness.
High winds and fast moving clouds can cause problems. This can be mitigated by increasing the detection feature size or adding a minimum frame count prior to triggering, which I didn't do in this example.

Background subtraction is a relatively simple, yet powerful technique to find movement in a video stream. The hardware I used for this demo is very inexpensive and fairly robust. This would be a good starting point for customization or a proof of concept.

Thoughts?