Close Show/hide page

Webcam image compositor, creating PNG48′s

How it would look if you took a webcam or camera, and captured several frames of the same scene in quick succession, and then essentially averaged them out and composited them into one image? My guess was that the resulting image would retain most of its sharpness but lose most of the ‘noise’. I made this quick mini-app with that thought in mind.

Here is an example of my results (before and after). The images were captured with a Logitech Quickcam Pro 9000 at a resolution of 1600×1200 (averaged from 128 (!) still images), and were not scaled or retouched (just cropped).


Example - single frame Example - composited version
Example - movement...

As a side benefit, if the camera — or something in the foreground of the scene — moves during the image capturing process, you can get photography-like effects that look like a long exposure…

A PNG48 encoder class…

Taking multiple samples in this way yields a wider range of color channel information than that of a single snapshot. For example, a device that saves 8-bits-per-channel can only store a value for red between 0-255. If you added together the color information from 16 “identical” images, you get a range from 0 to 4095 (in other words, 12 bits per channel instead of 8). The value of that extra information is potentially open to debate, but I think it’s clear that at least in the case about that there’s more there there than there was before… If nothing else, doing lots of Photoshop-like post-processing of the image would probably lead to less color degradation, banding, etc.

To store that data internally, I’m using a long, flat array of floating point values. Although calculations on floats are much slower than with integers, it of course offers much more granularity, making the image information essentially free from the constraints of a regular bit-length-based implementation. At this point, multiple calculations can be done on the data set without losing information due to rounding as would be the case with an integer-based format. At the point you want to re-output the image to the screen, a function converts the color array into a data structure with 8 bits per color channel, ie, into BitmapData.

All these considerations inevitably led to the idea of outputting to a 16-bit image file format. It turns out that, thankfully, PNG fits the bill. I was able to modify Tinic Uro’s PNGEnc class to output PNG’s with 16 bits per channel.

Here are the two interesting utility classes coming out of this experiment:

ImageCompositor.as | PNGEnc48.as

Licensed under a Creative Commons Attribution 3.0 License.

6 Responses to “Webcam image compositor, creating PNG48′s”

  1. restoration says:

    Great article. Thanks for sharing this informative article.

  2. makc says:

    I had a fight with google to dig this out today, and I’ve lost it. I could only find this at flashbookmarks.com, do something about google indexing please.

  3. Mitchell says:

    Lee, you have a wonderful site. I greatly enjoy your articles.

    Just thought I’d comment on this one.

    Several years ago I implemented a similar image stream effect for a dazzling improvement in webcam clarity, but with a slight algorithmic difference from your approach. You might find it interesting for future experimentation. Instead of a mean I used a sort of binary logarithmic convolution.

    The algorithm is very simple, works in integers, and doesn’t have the roundoff issue that you had been concerned about.

    for each pixel, r,g,b:
    shift current camera pixel right by 1 bit (i.e. divide by two)
    shift previous displayed frame pixel value right by 1 bit (again a divide by two)
    add these two values together to yield a new display frame pixel value

    This approach is basically a weighted average of the preceding values where each frame is twice as important the closer it is to the present frame.

    The effect is similarly crisp, much faster to compute, and the artifacts from motion disappear very rapidly.

  4. Mitchell says:

    Pixel values could also be added together in a larger integer buffer, and then shifted right. It is probably slightly more accurate that way, but a tiny bit slower.

  5. admin says:

    Mitchell, I’d really like to play with that idea, thanks for sharing that.

    Two questions –

    Does what you describe share something in common with the process used for bilinear filtering or for GPU anti-aliasing?

    Also, are you describing a routine that works on a single frame? I wonder if that same concept could be useful over two or more frames…

  6. Mitchell says:

    >> Also, are you describing a routine that works on a
    >> single frame? I wonder if that same concept
    >> could be useful over two or more frames…

    Although the algorithm I described uses only two frames (current actual camera pixels & the displayed pixels) it implicitly represents data from multiple frames over time. It is sort of like recursively defined values in mathematical proofs.

    Each new pixel is the weighted average of that same pixel from all preceding frames.

    (1/2)P0+ (1/4)P1 + (1/8)P2 + (1/16)P3 + …

    i.e. based on this mathematical series:

    1/2 + 1/4 + 1/8 + 1/16 + 1/32 + … = 1

    >> bilinear filtering or for GPU anti-aliasing?

    Those are essentially nearest neighbor pixel averages. They operate in one frame to smooth the image but they are not based on temporal data (at least as far as I know).

    I have however experimented with the combination of my temporal smoothing with various 3×3 pixel transformations for contrast enhancement or smoothing of the current frame (and also for edge enhancement and emboss effects and so forth; btw – I’m sure you’d really enjoy experimenting with these algorithms!). The temporal filtering provides a less noisy image to start with, so the results with other effects are improved.

    I was particularly fascinated with the notion that I could create green-screen effects by identifying clusters of pixels that had negligible changes, and then compositing in real time with these as transparency. It worked very nicely to some extent but, if the subject was relatively still I’d have this disembodied hand or face floating around.

Leave a Reply