What bar code scanners can tell us about perception
It may not be obvious, but a basic bar code scanner does something that machine vision researchers would love to see their own systems do: find objects amidst noisy backgrounds of visual information. What is an "object" to a bar code scanner? To answer that, let's start by explaining what a bar code is.
What is a bar code?
You've probably seen bar codes everywhere. Typically, they are represented as a series of vertical bars with a number or code underneath. There are many standards for bar codes, but we'll limit ourselves to one narrow class, typified by the following example:
This sort of bar code has a start code and an end code. These typically feature a very wide bar. One of its main purposes is to serve as a standard for bar widths. This is sometimes 4x the unit width for a bar. The remaining bars and gaps between them will be some multiple of that unit width (e.g., 1x, 2x, or 3x). Each sequence of bars and gaps relates to a unique number (or letter or other symbol) that is specified in advance by the standard for that kind of bar code.
A bar code scanner, like the handheld version pictured at right, doesn't actually care that the code is 2D, as you see it. To the scanner, the input is a stream of alternating light and dark signals, typically furnished by a laser signal bouncing off white paper or being absorbed by black ink (or reflecting / not reflecting off an aluminum can, etc.). If you're a programmer or PhotoShop guru, you could visualize this as starting with a digital snapshot of a bar code and cropping away all but a single pixel line of the image that cuts across the bar code, then applying a threshold to convert it into a black and white image devoid of color and even shades of gray.
The size of the bar code doesn't much matter, either. Within a certain, wide range, a bar code scanner will take any string of solid black as a potential start of a bar code, whether it's small or large and whether it's off to the left or the right of the center of the scanner's view.
What the scanner is doing with this stream of information is looking for the beginning and ending of a black section and using that first sample as a cue to look for the rest of the start code (or stop code; the bar code could be upside down) following it. If it finds that pattern, it continues looking for the patterns that follow, translating them into the appropriate digits, letters, or symbols, until it reaches the stop code.
Now, bar codes are often damaged. And they often appear in a noisy background of information. In fact, the inventors of bar code standards are very aware that a random pattern on a printed page could be misinterpreted as a bar code. They dealt with this by adding in several checks. For instance, one or more of the digits in a bar code are reserved as a "check code", the output of a mathematical function applied to the other data. The scanner applies the same function. If the output isn't the same as what the check code read in is, the candidate bar code scan is rejected as corrupt. Even the digit representations themselves contain only a small subset of all possible bar/gap combinations in order to reduce the chances that an errant spot or other invalid information could be misconstrued as a valid bar code. In fact, the odds that a bar code scanner could misread a bar code like the one above are so infinitesimally small that engineers and clerks can place nearly 100% confidence in their bar codes. A bar code either does or does not scan. There's no "kinda".
Bar codes have been engineered so well that it's possible to leave a scanner turned on 24/7, scanning out over a wide area, seeing all sorts of noise continuously, and be nearly 100% guaranteed that when it thinks it sees a bar code in the environment, it is correct. Some warehouses feature stationary bar code scanners that scan large boxes as they are moved along by fork lifts, for instance.
What does this have to do with machine vision? Isn't it amazing that a bar code scanner can deal with an incredibly noisy environment and still have a nearly 100% accuracy when it finds a bar code? This is very much like how you can pick out a human face in a busy picture with nearly 100% accuracy. There's all sorts of things that may ping your face recognition capacity, but when your focus is brought to bear on them, your skill at filtering out noise and correctly identifying the real faces is incredible, just like the bar code scanner. What's more, it doesn't matter where in your visual field the face is and how near or far it is, within a reasonable range. Just like the scanner.
Vision researchers are still hard pressed to provide an accounting of how we perceive the world visually. Machine vision researchers have been doing all sorts of neat things for decades, but we're still barely scratching the surface, here, for lack of a comprehensive theory of perception. Yet engineers creating bar codes decades ago actually solved this problem in a narrow case.
A good bar code scanner has an elegant solution to the problems of noise, scale invariance (zoom & offset), bounds detection (via start and stop codes). They even made it so a single bar code could represent one of billions of unique messages, not just be a simple there/not-there marker.
The bigger picture
Of course, I don't want to suggest that bar code scanners hold the key to solving the basic problem of perception. You probably have already guessed that the secret to bar codes is that they follow well engineered standards that make it almost easy to pick bar codes out of a noisy environment. Vision researchers have likewise made many systems that are quite capable of picking out human faces, as well as a variety of special classes of clearly definable objects.
It's pretty much accepted wisdom in human brain research now that much of what we see in the world is what we are looking to find. A bar code scanner works because it knows what to look for. Obviously, one key difference between your perceptual faculty and a bar code scanner is that the scanner is "born" with all the knowledge it needs, while you have to learn how faces, chairs, and cars "work" for yourself.
Still, for people wondering how to approach the question of perception, bar coding is not a bad analogy to start with.