Showing posts from 2007

Confirmation bias as a tool of perception

I've been trying to figure out where to go next with my study of perception. One concept I'm exploring is the idea that our expectations enhance our ability to recognize patterns. I recently found a brilliant illustration of this from researcher Matt Davis , who studies how humans process language. Try out the following audio samples. Listen to the first one several times. It's a "vocoded" version of the plain English recording that follows. Can you tell what's being said? Vocoded version. Click here to open this WAV file Give up? Now listen to the plain English version once and then listen to the vocoded version again. Clear English version. Click here to open this WAV file Davis refers to this a-ha effect as "pop-out": Perhaps the clearest case of pop-out occurs if you listen to a vocoded sentence before and immediately after you hear the same sentence in vocoded form. It is likely that the vocoded sentence will sound a lot clearer wh

What bar code scanners can tell us about perception

It may not be obvious, but a basic bar code scanner does something that machine vision researchers would love to see their own systems do: find objects amidst noisy backgrounds of visual information. What is an "object" to a bar code scanner? To answer that, let's start by explaining what a bar code is. What is a bar code? You've probably seen bar codes everywhere. Typically, they are represented as a series of vertical bars with a number or code underneath. There are many standards for bar codes, but we'll limit ourselves to one narrow class, typified by the following example: This sort of bar code has a start code and an end code. These typically feature a very wide bar. One of its main purposes is to serve as a standard for bar widths. This is sometimes 4x the unit width for a bar. The remaining bars and gaps between them will be some multiple of that unit width (e.g., 1x, 2x, or 3x). Each sequence of bars and gaps relates to a unique number (or letter or other

Perception as construction of stable interpretations

I've been spending a lot of time lately thinking about the nature of perception. As I've said before, I believe AI has gotten stuck at the two coastlines of intelligence: the knee-jerk-reaction of the sensory level and the castles-in-the-sky of the conceptual level. We've been missing the huge interior of the perceptual level of intelligence. It's not that programmers are ignoring the problem. They just don't have much in the way of a theoretical framework to work with, yet. People don't really know yet how humans perceive, so it's hard to say how a machine could be made to perceive in a way familiar to humans. Example of a stable interpretation I've been focused very much on the principle of "stable interpretation" as a fundamental component of perception. To illustrate what I mean by "stable", consider the following short video clip: Click here to open this WMV file This is taken from a larger video I've used in

Rebuttal of the Chinese Room Argument

While discussing the subject of Artificial Intelligence in another forum, someone brought up the old "Chinese Room" argument against the possibility of AI. My wife suggested I post my response to the point, as it seems a good rebuttal of the argument itself. If you're unfamiliar with the CR argument, there's a great entry in the Stanford Encyclopedia of Philosophy . It summarizes as follows: The argument centers on a thought experiment in which someone who knows only English sits alone in a room following English instructions for manipulating strings of Chinese characters, such that to those outside the room it appears as if someone in the room understands Chinese. The argument is intended to show that while suitably programmed computers may appear to converse in natural language, they are not capable of understanding language, even in principle. Searle argues that the thought experiment underscores the fact that computers merely use syntactic rules to manipulate symb

Video stabilizer

I haven't had much chance to do coding for my AI research of late. My most recent experiment dealt more with patch matching in video streams. Here's a source video, taken from a hot air balloon, with a run of what I'll call a "video stabilizer" applied: Full video with "follower" frame. Click here to open this WMV file Contents of the follower frame. Click here to open this WMV file The colored "follower" frame in the left video does its best to lock onto the subject it first sees when it appears. As the follower moves off center, a new frame is created in the center to take over. The right video is of the contents of the colored frame. (If the two videos appear out of sync, try refreshing this page once the videos are totally loaded.) This algorithm does a surprisingly good job of tracking the ambient movement in this particular video. That was the point, though. I wondered how well a visual system could le

"Conscious Realism" and "Multimodal User Interface" theories

I recently sent an email to Donald Hoffman , professor at the University of California, Irvine, with kudos for his book, Visual Intelligence , which has had a profound impact on my thinking about perception. Understandably, he's very busy kicking off the new school year, so I was grateful that he sent at least brief response and a reference to his latest published paper, titled Conscious Realism and the Mind-Body Problem . Naturally, I was eager to read it. Much of the study of how human consciousness arises stems from the assumption that consciousness is a product of physical processes; that consciousness is a product of a physical processes in the brain. This paper starts from the opposite assumption: that "consciousness creates brain activity, and indeed creates all objects and properties of the physical world." When I read this in the abstract, I must have largely ignored its significance. Having read Visual Intelligence , I'm familiar with Hoffman's focus on

Plan for video patch analysis study

I've done a lot of thinking about this idea of making a program that can characterize the motions of all parts of a video scene. Not surprisingly, I've concluded it's going to be a hard problem. But unlike other cases where I've smacked up against a brick wall, I can see what seems a clear path from here to there. It's just going to take a long time and a lot of steps. Here's an overview of my plan. First, the goal. The most basic purpose is to, as I said above, make a program that can characterize the motions of all parts of a video scene. The program should be able to fill an entire scene with "patches". Each patch will lock onto the content found in that frame and follow it throughout the video or until it can no longer be tracked. So if one patch is planted over the eye of a person walking through the scene, the patch should be able to follow that eye for at least as long as it's visible. Achieving this goal will be valuable because it will pro

Patch mapping in video

Over the weekend, I had one of them epiphany thingies. Sometime last week, I had started up a new vision project involving patch matching. In the past, I've explored this idea with stereo vision and discovering textures. Also, I opined a bit on motion-based segmentation here a couple of years ago. My goal in this new experiment was fairly modest: plant a point of interest (POI) on a video scene and see how well the program can track that POI from frame to frame. I took a snippet of a music video and captured 55 frames into separate JPEG files and made a simple engine with a Sequence class to cache the video frames in memory and a PointOfInterest class, of which the Sequence object would have a list, all busy following POIs. The algorithm for finding the same patch in the next frame is really simple and only involves summing up the red, green, and blue pixel value differences in candidate patches and accepting the candidate with the lowest difference total; trivial, really. When

Emotional and moral tagging of percepts and concepts

Back in April, I suffered head trauma that almost killed me and landed me in the hospital for, thankfully, only a day. My wife, the sweet prankster that she is, went to a newsstand and got me copies of Scientific American Mind and Discover Presents: The Brain, an Owner's Manual (a one-off, not a periodical). The former had a picture of a woman with the upper portion of her head as a hamburger and the latter a picture of a head with its skullcap removed revealing the brain. So I got a good laugh and some interesting reading. I'm reading an article now in The Brain titled "Conflict". The basic position author Carl Zimmer offers is encapsulated in the subtitle: morality may be hardwired into our brains by evolution. In my opinion, there is some merit to this idea, but I don't subscribe wholeheartedly to all of what the article promotes. Zimmer argues that the parts of our brains that respond emotionally to moral dilemmas are different from the parts that respo

A hypothetical blob-based vision system

As often happens, I was talking with my wife earlier this evening about AI. Given that she's a non-programmer, she's an incredible sport about it and really bright in her understanding of these often arcane ideas. Because of some questions she was asking, I thought it worthwhile to explain the basics of classifier systems. Without going into detail here, one way of summarizing them is to imagine representing knowledge of different kinds of things in terms of comparable features. She's a "foodie", so I gave the example of classifying cookies. As an engineer, you might come up with a long list of the things that define cookies; especially ones that can be compared among lots of cookies. Like "includes eggs" or a degree of homogeneity from 0 - 100%. Then, you describe each kind of cookie in terms of all these characteristics and measures. Some cookie types will have a "not applicable" or "don't care" value for some of these character

Abstraction in neuron banks

[ Audio Version ] On an exhilarating walk with my wife, we discussed the subject of how to build on the lessons I learned from my Pattern Sniffer project and its "neuron bank", documented in my previous blog entry . There are loads of things to do and it was not obvious how to squeeze more value out of what little I've done so far. But it finally became apparent. One thing that I was not happy about with Pattern Sniffer is that the world it perceives is "pure". There is just one pattern to perceive at a time. The world we perceive is rarely like this. As I walk along, I hear a bird singing, a car, and a lawn mower at the same time and am aware of each, separately. Clearly, there is lots of raw information overlap, yet I'm able to filter these things out and be aware of all three at once. Pattern Sniffer could see two things going on in its tiny 5 x 5 pixel visual field, but it would see them as a single pattern. This is the kind of sterile world so m

Pattern Sniffer: a demonstration of neural learning

[ Audio Version ] Table of contents Introduction Unsupervised learning Finite resources Competing to be useful Confidence The simulation Learning in linear time All at once learning Learning while performing Noisy data Longevity Working memory Pattern invariance More to explore The nuts and bolts of the algorithm Introduction For over a year, I've been nursing what I believe is a somewhat novel concept in AI that superficially resembles a neural network and is inspired by my read of Jeff Hawkins' On Intelligence . Recently, I finally got around to writing code to explore it. I was deeply surprised by how well it already works that I thought it worthwhile to write a blog entry introducing the concept and make public my source code and test program for independent review. For lack of putting any real thought into it, I just named the project / program "Pattern Sniffer". My regular readers will recognize my frequent disdain for traditional artificial neural networks