Showing posts from 2005

Neuron banks and learning

[ Audio Version ] I've been thinking more about perceptual-level thinking and how to implement it in software. In doing so, I've started formulating a model of how cortical neural networks might work, at least in part. I'm sure it's not an entirely new idea, but I haven't run across it in quite this form, so far. One of the key questions I ask myself is: how does human neural tissue learn? And, building on Jeff Hawkins' memory-prediction model , I came up with at least one plausible answer. First, however, let me say that I use the term "neuron" here loosely. The mechanisms I ascribe to individual neurons may turn out to be more a function of groups of them working in concert. Let me start with the notion of a group of neurons in a "neural bank". A bank is simply a group of neurons that are all looking at the same inputs, as illustrated in the following figure: Perhaps it's a region of the input coming from the auditory nerves. Or perhaps

A standardized test of perceptual capability

[ Audio Version ] I've been getting too lost in the idiosyncrasies of machine vision of late and missing my more important focus on intelligence, per se. I'm changing direction, now. My recent experiences have shown me that one thing we haven't really done well is in the area of perceptual level intelligence. We have great sensors and cool algorithms for generating interesting but primitive information about the world. Edge detection, for example, can be used to generate a series of lines in a visual scene. But so what? Lines are just about as disconnected from intelligence as the raw pixel colors are. Where do primitive visual features become percepts? Naturally, we have plenty of systems designed to instantly translate visual (or other sensory) information into known percepts. Put little red dots around a room, for instance, and a visual system can easily cue in on them as being key markers for a controlled-environment system. This is the sort of thinking that is used in

Using your face and a webcam to control a computer

[ Audio Version ] I don't normally do reviews of ordinary products. Still, I tried out an interesting one recently that makes practical use of a fairly straightforward machine vision technique that I thought worth describing. The product is called EyeTwig ( ), and is billed as "head mouse". That is, you put a camera near your computer monitor, aim it at your face, and run the program. Then, when you move your head left and right, up and down, the Windows cursor, typically controlled by your mouse, moves about the screen in a surprisingly intuitive and smooth fashion. Most people would recognize the implication that this could be used by the disabled. I thought about it, though, and realized that this application is limited mainly to those without mobility below the neck. And many of those in that situation have limited mobility of their heads. Still, a niche market is still a market. I think the product's creator sees that the real potential lies in an

Stereo disparity edge maps

[ Audio Version ] I've been experimenting further with stereo vision. Recently, I made a small breakthrough that I thought worth describing for the benefit of other researchers working toward the same end. One key goal of mine with respect to stereo vision has been the same as for most involved in the subject: being able to tell how far away things in a scene are from the camera or, at least, relative to one another. If you wear glasses or contact lenses, you've probably seen that test of your depth perception in which you look through polarizing glasses at a sheet of black rings and attempt to tell which one looks like it is "floating" above the others. It's astonishing to me just how little disparity there has to be between images in one's left and right eyes in order for one to tell which ring is different from the others. Other researchers have used a variety of techniques for getting a machine to have this sort of perception. I am currently using a combin

Some stereo vision illusions

[ Audio Version ] While engaging in some stereo vision experiments, I found myself a little stuck. I stopped working for a while and started staring at a wall on the opposite side of the room, pondering how my own eyes deal with depth perception. I crossed my eyes to study certain facets of my visual system. I got especially interested when I crossed my eyes so that the curtains on either side of the doorway were overlapped. I wasn't surprised to find my eyes were only too happy to lock the two together, given how similar they looked. I was, however, surprised to see how well my visual system fused various differences between the two images together into a single end product. It even became difficult to tell which component of the combined scene came from which eye without closing one eye. I thought it worthwhile to create some visual illusions based on some of these observations. To view them, you'll need to cross your eyes so that your right eye looks at the left image and vi

Topics in machine vision

[ Audio Version ] Once again, I've forgotten to announce a sub-site I created recently that I call Topics in Machine Vision (click here) , back on August 28th. Unlike my earlier Introduction to Machine Vision , it does not set out to give a broad overview of the subject matter. Instead, it's geared toward the researcher with at least some familiarity with the subject. Also, whereas I intended the introduction to stand complete on its own, Topics is more organic, meaning that I'll continue to add content to it as time passes. Knowing that this could get to be difficult to read and manage, I've broken down Topics into separate sections and pages. The first section I've fleshed out is on the Patch Equivalence concept I introduced in an earlier blog entry here. In fact, once I introduced this topic in detail, I went back and ran some experiments in application of the PE concept to stereo vision and published the results , including tons of example images that demonstrat

Introduction to machine vision

[ Audio Version ] Recently, I completely forgot to mention that I published a brief introduction to machine vision (click here) on August 14th. It's meant to be tailored to people who want to better understand the subject but haven't had much experience outside the popular media's thin portrayal of it. By contrast, much of what's written that gets into the nuts and bolts is often difficult to read because it requires complex math skills or otherwise expects you to have a fairly strong background in the subject, already. I'm especially fond of demystifying subjects that look exceptionally complex. Machine vision often seems like a perfect example of pinheads making the world seem too complicated and their work more impressive than it really is. Sometimes it comes down to pure huckstery as cheap tricks and carefully groomed examples are employed in pursuit of funding or publicity. Then again, there's an awful lot of very good and creative work out there. It's

Bob Mottram, crafty fellow

[ Audio Version ] I sometimes use my rickety platform here to review new technologies and web sites, but I haven't done enough to give kudos to the unusual people in AI that dot the world and sometimes find their way online. Bob Mottram is one such person that deserves mention. Who is Bob Mottram? He's a 33-ish year old British programmer who has found a keen interest in the field of Artificial Intelligence. He seems to be fairly well read on a variety of studies and technologies that are around. What starts to make him stand out is his active participation in the efforts. Like me, he finds that many of the documents out there that describe AI technologies sound tantalizingly detailed, but are actually very opaque when it comes to the details. Unlike most, however, he takes this simply as a challenge to surpass. He designs and codes and experiments until his results start to look like what is described in the literature. The next thing that sets Mottram apart is his willingne

Stereo vision: measuring object distance using pixel offset

[ Audio Version ] I've had some scraps of time here and there to put toward progress on my stereo vision experiments. My previous blog entry described how to calibrate a stereo camera pair to find X and Y offsets that correspond in the right camera with the same position in the left camera when they are both looking at a far off "infinity point". Once I had that, I knew it was only a small step to use the same basic algorithm for dynamically getting the two cameras "looking" at the same thing even when the subject matter is close enough for the two cameras to actually register a difference. And since I have the vertical offset already calculated, I was happy to see the algorithm running along this single horizontal "rail" runs faster. The next logical step, then, was to see if I could figure out the formula for telling how far away what the cameras are looking at is from the cameras. This is one of the core reasons for using a pair of cameras inste

Automatic alignment of stereo cameras

[ Audio Version ] I'm currently working on developing a low-level stereo vision component tentatively called "Binoculus". It builds on the DualCameras component, which provides basic access to two attached cameras. To it, Binoculus already adds calibration and will hopefully add some basic ability to segment parts of the scene by perceived depth. For now, I've only worked on getting the images from the cameras to be calibrated so they both "point" in the same direction. The basic question here is: once the cameras point roughly in the same direction, how many horizontal and vertical pixels off is the left one from the right? I had previously pursued answering this using a somewhat complicated printed graphic and a somewhat annoying process, because I was expecting I would have to deal with spherical warping, differing camera sizes, differing colors, and so on. I've come to the conclusion that this probably won't be necessary, and that all that proba

DualCameras component

[ Audio Version ] I have been getting more involved in stereo, or "binocular", vision research. So far, most of my actual development efforts have been on finding a pair of cameras that will work together on my computer, an annoying challenge, to be sure. Recently, I found a good pair, so I was able to move on to the next logical step: creating an API for dealing with two cameras. Using C#, I created a Windows control component that taps into the Windows Video Capture API and provides a very simple interface. Consumer code needs only start capturing, tell it to grab frames from time to time when it's ready, and eventually (optionally) to stop capturing. There's no question of synchronizing or worrying about a flood of events. I dubbed the component DualCameras and have made it freely available for download , including all source code and full documentation. I've already been using the component for a while now and have made some minor enhancements, but I'm ha

Patch equivalence

[ Audio Version ] As I've been dodging about among areas of machine vision, I've been searching for similarities among the possible techniques they could employ. I think I've started to see at least one important similarity. For lack of a better term, I'm calling it "patch equivalence", or "PE". The concept begins with a deceptively simple assertion about human perception: that there are neurons (or tight groups of them) that do nothing but compare two separate "patches" of input to see if they are the same. A "patch", generally, is just a tight region of neural tissue that brings input information from a region of the total input. With one eye, for example, a patch might represent a very small region of the total image that that eye sees. For hearing, a patch might be a fraction of a second of time spent listening to sounds within a somewhat narrow band of frequencies, as another example. A "scene", here, is a contiguou

Machine vision: motion-based segmentation

[ Audio Version ] I've been experimenting, with limited success, with different ways of finding objects in images using what some vision researchers would call "preattentive" techniques, meaning not involving special knowledge of the nature of the objects to be seen. The work is frustrating in large part because of how confounding real-world images can be to simple analyses and because it's hard to nail down exactly what the goals for a preattentive-level vision system should be. In machine vision circles, this is generally called "segmentation", and usually refers more specifically to segmentation of regions of color, texture, or depth. Jeff Hawkins ( On Intelligence ) would say that there's a general-purpose "cortical algorithm" that starts out naive and simply learns to predict how pixel patterns will change from moment to moment. Appealingly simple as that sounds, I find it nearly impossible to square with all I've been learning about

Machine vision: spindles

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. Maybe I'm just grasping at straws, but I recently realized one can separate out a new kind of primitive visual element. Every day, we're surrounded by thin, linear structures. Power lines, picture frames, pin-striped shirts, and trunks of tall trees are all great examples of what I mean. A line drawing is often nothing more than these thin, linear structures, and most written human languages are predominated by them. The first word that comes to mind when I think about these things is "spindles". On one hand, it seems hard to imagine that we have some built-in way to recognize and deal with spindles as a primitive kind of shape like we might with, say, basic geometric shapes (e.g., squares and circles) or features like edges or regions. But something about them seems tempting from the perspective of machine vision goals. Spindly structu

Machine vision: smoothing out textures

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. While reading up more on how texture analysis has been dealt with in recent years, I realized yesterday that there may be a straightforward way to do basic region segmentation based on textures. I consider texture-based segmentation to be one of two major obstacles to stepping above the trivial level of machine vision today toward general purpose machine vision. The other regards recognizing and ignoring illumination effects. Something struck me a few hours after reading how one researcher chose to define textures. Among others, he made two interesting points. First, that a smooth texture must, when sampled at a sufficiently low resolution, cease to be a texture and instead become a homogeneous color field. Second, the size of a texture must be at least several times larger in dimensions (e.g., width and height) than a single textural unit. A rectangular

Machine vision: studying surface textures

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. It seems that one can't escape the complexities that come with texture. Previously, I had experimented with very low resolution images because they can blur a texture into a homogeneous color blob. There's a terrible tradeoff, though. The texture smoothes out while the edges get blocky and less linear. Too much information is lost. What's more, a lower resolution image will likely have more uneven distribution of similar but different colored pixels. A ball goes from having a texture with lots of local color similarity to a small number of pixels with unique colors. Moreover, it's a struggle for me with my own excellent visual capabilities to really understand what's in such low resolution images. It can't be that good of a technique if the source images aren't intelligible to human eyes. I think I will have to revisit the sub

Machine vision: pixel morphing

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. I'm entertaining the idea that our vision works using a sort of "pixel morphing" technique. To illustrate what I mean in the context of a computer program, imagine a black scene with a small white dot in it. We'll call this dot the "target". With each frame in time, the circle moves a little, smoothly transcribing a square over the course of, say, forty frames. That means the target is on each of the four edges for ten time steps. The target starts at the top left corner and travels rightward. The agent watching this should be able to infer that the dot seen in the first frame is the same as in the second frame, even though it has moved, say, 50 pixels away. Let's take this "magic" step as a given. The agent hence infers that the target is moving at a rate of 50 pixels per step. In the third frame, it expects the