Posts

Showing posts from June, 2005

Machine vision: spindles

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. Maybe I'm just grasping at straws, but I recently realized one can separate out a new kind of primitive visual element. Every day, we're surrounded by thin, linear structures. Power lines, picture frames, pin-striped shirts, and trunks of tall trees are all great examples of what I mean. A line drawing is often nothing more than these thin, linear structures, and most written human languages are predominated by them. The first word that comes to mind when I think about these things is "spindles". On one hand, it seems hard to imagine that we have some built-in way to recognize and deal with spindles as a primitive kind of shape like we might with, say, basic geometric shapes (e.g., squares and circles) or features like edges or regions. But something about them seems tempting from the perspective of machine vision goals. Spindly structu

Machine vision: smoothing out textures

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. While reading up more on how texture analysis has been dealt with in recent years, I realized yesterday that there may be a straightforward way to do basic region segmentation based on textures. I consider texture-based segmentation to be one of two major obstacles to stepping above the trivial level of machine vision today toward general purpose machine vision. The other regards recognizing and ignoring illumination effects. Something struck me a few hours after reading how one researcher chose to define textures. Among others, he made two interesting points. First, that a smooth texture must, when sampled at a sufficiently low resolution, cease to be a texture and instead become a homogeneous color field. Second, the size of a texture must be at least several times larger in dimensions (e.g., width and height) than a single textural unit. A rectangular

Machine vision: studying surface textures

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. It seems that one can't escape the complexities that come with texture. Previously, I had experimented with very low resolution images because they can blur a texture into a homogeneous color blob. There's a terrible tradeoff, though. The texture smoothes out while the edges get blocky and less linear. Too much information is lost. What's more, a lower resolution image will likely have more uneven distribution of similar but different colored pixels. A ball goes from having a texture with lots of local color similarity to a small number of pixels with unique colors. Moreover, it's a struggle for me with my own excellent visual capabilities to really understand what's in such low resolution images. It can't be that good of a technique if the source images aren't intelligible to human eyes. I think I will have to revisit the sub

Machine vision: pixel morphing

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. I'm entertaining the idea that our vision works using a sort of "pixel morphing" technique. To illustrate what I mean in the context of a computer program, imagine a black scene with a small white dot in it. We'll call this dot the "target". With each frame in time, the circle moves a little, smoothly transcribing a square over the course of, say, forty frames. That means the target is on each of the four edges for ten time steps. The target starts at the top left corner and travels rightward. The agent watching this should be able to infer that the dot seen in the first frame is the same as in the second frame, even though it has moved, say, 50 pixels away. Let's take this "magic" step as a given. The agent hence infers that the target is moving at a rate of 50 pixels per step. In the third frame, it expects the

Machine vision: motion tracking

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. I was thinking earlier about something related to the head tilting issue. As I was walking around, I tracked stationary points in space. While my sense is of a stable view of the stationary point, I found that my eyes do actually saccade very rapidly and very subtly to keep the target in the center of my fovea. That is, my gaze is not stable. My gaze does appear to be predictive, though. It seems as I move, my eye comes to predict where the stationary point will be in the next moment and keeps my eyes moving to keep up. It's a little like shooting skeet. You see the clay pigeon emerge from the launcher and continuously adjust your muscles to keep it in your gun's sights. You could close your eyes and keep moving the gun along the predicted trajectory, but as time goes by, the gun will move farther and farther off target. As a side note, this may

Machine vision: tilting my head

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. I observed something very interesting today. When I look at a fixed position in a relatively static scene and tilt my head slowly left or right - rotate it, essentially - something unexpected happens. The scene seems to "click" into position at a rate of perhaps twice a second. The effect is similar to watching a poster rotate with a strobe light flashing every half-second, minus the blackness. And, funny enough, it feels as though my eyeballs are rotating and clicking with each step. I thought maybe this had something to do with the fact that I have two eyes, so I closed one eye and repeated the experiment. Same result. I tried this because I wanted to know how our eyes deal with changes in rotation. I was thinking about how to get software to deal with a change in point of view. When your saccades around a scene, it somehow almost instantly o

Machine vision: layer-based models

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. It's challenging for MV software to figure out, when looking at a complex scene, how to segment it into distinct objects. The main reason is that there doesn't seem to be anything intrinsic in an image to suggest boundaries among objects. Perhaps expectations can play into it. I was experimenting with a simple sort of expectation system in which the video camera gazes as a static scene. In time, the output image dissolves into black. Only when an object passes into the field of view does it break out from black. The moving parts stand out. I they stand still for a while, they too fade to black to indicate that they are now part of the static scenery. The mechanism is pretty simple. There's an "ambient" image that is built with time. Each pixel is constantly being scanned and an expectation for what its color should be is built. Late

Machine vision: 2D collages

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. I've been nursing the idea that it's not necessary to have a detailed sense of how far away things in an image are. It's probably sufficient, in some basic contexts, to just know that one thing is in front of another and not care about absolute distances. It seems some MV researchers have gone ape over telling exactly how far away an apple on a table is using lasers, stereo displacement, and all sorts of tricks. Maybe just knowing how big an apple typically is is good enough for telling how far away it is. When I think about 3D vision in this context, I have been likening the visible world to a collage of 2D images. Take the scene seen by a stationary camera looking at a road as cars go by. One could take the unchanging background as one image. A car moving by would be the only object of interest. What's interesting is that the image of t

Machine vision: hierarchy of regions

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. One thing that most of us in MV don't want to admit is that we arbitrarily set thresholds for distinguishing where one thing ends and another begins. I don't think we work that way, per se. I'd like to see edge- or color-blob-finding techniques having varying thresholds. One use would be in finding large regions with high thresholds, then using ever narrower thresholds to find the sub-regions within the broader ones. In a similar vain, I'm considering using low-res images to find homogeneous-color blobs in image Rich textures can disappear when the resolution is low, leaving just the overall color. A field of grass, for example, becomes a solid sheet of green. Once the field is isolated, it can be scrutinized in finer detail to see if there's something small that's of interest in it.

Machine vision: cost-effective action

[ Audio Version ] Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision. One thing that seems to dog many MV techniques is how slow or otherwise resource-hungry they are. I'm realizing that one thing that seems a must is a set of basic vision tools that allow for trading time for effectiveness. For example, given a whole image, the agent should be able to focus on a small portion - like your own fovea does - instead of trying to analyze the entire image. Also, the agent should be able to choose a lower quality image in order to reducing processing time. Ideally, an agent would be able to learn to estimate how much time each operation will take and to thus be able to choose which techniques to use and how intently to apply them based on how well they serve various goals. If, for example, the goal is to track the movement of one or more objects, a full-image, low-res approach might do. To study a stationary object in detail

Machine vision: overlooking shadow and light splotches on surfaces

[ Audio Version ] Following is another in the aforementioned series of ad hoc journal entries I've been keeping of my thoughts on machine vision. Shadows and light splotches that fall on even perfectly smooth surfaces really trip up systems designed to detect objects by finding contiguous surfaces. We don't seem to be fooled by such issues very often. We are fooled when there are ambiguities in what we see. Perhaps understanding what makes one situation ambiguous versus another will help isolate what differentiates the two for the benefit of codification. Looking at a picture I took looking down a tree-lined sidewalk, I found a great example of the issue. Shadows of trees fall on the sidewalk, creating a fairly smooth, two-tone division between shadowed and non-shadowed portions. I see a continuous sidewalk.

Machine vision: blob growth

[ Audio Version ] Recently, I've been spending a lot of my free time thinking about machine vision. I've been running a variety of simple experiments into different techniques and trying somehow to formulate a cohesive theory and tool set for creating a general purpose vision system. I feel bad that I haven't been blogging lately, though. I guess I've just assumed I need something significant to blog about so it's not a waste of people's time. Ironically, I've been keeping a small, ad hoc journal of some ideas about the subject. I figured that perhaps it's worth sharing. The next new entries are simply extracts from it. They're far less formal than most of my already informal blog entries. I apologize for not putting them in sufficient context, which I usually try to do when I blog. So, without further ado, following is the first entry. I keep trying to figure out a way to isolate regions. My bubble growth algorithm isn't all that bad, but not gr