Search This Blog

Monday, June 20, 2005

Machine vision: spindles

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

Maybe I'm just grasping at straws, but I recently realized one can separate out a new kind of primitive visual element. Every day, we're surrounded by thin, linear structures. Power lines, picture frames, pin-striped shirts, and trunks of tall trees are all great examples of what I mean. A line drawing is often nothing more than these thin, linear structures, and most written human languages are predominated by them.

The first word that comes to mind when I think about these things is "spindles".

On one hand, it seems hard to imagine that we have some built-in way to recognize and deal with spindles as a primitive kind of shape like we might with, say, basic geometric shapes (e.g., squares and circles) or features like edges or regions. But something about them seems tempting from the perspective of machine vision goals. Spindly structures in an image are obviously at least two dimensional, technically, yet ask a human to draw them and he'll most likely just draw thin lines. They're not just edges; not simply where one surface ends and a new one begins. They have their own colors and hence thickness.

Perhaps what makes spindles interesting to me is that it seems as though one could come up with a practical way of segregating spindles out of an image that may be easier than picking out, say, broad regions based on color blobs, texture spans, or edges. Finding blobs is hard in large part because it's hard to describe in simple terms what a given blob's shape is. A few blurry pixels along an otherwise sharp edge can bring a basic region growing technique to its knees and leave the researcher frustrated into hand adjusting cutoff thresholds to get the results he desires.

But spindles might be easier. Characterizing and recognizing a thin structure should be easier than an arbitrarily shaped blob. Even if the spindle is curved, branching, or somewhat jagged, it may still be easier than dealing with blobs. What's more, it's possible to compare the various spindles in an image to search for patterns that might give hints about 3D structures. Looking down a brick wall and you might pick out the horizontal white mortar lines as spindles and note that they all have a common vanishing point and thus hypothesize a 3D interpretation.

Spindles seem to come in two basic 3D flavors: colored edges and floating structures. The distinction, from a low level perspective, seems to be in whether what's on either side of a spindle is the same color or pattern or not. An overhead power line divides the sky, which is the same on both sides. A picture frame provides an enhancement of the boundary between a picture and the wall. Perhaps the similarity of the colors and textures on either side of a spindle also provide some basic suggestions about whether a given spindle is attached to one or both sides or is otherwise free-floating. The concept of "generic views" would say that it seems hard to imagine the frame around a picture might be floating in space in such a way that it would exactly line up with the picture, so the most plausible explanation is that it's no coincidence that the picture frame is actually in the same place as the picture. Whether it's attached to the wall or floating in space is a different question. So spindles can be helpful 3D cues.

I don't know whether to suggest that the human visual system sees spindles as a somehow separate sort of primitive, but it seems plausible. The very fact that printed characters in most all human languages are composed of spindles seems suggestive. Maybe it's because it's economical to write in strokes instead of blobs, but maybe it's more fundamental than that. It's also interesting that we have little trouble understanding technical "some assembly required" line drawings, even when they have no color, shading, or other 3D visual cues.

Perhaps spindles provide a way to explain how it is that a line drawing of a circle can be interpreted just as easily as a hollow hoop or a solid disk. That is, perhaps spindles are considered interchangeable with edges by our visual systems. Yet perhaps it's also that spindles stand out better than edges do.

Thursday, June 16, 2005

Machine vision: smoothing out textures

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

While reading up more on how texture analysis has been dealt with in recent years, I realized yesterday that there may be a straightforward way to do basic region segmentation based on textures. I consider texture-based segmentation to be one of two major obstacles to stepping above the trivial level of machine vision today toward general purpose machine vision. The other regards recognizing and ignoring illumination effects.

Something struck me a few hours after reading how one researcher chose to define textures. Among others, he made two interesting points. First, that a smooth texture must, when sampled at a sufficiently low resolution, cease to be a texture and instead become a homogeneous color field. Second, the size of a texture must be at least several times larger in dimensions (e.g., width and height) than a single textural unit. A rectangular texture composed of river rocks, for example, must be several rocks high and several rocks wide to actually be a texture.

Later, when I was trying to figure out what characteristics are worth consideration for texture-based segmentation that don't require me to engage in complicated mathematics, I remembered the concept I had been pursuing recently when I started playing with video image processing. I thought I could kill two birds with one stone by processing very low-res versions of source images: eliminating fine textures and reducing the number of pixels to process. I was disappointed, though, by the fact that low-res meant little information of value.

I realized that there was another way to get the benefit of blurring textures into smooth color fields without actually blurring (or lowering the resolution of - same thing) the images, per se. The principle is as follows.

Imagine an image that includes a sizable sampling of a texture. Perhaps it has a brick wall in the picture with no shadows, graffiti, or other significant confounding inclusions on the wall. The core principle is that there is a circle with a smallest radius CR (critical radius) that is large enough to be "representative" of at least one unit of that texture. In this case, what determines if it is representative is whether the circle can be placed anywhere within the bounds of the texture - the wall, in our example - and the average color of all the pixels within it will be almost exactly the same as if the circle were set anywhere else in that single-texture region.

If we want to identify the brick wall's texture as standing apart from the rest of the image, then, we have to do two things in this context. One, we need to find that critical radius (CR). Two, we need to populate the wall-textured region with enough CR circles so no part is left untested, yet no CR circle extends beyond the region. The enclosed region, then, is the candidate region.

I suppose this could work with squares, too. It doesn't have to be circles, but there may be some curious symmetric effects that come into play that I'm not aware of. Let's limit the discussion to circles, though.

So how does one determine the critical radius? A single random test won't do, because we don't know in advance that the test circle actually falls within a texture without some a priori knowledge. Our goal is to discover, not just validate.

I propose a dynamically varying grid of test circles that looks for local consistencies. Picture a grid in which at each node, there is centered a circle. The circles should overlap in such a way that there are no gaps. That is, the radius should be at least half the distance between one node and the node one unit down and across from it. In the first step, the CR (radius) chosen and hence the grid spacing would be small - two pixels, for example. As the test progresses, CR might grow by a simple doubling process or by some other multiplier. The grid would cover the entire image under consideration. The process would continue upward until the CR values chosen no longer allow for a sufficient number of sample circles to be created within the image.

The result of each pass of this process would be a new "image", with one pixel per grid node in the source image. That pixel's color would be the average of the colors of all the pixels within the test circle at that node. We would then search the new image for smooth color blobs using traditional techniques. Any significantly large blobs would be considered candidates for homogeneous textures.

I'm not entirely sure exactly how to make use of this information, but there's something intuitively satisfying about it. I've been thinking for a while now that we note the average colors of things and that that seems to be an important part of our way of categorizing and recognizing things. A recent illustration of this for me is a billboard I see on my way to work. It has a large region of cloudy sky, but the image is washed to an orangish-tan. From a distance, it looks to me just like the surface of a cheese pizza. So even though I know better, my first impression whenever I see this billboard - before I think about it - is of a cheese pizza. The pattern is obviously of sky and bears only modest resemblance to a pizza, but the overall color is very right.

Perhaps one way to use the resulting tiers of color blobs is to break down and analyze textures. Let's say I have one uniform color blob at tier N. I can look at the pixels of the N - 1, higher resolution version of this same region. One question I might ask is whether those pixels too are consistent. If so, maybe the texture is really just a smooth color region. If not, then maybe I really did capture a rough but consistent texture. I might then try to see how much variation there is in that higher resolution level. Maybe I can identify the two or three most prominent colors. In my sky-as-cheese-pizza example, it's clear that I see the dusty orange and white blobs collectively as appearing pizza-like; it's not just the average of the two colors. I could also use other conventional texture analysis techniques like co-occurrence matrices. Once I have the smoothness point (resolution) for a given color blob, I can perhaps double or quadruple the resolution to get it sufficiently rough for single-pixel-distances common in such analysis instead of having resolutions so high that such techniques don't work well.

Critics will be quick to point out that all I'm capturing in this algorithm is the ambient color of a texture. I might have a picture of oak trees tightly packed and adjacent to tightly packed pine trees. The ambient color of the two kinds of trees' foliage might be identical and so I would see them as a single grouping. To that I say the quip is valid, but probably irrelevant. I think it's reasonable to hypothesize that our own eyes probably deal with ambient texture color "before" they get into details like discriminating patterns. Further, I think a system that can successfully discriminate purely based on ambient texture color would probably be much farther ahead than alternatives I've seen to date. That is, it seems very practical.

Besides, the math is very simple, which is a compelling reason to me for believing it's something like how human vision might work. I can imagine the co-occurrence concept playing a role, but the combinatorics for a neural network that doesn't regular change its physical structure seem staggering. By contrast, it may take a long time for a linear processor to go through all these calculations, but the function is so simple and repetitive that it's pretty easy to imagine a few cortical layers implementing it all in parallel and getting results very quickly.

As a side note, I'm pretty well convinced that outside the fovea, our peripheral vision is doing most of its work using simple color blobs. Once we know what an object is, we just assume it's there as its color blobs move gradually around the periphery until the group of them moves out of view. It seems we track movement there, not details. The rest is just an internal model of what we assume is there. This strengthens my sense that within the fovea, there may be a more detail-oriented version of this same principle at work.

What I haven't figured out yet is how to deal with illumination effects. I suspect the same tricks that would be used for dealing with an untextured surface that has illumination effects on it would also be used on the lower resolution images generated by this technique. That is, the two problems would have to be processed in parallel. They could not be dealt with one before the other, I think.

Wednesday, June 15, 2005

Machine vision: studying surface textures

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

It seems that one can't escape the complexities that come with texture. Previously, I had experimented with very low resolution images because they can blur a texture into a homogeneous color blob. There's a terrible tradeoff, though. The texture smoothes out while the edges get blocky and less linear. Too much information is lost. What's more, a lower resolution image will likely have more uneven distribution of similar but different colored pixels. A ball goes from having a texture with lots of local color similarity to a small number of pixels with unique colors.

Moreover, it's a struggle for me with my own excellent visual capabilities to really understand what's in such low resolution images. It can't be that good of a technique if the source images aren't intelligible to human eyes.

I think I will have to revisit the subject of studying textures. An appropriate venue would be a scene with a simple white or black backdrop and uniform-texture objects moving around in close proximity to the video camera. Objects might include a tennis ball, various rocks, pieces of fabric, plastic sheets, and so on. The goal would be to get an agent to "understand" such textures. One critical aspect of understanding would be that it could later identify a texture it has studied. The moving around of an object with a given texture is important. It's not enough to use a still image of a texture to really understand it. Textured surfaces tend to have wide variation in their appearances as they are moved about and reshaped. To recognize a texture requires that it be abstracted in a way that can overcome such variations.

Friday, June 10, 2005

Machine vision: pixel morphing

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

I'm entertaining the idea that our vision works using a sort of "pixel morphing" technique. To illustrate what I mean in the context of a computer program, imagine a black scene with a small white dot in it. We'll call this dot the "target". With each frame in time, the circle moves a little, smoothly transcribing a square over the course of, say, forty frames. That means the target is on each of the four edges for ten time steps.

The target starts at the top left corner and travels rightward. The agent watching this should be able to infer that the dot seen in the first frame is the same as in the second frame, even though it has moved, say, 50 pixels away. Let's take this "magic" step as a given. The agent hence infers that the target is moving at a rate of 50 pixels per step. In the third frame, it expects the target to be 50 pixels further to the right and looks for it there.

Eventually, the target reaches the right edge of the square and starts traversing downward along that edge. Our agent is expecting the target to be 50 pixels to the right in the next step and so looks for it there. It doesn't find it. Using an assumption that things don't usually just appear and disappear from view, the agent looks around for the target until it finds it. It now has a new estimate of where it will be in the next frame: 50 pixels below its position in the current frame.

Now, since the target is the only thing breaking up the black backdrop, it leaves something ambiguous. Is the target moving or is the scene moving, as might happen if a robot were falling over? We'll prefer to assume the entire scene is moving because there's nothing to suggest otherwise. So now let's draw a solid brown square around the invisible square the target traverses. The result looks like a white ball moving around inside a brown box. Starting from the first frame, again, we magically notice the target has moved 50 pixels to the right in the second frame. The brown square has not moved. We could interpret this as the ball moving in a stationary scene or as a moving scene with the brown box moving so as to perfectly offset the scene's motion. This latter interpretation seems absurd, so we conclude the target is what is moving. Incidentally, this helps explain why one prefers to think of the world as moving while the train he is on is "stationary". The train provides the main frame of reference in the scene, unless one presses his face against the window and so only sees the outside scene.

Now to explain the magic step. The target's position has changed from frame N to frame N + 1. It's now 50 pixels to the right. How can we programmatically infer that these two things are the same? For the second frame, we don't have any way to predict that the target will be 50 pixels to the right of its original position. What should happen is that the agent should assume that objects don't just disappear. Seeing the circle is missing, it should go looking for it. One approach might be to note that now there's a "new" blob in the scene that wasn't there before. The two are roughly the same size and color, so it seems reasonable to assume they are the same and to go from there. It becomes a collaboration between the interpretations of two separate frames.

But there's still some magic behind this approach. We simplified the world by just having flat-colored objects like the brown rectangle and the white circle. There are no shadows or other lighting effects and only a small number of objects to deal with. The magic part is that we were able to identify objects before we bothered to match them up. A vision system should probably be able to do matching up of parts of a picture before recognizing objects. But how?

One answer might be to treat a single frame as though it were made of rubber. Each pixel in it can be pushed in some direction, but one should expect the neighboring pixels will move in similar directions and distances, with that expectation falling off with distance from any given pixel that moves. Imagine a picture of a 3D cube with each side a different color rotating along an up/down axis, for example. You see the top of the cube and the sides nearest you. And the lighting is such that the front faces change color subtly as the cube rotates. Looking down the axis, the cube is rotating clockwise, which means you see the front faces moving from right to left.

Imagine the pixels around the top corner nearest you. You see color from three faces: the two front faces and the top face. Let's talk about frames 1 and 2, where 2 comes is right after frame 1. In frame 1 the corner we're looking at is a little off to the right of the center of the frame and in frame 1, it's a little left of the center. We want the agent considering these frames to intuit from frames 1 and 2 that the corner under consideration has moved and where it is. Now think of frame 1 as a picture made of rubber. Imagine stretching it so that it looks like frame 2. With your finger, you push the corner we're considering an inch to the left so it lines up with the same corner in frame 2. Other pixels nearby go with it. Now you do the same with the bottom corner just below it and it's starting to look a little more like frame 2. You do the same along the edge between these two corners until the edge is pretty straight and lines up the same edge in frame 2. And you do the same with each of the edges and corners you find in the image.

Interestingly, you can do this with frame 3, too. You can keep doing this with each frame, but eventually things "break". The left front face eventually is rotated out of view. All those pixels in that face can't be pushed anywhere that will fit. They have to be cut out and the gap closed, somehow. Likewise, a new face eventually appears on the right, and there has to be a way to explain its appearance. Still much of the scene is pretty stable in this model. Most pixels are just being pushed around.

How would such a mechanism be implemented in code? When the color of a pixel changes, the agent can't just look randomly for another pixel in the image that is the same color and claim it's the same one. Even the closest match might not be the same one. But what if each pixel were treated as a free agent that has to bargain with its nearby neighbors to come up with a consistent explanation that would, collectively, result in morphing of the sort described above? Strength in numbers would matter. Those pixels whose colors don't change would largely be non-participants. Only those that change from one frame to another would. From frame 1 to 2, pixel P changes color. In frame 1, pixel P was in color blob B1. In frame 2, P searches all the color blobs for the one whose center is closest to P that is strongly similar in color. It tries to optimize its choice on closeness in both distance and color. In the meantime, every other P that changes from frame 1 to 2 is doing the same. When it's all done, every changed-pixel P is reconsidered by reference to its neighbors. What to do next is not clear, though.

One thing that should come out of the collaborative process, though, is a kind of optimization. Once some pixels in the image have been solidly matched by morphing, they should give helpful hints to nearby pixels as to where to begin their searches as well. If pixel P has moved 50 pixels to the right and 10 down, the pixel next to P has probably also moved about 50 pixels to the right and 10 down.

In the case of the white circle moving around, it should be clear. But what if a white border were added around the brown square? The brown hole created as the white circle moves from that position to the next might result in all changed pixels P guessing that the white pixels in the border nearest the new brown hole are actually where the white circle went, but this doesn't make sense. Similarly, the new white circle in frame 2 could be thought to have come from out of the white border; again, this doesn't make sense.

One answer would be a sort of conservation of "mass" concept, where mass is the number of pixels in some color blob. The white circle in frame 2 could have come from the wall, but that would require creating a bunch of new white pixels. And the white circle in frame 1 could have disappeared into the white border, but this would require a complete loss of those pixels. Perhaps the very fact that we have a mass of pixels in one place in frame 1 and the same mass of the same color of pixels in another place in frame 2 should lead us to conclude that they are the same.

There's a lot of ground to cover with this concept. I think there's value to the idea of bitmap morphing. I think a great illustration of how this could be used by our own vision is how we deal with driving. Looking forward, the whole scene is constantly changing, but only subtly. Only the occasional bird darting by or other fast-moving objects screw up the impression of a single image that's subtly morphing.

Machine vision: motion tracking

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

I was thinking earlier about something related to the head tilting issue. As I was walking around, I tracked stationary points in space. While my sense is of a stable view of the stationary point, I found that my eyes do actually saccade very rapidly and very subtly to keep the target in the center of my fovea. That is, my gaze is not stable.

My gaze does appear to be predictive, though. It seems as I move, my eye comes to predict where the stationary point will be in the next moment and keeps my eyes moving to keep up. It's a little like shooting skeet. You see the clay pigeon emerge from the launcher and continuously adjust your muscles to keep it in your gun's sights. You could close your eyes and keep moving the gun along the predicted trajectory, but as time goes by, the gun will move farther and farther off target.

As a side note, this may help explain why my eyes sometimes get tired when I rock in my seat while working at my computer. The screen's position relative to my eyes is constantly changing. Surely my eyes have to work harder to keep focused on the screen's contents.

Machine vision: tilting my head

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

I observed something very interesting today. When I look at a fixed position in a relatively static scene and tilt my head slowly left or right - rotate it, essentially - something unexpected happens. The scene seems to "click" into position at a rate of perhaps twice a second. The effect is similar to watching a poster rotate with a strobe light flashing every half-second, minus the blackness. And, funny enough, it feels as though my eyeballs are rotating and clicking with each step.

I thought maybe this had something to do with the fact that I have two eyes, so I closed one eye and repeated the experiment. Same result.

I tried this because I wanted to know how our eyes deal with changes in rotation. I was thinking about how to get software to deal with a change in point of view. When your saccades around a scene, it somehow almost instantly orients itself to the new point of view. It occurred to me that maybe the brain somehow plans the saccade and predicts how much the scene will "shift" by. A computer should be able to do this, too. The hard part is predicting how far a camera's saccade will shift the scene. With a "soft fovea" inside a fixed view, this is easy.

But it seems the tilting-head case throws the brain for a loop. I believe what's happening is that the lower level visual processor doesn't know how to deal with the whole scene rotating and so calls for a "reset" of the image, as though you had blinked and, upon opening your eyes, found yourself in an entirely new scene.

I estimate it takes a little less than half a second to deal with the new orientation. It would be interesting to experiment with the brain's ability to learn to deal continuously with such rotations. I bet it would be like switching from contacts to glasses or vice-versa. At first, the world appears strangely bouncy as I move about. Within a few minutes, I find that bounciness goes away. I assume this is because my brain learns to make predictions about how the scene I see will respond to my movements.

Machine vision: layer-based models

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

It's challenging for MV software to figure out, when looking at a complex scene, how to segment it into distinct objects. The main reason is that there doesn't seem to be anything intrinsic in an image to suggest boundaries among objects.

Perhaps expectations can play into it. I was experimenting with a simple sort of expectation system in which the video camera gazes as a static scene. In time, the output image dissolves into black. Only when an object passes into the field of view does it break out from black. The moving parts stand out. I they stand still for a while, they too fade to black to indicate that they are now part of the static scenery. The mechanism is pretty simple. There's an "ambient" image that is built with time. Each pixel is constantly being scanned and an expectation for what its color should be is built. Later, a simple comparison of the current scene's image to the ambient image will only yield non-zero pixel differences wherever a pixel color has suddenly changed, typically because an object is moving through.

That's a cool experiment, but not useful for much. Perhaps it could be used to help isolate objects long enough to build simple models of them. Add a little sophistication to the above. Instead of constantly morphing an ambient image over time, the agent pauses a few moments initially to determine that the entire scene is static, then takes a snapshot, perhaps averaged out over two or three frames to cancel out typical noise. Henceforth, so long as the agent knows its looking at the same scene, it would cancel it out using the snapshot - the "model" - to see if anything new is there. A person might sit down in a chair in the scene for a few minutes, but he'd not disappear from the scene, even though he's stationary.

Thursday, June 9, 2005

Machine vision: 2D collages

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

I've been nursing the idea that it's not necessary to have a detailed sense of how far away things in an image are. It's probably sufficient, in some basic contexts, to just know that one thing is in front of another and not care about absolute distances. It seems some MV researchers have gone ape over telling exactly how far away an apple on a table is using lasers, stereo displacement, and all sorts of tricks. Maybe just knowing how big an apple typically is is good enough for telling how far away it is.

When I think about 3D vision in this context, I have been likening the visible world to a collage of 2D images. Take the scene seen by a stationary camera looking at a road as cars go by. One could take the unchanging background as one image. A car moving by would be the only object of interest. What's interesting is that the image of the car, from snapshot to snapshot, doesn't change much. It's as though one just took the previous image of the car and stretched and warped it a little in order to get the current image of the car. That "smooth morphing" idea is at the heart of this 2D collage analogy.

In the car example, it should be fairly easy to use the conventional technique of seeing pixel differences between a before and after image to isolate the car from the background. Not sure yet how to deal with the morphing. It seems, fair, though, to assume that the car doesn't just disappear unless it's heading out of the scene. Instead, it should suffice to take the "before car" and place it in the "after car" space and then scale it to fit the blob. Then comes a comparison step to see how the two car images differ. Perhaps key points - edges or corners - can be found and their positions corresponded.

Machine vision: hierarchy of regions

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

One thing that most of us in MV don't want to admit is that we arbitrarily set thresholds for distinguishing where one thing ends and another begins. I don't think we work that way, per se. I'd like to see edge- or color-blob-finding techniques having varying thresholds. One use would be in finding large regions with high thresholds, then using ever narrower thresholds to find the sub-regions within the broader ones.

In a similar vain, I'm considering using low-res images to find homogeneous-color blobs in image Rich textures can disappear when the resolution is low, leaving just the overall color. A field of grass, for example, becomes a solid sheet of green. Once the field is isolated, it can be scrutinized in finer detail to see if there's something small that's of interest in it.

Machine vision: cost-effective action

[Audio Version]

Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

One thing that seems to dog many MV techniques is how slow or otherwise resource-hungry they are. I'm realizing that one thing that seems a must is a set of basic vision tools that allow for trading time for effectiveness. For example, given a whole image, the agent should be able to focus on a small portion - like your own fovea does - instead of trying to analyze the entire image. Also, the agent should be able to choose a lower quality image in order to reducing processing time.

Ideally, an agent would be able to learn to estimate how much time each operation will take and to thus be able to choose which techniques to use and how intently to apply them based on how well they serve various goals. If, for example, the goal is to track the movement of one or more objects, a full-image, low-res approach might do. To study a stationary object in detail, by contrast, might suggest a small-portion, high-res approach.

Machine vision: overlooking shadow and light splotches on surfaces

[Audio Version]

Following is another in the aforementioned series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

Shadows and light splotches that fall on even perfectly smooth surfaces really trip up systems designed to detect objects by finding contiguous surfaces. We don't seem to be fooled by such issues very often.

We are fooled when there are ambiguities in what we see. Perhaps understanding what makes one situation ambiguous versus another will help isolate what differentiates the two for the benefit of codification.

Looking at a picture I took looking down a tree-lined sidewalk, I found a great example of the issue. Shadows of trees fall on the sidewalk, creating a fairly smooth, two-tone division between shadowed and non-shadowed portions. I see a continuous sidewalk.

Machine vision: blob growth

[Audio Version]

Recently, I've been spending a lot of my free time thinking about machine vision. I've been running a variety of simple experiments into different techniques and trying somehow to formulate a cohesive theory and tool set for creating a general purpose vision system. I feel bad that I haven't been blogging lately, though. I guess I've just assumed I need something significant to blog about so it's not a waste of people's time.

Ironically, I've been keeping a small, ad hoc journal of some ideas about the subject. I figured that perhaps it's worth sharing. The next new entries are simply extracts from it. They're far less formal than most of my already informal blog entries. I apologize for not putting them in sufficient context, which I usually try to do when I blog. So, without further ado, following is the first entry.

I keep trying to figure out a way to isolate regions. My bubble growth algorithm isn't all that bad, but not great. There's a nasty problem with spill-over where edges are poorly defined.

I'm reminded of my reading of Visual Intelligence. Hoffman addresses the concept of what I like to call "segmentation" of figures. A complex silhouette of a human, for example, might be segmented into a head, arms, legs, and a torso. The key to segmentation, in Hoffman's view, is finding the convex portions and starting cuts through the silhouette at them. To my thinking, the result tends to be smaller segments that generally don't have major concave corners or curves any more.

I've been trying to think of it from the other side, though. What if one took the bubble concept and added a certain "desire" of a bubble to keep from having small bulges? Perhaps just avoiding sharp concave corners would provide an interesting result. A bubble that begins growing in the center of the head in a silhouette might stop growing as it reaches the neck because further growth would create sharp concavities.