Review of "Visual Intelligence"

[Audio Version]

When I was in the store eyeing up On Intelligence, I also noticed an interesting looking book titled Visual Intelligence, by Donald D. Hoffman, that I was pretty sure I'd have to get back to. I finally bought it a few days ago. Owning to the circumstances of a bit of recent travel, I found I have had a bunch of time to read it. I'm a slow reader, but my inner geek found this book so gripping that I finished this roughly 200 page book off in two days. (I suppose if it had more words and fewer of the pretty pictures, it might have taken a few more days and been less gripping.)

Given that I found Visual Intelligence a very cool book, I thought it worth writing a review. I'll begin by saying it's incredibly well written and that most of it should be easily reachable by the casual reader. It lends many cool insights into the curious nature of human vision and, by implication, all the other senses.

My read of On Intelligence and its implications are still churning about in my mind. I feel in some ways like a fool for not having put in a variety of observations I've had since writing my review into it. Perhaps I was too hasty. Oh, well, I'll do it again now by beginning a review of Visual Intelligence not more than an hour after finishing reading it. Every point in time after reading a book seems to bring with it ups and downs when it comes to the context necessary for reviewing a book. Years after reading a book, I'm more likely to have a nicely nuanced view of it, but I'm also likely to have ascribed to it all sorts of claims, characters, and other ideas that were never there. Oh, well.

I also mention On Intelligence here because I found Visual Intelligence an interesting companion read. Viewing VI through the lens of OI seems to fill in some gaps and perhaps answer some of the "why" questions that Hoffman brings out about the instructive optical illusions found throughout the book. Yet VI also seems to really challenge OI's simplistic reduction and view of the seemingly infinite flexibility of the neocortex in ways I'll highlight more below. I'll apologize to Mr. Hoffman for not reviewing Visual Intelligence completely on its own, but the OI connections and contrasts seem worthwhile.

If it's not already obvious, Visual Intelligence, subtitled "How we Create What We See", tackles the question of how human vision works. Hoffman's style of writing is easy going and flattering to the reader. On the surface, he is continually pointing out the amazing ability of your brain to construct mental representations of the world using incredibly sparse and even ambiguous information in a surprisingly wide variety of ways as yet unachieved using computers to date. At every turn, though, he's seeking to lay down clear, mechanistic rules, using visual examples you can play with and simple explanations of the hypotheses behind them. He could take a conventional approach of challenging your assumptions, but instead he uses this crafty, flattering method in the hope that you'll reach the same conclusions he and others have about various observations that might otherwise be hard to believe, if you couldn't see them for yourself. I give Hoffman credit for this technique; I'm not very good at it, myself.

I'm going to spoil the plot of this murder mystery by revealing who the killer is at the outset. Hoffman's basic assertion in Visual Intelligence is that the brain "constructs" its perception of the world, rather than simply observing the world. Disappointingly, he dabbles a bit in the end of VI in relativistic epistemological views of the world that seem mostly a waste of paper. You're better off starting with the assumption that Hoffman acknowledges that the world you perceive is the real world and ignore the classic philosophically skeptical divorce he plants between what others might call "things as we perceive them" and "things as they really are". The thinly veiled "brain in a vat" stuff in the end of the book distracts, however, from the clean, mechanistic view the rest of the book takes of vision and of the crafty ways researchers have been using for centuries to tease those mechanisms out of the brain's obscurity.

The compelling euphemism Hoffman uses throughout Visual Intelligence is of you, the "creative genius". He makes and continually introduces support for the very strong assertion that everything you perceive is really your own mind's creation. Most of the illustrations he includes are very simplistic. They're stripped of the enormous richness you expect from the natural world you see daily. They are simplified not to save ink, but to boil down what's going on to its bare essentials. Following are some modified examples of illustrations found in VI:

Some illustrations from Visual Intelligence featuring squares and cubes

I don't want to spend too much time explaining what each of these is meant to illustrate, but let me give an example to give a flavor of what the book explains far better than I can. Following is a modified quote related to the "flat wheel" and "3D cube" illustration above:

    Here's why it's hard to see [the "wheel"] as a cube. Note that it has three lines running through its middle, joining the six vertices. According to [one rule], each line must be interpreted as a straight line, without any corners, in space. This precludes interpreting the figure as a cube.

    If we change our view of this [wheel-looking] cube ever so slightly [like in the 3D cube one], then we obtain a generic view and find it easy, once again, to see a cube.

Throughout Visual Intelligence, Hoffman presents, explains, and accumulates a list of rules that he considers pretty solidly agreed upon by the community that's been seriously studying vision for the past few centuries. Without elaboration, let me list them here to summarize:

  • Rule 1. Always interpret a straight line in an image as a straight line in 3D.
  • Rule 2. If the tips of two lines coincide in an image, then always interpret them as coinciding in 3D.
  • Rule 3. Always interpret lines collinear in an image as collinear in 3D.
  • Rule 4. Interpret elements nearby in an image as nearby in 3D.
  • Rule 5. Always interpret a curve that is smooth in an image as smooth in 3D.
  • Rule 6. Where possible, interpret a curve in an image as the rim of a surface in 3D.
  • Rule 7. Where possible, interpret a T-junction in an image as a point where the full rim conceals itself: the cap conceals the stem.
  • Rule 8. Interpret each convex point on a bound as a convex point on a rim.
  • Rule 9. Interpret each concave point on a bound as a saddle point on a rim.
  • Rule 10. Construct surfaces in 3D that are as smooth as possible.
  • Rule 11. Construct subjective figures that occlude only if there are convex cusps.
  • Rule 12. If two visual structures have a non-accidental relation, group them and assign them to a common origin.
  • Rule 13. If three or more curves intersect at a common point in an image, interpret them as intersecting at a common point in space.
  • Rule 14. Rule of concave creases: Divide shapes into parts along concave creases.
  • Rule 15. Minima rule: Divide shapes into parts at negative minima, along lines of curvature, of the principal curvatures.
  • Rule 16. Minima rule for silhouettes: Divide silhouettes into parts at concave cusps and negative minima of curvature.
  • Rule 17. The salience of a cusp boundary increases with increasing sharpness of the angle at the cusp.
  • Rule 18. The salience of a smooth boundary increases with the magnitude of (normalized) curvature at the boundary.
  • Rule 19. Salient boundaries: Choose figure and ground so that figure has the more salient part boundaries.
  • Rule 20. Salient parts: Choose figure and ground so that figure has the more salient parts.
  • Rule 21. Interpret gradual changes of hue, saturation, and brightness in an image as changes in illumination.
  • Rule 22. Interpret abrupt changes of hue, saturation, and brightness in an image as changes in surfaces.
  • Rule 23. Construct as few light sources as possible.
  • Rule 24. Put light sources overhead.
  • Rule 25. Filters don't invert lightness.
  • Rule 26. Filters decrease lightness differences.
  • Rule 27. Choose the fair pick that's most stable.
  • Rule 28. Interpret the highest luminance in the visual field as white, flourent, or self-luminous.
  • Rule 29. Create the simplest possible motions.
  • Rule 30. When making motion, construct as few objects as possible, and conserve them as much as possible.
  • Rule 31. Construct motion to be as uniform over space as possible.
  • Rule 32. Construct the smoothest velocity field.
  • Rule 33. If possible, and if other rules permit, interpret image motions as projections of rigid motions in three dimensions.
  • Rule 34. If possible, and if other rules permit, interpret image motions as projections of 3D motions that are rigid and planar.
  • Rule 35. Light sources move slowly.

Much as I would love to explain all the out-of-context terms like "filters", "salient boundaries", and "subjective figures", I suppose that wouldn't be fair to Hoffman and would make this a very long review, indeed. Suffice to say that it should be apparent that these rules cover a pretty wide swath of topics, including edges, object segmenting, light and color, translucency, and even motion. This last part (motion) is interesting because it stands in contrast to Jeff Hawkins' assertion, in On Intelligence, that most AI researchers don't account for time in their models and thinking. He may be right to some degree, but Hoffman cites examples of research into perception of motion going back to the nineteenth century and shows that time is alive and well to this day in at least some quarters.

I'd like to stay with the comparison of Hawkins' assertions in On Intelligence with the ones here in Visual Intelligence for another point. It's very clear that Hoffman's view is that most all humans are endowed with the same bag of tricks that underlie the above rules and many more waiting to be discovered. Hawkins, by contrast, seems to strongly assert that the brain -- the neocortex, more precisely -- doesn't come prepackaged with such tricks. Instead, it starts out pretty much a blank slate, with each part looking for predictable patterns in input. We are able to point to roughly the same places on the brain where certain functions, like speech or your thumb's touch sense, can be found in most people. Hawkins would attribute this to how "wiring" from outside the cortex, including that from the various sense organs, is hooked into the cortex in the same basic ways for most people. Beyond that expected coincidence, the rest of what goes on beyond the hook-ups is, to Hawkins, learned. In stark contrast, Hoffman shows no timidity in claiming that the optical illusions he sees and demonstrates in the book are exactly how your brain will most likely see them, too, clearly implying that we use the same mechanisms of perception, and so must be predisposed to having them. One would be hard pressed to conclude from Hoffman's esoteric examples that we all just happen upon the same incredible mechanisms by accident. So in this context, Hawkins and Hoffman stand at opposite poles on the question of the role of learning versus inborn skills, at least when it comes to how we directly perceive the phenomenal world.

But don't be to quick to conclude that they are at opposite poles on all things. I was delighted to apply Hawkins' memory-prediction framework to each of the rules and visual puzzles Hoffman put forth and found a great meshing between the two. Most of the rules listed above are really about constraints to interpretation of information. Consider rule 1, for example: "Always interpret a straight line in an image as a straight line in 3D". Looking at a straight line on a printed page, you could say it actually represents a circle that's turned 90° into the page, so all you're seeing is the edge of it. Or you could interpret it as two separate lines that happen to overlap from your current perspective. Yet Hoffman asserts that your brain will most likely choose as the most likely interpretation an assumption that what looks like a straight line actually is a straight line. In fact, under the right circumstances, you should even be able to interpret an interrupted straight line as a continuous one that's obscured by some other object, even though there's no direct evidence to support it. Consider the following illustration of this:

Broken lines partly obscured by an opaque object.

Ever since I read about the memory-prediction framework, I've been thinking in these terms about how to deal with "implied" information like the hidden portions of the lines in the figure above. Hoffman would call this a "subjective" line, to contrast the fact that a typical man-made device would not make the same leap that your brain does to conclude that the lines that pass "behind" the obstacle are actual continuous, so it must be your own subjective interpretation of the information given and the "prediction" that the missing piece is really back there, somewhere. The classic example I keep pondering is how, in my bubble vision experiments, my bubbles "leak" out of one well-defined surface into others, even at times when they find a small but sufficiently blurry edge. It frustrates me, of course, because I "know" the edge is there, even though it's blurry. The memory-prediction model and the subjective-edge concept seem to go hand in hand in explaining why my own brain's "bubbles" don't suffer the "leak" problem that the bubbles I created in code do.

One of the topics that Visual Intelligence addresses that was eye-opening for me is in the area of "dividing shapes". I've taken for granted for a long time that the brain subdivides one's visual field into smaller and smaller parts and uses these basic parts to help describe and to identify what one sees. I like to use the word "segmentation" to identify this concept. I found Hoffman's explanation of how he and many others interpret how the brain does this to be so incredibly, simply mechanistic that it's very easy to imagine it's true. If segmentation occurs, the natural question is, "by what rules does the brain subdivide the objects it sees into segments?" That is, where does it draw the lines between neighboring segments? What surprised me is that Hoffman is again able to deal with such a complicated question by reference even to very simple line drawings like the following:

Segmentation of an object up into smaller parts.

To Hoffman's thinking, the places you see dotted lines in the figure above are places where, if all you saw was the curved rim, you might likely divide up the solid object implied by the rim into smaller pieces. These lines all at least begin at one sharply concave dent in the rim. And while you will favor connecting such dents, you'll also favor shortest-line divisions from one dent to the other side of the assumed object over seeking out opposing dents that are farther away or cut less "deeply". What's usually left, then, are segments that are composed entirely of convex (outward) curves and corners and no prominent dents. Put another way, I'd say what's left can be approximated using nothing more than deformed circles. The result in the lower illustration above is still strikingly similar to the object it approximates above.

The sort of descriptive power that can come from simplified models composed of deformed circles and the like are exactly what I was trying to get at with the bubble concept. In order to emphasize the importance of segmentation of this sort in the way we think and speak about the world, Hoffman considers the human face. Here's a brief excerpt:

    What's striking ... is that parts [of a face] picked out by the minima rule are ones that you name with single words [like "forehead", "nose", "lips", and "chin"]. To name regions of a shape other than these parts, you often require more complex descriptions [such as "the lower half of the forehead and the upper half of the nose"].

I could write at much greater length about all the interesting things Hoffman talks about in Visual Intelligence but you're much better off just reading it for yourself. I'd recommend it to anyone interested in the subject of machine vision. The idea that you will perceive all sorts of phantoms in the images you see may seem irrelevant or even discouraging to the programmer. but I believe they only serve to tease out valuable rules like the ones listed above, which Hoffman explains superbly.

Visual Intelligence; How We Create What We See, by Donald D. Hoffman, was first copyrighted in 1998 and is currently published by W. W. Norton and Company, Inc. I bought my soft-cover copy from a local Barnes and Noble for about $19.00.


Popular posts from this blog

Neural network in C# with multicore parallelization / MNIST digits demo

Discovering English syntax

Virtual lexicon vs Brown corpus