A standardized test of perceptual capability

[Audio Version]

I've been getting too lost in the idiosyncrasies of machine vision of late and missing my more important focus on intelligence, per se. I'm changing direction, now.

My recent experiences have shown me that one thing we haven't really done well is in the area of perceptual level intelligence. We have great sensors and cool algorithms for generating interesting but primitive information about the world. Edge detection, for example, can be used to generate a series of lines in a visual scene. But so what? Lines are just about as disconnected from intelligence as the raw pixel colors are.

Where do primitive visual features become percepts? Naturally, we have plenty of systems designed to instantly translate visual (or other sensory) information into known percepts. Put little red dots around a room, for instance, and a visual system can easily cue in on them as being key markers for a controlled-environment system. This is the sort of thinking that is used in vision-based quality control systems, too.

But what we don't have yet is a way for a machine to learn to recognize new percepts and learn to characterize and predict their behavior. I've been spending many years thinking about this problem. While I can't say I have a complete answer yet, I do have some ideas. I want to try them out. Recently, while thinking about the problem, I formulated an interesting way to test a perceptual-level machine's ability to learn and make predictions. I think it can be readily reproduced on many other systems and extended for ever more capable systems.

The test involves a very simplified, visual world composed of a black rectangular "planet" and populated by a white "ball". The ball, a small circle whose size never changes, moves around this 2D world in a variety of ways that, for the most part, are formulaic. One way, for example, might be thought of as a ball in a box in space. Another can be thought of as a ball in a box standing upright on Earth, meaning it bounces around in parabolic paths as though in the presence of a gravitational field. Other variants might involve random adjustments to velocity, just to make prediction more difficult.

The test "organism" would be able to see the whole of this world. It would have a "pointer". Its goal would be to move this pointer to wherever it believes the ball will be in the next moment. It would be able to tell where the pointer currently points using a direct sense separate from its vision.

Predicting where the ball will be in the future is a very interesting test of an organism's ability to learn to understand the nature of a percept. Measuring the competency of a test organism would be very easy, too. For each moment, there is a prediction, in the form of the pointer pointing to where it believes the ball will be in the next moment. When that moment comes, the distance between the predicted and actual positions of the ball is calculated. For any given set series of moments, the average distance would be the score of the organism in that context.

It would be easy for different researchers to compare their test organisms against others, but would require a little bit of care to put each test in a clear context. The context would be defined by a few variables. First is the ball behavior algorithm that is used. Each such behavior should be given a unique name and a formal description that can be easily implemented in code in just about any programming language. Second, the number of moments used to "warm up", which we'll call the "warm up period". That is, it should take a while for an organism to learn about the ball's behavior before it can be any good at making predictions. Third, the "test period"; i.e., the number of moments after the warm-up period is done in which test measurements are taken. The final score in this context, then, would be the average of all the distances measured between prediction and actual position.

There would be two standard figures that should be disclosed with any given test results. One is that the best possible score is 0, which means the predictions are always correct. The second is the best possible score for a "lazy" organism. In this case, a lazy organism is one that always guesses that the ball will be in the same place in the next moment that it is now. Naturally, a naive organism would do worse than this cheap approximation, but a competent organism should do better. The "lazy score" for a specific test run would be calculated as the average of all distances from each moment's ball position to its next moment's position. A weighted score for the organism could then be calculated as a ratio of actual score to lazy score. A value of zero would be the best possible. A value of one would indicate that the predictions are no better than the lazy score. A value greater than one would indicate that the predictions are actually worse than the lazy algorithm.

Some might quip that I'm just proposing a "blocks world" type experiment and that an "organism" competent to play this game wouldn't have to be very smart. I disagree. Yes, a programmer could preprogram an organism with all the knowledge it needs to solve the problem and even get a perfect score. A proper disclosure of the algorithm used would let fellow researchers quickly disqualify such trickery. So would testing that single program against novel ball behaviors. What's more, I think a sincere attempt to develop organisms that can solve this sort of problem in a generalizable way will result in algorithms that can be generalized to more sophisticated problems like vision in natural settings.

Naturally, this test can also be extended in sophistication. Perhaps there could be a series of levels defined for the test. This might be Level I. Level II might involve multiple balls of different colors. And so on.

I probably will draft a formal specification for this test soon. I welcome input from others interested in the idea.


Popular posts from this blog

Neural network in C# with multicore parallelization / MNIST digits demo

Discovering English syntax

Virtual lexicon vs Brown corpus