Discovering English syntax
I've started a new project. My goal is to write a program that discovers enough of the syntax rules of written English to serve as a passable lexical tokenizer. I've made some progress in my approach thus far. But I can tell that my approach requires some serious rethinking. I'll describe the experimental design here and comment on my current progress. If you wish to see the code I'm actively experimenting with you can find it on GitHub . English syntax Anyone familiar with programming language will recognize that there is a process involved in translating human-readable code into the more cryptic representation used by a computer to execute that code. And that there is a very precise syntax governing what your code in that language must look like to be considered syntactically valid. In the JavaScript statement "var x = someFunction(y * 12);" you'll realize that there are well-defined roles for each part. The "var" keyword indicates that the &qu