Virtual lexicon vs Brown corpus
Having completed my blocker , I decided to take a break before tackling syntax analysis to study more facets of English. But also, I realized I should beef up the lexicon underlying my virtual lexicon (VL). I had only collected about 1,500 words, and most of those I had simply hand-entered by way of theft from the CGEL's chapters on morphology; mostly compound words, at that. It was enough to test and demonstrate the VL's capacity to deal half-decently with morphological parsing, but nowhere near big enough to represent the at least tens of thousands of words a typical high school graduate with English as their native language will know. A virtual lexicon's core premise is that being able to recognize novel word forms by recognizing the parts of the word is more valuable than having a large list of exacting word-forms. In essence, a relatively small number of lexical entries should be able to represent a much larger set of practical words found "in the wild".