On the way to lunch today, Bob asked where the word ``cantilever'' comes from[*]. Jerry and I didn't know. So, we were trying to think of other ``canti-'' words to compare. The only one we could come up with is ``canticle'' which didn't seem related.
So, we backed off to looking for ``cant-'' words. We came up with:
cant, cantaloupe, cantankerous, canteen, cantina, cantor[**]
This seemed like a pretty slender set. But, it got me wondering, there are a few really dominant four-letter prefixen like ``over-'' and ``unde-'' and ``anti-'' and ``semi-''. I wondered how the other four-letter starts to words lined up.
A few quick tests with /usr/share/dict/words under Mac OS X shows that there are 18244 different combinations of the opening four-letters of non-capitalized words of at least four letters. This is out of the 456976 possible four-letter combinations.
The /usr/share/dict/words has some things that may be considered repeats like ``cantankerous'' and ``cantankerously''. Similarly, it only has one entry per homograph. But, if you're not too picky about dimpled chads, it will give a pretty decent idea of the big picture.
According to /usr/share/dict/words there are 78 non-capitalized words which begin with ``cant-''. By comparison to the other 18243 four-letter combinations that start words, this is very considerable. Starting 78 words puts it in the 97-th percentile. A full 6142 (more than 1/3) of the 18244 four-letter openers only opened one word.
The top 10 four-letter openers accounted for almost 5% of the non-captilized words of at least four letter in /usr/share/dict/words.[***].
I wonder if other languages have similar histograms.
[*] Webster says that ``cant'' refers to an external angle and ``lever'' refers to a roof support.
[**] look(1) on Mac OS X says:
cant, cantabank, cantabile, cantala, cantalite, cantaloupe, cantankerous, cantankerously, cantankerousness, cantar, cantara, cantaro, cantata, cantation, cantative, cantatory, cantboard, canted, canteen, cantefable, canter, canterer, canthal, cantharidal, cantharidate, cantharides, cantharidian, cantharidin, cantharidism, cantharidize, cantharis, cantharophilous, cantharus, canthectomy, canthitis, cantholysis, canthoplasty, canthorrhaphy, canthotomy, canthus, cantic, canticle, cantico, cantilena, cantilene, cantilever, cantilevered, cantillate, cantillation, cantily, cantina, cantiness, canting, cantingly, cantingness, cantion, cantish, cantle, cantlet, canto, canton, cantonal, cantonalism, cantoned, cantoner, cantonment, cantoon, cantor, cantoral, cantoris, cantorous, cantorship, cantred, cantref, cantrip, cantus, cantwise, canty
[***] The top ten openers were:
2043 over 1334 unde 1323 inte 1078 anti 1000 supe 951 semi 731 unco 700 poly 648 para 618 peri