Patrick (patrickwonders) wrote,
Patrick
patrickwonders

  • Mood:

Prefixen frequencies

On the way to lunch today, Bob asked where the word ``cantilever'' comes from[*]. Jerry and I didn't know. So, we were trying to think of other ``canti-'' words to compare. The only one we could come up with is ``canticle'' which didn't seem related.

So, we backed off to looking for ``cant-'' words. We came up with:

  • cant, cantaloupe, cantankerous, canteen, cantina, cantor[**]

This seemed like a pretty slender set. But, it got me wondering, there are a few really dominant four-letter prefixen like ``over-'' and ``unde-'' and ``anti-'' and ``semi-''. I wondered how the other four-letter starts to words lined up.

A few quick tests with /usr/share/dict/words under Mac OS X shows that there are 18244 different combinations of the opening four-letters of non-capitalized words of at least four letters. This is out of the 456976 possible four-letter combinations.

The /usr/share/dict/words has some things that may be considered repeats like ``cantankerous'' and ``cantankerously''. Similarly, it only has one entry per homograph. But, if you're not too picky about dimpled chads, it will give a pretty decent idea of the big picture.

According to /usr/share/dict/words there are 78 non-capitalized words which begin with ``cant-''. By comparison to the other 18243 four-letter combinations that start words, this is very considerable. Starting 78 words puts it in the 97-th percentile. A full 6142 (more than 1/3) of the 18244 four-letter openers only opened one word.

The top 10 four-letter openers accounted for almost 5% of the non-captilized words of at least four letter in /usr/share/dict/words.[***].

I wonder if other languages have similar histograms.

[*] Webster says that ``cant'' refers to an external angle and ``lever'' refers to a roof support.

[**] look(1) on Mac OS X says:

  • cant, cantabank, cantabile, cantala, cantalite, cantaloupe, cantankerous, cantankerously, cantankerousness, cantar, cantara, cantaro, cantata, cantation, cantative, cantatory, cantboard, canted, canteen, cantefable, canter, canterer, canthal, cantharidal, cantharidate, cantharides, cantharidian, cantharidin, cantharidism, cantharidize, cantharis, cantharophilous, cantharus, canthectomy, canthitis, cantholysis, canthoplasty, canthorrhaphy, canthotomy, canthus, cantic, canticle, cantico, cantilena, cantilene, cantilever, cantilevered, cantillate, cantillation, cantily, cantina, cantiness, canting, cantingly, cantingness, cantion, cantish, cantle, cantlet, canto, canton, cantonal, cantonalism, cantoned, cantoner, cantonment, cantoon, cantor, cantoral, cantoris, cantorous, cantorship, cantred, cantref, cantrip, cantus, cantwise, canty

[***] The top ten openers were:

 2043 over
 1334 unde
 1323 inte
 1078 anti
 1000 supe
  951 semi
  731 unco
  700 poly
  648 para
  618 peri
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 4 comments