There’s a great activity at the beginning of Workshop Statistics where kids write a couple of sentences about why they’re taking the course, and then construct the distribution of word lengths. The main point is to ask, “are all word lengths the same?” Answer: no, duh. Right: they vary. It’s not that an individual word changes its length, but that the idea word length varies from word to word. So it’s a variable, in a way that’s a little different from the variables they’re used to from algebra.
But what does the distribution look like? Rather than look it up, I found Hamlet’s “to be or not to be” soliloquy online, pasted it into my favorite text processor (TextMate) and did a bunch of global substitutions so that every word was on its own line. (I also stripped out hyphens and apostrophes and other punctuation, which may not always be appropriate, but never mind. But I think of ’tis as a three-, not a four-letter word.) Then a quick dump into a Fathom collection, and a new attribute (or variable) with a formula like stringLength(WORDS) and you’re all set. This process takes enough fluency that it’s an inappropriate activity for the kids in my class, at least, but the results are interesting enough to share, as in the illustration at right.