Text, Subtext, and Claude Shannon

Early on after I started writing again, I joined an online critique group, Critters. I got a lot of good feedback from it, but only one critique was so good that I still remember it more than a year later. It was from a pro writer who correctly pointed out that the story wasn’t really *about* anything. There was a plot, to be sure (a complicated one about a murder on a new colony world) but he was right — it was all text and no subtext. A bunch of stuff happened, and one thing led to another, but it just happened. Nothing beyond internal logic drove the plot. It was basically a Seinfeld episode, not that there’s… wait, no, there is something wrong with that. If someone were to have asked me what the story was about, I could have given a superficial description (and did in parentheses above) but I would have had a frustrating trade-off between being brief and being accurate.

What do you want to respond when someone asks what your story’s about? That is, if you’re asked to compress your story into a short bite. You want to feel like you’ve gotten across what’s important, without basically retelling the whole story, right?

In information theory, there’s a concept called entropy. It’s a measure, among other things, of how much information is in a piece of text. When using a program like Zip to compress something without loss, the entropy of that file determines just how small it can get. Let’s say you have two pieces of text that you want to reproduce exactly from a description of limited length. The first one consists of the letter a, repeated 4,000 times. The second one is the result of me banging on my keyboard, typing out a random string of 2,000 letters.

The latter text has very high entropy: each new letter is a surprise unrelated to the letters that came before it, so the only way to exactly reproduce it is to type it out in full. Cost: 2,000 letters. The former text has very low entropy: even if you’d never seen it before, you’d pick up on the pattern pretty fast. In order to reproduce it exactly, my description above tells you exactly how to reproduce it: “The letter a, repeated 4,000 times.” Cost: ~35 letters. (or even, “4000 a’s”: 8 ) In the parlance of the field, the former text could be said to have less information than the latter text.

For any given real text, the best a machine is able to do is to examine it on the byte level and look for common sequences that it has in a library. This is for an exact copy, though, on a mechanical level. If all we care about is getting the gist of it, then neither of the above examples has any information as we usually understand it: it takes about as long to say “4,000 a’s” as “2,000 random letters”. Reproducing the original text exactly or not at all, we take the same meaning from the result as from the original… maybe. (I am cautious here because of having been chastised before that Jackson Pollock paintings are not just random paint drips, they are designed with a purpose and so no I cannot sell my dropcloth.)

Stories work in much the same way, I think. Have you ever heard a young child describe a story they heard? It tends to be just a sequence of events, all the stuff that happened (sometimes in no particular order). Sometimes the summary is nearly as long as the story itself — it is basically a retelling of the story, heavily filtered: “There was a little kid, and the kid went to a cave and there was a cat and the cat’s name was Magersfontein and they went to a cave and there was a spider in a big circle and they squashed the spider and that summoned a demon who ate them both!”

I’m going to go out on a limb and say that if the only way to describe your story is to tell it, then it’s not really about anything. It is high entropy. I’ll make fun of little kids again: when they tell stories, those stories tend to be random, pulling in all sorts of elements from their life and meandering as things occur to them. If you’ve heard a couple such stories you can get the gist of them pretty easily: “Susie told me another of her stories. This one had monsters and a goldfish, and, um. You need to hide your ‘toys’ better, dear.” You’ll never be able to reproduce the exact story, but you’ll be able to produce one that anyone would believe was told by Susie.

On the other end of the scale is the “high concept” story: “Snakes on a Plane,” for example. Or “The Seven Samurai in the wild west,” goes a long way toward describing “The Magnificent Seven”. Which is a good film, don’t get me wrong. But it’s pretty low entropy, you can get the gist of it in just a few words — if pressed, you could recreate a story based on that that would bear a passing resemblance to the ones on the screen. If there were surprises (the snakes turn out to be robots, the cowboys turn out to be the bad guys, whatever) than you need to add words to get the same level of understanding across.

One is frequently asked to distill a story down to a very short description: an elevator pitch, an agent query letter, a synopsis, or just a friend asking, “I heard that was good, what’s it about?”. The shorter the distillation, the more it needs to get to the core of the story. Here’s an example:

“Transported to a surreal landscape, a young girl kills the first person she meets and then teams up with three strangers to kill again”

This of course is Rick Polito’s famous summary of The Wizard of Oz. (John Scalzi put together a whole list of that sort of thing over at his film column, they’re awesome) This is a technically true description of the text, but at the same time, it’s not really the same story, is it? If you try to reproduce the story just from that description, the result will bear only a passing resemblance to The Wizard of Oz. So what’s missing?

I would argue (he says, going further out on the limb) that the summary is missing 1) the main conflict: “Young girl believes she needs a wizard’s help to go home, but the wizard won’t help her until she kills a witch.” That gets you most of the way there, and if you add in the minor conflicts of the tin man, cowardly lion, and scarecrow wanting sex, drugs, and rock ‘n’ roll, you get a lot of the rest of the way there. And that’s the text there: a thin outline of the plot and a summary of the major conflict.

But that doesn’t get you all the way there, I think. If you write a story to that description, I don’t think it’ll feel the same as The Wizard of Oz movie. To get that, you need 2) the subtext: the two points of first the importance of friends and family (Dorothy starts her journey by abandoning her family, and ends it embracing her family by seeing the faces of her new friends in them), and second the somewhat opposed idea that you don’t need to look outside yourself for what you want (Starts with Dorothy seeking not-Kansas and coming upon a bunch of funny-looking short people who need an outsider to solve their problems; end with everyone getting lame symbols of their personal growth and Dorothy not shanking the wizard over the whole secret shoe thing). That’s what makes this story distinctive from others with the same plot. Even if the details are totally different, if your recreation includes those points, you’ll wind up with something that is recognizably The Wizard of Oz.

So what does that mean when writing a story? I think that without a subtext, the story gets hard to pin down. It’ll either compress too far (“Kinda like Finnegan’s Wake” vs “Snakes on a plane.”) and get lost in the crowd, or it won’t compress far enough (“This happened, then this happened, then…”) and nobody will remember the damn thing. I think that if a story doesn’t compress, it gets lost. It goes in one eye and, um, out the … wrong metaphor. It gets forgotten, is what I’m trying to say, it doesn’t make an impression. Even if the prose is fiendishly clever and bits of it are quoted everywhere, I think stories need to really compress to make an impact and need to be de-compressible to be remembered: that is, a story needs to be at its heart simple enough to internalize, and distinctive enough to be its own story and nothing else.

What do you think? Is subtext really that important? Am I missing the point of why a story needs it?

This post is approaching the same issue (achieving the “right” level of information content in a story) as your June 1 post (that used the moonshiner analogy). It appears on the surface to be about a different issue that the moonshiner post because the information encoding strategies used by the two posts are so different.

One such family of strategies uses the frequency of occurrence of symbols to either encode more frequent symbols with fewer bits (Huffman encoding) or encode longer patterns with the same number of bits (LZW compression used in GIF image files). Shared knowledge of the probability of any given symbol appearing at in a particular context allows for efficient encoding and decoding.

The moonshiner post is, to me, really about how the pre-existing knowledge of story tropes and human psychology, shared between storyteller and audience, implicitly forms a probability distribution over individual story elements that we use to encode and decode story information more efficiently. Cliches are encoded very efficiently by this scheme, so they contribute little information content (the head). Idiosyncratic brain-dumps are not encoded efficiently, so they contribute too much information content (the tail). The storyteller wants to aim for something between these extremes.

A different kind of encoding strategy decomposes the signal into a “predicted” or “estimated” part on one hand and a “residual” part on the other. The estimated part often captures the coarse-grained “essence” of the signal without requiring too much information; the residual part may have a lot of fine-grained detail, but it might be of such small magnitude that it’s okay to ignore it (this is what’s going on in JPEGs and MP3s). This is analogous to decomposing a story into “subtext” and “details”.

Okay, sorry that turned out as long as it did. I apologize if all of this is obvious, or obviously inapplicable.

3 thoughts on “Text, Subtext, and Claude Shannon”

jpfed says:

3 August, 2011 at 3:04 am

This post is approaching the same issue (achieving the “right” level of information content in a story) as your June 1 post (that used the moonshiner analogy). It appears on the surface to be about a different issue that the moonshiner post because the information encoding strategies used by the two posts are so different.

One such family of strategies uses the frequency of occurrence of symbols to either encode more frequent symbols with fewer bits (Huffman encoding) or encode longer patterns with the same number of bits (LZW compression used in GIF image files). Shared knowledge of the probability of any given symbol appearing at in a particular context allows for efficient encoding and decoding.

The moonshiner post is, to me, really about how the pre-existing knowledge of story tropes and human psychology, shared between storyteller and audience, implicitly forms a probability distribution over individual story elements that we use to encode and decode story information more efficiently. Cliches are encoded very efficiently by this scheme, so they contribute little information content (the head). Idiosyncratic brain-dumps are not encoded efficiently, so they contribute too much information content (the tail). The storyteller wants to aim for something between these extremes.

A different kind of encoding strategy decomposes the signal into a “predicted” or “estimated” part on one hand and a “residual” part on the other. The estimated part often captures the coarse-grained “essence” of the signal without requiring too much information; the residual part may have a lot of fine-grained detail, but it might be of such small magnitude that it’s okay to ignore it (this is what’s going on in JPEGs and MP3s). This is analogous to decomposing a story into “subtext” and “details”.

Okay, sorry that turned out as long as it did. I apologize if all of this is obvious, or obviously inapplicable.

1. John P. Murphy says:
  
  3 August, 2011 at 9:30 am
  
  Huh! I hadn’t thought of the moonshiner post like that, but you’re absolutely right — the two extremes I talked about there encode very differently.
  
  I wish I’d thought of the JPEG analogy myself, because your description reminded me of a writing technique that fits in very neatly with this discussion, called the snowflake method where one deliberately starts with a very coarse-grained description of a story, and then iteratively refines it.
  
  Thanks for stopping by, and for a fascinating comment!
  
Pingback: Greatest Hits « Murphy's Blog

John P. Murphy

Award-nominated author of science fiction and fantasy

Text, Subtext, and Claude Shannon

3 thoughts on “Text, Subtext, and Claude Shannon”

Leave a comment Cancel reply

Share this:

Related

3 thoughts on “Text, Subtext, and Claude Shannon”

Leave a comment Cancel reply