Conceptual Chunking in Perl

Larry Wall wrote:
... Typically, Perl programmers feel that they can write code that is more readable in Perl than in other languages, because they have freedom of choice in how to "chunk" the problem (to borrow a term from cognitive science).
ozan s. yigit wrote:
you have posted on this topic extensively, and i think there is much to discuss in there. but i find this little bit bewildering: what on earth are you talking about? does [eg] python offer less "freedom of choice" to "chunk" a problem? how so?

Okay, people have asked for such examples before, and I haven't responded, because the idea of chunking is pervasive in Perl and I couldn't figure out where to start. :-)

But here goes...

Psychologically speaking, "chunking" is the ability to reduce the complexity of a problem by making foreground/background or inside/outside distinctions and concentrate on one or the other. As such, the main enabler is the ability to define and recognize boundaries between foreground and background, between inside and outside. Classically, languages provide relatively few ways to make boundaries, ranging from the highly abstract object, down through modules and functions, clear down to loop abstractions, formatting conventions, statement delimitation, parenthesizing and quoting.

Perl is roughly equivalent to Python on more abstract levels, with some differences. Perl provides closures, while Python goes more deeply into some of the metaclass stuff (both of which I think are benign but relatively useless to mere mortals). I think the Perl module mechanism is a little more flexible than Python's--it's sufficiently general that I also use it for the pragma mechanism, because the semantics of importation are under the module's control, and the normal importation is merely a matter of reusing the standard export implementation. The user has a lot of flexibility in deciding which parts of a module's definitions should be defined how (in C or Perl) and when (immediately or lazily). There's flexibility in choosing between lexical and dynamic scoping. There's flexibility in choosing early or late binding. You can change inheritance on the fly, if you like. You can use objects where they make sense, and avoid them where they don't. All of this affects how you decompose your problem, and that in turn gives you flexibility in chunking.

On a less abstract level, Perl lets you choose the psychological boundaries of loops, for instance. You can name a loop according to what it is processing. A name is a high-powered way of hiding an abstraction, mentally speaking:

    while (<>) {
      next LINE if /^#/;  # Discard comments.

In my mind, I can now pigeonhole that as the LINE loop, and reduce it to a single little lump of cybercrud, even if the loop is 582 lines long.

Alternately, you can go with a more customary loop, which gives a different psychological "feel":

    while (<>) {
      next if /^#/;   # Discard comments.

Since it's an anonymous loop, I now rely psychologically more on how it looks on the screen visually. It has an easily seen beginning and end. Things don't just "peter out" as they do in languages that use indentation as syntax. (Editorial opinion: the indentation scheme of Python is okay in small examples, but doesn't scale very well. It rapidly breaks down, visually and psychologically, as soon as you get any construct larger than a screen. It's all very well to argue, as some have argued, that you should never write a construct larger than a screen in Python, but then I'll respond that my point about flexibility in chunking is thereby proven. What if the user wants a chunk that is larger than the screen? Dangling, open-ended syntax is pretty useless at the discourse level. I'll go with Aristotle on that one.)

You can reduce a loop to one line to reduce its "significance" even further:

    while (<>) { print unless /^#/ }

You can even pretend there isn't a loop there:

    print grep !/^#/, <>;

You can delegate the loop to someone else:

    print `grep -v '^#'`;

Well, that's probably enough about "while" loops, though we could certainly go into the psychological difference between "while" loops, C-style "for" loops and "foreach" loops. Linguistically, a foreach loop is functioning as a topicalizer for the interior of the loop.

    foreach $line (@lines) {

For mental flexibility, Perl gives you an anonymous form:

    foreach (@line) { print }

Since "for" is a synonym for "foreach" in Perl, you sometimes even see it used strictly as a topicalizer for a single value!

    for ($slurped_file) {
        tr [abc] [xyz];

Moving on down the abstraction level, there is psychological value in having a single way to delimit statements, and making all whitespace equivalent. This gives the user freedom in how to line things up vertically within a statement to enhance readability.

The notion of statement modifiers allows people to relegate unwanted psychological facts to the right side of the screen where they can be ignored.

Within statements, the whole notion of context in Perl is built around the concept that various operations are semantically "governed" by their surroundings. The choice of whether to parenthesize says a lot about how the programmer thinks of it. If the programmer wants to use the rest of the line as the scope, so to speak, you might see

    return print reverse sort bynum values %hash;

Someone who doesn't like line scopes might write something more like

    return print(reverse(sort bynum values(%hash)));

Again, this is psychological flexibility. Another person will choose the (presumably) equivalent

    return print sort {$b <=> $a} values(%hash);

To this person, the sort subroutine isn't even a subroutine.

Interpolative contexts are important in Perl. List operators do automatic list interpolation on their arguments. Double-quoted strings (and related contexts) provide a very convenient chunking mechanism for hiding a lot of concatenation. Variables in this context look just the same as they do in the rest of Perl--that's one reason I put $ and @ on variables in the first place. (The other is that noun markers like $ and @ allow quick visual figure/ground distinctions, enhancing readability. A Perl variable is also a kind of "chunk".)

One could also write reams about the different ways to write a pattern match in Perl. What other languages let you break up your regular expression chunks with both horizontal and vertical whitespace, and even comment each chunk, if you so desire? Or you can do as is traditionally done and visually encapsulate the whole unspeakable mess on a single line.

Finally, quote delimiters. Forcing people to use just a few quote characters forces a lot of noise into a lot of programming languages. Many UNIX languages suffer from backslashitis and leaning-toothpick syndrome. Letting people pick their quote characters makes things a little harder for emacs, to be sure, but lets people encapsulate things visually the way they may be used to. Why force someone to say

    tr("abcdef\"", "ABCDEF'");


    tr [abcdef"] [ABCDEF'];

is clearer, or even

    tr [abcdef"]

And note how this interplays well with the free statement formatting.

On multi-line quotes, why force someone to use triple quote (ugh)? Why not make it easier for the person and harder for the computer, and let the user pick the trailing delimiter? At least the shell's got this right.

Here's a convenient mental trick. If I know that the text I'm dealing with contains no blank lines, I often use a blank line as my final delimiter. So instead of saying

    print <<"END";

I just say my delimiter is nothing

    print <<"";

and make sure the next line is blank. It works very well as a form of visual chunking. Python folks in particular should appreciate the idea of using the absence of something as the final delimiter.