Conceptual Chunking in Perl

<h1>Conceptual Chunking in Perl</h1>

Larry Wall wrote:
<blockquote>
... Typically, Perl programmers feel that they
can write code that is <i>more</i> readable in Perl than in other languages,
because they have freedom of choice in how to "chunk" the problem (to
borrow a term from cognitive science). 
</blockquote>

ozan s. yigit wrote:
<blockquote>
you have posted on this topic extensively, and i think there is much
to discuss in there. but i find this little bit bewildering: what on
earth are you talking about? does [eg] python offer less "freedom of
choice" to "chunk" a problem? how so?
</blockquote>

<p>
  Okay, people have asked for such examples before, and I haven't responded,
  because the idea of chunking is pervasive in Perl and I couldn't figure
  out where to start.  :-)
</p>

<p>
  But here goes...
</p>

<p>
  Psychologically speaking, "chunking" is the ability to reduce the
  complexity of a problem by making foreground/background or
  inside/outside distinctions and concentrate on one or the other.  As
  such, the main enabler is the ability to define and recognize boundaries
  between foreground and background, between inside and outside.
  Classically, languages provide relatively few ways to make boundaries,
  ranging from the highly abstract object, down through modules and
  functions, clear down to loop abstractions, formatting conventions,
  statement delimitation, parenthesizing and quoting.
</p>

<p>
  Perl is roughly equivalent to Python on more abstract levels, with some
  differences.  Perl provides closures, while Python goes more deeply into
  some of the metaclass stuff (both of which I think are benign but
  relatively useless to mere mortals).  I think the Perl module mechanism
  is a little more flexible than Python's--it's sufficiently general that
  I also use it for the pragma mechanism, because the semantics of
  importation are under the module's control, and the normal importation
  is merely a matter of reusing the standard export implementation.  The
  user has a lot of flexibility in deciding which parts of a module's
  definitions should be defined how (in C or Perl) and when (immediately
  or lazily).  There's flexibility in choosing between lexical and dynamic
  scoping.  There's flexibility in choosing early or late binding.  You
  can change inheritance on the fly, if you like.  You can use objects
  where they make sense, and avoid them where they don't.  All of this
  affects how you decompose your problem, and that in turn gives you
  flexibility in chunking.
</p>

<p>
  On a less abstract level, Perl lets you choose the psychological boundaries
  of loops, for instance.  You can name a loop according to what it is
  processing.  A name is a high-powered way of hiding an abstraction,
  mentally speaking:
</p>

<pre>
  LINE: 
    while (&lt;&gt;) {
      next LINE if /^#/;  # Discard comments.
      print;
    }
</pre>

<p>
  In my mind, I can now pigeonhole that as the LINE loop, and reduce it to
  a single little lump of cybercrud, even if the loop is 582 lines long.
</p>

<p>
  Alternately, you can go with a more customary loop, which gives a
  different psychological "feel":
</p>

<pre>
    while (&lt;&gt;) {
      next if /^#/;   # Discard comments.
      print;
    }
</pre>

<p>
  Since it's an anonymous loop, I now rely psychologically more on how it
  looks on the screen visually.  It has an easily seen beginning and end.
  Things don't just "peter out" as they do in languages that use
  indentation as syntax.  (Editorial opinion:  the indentation scheme of
  Python is okay in small examples, but doesn't scale very well.  It
  rapidly breaks down, visually and psychologically, as soon as you get
  any construct larger than a screen.  It's all very well to argue,
  as some have argued, that you should never write a construct larger than
  a screen in Python, but then I'll respond that my point about
  flexibility in chunking is thereby proven.  What if the user <i>wants</i> a
  chunk that is larger than the screen?  Dangling, open-ended syntax is
  pretty useless at the discourse level.  I'll go with Aristotle on
  that one.)
</p>

<p>
  You can reduce a loop to one line to reduce its "significance" even further:
</p>

<pre>
    while (&lt;&gt;) { print unless /^#/ }
</pre>

<p>
  You can even pretend there isn't a loop there:
</p>

<pre>
    print grep !/^#/, &lt;&gt;;
</pre>

<p>
  You can delegate the loop to someone else:
</p>

<pre>
    print `grep -v '^#'`;
</pre>

<p>
  Well, that's probably enough about "while" loops, though we could
  certainly go into the psychological difference between "while" loops,
  C-style "for" loops and "foreach" loops.  Linguistically, a foreach loop
  is functioning as a topicalizer for the interior of the loop.
</p>

<pre>
    foreach $line (@lines) {
        print;
    }
</pre>

<p>
  For mental flexibility, Perl gives you an anonymous form:
</p>

<pre>
    foreach (@line) { print }
</pre>

<p>
  Since "for" is a synonym for "foreach" in Perl, you sometimes even see it
  used strictly as a topicalizer for a single value!
</p>

<pre>
    for ($slurped_file) {
        s/5/6/g;
        s/4/5/g;
        s/3/4/g;
        s/2/3/g;
        s/1/2/g;
        tr [abc] [xyz];
        print;
    }
</pre>

<p>
  Moving on down the abstraction level, there is psychological value in
  having a single way to delimit statements, and making all whitespace
  equivalent.  This gives the user freedom in how to line things up
  vertically within a statement to enhance readability.
</p>

<p>
  The notion of statement modifiers allows people to relegate unwanted
  psychological facts to the right side of the screen where they can
  be ignored.
</p>

<p>
  Within statements, the whole notion of context in Perl is built around
  the concept that various operations are semantically "governed" by their
  surroundings.  The choice of whether to parenthesize says a lot about
  how the programmer thinks of it.  If the programmer wants to use
  the rest of the line as the scope, so to speak, you might see
</p>

<pre>
    return print reverse sort bynum values %hash;
</pre>

<p>
  Someone who doesn't like line scopes might write something more like
</p>

<pre>
    return print(reverse(sort bynum values(%hash)));
</pre>

<p>
  Again, this is psychological flexibility.  Another person will choose
  the (presumably) equivalent
</p>

<pre>
    return print sort {$b <=> $a} values(%hash);
</pre>

<p>
  To this person, the sort subroutine isn't even a subroutine.
</p>

<p>
  Interpolative contexts are important in Perl.  List operators do
  automatic list interpolation on their arguments.  Double-quoted strings
  (and related contexts) provide a very convenient chunking mechanism for
  hiding a lot of concatenation.  Variables in this context look just the
  same as they do in the rest of Perl--that's one reason I put $ and @ on
  variables in the first place.  (The other is that noun markers like $
  and @ allow quick visual figure/ground distinctions, enhancing
  readability.  A Perl variable is also a kind of "chunk".)
</p>

<p>
  One could also write reams about the different ways to write a pattern
  match in Perl.  What other languages let you break up your regular
  expression chunks with both horizontal and vertical whitespace, and even
  comment each chunk, if you so desire?  Or you can do as is traditionally
  done and visually encapsulate the whole unspeakable mess on a single
  line.
</p>

<p>
  Finally, quote delimiters.  Forcing people to use just a few quote
  characters forces a lot of noise into a lot of programming languages.
  Many UNIX languages suffer from backslashitis and leaning-toothpick
  syndrome.  Letting people pick their quote characters makes things a
  little harder for emacs, to be sure, but lets people encapsulate things
  visually the way they may be used to.  Why force someone to say
</p>

<pre>
    tr("abcdef\"", "ABCDEF'");
</pre>

<p>
  when
</p>

<pre>
    tr [abcdef"] [ABCDEF'];
</pre>

<p>
  is clearer, or even
</p>

<pre>
    tr [abcdef"]
       [ABCDEF'];
</pre>

<p> 
  And note how this interplays well with the free statement formatting.
</p>

<p>
  On multi-line quotes, why force someone to use triple quote (ugh)?
  Why not make it easier for the person and harder for the computer,
  and let the user pick the trailing delimiter?  At least the shell's
  got this right.
</p>

<p>
  Here's a convenient mental trick.  If I know that the text I'm dealing
  with contains no blank lines, I often use a blank line as my
  final delimiter.  So instead of saying
</p>

<pre>
    print <<"END";
    $UM
    K00l
    $TUPH
    END
</pre>

<p>
  I just say my delimiter is nothing
</p>

<pre>
    print <<"";
    $UM
    K00l
    $TUPH

</pre>

<p>
  and make sure the next line is blank.  It works very well as a form of
  visual chunking.  Python folks in particular should appreciate the idea
  of using the absence of something as the final delimiter.
</p>