count-fold-lines: Emacs hack to fold duplicate lines and count them.

I just wrote a thing in Emacs that others might find useful:

(defun count-fold-lines (start end)
  "Replace lines from START to END with descending counts of unique lines.
It's easiest to just give an example to explain this.  These lines...

  foo
  foo
  bar
  foo
  baz
  bar
  foo

...would become this:

   4   foo
   2   bar
   1   baz

(modulo some left-padding differences that we don't care about)."
  (interactive "r")
  (sort-lines nil start end)
  (save-mark-and-excursion
    ;; Hah!  You thought I was going to implement this in Elisp, didn't you?
    (shell-command-on-region start end "uniq --count" nil t)
    (sort-lines t (point) (mark))))

Although the code’s documentation gives a short example, it might be clearer if I give a long example here.

Imagine that you’ve marked a region like this in an Emacs buffer:

This is one of the lines we might see; there are twelve of them.
Meanwhile, other lines look like this (there are five of them).
This is one of the lines we might see; there are twelve of them.
And still others are like this -- there are ten of these.
This is one of the lines we might see; there are twelve of them.
And still others are like this -- there are ten of these.
And still others are like this -- there are ten of these.
Some lines only appear once, so why did I even use the plural?
This is one of the lines we might see; there are twelve of them.
And still others are like this -- there are ten of these.
And still others are like this -- there are ten of these.
Meanwhile, other lines look like this (there are five of them).
There's one line that appears twice.
And still others are like this -- there are ten of these.
And still others are like this -- there are ten of these.
And still others are like this -- there are ten of these.
This is one of the lines we might see; there are twelve of them.
Meanwhile, other lines look like this (there are five of them).
Meanwhile, other lines look like this (there are five of them).
This is one of the lines we might see; there are twelve of them.
This is one of the lines we might see; there are twelve of them.
This is one of the lines we might see; there are twelve of them.
And still others are like this -- there are ten of these.
There's one line that appears twice.
This is one of the lines we might see; there are twelve of them.
Meanwhile, other lines look like this (there are five of them).
This is one of the lines we might see; there are twelve of them.
And still others are like this -- there are ten of these.
This is one of the lines we might see; there are twelve of them.
This is one of the lines we might see; there are twelve of them.

Running `M-x count-fold-lines' will transform the region into this:

     12 This is one of the lines we might see; there are twelve of them.
     10 And still others are like this -- there are ten of these.
      5 Meanwhile, other lines look like this (there are five of them).
      2 There's one line that appears twice.
      1 Some lines only appear once, so why did I even use the plural?

Now you can see that there were five unique lines, and how many times each line appeared in the original region.

This code depends on the availability of the standard Unix utility uniq — I just took the shortest path because I needed this command on the fly for some research I was doing. But it could easily be rewritten in pure Elisp, with no dependency on external uniq; if someone really wants to use this on a system where uniq is unavailable, please let me know.

(And if you’re looking for this in my .emacs, it’s under the name kf-count-fold-lines, because I use that namespace prefix to avoid clobbering symbols from Emacs and from third-party packages.)

Leave a Reply

Your email address will not be published. Required fields are marked *

Rants.org Comments Policy

− two = three