2007-12-13

Naturally Sorted

Well, Jeff Atwood has weighed in the discussion, with a clear explanation for "the rest of them". Seeing Batchelder's Python code, I realized I didn't do as well as I could with my OCaml version. Specifically, I didn't look hard enough for a simple way to split the string between decimal and non-decimal parts. The idea is not to delimit parts by looking for transitions between one kind and the other, but to use one of the kinds as the delimiter itself:

let split =
  let delim = Str.regexp "[0-9]+" in
    List.map (function Str.Delim s -> s | Str.Text s -> s)
  % Str.full_split delim

As is to be expected with OCaml, the result of full_split is rather odd, in that it uses an ADT to distinguish between text and delimiters. I use a forgetful mapping to coalesce all the parts into a plain string list.

Another point where the comparison could be more discriminating is when comparing numbers with leading zeros. While it is obvious that "010" > "08", it is also to be expected that "010" > "10". The fix for this is to compare first by value, then by length:

let cmp_alnum x y =
  try
    let m, n = int_of_string x, int_of_string y in
    let c = Pervasives.compare m n in
    if c != 0 then c else
    Pervasives.compare (String.length x) (String.length y)
  with Failure _ -> Pervasives.compare x y

There's always opportunity for polish…

No comments: