Jon Bentley has scarred a couple of generations of programmers, or he was and continues to be right:
I've assigned this problem in courses at Bell Labs and IBM. […] In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs (and I wasn't always convinced of the correctness of the code in which no bugs were found).
Jon Bentley, Programming Pearls, p. 36
This disjunction is not meant to be exclusive. The question whether programmers know how to write a binary search or not crops again in connection with this quote (but the topic is perennial), and reddit jumps to try its collective hand at the task and fail with numbers echoing Bentley's. The remedy (namely, closely study Dijkstra's A Discipline of Programming, or at least the van Gasteren and Feijen's paper) is as effective as it is bitter. The alternative is to learn the algorithm by heart:
let binsearch v e = let n = Array.length v in if n = 0 then raise Not_found else let i = ref 0 and j = ref n in while !i <> !j - 1 do let m = (!i + !j) / 2 in if v.(m) <= e then i := m else j := m done; if v.(!i) = e then !i else raise Not_found
(note: the midpoint calculation cannot overflow in OCaml). No ifs, no buts, not a single thing changed: the algorithm as presented above is proved correct. Alas, we live in an age of practical assurances, and like Thomas we trust more a green-lighted battery of tests than a formal derivation (who understands those anyway?). There is a nice, 4096-case test set available on-line; a parser and a test harness are simple enough to build for it. Each case consists of a label, a test value, a list of values and an expected outcome:
type test = Test of string * int * int array * bool
The file format is line-oriented; this makes easy to use a regexp-based parser. To ease the use of regular expressions, I build a test function out of a pattern:
let matching patt = let re = Str.regexp patt in fun str -> Str.string_match re str 0
(I close over the compiled regexp for efficiency). The required regexps are:
let is_label = matching "^Problem [0-9]+$" let is_start = matching "^in \\[$" let is_value = matching "^-?[0-9]+$" let is_endyes = matching "^\\]\\? yes$" let is_endno = matching "^\\]\\? no$" let is_sep = matching "^$"
Syntax errors in the test cases file are signalled by an exception:
exception SyntaxError of int * string
I keep track of the lines read for purposes of error reporting:
let lineno = ref 0 let syntax_error str = raise (SyntaxError (!lineno, str)) let read inch = try let l = input_line inch in incr lineno; Some l with End_of_file -> None
The parser reads a file and returns a list of test cases; it uses mutually-recursive functions as a traditional recursive-descent parser with synthetic attributes. Each function corresponds to a terminal in the production case ::= label value start value* end sep
:
let rec parse_tests inch = List.rev (parse_case inch []) and parse_case inch tests = match read inch with | Some line when is_label line -> parse_member inch tests line | None -> tests | _ -> syntax_error "PROBLEM expected" and parse_member inch tests label = match read inch with | Some line when is_value line -> parse_start inch tests label (int_of_string line) | _ -> syntax_error "MEMBER expected" and parse_start inch tests label cand = match read inch with | Some line when is_start line -> parse_value inch tests label cand [] | _ -> syntax_error "START expected" and parse_value inch tests label cand vals = match read inch with | Some line when is_endyes line -> parse_finish inch tests label cand vals true | Some line when is_endno line -> parse_finish inch tests label cand vals false | Some line when is_value line -> parse_value inch tests label cand (int_of_string line :: vals) | _ -> syntax_error "VALUE or END expected" and parse_finish inch tests label cand vals test = match read inch with | Some line when is_sep line -> parse_case inch (Test (label, cand, Array.of_list (List.rev vals), test) :: tests) | _ -> syntax_error "SEPARATOR expected"
The test harness iterates a test function over the resulting list, recording the number of successful tests along the total:
let test_binsearch inch = let count = ref 0 and pass = ref 0 in List.iter (fun (Test (label, cand, vals, test)) -> let result = try ignore (binsearch vals cand); test with Not_found -> not test in incr count; if result then begin Printf.printf "TEST %s: PASS\n" label; incr pass end else Printf.printf "TEST %s: FAIL\n" label ) (parse_tests inch); Printf.printf "\nSUMMARY: Passed %d of %d tests (%.2f %%)\n" !pass !count (100. *. float !pass /. float !pass)
In order to be tidy about using files, I use a helper handler:
let unwind ~protect f x = try let y = f x in let () = protect x in y with e -> let () = protect x in raise e let with_input_channel f inch = unwind ~protect:close_in f inch
The main expression turns this little (less than 90 lines of code) program into an executable suitable for command-line execution:
let () = let inch = try let fname = Array.get Sys.argv 1 in open_in_bin fname with Invalid_argument _ -> stdin in with_input_channel test_binsearch inch
That's it. Compile and execute it with:
$ ocamlopt.opt -o binsrch.exe str.cmxa binsrch.ml $ ./binsrch.exe tests.txt > tests.out
(I'm using Cygwin here, don't hate me). The result is as expected:
TEST Problem 1: PASS TEST Problem 2: PASS TEST Problem 3: PASS [snip…] TEST Problem 4094: PASS TEST Problem 4095: PASS TEST Problem 4096: PASS SUMMARY: Passed 4096 of 4096 tests (100.00 %)
Now do yourself a favor and go read Dijkstra.
2 comments:
Could you be more specific on `the midpoint calculation cannot overflow in OCaml'?
Anonymous, OCaml defines Sys.max_array_size << max_int. It is possible for integer arithmetic to overflow in OCaml (e.g., (max_int + max_int)/2 is -1), but never for the midpoint calculation with valid array indices.
Post a Comment