This is just a quick note to document a concrete variation of the technique for transposing a bit matrix via parallel bit operations in Hacker's Delight 7–3. In this case I needed to specialize it to 4×4 matrices represented by 16-bit integers. The underlying method is to recursively transpose each of the four 2x2-bit submatrices, and then block-transpose the 2x2-block matrix. Numbering the bits from 0 to A from LSB to MSB, the procedure looks like this:
In terms of 16-bit words, each step looks like this:
F E D C | B A 9 8 | 7 6 5 4 | 3 2 1 0 |
↓ |
F B D 9 | E A C 8 | 7 3 5 1 | 6 2 4 0 |
↓ |
F B 7 3 | E A 6 2 | D 9 5 1 | C 8 4 0 |
where I have highlighted the bits that remain fixed at each step. The technique amounts to swapping by xor
-ing, as in x ^= y ^= x ^= y
but bit by bit. The bits that remain fixed are masked out in order to avoid being swapped. In what follows, I use -
for the zero bit and x
for the one bit, and reserve hexadecimal digits as variable names for each bit position. I denote xor
-ing bits i and j by juxtaposition as ij. Suppose the matrix to be transposed is stored in the 16-bit variable m. Using t as a temporary, the following operations carry out the transposition as shown:
m | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
m>>3 | - | - | - | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 |
m^(m>>3) | F | E | D | FC | EB | DA | C9 | B8 | A7 | 96 | 85 | 74 | 63 | 52 | 41 | 30 |
0x0A0A | - | - | - | - | x | - | x | - | - | - | - | - | x | - | x | - |
t=(m^(m>>3))&0x0A0A | - | - | - | - | EB | - | C9 | - | - | - | - | - | 63 | - | 41 | - |
t<<3 | - | EB | - | C9 | - | - | - | - | - | 63 | - | 41 | - | - | - | - |
t^(t<<3) | - | EB | - | C9 | EB | - | C9 | - | - | 63 | - | 41 | 63 | - | 41 | - |
m^=t^(t<<3) | F | B | D | 9 | E | A | C | 8 | 7 | 3 | 5 | 1 | 6 | 2 | 4 | 0 |
m>>6 | - | - | - | - | - | - | F | B | D | 9 | E | A | C | 8 | 7 | 3 |
m^(m>>6) | F | B | D | 9 | E | A | FC | B8 | D7 | 93 | E5 | A1 | C6 | 82 | 74 | 30 |
0x00CC | - | - | - | - | - | - | - | - | x | x | - | - | x | x | - | - |
t=(m^(m>>6))&0x00CC | - | - | - | - | - | - | - | - | D7 | 93 | - | - | C6 | 82 | - | - |
t<<6 | - | - | D7 | 93 | - | - | C6 | 82 | - | - | - | - | - | - | - | - |
t^(t<<6) | - | - | D7 | 93 | - | - | C6 | 82 | D7 | 93 | - | - | C6 | 82 | - | - |
m^=t^(t<<6) | F | B | 7 | 3 | E | A | 6 | 2 | D | 9 | 5 | 1 | C | 8 | 4 | 0 |
Each xor
-assignment to m completes the corresponding recursive step as outlined above. OCaml code for this is straightforward:
let btrans4x4 m =
let t = (m lxor (m lsr 3)) land 0x0a0a in
let m = m lxor t lxor (t lsl 3) in
let t = (m lxor (m lsr 6)) land 0x00cc in
m lxor t lxor (t lsl 6)
1 comment:
Thanks for the useful page! I just implemented a somewhat generalized version of your routine, and thought I'd share it:
https://github.com/girving/pentago/blob/3e64c60854a181131cc201fa679582305fcb6349/symmetry.cpp#L228
The code takes an element of the semidirect product group Z_4^4 \rtimes D_4 and applies it as a certain conjugation map to a subset of Z_4^4 represented as 4x4x4x4 bit array packed into two __m128i's. Conjugation by the D_4 part of the map requires transposing several different pairs of axes.
Post a Comment