2011-09-09

4×4-Bit Matrix Transposition

This is just a quick note to document a concrete variation of the technique for transposing a bit matrix via parallel bit operations in Hacker's Delight 7–3. In this case I needed to specialize it to 4×4 matrices represented by 16-bit integers. The underlying method is to recursively transpose each of the four 2x2-bit submatrices, and then block-transpose the 2x2-block matrix. Numbering the bits from 0 to A from LSB to MSB, the procedure looks like this:



3210
7654
BA98
FEDC

6240
7351
EAC8
FBD9

C840
D951
EA62
FB73

In terms of 16-bit words, each step looks like this:


F E D CB A 9 87 6 5 43 2 1 0
F B D 9E A C 87 3 5 16 2 4 0
F B 7 3E A 6 2D 9 5 1C 8 4 0

where I have highlighted the bits that remain fixed at each step. The technique amounts to swapping by xor-ing, as in x ^= y ^= x ^= y but bit by bit. The bits that remain fixed are masked out in order to avoid being swapped. In what follows, I use - for the zero bit and x for the one bit, and reserve hexadecimal digits as variable names for each bit position. I denote xor-ing bits i and j by juxtaposition as ij. Suppose the matrix to be transposed is stored in the 16-bit variable m. Using t as a temporary, the following operations carry out the transposition as shown:


m F E D C B A 9 8 7 6 5 4 3 2 1 0
m>>3 - - - F E D C B A 9 8 7 6 5 4 3
m^(m>>3) F E DFC EBDAC9B8 A7968574 63524130
0x0A0A - - - - x - x - - - - - x - x -
t=(m^(m>>3))&0x0A0A - - - - EB -C9 - - - - - 63 -41 -
t<<3 -EB -C9 - - - - -63 -41 - - - -
t^(t<<3) -EB -C9 EB -C9 - -63 -41 63 -41 -
m^=t^(t<<3) F B D 9 E A C 8 7 3 5 1 6 2 4 0
m>>6 - - - - - - F B D 9 E A C 8 7 3
m^(m>>6) F B D 9 E AFCB8 D793E5A1 C6827430
0x00CC - - - - - - - - x x - - x x - -
t=(m^(m>>6))&0x00CC - - - - - - - - D793 - - C682 - -
t<<6 - -D793 - -C682 - - - - - - - -
t^(t<<6) - -D793 - -C682 D793 - - C682 - -
m^=t^(t<<6) F B 7 3 E A 6 2 D 9 5 1 C 8 4 0

Each xor-assignment to m completes the corresponding recursive step as outlined above. OCaml code for this is straightforward:


let btrans4x4 m =
  let t = (m lxor (m lsr 3)) land 0x0a0a in
  let m = m lxor t lxor (t lsl 3) in
  let t = (m lxor (m lsr 6)) land 0x00cc in
  m lxor t lxor (t lsl 6)

1 comment:

Geoffrey Irving said...

Thanks for the useful page! I just implemented a somewhat generalized version of your routine, and thought I'd share it:

https://github.com/girving/pentago/blob/3e64c60854a181131cc201fa679582305fcb6349/symmetry.cpp#L228

The code takes an element of the semidirect product group Z_4^4 \rtimes D_4 and applies it as a certain conjugation map to a subset of Z_4^4 represented as 4x4x4x4 bit array packed into two __m128i's. Conjugation by the D_4 part of the map requires transposing several different pairs of axes.