## 2011-09-09

### 4×4-Bit Matrix Transposition

This is just a quick note to document a concrete variation of the technique for transposing a bit matrix via parallel bit operations in Hacker's Delight 7–3. In this case I needed to specialize it to 4×4 matrices represented by 16-bit integers. The underlying method is to recursively transpose each of the four 2x2-bit submatrices, and then block-transpose the 2x2-block matrix. Numbering the bits from 0 to A from LSB to MSB, the procedure looks like this:

 3 2 1 0 7 6 5 4 B A 9 8 F E D C

 6 2 4 0 7 3 5 1 E A C 8 F B D 9

 C 8 4 0 D 9 5 1 E A 6 2 F B 7 3

In terms of 16-bit words, each step looks like this:

 `F E D C` `B A 9 8` `7 6 5 4` `3 2 1 0` ↓ `F B D 9` `E A C 8` `7 3 5 1` `6 2 4 0` ↓ `F B 7 3` `E A 6 2` `D 9 5 1` `C 8 4 0`

where I have highlighted the bits that remain fixed at each step. The technique amounts to swapping by `xor`-ing, as in `x ^= y ^= x ^= y` but bit by bit. The bits that remain fixed are masked out in order to avoid being swapped. In what follows, I use `-` for the zero bit and `x` for the one bit, and reserve hexadecimal digits as variable names for each bit position. I denote `xor`-ing bits i and j by juxtaposition as ij. Suppose the matrix to be transposed is stored in the 16-bit variable m. Using t as a temporary, the following operations carry out the transposition as shown:

 `m` `m>>3` `m^(m>>3)` `0x0A0A` `t=(m^(m>>3))&0x0A0A` `t<<3` `t^(t<<3)` `m^=t^(t<<3)` `m>>6` `m^(m>>6)` `0x00CC` `t=(m^(m>>6))&0x00CC` `t<<6` `t^(t<<6)` `m^=t^(t<<6)` F E D C B A 9 8 7 6 5 4 3 2 1 0 - - - F E D C B A 9 8 7 6 5 4 3 F E D FC EB DA C9 B8 A7 96 85 74 63 52 41 30 - - - - x - x - - - - - x - x - - - - - EB - C9 - - - - - 63 - 41 - - EB - C9 - - - - - 63 - 41 - - - - - EB - C9 EB - C9 - - 63 - 41 63 - 41 - F B D 9 E A C 8 7 3 5 1 6 2 4 0 - - - - - - F B D 9 E A C 8 7 3 F B D 9 E A FC B8 D7 93 E5 A1 C6 82 74 30 - - - - - - - - x x - - x x - - - - - - - - - - D7 93 - - C6 82 - - - - D7 93 - - C6 82 - - - - - - - - - - D7 93 - - C6 82 D7 93 - - C6 82 - - F B 7 3 E A 6 2 D 9 5 1 C 8 4 0

Each `xor`-assignment to m completes the corresponding recursive step as outlined above. OCaml code for this is straightforward:

```let btrans4x4 m =
let t = (m lxor (m lsr 3)) land 0x0a0a in
let m = m lxor t lxor (t lsl 3) in
let t = (m lxor (m lsr 6)) land 0x00cc in
m lxor t lxor (t lsl 6)
```

#### 1 comment:

Geoffrey Irving said...

Thanks for the useful page! I just implemented a somewhat generalized version of your routine, and thought I'd share it:

https://github.com/girving/pentago/blob/3e64c60854a181131cc201fa679582305fcb6349/symmetry.cpp#L228

The code takes an element of the semidirect product group Z_4^4 \rtimes D_4 and applies it as a certain conjugation map to a subset of Z_4^4 represented as 4x4x4x4 bit array packed into two __m128i's. Conjugation by the D_4 part of the map requires transposing several different pairs of axes.