After linearizing all array accesses, interpolating ARGB values is easy from the algorithmic side of things; so easy things first. I'll handle the pixel interpolation proper by abstracting them away in their own little functions. Processing an ARGB source array is just a matter of changing the variable declarations:

final int[] srcpix = { 0xff0051ff, 0xff00fffb, 0xff9cff00, 0xff0051ff, 0xffff0000, 0xff00ff08, 0xffffdb00, 0xff00fffb, 0xff9cff00, 0xff00fffb, 0xff0051ff, 0xffffdb00, 0xffffdb00, 0xff9cff00, 0xff00fffb, 0xff00ff08, }; final int nrows = 4; final int ncols = 4; void interp0() { final int[] p = new int[4*ncols+8]; p[1*ncols+2] = srcpix[0*ncols]; arrayCopy(srcpix, 0*ncols, p, 1*ncols+3, ncols); p[2*ncols+3] = srcpix[1*ncols-1]; p[2*ncols+4] = srcpix[1*ncols]; arrayCopy(srcpix, 1*ncols, p, 2*ncols+5, ncols); p[3*ncols+5] = srcpix[2*ncols-1]; p[3*ncols+6] = srcpix[2*ncols]; arrayCopy(srcpix, 2*ncols, p, 3*ncols+7, ncols); p[4*ncols+7] = srcpix[3*ncols-1]; arrayCopy(p, 1*ncols+2, p, 0, ncols+2); int yr = 0; int yi = 2; int offset = 0; for (int i = 0; i < height; i++) { final float yf = (float) yr / (float) height; int c00 = p[0*ncols+0], c01 = p[0*ncols+1], c02 = p[0*ncols+2], c03 = p[0*ncols+3]; int c10 = p[1*ncols+2], c11 = p[1*ncols+3], c12 = p[1*ncols+4], c13 = p[1*ncols+5]; int c20 = p[2*ncols+4], c21 = p[2*ncols+5], c22 = p[2*ncols+6], c23 = p[2*ncols+7]; int c30 = p[3*ncols+6], c31 = p[3*ncols+7], c32 = p[3*ncols+8], c33 = p[3*ncols+9]; int xr = 0; int xi = 3; for (int j = 0; j < width; j++) { final float xf = (float) xr / (float) width; int c = 0; switch (method) { case NEAREST: c = yf < 0.5 ? xf < 0.5 ? c11 : c12 : xf < 0.5 ? c21 : c22; break; case BILINEAR: c = bilinear(xf, yf, c11, c12, c21, c22); break; case BICUBIC: c = bicubic(xf, yf, c00, c01, c02, c03, c10, c11, c12, c13, c20, c21, c22, c23, c30, c31, c32, c33); break; } pixels[offset++] = c; xr += ncols - 1; if (xr >= width) { xr -= width; xi += 1; if (xi < ncols+2) { c00 = c01; c01 = c02; c02 = c03; c03 = p[0*ncols+xi+0]; c10 = c11; c11 = c12; c12 = c13; c13 = p[1*ncols+xi+2]; c20 = c21; c21 = c22; c22 = c23; c23 = p[2*ncols+xi+4]; c30 = c31; c31 = c32; c32 = c33; c33 = p[3*ncols+xi+6]; } } } yr += nrows - 1; if (yr >= height) { yr -= height; yi += 1; if (yi <= nrows) { arrayCopy(p, (ncols+2), p, 0, 3*(ncols+2)); if (yi < nrows) { p[3*ncols+6] = srcpix[yi*ncols]; arrayCopy(srcpix, yi*ncols, p, 3*ncols+7, ncols); p[4*ncols+7] = srcpix[(yi+1)*ncols-1]; } } } } }

The pixel array is the result of obtaining the `hue()`

for each value in the original `float`

array `data`. As you can see, I simplified the inner `switch`

by extracting the interpolation schemes off to named routines. These routines will have to unpack the four components in each pixel, interpolate each component and pack back the result pixel. The bilinear interpolator is straightforward:

static int bilinear(final float t, final float u, final int c00, final int c01, final int c10, final int c11) { final int a00=(c00>>24)&0xff, r00=(c00>>16)&0xff, g00=(c00>>8)&0xff, b00=c00&0xff; final int a01=(c01>>24)&0xff, r01=(c01>>16)&0xff, g01=(c01>>8)&0xff, b01=c01&0xff; final int a10=(c10>>24)&0xff, r10=(c10>>16)&0xff, g10=(c10>>8)&0xff, b10=c10&0xff; final int a11=(c11>>24)&0xff, r11=(c11>>16)&0xff, g11=(c11>>8)&0xff, b11=c11&0xff; final float a = linear(u, linear(t, a00, a01), linear(t, a10, a11)); final float r = linear(u, linear(t, r00, r01), linear(t, r10, r11)); final float g = linear(u, linear(t, g00, g01), linear(t, g10, g11)); final float b = linear(u, linear(t, b00, b01), linear(t, b10, b11)); return (int) a << 24 | (int) r << 16 | (int) g << 8 | (int) b;

Note that the pixels are unpacked in "row" (ARGB) order but interpolated in "column" order; these transpositions are typical of vectorized and GPU code. The result is very different to the previous version of the linear interpolator, since I'm interpolating *vectors* here, not scalars mapped to a color ramp:

You can see very noticeable Mach banding in the form of color ridges and creases. `hue()`

mitigated this effect by smootherstep'ping the parameter, and I can do the same here, by converting a `float`

channel into a saturated (that is, range limited) `int`

channel:

static int bsat(float x) { if (x < 0) return 0; else if (x > 255) return 255; x /= 255f; return (int) ((10 - (15 - 6 * x) * x) * x * x * x * 255); }

Changing the last line in `bilinear()`

to:

return bsat(a) << 24 | bsat(r) << 16 | bsat(g) << 8 | bsat(b);

results in an image with the high-frequency components smoothed out or filtered:

*Much* nicer. In fact I now understand why there are so many interpolation options to resize an image in Adobe Photoshop®. In the case of the bicubic interpolation I quickly learned that some sort of clipping and/or rescaling is vital, since the "tension" given by the derivatives can make the interpolated values overshoot:

This is easily seen by masking out every channel but one:

In fact, a little experimentation shows that interpolated values can go as low as -72 or as high as 327! The worst outliers are, of course, the result of interpolating high-frequency matrices of the form `[[0 1 1 0] [1 0 0 1] [1 0 0 1] [0 1 1 0]]`

. Local rescaling is not a good choice, in my opinion, since it would wash out all channels equally; in particular, it would make images more transparent than they really are. The obvious choice is clipping, although doing that without filtering introduces artifacts:

Using the `bsat()`

above produces the best results:

The final code for the ARGB bicubic interpolator is:

static int bicubic(final float t, final float u, final int c00, final int c01, final int c02, final int c03, final int c10, final int c11, final int c12, final int c13, final int c20, final int c21, final int c22, final int c23, final int c30, final int c31, final int c32, final int c33) { final int a00=(c00>>24)&0xff, r00=(c00>>16)&0xff, g00=(c00>>8)&0xff, b00=c00&0xff; final int a01=(c01>>24)&0xff, r01=(c01>>16)&0xff, g01=(c01>>8)&0xff, b01=c01&0xff; final int a02=(c02>>24)&0xff, r02=(c02>>16)&0xff, g02=(c02>>8)&0xff, b02=c02&0xff; final int a03=(c03>>24)&0xff, r03=(c03>>16)&0xff, g03=(c03>>8)&0xff, b03=c03&0xff; final int a10=(c10>>24)&0xff, r10=(c10>>16)&0xff, g10=(c10>>8)&0xff, b10=c10&0xff; final int a11=(c11>>24)&0xff, r11=(c11>>16)&0xff, g11=(c11>>8)&0xff, b11=c11&0xff; final int a12=(c12>>24)&0xff, r12=(c12>>16)&0xff, g12=(c12>>8)&0xff, b12=c12&0xff; final int a13=(c13>>24)&0xff, r13=(c13>>16)&0xff, g13=(c13>>8)&0xff, b13=c13&0xff; final int a20=(c20>>24)&0xff, r20=(c20>>16)&0xff, g20=(c20>>8)&0xff, b20=c20&0xff; final int a21=(c21>>24)&0xff, r21=(c21>>16)&0xff, g21=(c21>>8)&0xff, b21=c21&0xff; final int a22=(c22>>24)&0xff, r22=(c22>>16)&0xff, g22=(c22>>8)&0xff, b22=c22&0xff; final int a23=(c23>>24)&0xff, r23=(c23>>16)&0xff, g23=(c23>>8)&0xff, b23=c23&0xff; final int a30=(c30>>24)&0xff, r30=(c30>>16)&0xff, g30=(c30>>8)&0xff, b30=c30&0xff; final int a31=(c31>>24)&0xff, r31=(c31>>16)&0xff, g31=(c31>>8)&0xff, b31=c31&0xff; final int a32=(c32>>24)&0xff, r32=(c32>>16)&0xff, g32=(c32>>8)&0xff, b32=c32&0xff; final int a33=(c33>>24)&0xff, r33=(c33>>16)&0xff, g33=(c33>>8)&0xff, b33=c33&0xff; final float a = cubic(u, cubic(t, a00, a01, a02, a03), cubic(t, a10, a11, a12, a13), cubic(t, a20, a21, a22, a23), cubic(t, a30, a31, a32, a33)); final float r = cubic(u, cubic(t, r00, r01, r02, r03), cubic(t, r10, r11, r12, r13), cubic(t, r20, r21, r22, r23), cubic(t, r30, r31, r32, r33)); final float g = cubic(u, cubic(t, g00, g01, g02, g03), cubic(t, g10, g11, g12, g13), cubic(t, g20, g21, g22, g23), cubic(t, g30, g31, g32, g33)); final float b = cubic(u, cubic(t, b00, b01, b02, b03), cubic(t, b10, b11, b12, b13), cubic(t, b20, b21, b22, b23), cubic(t, b30, b31, b32, b33)); return bsat(a) << 24 | bsat(r) << 16 | bsat(g) << 8 | bsat(b); }

Quite a mouthful, but I arrived at it by methodic and meticulous columnar editing. Now everything is ready to tackle the problem of hoisting the conditional in the inner loop and deriving three different, fully streamlined interpolators. Until next time.

## No comments:

Post a Comment