2012-01-12

File Sharing on the Spot

(now with highlighting!) Last time I complained that I couldn't find an easy way to share a source code archive that didn't involve signing up for a service I didn't care for. Blogging platforms only make easy to attach images to posts, so why not pack a file as a PNG? Enter PNGPack. I tried finding OCaml bindings to libpng; after a disheartening exploration I realized that Java is (for me at least) the ideal platform to make this possible, since it has a rich standard library to resort to:

import java.awt.image.BufferedImage;
import java.awt.image.DataBufferByte;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import javax.imageio.ImageIO;

PNGPack is the simplest command-line utility I could get away with. It processes its arguments one by one, checking that they actually are regular files and determining by their extension whether to pack them as PNGs or to extract them:

public class PNGPack {
  public static void main(String[] args) {
    if (args.length == 0) {
      System.err.println("usage - java PNGPack <file>...");
      System.exit(2);
    }
    for (int i = 0; i != args.length; i++) {
      final File file = new File(args[i]);
      if (!file.isFile()) {
        System.err.printf("File \"%s\" is not a regular file\n", file.toString());
        continue;
      }
      final String fileName = file.getName();
      final int index = fileName.lastIndexOf('.');
      final String baseName, extension;
      if (index < 1) {
        baseName  = fileName;
        extension = null;
      } else {
        baseName  = fileName.substring(0, index);
        extension = fileName.substring(index + 1).toLowerCase();
      }
      try {
        if ("png".equals(extension)) {
          final File out = new File(file.getParentFile(), baseName);
          decode(file, out);
        } else {
          final File out = new File(file.getParentFile(), fileName + ".png");
          encode(file, out);
        }
      } catch (IOException e) {
        System.err.printf("Can't read \"%s\"\n", file.toString());
        e.printStackTrace(System.err);
      }
    }
  }

The image is headed by 24 bytes comprised by:

OffsetLengthField
04Signature 'PNGP'
44File length (big endian)
816MD5 digest
  private static final int PNGP_SIG  = 0x504e4750; // 'PNGP'
  private static final int HEADER_SZ = 24;

The image dimensions are selected so that the result is as square as possible. The image uses 4-byte ABGR pixels for maximum compactness and round-trip fidelity. The image is padded to size with 0 bytes, and the MD5 digest is computed over all the contents, padding included to avoid opening a steganographic channel:

  private static void encode(File inp, File out) throws IOException {
    final long length = inp.length();
    if (length >= 0x80000000L)
      throw new IOException("Overlong file");
    final int pixels = (int) (length + HEADER_SZ) >> 2;
    final int height = (int) Math.floor(Math.sqrt((double) pixels));
    final int width  = (int) Math.ceil((double) pixels / (double) height);
    final BufferedImage image = new BufferedImage(width, height, BufferedImage.TYPE_4BYTE_ABGR);
    final byte[] frame = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();
    final MessageDigest md5 = getMD5Digest();
    final InputStream is = new DigestInputStream(new FileInputStream(inp), md5);
    try {
      int nread;
      int index = HEADER_SZ;
      while ( (nread = is.read(frame, index, frame.length - index)) != -1 )
        index += nread;
      for (int i = index; i != frame.length; i++)
        frame[i] = 0;
      md5.update(frame, index, frame.length - index);
    }
    finally {
      is.close();
    }
    final byte[] digest = md5.digest();
    assert (digest.length == 16);
    intToBytes(frame, 0, PNGP_SIG);
    intToBytes(frame, 4, (int) length);
    for (int i = 0; i != digest.length; i++)
      frame[8 + i] = digest[i];
    final OutputStream os = new BufferedOutputStream(new FileOutputStream(out));
    try {
      ImageIO.write(image, "PNG", os);
    } finally {
      os.close();
    }
    System.out.printf("<img width=\"%d\" height=\"%d\" src=\"%s\" alt=\"%s\" />\n",
      width, height, out.getName(), inp.getName());
  }

The encoding ends by outputting a handy <img> tag. Decoding an image is equally straightforward, except for a number of safety checks trying to ensure that only proper PNGPack images are decoded:

  private static void decode(File inp, File out) throws IOException {
    final InputStream is = new BufferedInputStream(new FileInputStream(inp));
    final BufferedImage image;
    try {
      image = ImageIO.read(is);
    } finally {
      is.close();
    }
    if (image.getType() != BufferedImage.TYPE_4BYTE_ABGR)
      throw new IOException("Invalid PNGPack image (bad image type)");
    final byte[] frame = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();
    if (bytesToInt(frame, 0) != PNGP_SIG)
      throw new IOException("Invalid PNGPack image (bad signature)");
    final int length = bytesToInt(frame, 4);
    final int pixels = (int) (length + HEADER_SZ) >> 2;
    final int height = (int) Math.floor(Math.sqrt((double) pixels));
    final int width  = (int) Math.ceil((double) pixels / (double) height);
    if (!(height == image.getHeight() && width == image.getWidth()))
      throw new IOException("Invalid PNGPack image (bad dimensions)");
    final MessageDigest md5 = getMD5Digest();
    md5.update(frame, HEADER_SZ, frame.length - HEADER_SZ);
    final byte[] digest = md5.digest();
    assert (digest.length == 16);
    for (int i = 0; i != digest.length; i += 4)
      if (frame[8 + i] != digest[i])
        throw new IOException("Invalid PNGPack image (bad MD5 digest)");
    final OutputStream os = new FileOutputStream(out);
    try {
      os.write(frame, HEADER_SZ, length);
    } finally {
      os.close();
    }
  }

The code is hyper-compact in order to minimize the number of lines shown in this post. Feel free to add whitespace and comments to suit taste. The only thing left is a couple of helper functions:

  private static void intToBytes(byte[] buf, int off, int n) {
    buf[off + 0] = (byte) ((n >> 24) & 255);
    buf[off + 1] = (byte) ((n >> 16) & 255);
    buf[off + 2] = (byte) ((n >>  8) & 255);
    buf[off + 3] = (byte) ( n        & 255);
  }

  private static int bytesToInt(byte[] buf, int off) {
    return (((int) buf[off + 0] & 255) << 24)
      |  (((int) buf[off + 1] & 255) << 16)
      |  (((int) buf[off + 2] & 255) <<  8)
      |   ((int) buf[off + 3] & 255);
  }

  private static MessageDigest getMD5Digest() {
    try {
      return MessageDigest.getInstance("MD5");
    } catch (NoSuchAlgorithmException _) {
      System.err.println("No instance of MD5 digest algorithm!");
      System.exit(1);
      return null; // not reached
    }
  }
}

I've updated the last post to include the PNGPacked code archive. Use it responsibly, and enjoy!

6 comments:

Paolo said...

Wow, this is smart :-)

Matías Giovannini said...

@Paolo: you're too kind! I didn't do any due diligence to see if something like this already existed; I'm sure that the idea gets reinvented periodically.

Just A. Developer said...

This is a great idea.

What kinds of problems did you run into with the OCaml bindings to libpng? Do you think they're fixable or would it be better to start over?

Matías Giovannini said...

@Phil: the problem with libpng bindings is that there aren't any! Libpng itself is rather bizarre: it uses setjmp/longjmp as a poor-man's C exception system, for instance. Nothing that can be cooked up in three hours like I did in Java.

gasche said...

Maybe you could have used Camlimages?

It has PNG support, among other image formats, but it doesn't reuse libpng.

Emmanuel said...

Hello,

with gist.github.com you can upload code anonymously (without login) and without using git either. All you need is a tool like curl.

I made a little script to juggle the curl parameters so I don't have to do that by hand:


https://gist.github.com/1616651


If you don't like curl or ruby you can do that with wget and bash or whatever tool you like.

Gist provides an embed code for blogs with with nice syntax highlighting, like this:


<script src="https://gist.github.com/1616651.js"></script>


But you can just put a direct download link like this one:

https://gist.github.com/gists/1616651/download

if you rather that.

The reason I like gist over other million "paste" tools out there is that it allows you to submit multiple files, which is convenient sometimes.

Greetings!