Please enable Javascript to view this site.
 
 
  • Coding

  • Technology

  • Japanese

  • Chinese

  • Miscellaneous

Primitive Captcha Solving / OCR in Javascript

Blizzard’s recently concluded Diablo 3 beta key giveaway contest on Twitter provides an opportunity to demonstrate primitive Captcha solving / OCR in Javascript, along with some basic use of the HTML5 canvas element, a Javascript OAuth library, and artificial neural networks.

Here is the background story. Blizzard, the company behind the hugely anticipated Diablo 3 game, recently concluded a giveaway contest where they posted one or two Diablo 3 beta keys to Twitter each day; all you had to do was be the first to see the key and enter it into your account to claim it. It was pretty obvious that anyone could easily write a script to constantly refresh the Blizzard Twitter feed, identify a key string, and submit it to a Battle.net account. The fans protested the injustice of this system, and Blizzard responded by posting images of the keys rather than the keys themselves, supposedly thwarting any automated scripts.

While I suppose Blizzard gets props for trying, it is still embarrassingly easy to scrape the keys even from the images. Before I go any further, let me say that I did not use this to grab any keys. Since I already have one, there was no point for me doing this other than to satisfy my curiosity. I executed everything up to the point before submitting the beta key itself. I did not want to take away the opportunity from anyone else, although in retrospect I doubt that mattered, since all the keys probably went to people who did something similar to what I’m about to describe. Before continuing, here’s a screenshot of the code in action:

First let’s get the boring stuff out of the way. Refreshing the Blizzard Twitter feed is easily done through the Twitter API. You can submit the following JSON request:

https://api.twitter.com/1/statuses/user_timeline.json?screen_name=BlizzardCS&include_rts=true&count=16&include_entities=true&1334348815562=cachebust

Note that the cachebust value is simply the output of “+new Date”. This can then be read into a Javascript object:

var tweets = JSON.parse(xmlhttp.responseText);

You will soon notice Twitter places rate limits on their API requests. For public queries, they limit you to 150 requests per hour based on IP Address. For authenticated requests (Twitter uses OAuth), they limit you to 350 per application. To be the first to grab a beta key, you want to refresh the Twitter feed as often as possible. A single OAuth token can access the API about once every 12 seconds. Although it is against their ToS to round robin multiple tokens, they don’t seem to enforce it, at least not within the few days I experimented with it. Since OAuth requests are not as simple as your everyday xmlhttp requests, it may help to use a Javascript OAuth library like this one.

Once you have the JSON feed, all you have to do is identify the tweet that has the link to the beta key image.

// Parse each tweet
for (var i = 0; i < tweets.length; i++) {
	// Check for beta key
	if (tweets[i].entities.media) {
		for (var j = 0; j < tweets[i].entities.media.length; j++) {
			if (tweets[i].entities.media[j].type == 'photo') {
				// Grab image and submit for decoding
				url = tweets[i].entities.media[0].media_url_https;
				break;
			}
		}
	}
}

With the url of the image, you can now get to the fun stuff– decoding the image into a string of characters. To begin, create an HTML5 canvas element and add the image to it.

// Add image to canvas
var img = new Image();
img.onload = function() {
	canvas = document.createElement('canvas');
	canvas.width = img.width;
	canvas.height = img.height;
	canvas.getContext('2d').drawImage(img,0,0);
	decode();
};
img.src = url;

The decode function is where you break the image into individual characters and identify each one. This will be the focus of the rest of this article. First, read the pixel data from the image:

// Read in pixel data
var image = canvas.getContext('2d').getImageData(0, 0, canvas.width, canvas.height);

The pixel data includes 4 values for each pixel: one for each of red, green, and blue (RGB), and one for alpha transparency. To make things easier, convert the image to greyscale.

// Convert to grayscale
for (var x = 0; x < image.width; x++){
	for (var y = 0; y < image.height; y++){
		var i = x*4+y*4*image.width;
		var luma = Math.floor(
			image.data[i] * 299/1000 +
			image.data[i+1] * 587/1000 +
			image.data[i+2] * 114/1000
		);
		image.data[i] = luma;
		image.data[i+1] = luma;
		image.data[i+2] = luma;
		image.data[i+3] = 255;
	}
}

Next, you want to cut the image into blocks, one for each character. Identifying blocks can be a bit tricky. You could probably write a more sophisticated algorithm involving some sort of line tracing to identify contiguous shapes, but for our simple case, identifying vertical strips of mostly white space does the trick.

// Cut into blocks
var blocks = new Array();
var block_start = 0;
var block_end = 0;
var before_white = true;
for (var x = 0; x < image.width; x++) {
	var white = true;
	for (var y = 0; y < image.height; y++) {
		var i = x*4 + y*4*image.width
		var c = image.data[i];
		if (c < 140) {
			white = false;
			break;
		}
	}
	if (before_white == true && white == false) {
		block_start = x;
	}
	if (before_white == false && white == true) {
		block_end = x - 1;
		var block = {start: block_start, end: block_end, image: {}, canvas: {}};
		blocks.push(block);
	}
	before_white = white;
}

At this point we’ve identified the start and end x-pixel for each block, but we haven’t actually created the individual images yet. Cloning an array in Javascript is not as simple as saying arr1 = arr2; that only creates a pointer to arr2, not a copy. This is further complicated by the fact that the image data from a canvas element is stored as a Uint8ClampedArray, which doesn’t behave the same as a simple array. Therefore, cloning the original image to each block requires a little extra effort.

// Clone each block
for (var w = 0; w < blocks.length; w++) {
	blocks[w].image.width = image.width;
	blocks[w].image.height = image.height;
	blocks[w].image.data = new Uint8ClampedArray(image.data.length);
	for (var i = 0; i < image.data.length; i++) {
		blocks[w].image.data[i] = image.data[i];
	}
}

// Whiteout all other characters from each block
for (var w = 0; w < blocks.length; w++) {
	for (var x = 0; x < image.width; x++) {
		if (x < blocks[w].start || x > blocks[w].end) {
			for (var y = 0; y < image.height; y++) {
				var i = x*4 + y*4*image.width
				blocks[w].image.data[i] = 255;
				blocks[w].image.data[i+1] = 255;
				blocks[w].image.data[i+2] = 255;
			}
		}
	}
}

Now we have multiple copies of the original image, one for each character. To standardize things, we will crop each character and resize them to dimensions that were found to work well with this particular character set. If you use different or multiple character sets, you may want to change these dimensions to fit your situation.

// Crop each block, pad with whitespace to appropriate ratio, and resize to 60 x 50
for (var w = 0; w < blocks.length; w++) {
	// We already have the x-boundaries, just need to find y-boundaries
	var y_min = 0;
	findmin:
	for (var y = 0; y < blocks[w].image.height; y++) {
		for (var x = 0; x < blocks[w].image.width; x++) {
			var i = x*4 + y*4*image.width
			if (blocks[w].image.data[i] < 200) {
				y_min = y;
				break findmin;
			}
		}
	}
	var y_max = 0;
	findmax:
	for (var y = blocks[w].image.height; y >= 0; y--) {
		for (var x = 0; x < blocks[w].image.width; x++) {
			var i = x*4 + y*4*image.width
			if (blocks[w].image.data[i] < 200) {
				y_max = y;
				break findmax;
			}
		}
	}

	// Pad and resize
	var cwidth = blocks[w].end - blocks[w].start + 1;
	var cheight = y_max - y_min + 1;
	var cratio = cwidth / cheight;

	var sx = blocks[w].start;
	var sy = y_min;
	var sw = blocks[w].end - blocks[w].start + 1;
	var sh = y_max - y_min + 1;

	var dimx = 60;
	var dimy = 50;
	var dimr = dimx / dimy;
	if ((cwidth / cheight) < dimr) {
		var dh = dimy;
		var dw = Math.round(cwidth * dimy / cheight);
		var dy = 0;
		var dx = Math.round((dimx - dw) / 2);
	}
	else if ((cwidth / cheight) > dimr) {
		var dw = dimx;
		var dh = Math.round(cheight * dimx / cwidth);
		var dx = 0;
		var dy = Math.round((dimy - dh) / 2);
	}
	else {
		var dh = dimy;
		var dw = dimx;
		var dy = 0;
		var dx = 0;
	}
	blocks[w].canvas = document.createElement('canvas');
	blocks[w].canvas.width = dimx;
	blocks[w].canvas.height = dimy;
	blocks[w].canvas.style.margin = "0 1px 0 0";
	blocks[w].canvas.getContext('2d').fillStyle="#ffffff";
	blocks[w].canvas.getContext('2d').fillRect(0,0,dimx,dimy);
	blocks[w].canvas.getContext('2d').drawImage(canvas, sx, sy, sw, sh, dx, dy, dw, dh);
}

Now we can finally get to the interesting part of identifying each character. The way we do this resembles a neural network. We can create a weighted pixel profile of each character based on existing fonts. For example, take the letter “A”. We can create multiple 60 x 50 images of the letter A in various fonts (Arial, Verdana, Helvetica, Times New Roman, etc.), grab the pixel data for each one, and create an array corresponding to how often an individual pixel in the 60 x 50 image is “on” for the letter A. Certain pixels may be “on” for the letter A in all fonts, while other pixels may be “on” for the letter A in Arial, but not for the letter A in Helvetica. This results in an array of weighted values corresponding to how “important” each pixel is to the letter A. Repeat this for each letter of the alphabet, each number, etc., and you end up with an artificial neural network that can “learn” as you incorporate more fonts.

Generating this neural net can be a bit tedious. Here is some basic code to get you started. It generates a net for all characters in a single font. You can then repeat this for other fonts and average the results (or weigh them if you value certain fonts over others).

// Create an image with all letters and numbers - "ABCDEFGHIJKLMNOPQRSTUVWXYZ-1234567890"
// Feed that image to the previous code thus far to get blocks of each character
for (var w = 0; w < blocks.length; w++) {
	var cimage = blocks[w].canvas.getContext('2d').getImageData(0, 0, blocks[w].canvas.width, blocks[w].canvas.height);
	var code = 'net[' + w + '] = [';
	// Check receptors
	for (var x = 0; x < cimage.width; x += 2) {
		for (var y = 0; y < cimage.height; y += 2) {
			var i = x*4 + y*4*cimage.width
			var c = cimage.data[i];
			// Test if a pixel is "on"
			if (c < 160) {
				code += '1,';
			}
			else {
				code += '0,'
			}
		}
	}
	code += '];<br/>';
	document.getElementById('debug').innerHTML += code;
}

Once you have established your neural net, you can feed it the characters from the beta key image. For simplicity (and because Blizzard uses the same font for all their images), we’re using only one font, so our neural net contains only the values 0 or 1 (source).

// Output map for neural net
var output_map = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '-',
                  'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
                  'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'];

// Guess the character
var code = '';
for (var w = 0; w < blocks.length; w++) {
	// Reset activation data
	var activation = new Array();
	for (var i = 0; i < net.length; i++) {
		activation[i] = 0;
	}
	var n = 0;
	// Run the net
	var cimage = blocks[w].canvas.getContext('2d').getImageData(0, 0, blocks[w].canvas.width, blocks[w].canvas.height);
	for (var x = 0; x < cimage.width; x += 2) {
		for (var y = 0; y < cimage.height; y += 2) {
			var i = x*4 + y*4*cimage.width;
			for (var j = 0; j < net.length; j++) {
				if (cimage.data[i] < 160 && net[j][n] == 1) {
					activation[j]++;
				}
				else if (cimage.data[i] >= 160 && net[j][n] == 0) {
					activation[j]++;
				}
			}
			n++;
		}
	}
	// Evaluate results
	character = 0;
	confidence = 0;
	for (var i = 0; i < activation.length; i++) {
		if (activation[i] > confidence) {
			character = i;
			confidence = activation[i];
		}
	}
	code += output_map[character];
}
// Display the result
document.getElementById('result').innerHTML = code;

That’s it, you now have a way to decode an image into a string of characters using only a few lines of Javascript. To complete the original objective of obtaining a Diablo 3 beta key, you just have to submit the key to the Battle.net game activation page, which is trivial. When testing this, I was able to keep the Twitter JSON updated every 2 seconds, and once a tweet with a link to the beta key was identified, grabbing and decoding the image happened almost instantaneously.

As a final note, let me reiterate that I never actually submitted any of the keys (I doubt it would let me register two beta keys anyway). However, I hope this shows that any such Twitter competitions are foolish, and that companies like Blizzard would refrain from them in the future.

Comments (0) Trackbacks (0)

No comments yet.


Leave a comment


No trackbacks yet.