Navigation
Powered by Squarespace
Sidebar
« DARPA Shredder Challenge - Part Deux | Main | Paramiko: easy, native ssh in Python... »
Monday
Oct312011

DARPA Shredder Challenge - Preliminary (quarter-baked) thoughts...

DARPA opened its Shredder Challenge late last week, and in typical DARPA style threw down a fantastic, seemingly impossible gauntlet:  extract meaningful information (intel) from a series of damaged and destroyed documents.

I am struck by two things about this challenge:

  1. DARPA is asking "How would you...", and not "Can you...".  That is, DARPA believes it is wholly possible to complete the challenge, and it is only a matter of your ingenuity to find any particular path to a solution.  This is a striking position considering that the simplest of the puzzles looks like this:  (reproduced unwisely without any permission from DARPA. )And it's worth 2 points!
  2. Anyone can win this challenge.  I mean that it occurs to me that there are not (likely) a cadre of paper shredding experts with degrees in advanced document reconstruction that are shoe-ins for this contest.  Anyone with a methodical mind should be able to design a solution approach, test it and stand a chance to win the purse.

I have often admired DARPA's innovative challenges (like their Network Challenge -or "Red Balloon"- problem), and I decided that I would enjoy at least participating in this one.  I am thoroughly confident that I won't score even a point, but I hope to share some ideas that may prove helpful to others.

Some thoughts on tools...

I am interested in machine vision and image processing, so I decided to take this opportunity to learn more about OpenCV and apply its capabilities to Puzzle 1.   OpenCV is a very popular image-processing library with several interfaces including Python and Processing.  I develop primarily on a (aging) Mac and I found getting started with OpenCV on OSX something of a Sisyphean ordeal.  Here are some key notes I learned through a couple of discouraging hours of headaches:

  • Use homebrew to install OpenCV, rather than Cmake and compiling the library from source.  Surely more awesome ninjas can solve the myriad dependency and python version compatibility issues that crop up in compiling the OpenCV libraries from source, but for the restofus: 
    • brew install git 
    • brew update 
    • brew install opencv
  • You'll need to update your environment's path variables to allow python to locate the appropriate bindings.  This site does an excellent job walking you through the necessary shell magic.
  • Forget OpenCV entirely and use SimpleCV.  I found that in the interest of getting productive quickly (especially on a mac), SimpleCV has nailed down the process of setting up your environment easily and correctly.

SimpleCV:  Computer Vision for the Restofus.

I will be using SimpleCV for the bulk of my exploration of the DARPA challenge, and I would encourage anyone interested in trying their hand at machine vision to do the same.  In addition to providing a simple, one-and-done installation package for most environments, SimpleCV also simplifies the native OpenCV python interface.  You can access Camera streams and Kinect devices, implement common image processing algos etc. in fewer, more readable lines of code.

I love the snip on their homepage that captures a frame from your webcam in 3 lines of code.

A few notes I learned that may help you:

  • Use the appropriate single "super" package installer to get yourself the required libraries and the SimpleCV stable code base
  • Then download the latest Git commit and re-install SimpleCV.  The latest Git revision has Blob detection algorithms baked into the SimpleCV interface, and allows you to by-pass dealing with the installation of the cvblob-python project.
  • Give the SimpleCV Interactive Shell a try.  It might get in the way of more experienced developers, but for newbies its a great way to quickly experiment with key features.
  • Browse the source.  The documentation has gaps, and I found myself understanding how to operate the image processing capabilities much more clearly as I started looking at the classes directly.
  • Read the OpenCV docs.  Remember that SimpleCV is a wrapper to OpenCV (and some other libs as well).  So it will help you to understand the native libraries, while taking advantage of the scaffolding that SimpleCV can provide.

It took me mere minutes to get my SimpleCV environment setup to a point that I could quickly and intuitively load in an image and begin to run some basic operations.  

The following is my line of thinking on the first puzzle.

Puzzle 1:  "What is the appropriate title being referenced?":

The challenge image looks basically like a jigsaw puzzle (albeit one where the edges on the pieces are more subtle).  My main objective is to achieve just enough reconstruction to extract the intel.  No extra points for the completeness of the reconstruction, so I am not concerned about the tiny little scraps at the bottom right.

My first thought about this puzzle is to get a hold of my pieces.  That is, I'd like to be able to run match algorithms against a particular scrap, so I need to be able to identify, extract and manipulate each bit of the paper.  Enter Blob Detection.  The complicated mathematical description aside, I am interested in clumping pixels together that look like they are part of the same entity.  Given that the background is a nice distinctive Pepto pink, this ought to be a doable excerise.  And thanks to SimpleCV's findBlobs(), you get the following image pretty quickly:

 Detected blobs are highlighted here in white. Note that the source had the bits of scrap in yellow, and thie image shows that we can locate them all.

Neato.  Well, now that we have the blobs, what about running some matches.  I was nearly floored to learn that SimpleCV has a terrific implementation (or rather re-interpretation) of OpenCV's matchShapes().  I am struggling to understand the technical details, but from what I can gather, this method compares two contours and reports back a value that indicates the stregth of the match.  The method is based on comparing Hu Moments which express information about a contour and is not sensitive to rotation, scale or reflection!!!!  

And so that's my approach to Puzzle 1:  loop through each "puzzle shred" and compare it to every other shred in order to find its match.  Then perform the same loop with each pair and so on until we reduce a couple of hundred shreds into a handful.

Of course this is easier said than done, but at least I can break down the approach.

  1. Extract the shreds as distinct, contiguous blobs
    • So, something you will notice if you simply run findBlobs() using SimpleCV is that it is fairly indiscriminate.  That is, while the image above looks like each shred is highlighted individually, in fact, each is fact composed of several "sub blobs".
    • We (I) first need to clean the image with a few passes of a "morphological operation".  Probably a morphClose() which should basically (hopefully) give us more robust blobs.
  2. Adapt the blob match to our needs
    • In its current format, simply running the blob match between two contours would really only tell you whether (or rather to what degree) they are identical.  We are after something close, but not quite an exact match.  I'll need a more clever math wiz than I to adapt the algorithm to make that happen.
  3. Profit!!!

 

So with that.  Back to more thinking.

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>