Finding Waldo, a flag on the moon and multiple choice tests, with R
By arthur charpentier on Tuesday, May 15 2012, 12:58 - Nice graphs - Permalink
I have to admit, first, that finding Waldo has been a difficult task. And I did not succeed. Neither could I correctly spot his shirt (because actually, it was what I was looking for). You know, that red-and-white striped shirt. I guess it should have been possible to look for Waldo's face (assuming that his face does not change) but I still have problems with size factor (and resolution issues too). The problem is not that simple. At the http://mlsp2009.conwiz.dk/ conference, a price was offered for writing an algorithm in Matlab. And one can even find Mathematica codes online. But most of the those algorithms are based on the idea that we look for similarities with Waldo's face, as described in problem 3 on http://www1.cs.columbia.edu/~blake/'s webpage. You can find papers on that problem, e.g. Friencly & Kwan (2009) (based on statistical techniques, but Waldo is here a pretext to discuss other issues actually), or more recently (but more complex) Garg et al. (2011) on matching people in images of crowds.
What about codes in R ? On http://stackoverflow.com/, some ideas can be found (and thank Robert Hijmans for his help on his package). So let us try here to do something, on our own. Consider the following picture,

> library(raster) > waldo=brick(system.file("DepartmentStoreW.grd", + package="raster")) > waldo class : RasterBrick dimensions : 768, 1024, 786432, 3 (nrow,ncol,ncell,nlayer) resolution : 1, 1 (x, y) extent : 0, 1024, 0, 768 (xmin, xmax, ymin, ymax) coord. ref. : NA values : C:\R\win-library\raster\DepartmentStoreW.grd min values : 0 0 0 max values : 255 255 255
> plot(waldo,useRaster=FALSE)
| |
# white component white = min(waldo[[1]] , waldo[[2]] , waldo[[3]])>220 focalswhite = focal(white, w=3, fun=mean) plot(focalswhite,useRaster=FALSE) # red component red = (waldo[[1]]>150)&(max( waldo[[2]] , waldo[[3]])<90) focalsred = focal(red, w=3, fun=mean) plot(focalsred,useRaster=FALSE)i.e. here we have the graphs below, with the white regions, and the red ones,
|
|
# striped component striped = red; n=length(values(striped)); h=5 values(striped)=0 values(striped)[(h+1):(n-h)]=(values(red)[1:(n-2*h)]== TRUE)&(values(red)[(2*h+1):n]==TRUE) focalsstriped = focal(striped, w=3, fun=mean) plot(focalsstriped,useRaster=FALSE)So here, we can easily spot Waldo, i.e. the guy with the red-white stripes (with two different sets of thresholds for the RGB decomposition)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
exam = stack("C:\\Users\\exam-blank.png") red = (exam[[1]]>150)&(max( exam[[2]] , exam[[3]])<150) focalsred = focal(red, w=3, fun=mean) plot(focalsred,useRaster=FALSE) exam = stack("C:\\Users\\exam-filled.png") red = (exam[[1]]>150)&(max( exam[[2]] , exam[[3]])<150) focalsred = focal(red, w=3, fun=mean) plot(focalsred,useRaster=FALSE)
|
|
First, we have to identify areas where students have to fill the blanks. So in the template, identify black boxes, and get the coordinates (here manually)
exam = stack("C:\\Users\\exam-blank.png") black = max( exam[[1]] ,exam[[2]] , exam[[3]])<50 focalsblack = focal(black, w=3, fun=mean) plot(focalsblack,useRaster=FALSE) correct=locator(20) coordinates=locator(20) X1=c(73,115,156,199,239) X2=c(386,428.9,471,510,554) Y=c(601,536,470,405,341,276,210,145,79,15) LISTX=c(rep(X1,each=10),rep(X2,each=10)) LISTY=rep(Y,10) points(LISTX,LISTY,pch=16,col="blue")
|
|
CORRECTX=c(X1[c(2,4,1,3,1,1,4,5,2,2)], X2[c(2,3,4,2,1,1,1,2,5,5)]) CORRECTY=c(Y,Y) points(CORRECTX, CORRECTY,pch=16,col="red",cex=1.3) UNCORRECTX=c(X1[rep(1:5,10)[-(c(2,4,1,3,1,1,4,5,2,2) +seq(0,length=10,by=5))]], X2[rep(1:5,10)[-(c(2,3,4,2,1,1,1,2,5,5) +seq(0,length=10,by=5))]]) UNCORRECTY=c(rep(Y,each=4),rep(Y,each=4))Now, let us get back on red areas in the form filled by the student, identified earlier,
exam = stack("C:\\Users\\exam-filled.png") red = (exam[[1]]>150)&(max( exam[[2]] , exam[[3]])<150) focalsred = focal(red, w=5, fun=mean)
|
|
ind=which(values(focalsred)>.3) yind=750-trunc(ind/610) xind=ind-trunc(ind/610)*610 points(xind,yind,pch=19,cex=.4,col="blue") points(CORRECTX, CORRECTY,pch=1, col="red",cex=1.5,lwd=1.5)Crosses on the graph on the right below are the answers identified as correct (here 13),
> icorrect=values(red)[(750-CORRECTY)* + 610+(CORRECTX)] > points(CORRECTX[icorrect], CORRECTY[icorrect], + pch=4,col="black",cex=1.5,lwd=1.5) > sum(icorrect) [1] 13
|
|
|
> iuncorrect=values(red)[(750-UNCORRECTY)*610+ + (UNCORRECTX)] > sum(iuncorrect) [1] 4So I have not been able to find Waldo, but I least, that will probably save me hours next time I have to mark exams...






































Comments
Unfortunately R is not very good at image processing. I wish it were, so I could finally get rid of matlab. There's no reason, I believe, but the fact that people have not been working enough on the signal processing aspects for R.
Using matlab and R, I eventually get mixed up....
Nice post, Arthur!
Fantastique post, j'aimerais essayer ces techniques pour compter les têtes aux manifestation.
What about getting several images of Waldo, and running a classifier on it?
I can't believe it ! Tonight, my daughter came with Waldo's book, asking for help. The first one was easy, it took us a few seconds to spot Waldo ! Then, came another page: the one above, that I know by heart since I have spent hours on it, with my computer ! Except that Waldo was not where he was supposed to be... as I remember, he was supposed to be in the upper corner, on the left, next to the table, and the girl with red tights
Well, belive me, in my daughter's book, he was not !
Waldo was gone ! I just can't belive it !