The file iris_data.csv
contains some data on iris photographs from unsplash.org. For each of 28 photos we have the mean luminance, chroma, and hue and the standard deviation of luminance, chroma, and hue, based on converting from RGB to polar Lab coordinates with the colorspace
package. I haven’t included the actual photo files because they’re a bit big, but here are a couple of the photos at lower resolution.
We want to classify these images of irises into two groups. As a task, this matches what we do in the classic data set; it isn’t what the photographs are for, and it is entirely pointless.
So, first, lets look at a pairs plot of the variables
iris_data<-read.csv("~/iris_data.csv",row.names=1)
pairs(iris_data[,1:6],pch=19)
The clusters aren’t all that obvious; let’s add the group variable
iris_data<-read.csv("~/iris_data.csv",row.names=1)
pairs(iris_data[,1:6],pch=19,col=ifelse(iris_data[,7],"blue","orange"))
We’re getting some separation, but not perfect separation with any pair of variables. Try principal components
pc<-princomp(iris_data[,1:6])
pairs(pc$scores[,1:4],pch=19,col=ifelse(iris_data[,7],"blue","orange"))
It looks like component 1 is good for most of the separation, but that the next three aren’t much use. Of course, principal components are unsupervised, so we can use advanced supervised learning techniques instead
l<-MASS::lda(eye~.,data=iris_data)
p<-predict(l)$posterior
hist(p[,1])
table(pred=p[,1]>0.5,true=iris_data$eye)
## true
## pred FALSE TRUE
## FALSE 1 12
## TRUE 15 0
And we get nearly perfect classification (at least in terms of apparent error)
You might be curious about which picture is the one error. Here it is:
On this special day, I want to finish by thanking the people who sent their iris photos to Unsplash
t(sapply(strsplit(rownames(iris_data),"-"),"[",1:2))
## [,1] [,2]
## [1,] "aline" "coill"
## [2,] "amanda" "dalbjorn"
## [3,] "anastasiya" "badun"
## [4,] "ashlee" "marie"
## [5,] "colin" "watts"
## [6,] "cynthia" "westbrook"
## [7,] "geronimo" "giqueaux"
## [8,] "gopinath" "mohanta"
## [9,] "ian" "talmacs"
## [10,] "james" "morden"
## [11,] "james" "morden"
## [12,] "james" "morden"
## [13,] "javier" "vinals"
## [14,] "jeffrey" "hamilton"
## [15,] "jordan" "whitfield"
## [16,] "kalea" "jerielle"
## [17,] "kalea" "jerielle"
## [18,] "lana" "svet"
## [19,] "lucy" "mason"
## [20,] "olesya" "blinskaya"
## [21,] "ria" "truter"
## [22,] "roman" "petrov"
## [23,] "sardar" "faizan"
## [24,] "sheila" "swayze"
## [25,] "v2osk" "In4XVKhYaiI"
## [26,] "veronika" "scherbik"
## [27,] "victor" "freitas"
## [28,] "yoksel" "zok"