I rarely use this blog to rant. Generally because there isn't much to rant about. So consider this more of a quiet diatribe.
I've been really disappointed by the way the Android Developer Challenge has unfolded. I don't blame anyone in particular for this... Google is free to run it however they want, and I'm sure all the judges they've tapped have very busy lives without this being added to the list. Still, it's a shame to see what could have been such a great kick-start to the developer community generate so much ill-will.
This whole post is VERY inside-baseball, so if the preceding paragraph doesn't make sense, save yourself and skip the rest.
Any developer will tell you that they've gained something from the challenge, regardless of whether they've won or not. Speaking personally, it has focused and incented me to spend a lot of time learning the APIs, grokking its quirks, and getting valuable experience writing apps that behave well. All of this has made me a better programmer, and possibly helped build a stronger resume. So, in a very grade-school sense, I have "won". (In every other sense, not so much.)
That said, the motivation at the front of people's minds has been to, y'know, really win the contest, meaning a selection as one of the 50 finalists. While I would have downloaded the SDK and played around with it, I probably would not have invested as many hours as I did without the added motivation of the contest.
While the details of the contest have always been a bit hazy, Google clearly established four criteria by which applications would be judged: originality, polish, indispensability, and demonstration of Android features. These were announced early enough that developers wrote their apps with those criteria in mind, focusing on them when making design and implementation decisions.
When the deadline grew closer (after being extended), Google announced more details about how judging would work. There would be about 100 judges, drawn from OHA companies and "industry experts". Each application would be evaluated by four randomly chosen judges. Each judge would give the apps a score of 1-10 on each of the four criteria. The 100 submissions with the highest scores would then go on to a second round of judging, and the 50 of those 100 with the highest scores would be declared the winners.
On its face, this is an excellent system.
The problem is that there seems to have been a great deal of confusion over the meaning of "evaluate." Specifically, developers had assumed that the judges would follow instructions in their documentation, use the app and examine its features. Judges assumed that they needed to launch the app. As the May 5th "deadline" grew closer, more and more developers began to gripe on the official Android Challenge group. Among those who have networked applications and tracked usage, a disturbing number reported seeing judges who did little or no evaluation at all.
At first, a Google team member explained that some of these abortive accesses were probably just "spot checks," designed to see if an application was functional but not an actual evaluation per se. That quieted people for a little while, but as the deadline grew closer and rumors swirled that the 100 semifinalists had been selected, it grew into an outcry.
Speaking personally, this exact thing happened to me. Two judges logged into the application and did nothing. Another two used the app for about two minutes each, never accessing any of the significant features of the app.
It's hard to convey how dispiriting this is. After spending months of work and making sacrifices to pull a product together, learning that people aren't even really looking at it feels profoundly deflating. I think back over all the time I spent fixing bugs, polishing the user interface, thinking carefully through workflows, doing every little thing I could think of to make the app shine just a little bit more. Having all of that go to waste is a pretty awful feeling.
Again, I don't really blame anyone for this. It's Google's contest and Google's money, and they're free to do whatever they want with it.
It's a little hard for me to blame the judges, too. Let's do some math. 1,788 entries were submitted. With each needing four evaluations, that's 7,152 test runs required. Assuming about 100 judges, each one would need to run about 72 applications.
Now: The deadline was April 14th. Google spent some time making sure that they identified all the working submissions, packaged them up, identified the judges, and shipped laptops with the proper software installed. From what people online were seeing, apps didn't start being evaluated until about the 21st. If you believe the rumors, the final 100 were chosen around the 30th. That gives a grand total of 10 calendar days for testing; more realistically, 8 workdays.
Doing more math... each judge would need to test 9 applications every single day. That sounds pretty painful to begin with. Keeping in mind who these people are, though, it's different from having a dedicated QA team testing. These are all people with regular full-time jobs, taking on a volunteer effort that will not bring them any more revenue. All their work will be anonymous, and any glory will attach to Google.
So, while I remain extremely disappointed, I at least understand why, say, someone would blow through 5 submissions over their lunch break, doing a quick Blink-style gut check of how good it is.
And yet, and yet... there's an undeniable wail out there. "But I worked for MONTHS on my application! And you can only spare two minutes?"
Who knows how widespread this is, but between my experiences and those being recounted online, it seems broad.
I doubt we'll ever learn more, but the more the situation deteriorated, the more curious I became about how it actually worked. I would have loved to have seen Google's instructions to the judges; did they say, "If you read the documentation and think it's a bad app, don't bother judging it"? Was there some judges out there who did spend a long amount of time with each app? And, if so, were their marks higher or lower than those who spent very little time? Did people just run out of time and rush their final evalutions? Did Google have any second thoughts about how the contest was being run?
I'm sure I'll never know, but at least it keeps my mind busy.
Wrapping back around to the beginning... it is disappointing that it ended like this. The Challenge was and remains a great idea - it got people excited about Android, established several thousand nascent experts prior to launch, brought in content to help move handsets, and helped keep Android in the public eye after Apple announced their SDK. While the full fallout of this may not be known for a while, I'm guessing some bitter individuals will turn against Android or Google as a whole, will discourage others against competing in the second Challenge, and remove themselves from the gene pool of Android early adopters. Again, it's a shame, because the Challenge was such a great idea, and I can imagine that in the future the powers that be will point to the ADC as a reason not to hold another one.
Oh, and by the way: congratulations to all the winners! I really do look forward to seeing the applications.
UPDATE 5/13/08: Dan Morrill explained the ADC judging process in a blog post today. I think this is a great step towards addressing some of the concerns expressed by members of the Android community. It casts a bit more light on some of the more mysterious aspects of the process, and should reassure contestants that they were not unfairly penalized by the actions of individual judges.
Unfortunately, Google's process can only deal with the data it receives, and without having spied on every judge, the most burning questions will never be answered. The post does a great job of explaining why you shouldn't panic if only three judges reviewed your application; it doesn't give much solace if zero judges did.
Again, this is a learning process for everyone. I'm not sure what the solution to this problem is... as long as judging continues on a volunteer basis, we will need to prepare for rushed or incomplete evaluations.