I came across this study not long ago where someone used some census data to try to come up with the “Most American City”.
The data scientist put some effort into the calculation: carefully weighted factors that contribute to how much ‘Americanism’ each town has…. The towns were ranked and then put it a slick interactive graphic… you can see it here:
The ‘Most American’ city should be easy to spot right? Wait… where is it?
As you’ve probably noticed, because the size of town bubbles was determined by rank value, the very highest ranked towns receive the very tiniest sized bubble icons. Kind of illogical, to make the MOST American town the smallest bubble… Actually, it’s downright un-American, don’t you think?
Now, I don’t bring this up to pick on this study, but rather because it’s a perfect example of an very common pitfall in the data modeling process. We as analysts spend so much time marshaling the data forward towards the end result that it becomes easy to forget that every other person who interacts with our work will be experiencing it in the opposite direction. By this I mean they will starting with the end visual, then working back through your calculation methods.
This should serve as a reminder to anyone in data analysis how incredibly important it is to walk your algorithm and visual in reverse several times during development. It should make all of us think carefully about how our visualization represents our data.