R tutorial

R tutorial

R, the statistical programming environment, is an invaluable tool for any analyst or data-visualization specialist.  It’s not often that we can get our hands on the toys the professionals use (i.e. the New York Times), but we can with R.  It’s open source (free as in beer and as in speech).  The main draw back is that R can be difficult to learn.  The folks at statsguys.wordpress.com are working to remove this barrier by creating truly newbie-friendly tutorials that can be completed in as little as 20 minutes.  Check them out!

Another Example of Why You Should Never Trust a Data Visualization Blindly

The Council on Foreign Relations is getting a lot of social media attention for their interactive map, Vaccine Preventable Outbreaks.  It’s a super slick example of why it’s so important to look at the raw data when considering the persuasiveness of data visualizations.  The CSV download published with this map appears to draw its numbers from a variety of news headlines (read “10 diagnosed with polio”).  I can’t discern any attempt to determine overlap, which would leading to over reporting.  The map even includes an option to report a data point, in case the kid at the grocery store appears to have a touch of rubella. Surveillance is a serious business, and it’s sad to see established methodologies overlooked.  

Prominent data scientists have raised concerns about consumers’ tendency to blindly accept the message portrayed in charts and graphs.  Just remember, garbage in…


For the record, I am not commenting on vaccine policy (also a serious business).  My only point is that data consumers should use critical thinking when examining visualizations.

More Data, Less Experience

I wanted to break the blog-silence with a PBS clip on generative art (art created with an autonomous system, like a computer, a program, or a data set).  Enjoy:


The brief introduction raised some interesting questions for me.  Generative composer, Luke Dubois, wrote his score, Hard Data, using casualty statistics from the war in Iraq.  When discussing his piece, he states, “[this is] the first conflict in which we have more data than knowledge… more numbers from the war than experience of it,” [emphasis is mine].  While he’s speaking of the near universal availability of data about the war and the individual experience of the war by an all-volunteer army, I wonder if his statement also applies to our generation more broadly.

  • Are we interacting with the present in the same way that we engage with the past, through recording/documentation/data?
  • Does the speed and availability of data alter our perceptions of the passing of time? 
  • Is every moment becoming hindsight before we experience it?
  • Does this shine a light on our tendency to document our lives via social media and other online platforms?

Whew, way too meta for a Thursday…

Small Multiples for Qualitative Data

A small multiple is a series of miniature, similar graphs displayed together for easy comparison.  The term was popularized by Edward Tufte in reference to the display of quantitative information.  

However, small multiples can also be used to visualize qualitative data, allowing users to quickly perceive differences in objects or changes over time.  I’ll explore two popular forms of visualization in qualitative research: the word cloud and the social network diagram, demonstrating how these methods can benefit from a small multiple approach.

Word Clouds / Word Storms

Word cloud creators, like Wordle and Tagxedo, generate visualizations from text, giving frequently used words greater size and prominence in the swarm.  While word clouds are tools for generating high-level summaries of single documents, word storms* are useful for creating comparisons between multiple documents.  Presented together, word storms compose a powerful visualization.  This display might be used to demonstrate differences or similarities in how disparate groups discuss a given topic or how different news outlets present the same news story.

Social Network Diagram

Social network diagrams are often presented in a pre/post format to show how an intervention strengthened and expanded collaborative ties.  The concept of small multiples takes the display to the next level, allowing users to discern differences over time.  Here, I use MIT’s Immersion tool** to visualize how people are connected through emails.  The small multiples below use metadata from 9 years of my personal emails to visualize my social network as I graduate college, join the workforce, enter a career-field, plow through graduate school, and move across the country.***  Displaying the social network diagrams together in a small multiple formation allows viewers to consider each year individually, while making comparisons.

In conclusion, small multiples are effective for presenting qualitative data as well as quantitative.

* Faculty with the University of Edinburgh are publishing exciting work on word clouds.  To read more about their methods, visit: http://groups.inf.ed.ac.uk/cup/wordstorm/wordstorm.html

** Check out a video on MIT’s Immersion program here: http://vimeo.com/69464265

*** Each node represents a person in my “email life” and each tie denotes a piece of communication.  The larger the node, the greater the number of emails.

A simple diagonal line add clarity to this scatter plot.


Public perceptions of US death rates for selected causes: Comparison of public opinion survey estimates versus empirical data shows how people overestimate of rare deaths