An Interview with Nathan Yau
Nathan Yau is the creator of flowingdata.com, a blog that “explores how designers, statisticians, and computer scientists are using data to understand ourselves better” through data visualization. If you didn’t get an A+ in high school algebra, of if you barely scraped by the Intro to Math class in college, don’t worry. Yau is out to prove that otherwise intimidating information is actually understandable and interesting. It can even be beautiful.
Yau is also a statistician for the urban sensing group at the Center for Embedded Networked Sensing (CENS) at UCLA, a crew of smarty pants who are currently developing the Personal Environmental Impact Report (PEIR), an “online tool that allows you to use your mobile phone to explore and share how you impact the environment and how the environment impacts you.” You can prove you’re a nerd by adding it to your Facebook page.
Yau is an important part of a growing trend to make relevant and important data available to people who would otherwise avoid thumbing through pages of scientific data. I interviewed him via email (of course), after I had a chance to interact with PEIR during the Wired Festival in Chicago’s Millennium park last fall.
Natalie Edwards: Why do you think data visualization is gaining in popularity?
Nathan Yau: I think the main reason is that there’s just more data available, and as a result, people want to understand it. Not everyone knows how to interpret all those numbers though. Visualization lets people see (and hopefully understand) their data without any formal statistical training. You don’t have to know mean, median, and standard deviation to see a trend.
It’s hard to extract useful information from a table of numbers, but it’s easy to spot patterns in a picture. So it’s much quicker to get your point across with a visualization. People can see the pattern, non-pattern, clustering, or whatever instead of needing you to explain everything.
NE: What are the key elements to creating effective data visualization?
NY: Great data visualization tells a story. I just wrote a post on this.
NE: What is the most interesting thing that can be done with data visualization, in your opinion?
NY: There’s a lot of cool analytical stuff in the statistical visualization area, but I’m most excited about the merging of science and art. Going back to story-telling with data visualization, I really like the work of Jonathan Harris, Stamen Design, and some of the stuff by Fernanda Viegas and Martin Wattenberg from IBM. They create a lot of things that are outside the typical charts and graphs and represent the personality in the data. Data can be very dry (and boring) even though there’s a lot of information it it, but when we bring in elements of design into visualization, there’s this new dimension to the data. We can make it human, and all of sudden people listen and understand that this data is about them.
NE: Why did you start flowing data? What made you think anyone would be interested in it?
In the early goings, FlowingData was just a way for me to keep track of cool things that might be related to my dissertation. I’m a UCLA student, but I came to Buffalo with my wife, so it was sort of my way of keeping my mind moving. I wasn’t getting the same interaction with my classmates and professors that I was used to.
FlowingData has grown since then of course, but to be honest, I try not to worry too much about whether or not anyone else is interested since I’d probably still be writing even if there were no audience. It is nice to have readers though. I learn just as much from them as I do from writing.
NE: Tell me about PIER, and what you hope to achieve with it.
It’s actually PEIR. The Personal Environmental Impact Report. Ultimately, we want people to understand that the little choices they make every day, like how to get to work, can have a major effect on their surroundings. What do people do with that information? Hopefully, PEIR users are driven to make more eco-friendly choices. I think we’re past one challenge, which is inferring environmental impact and exposure from location traces. The next major challenge is to effectively communicate the data to a PEIR user, which we’ve done to some extent with the online profile and maps, but how can we compel people to action? Again, I think it comes back to telling a (green) story with visualization.
NE: What are some of the roadblocks or risks of using this kind of personal data visualization? Why do you think people would want to use it?
Since this data are really personal and private, we do our best to keep it that way. We make sure the only person who can view your data is you. For PEIR in particular, the toughest part is communicating this very scientific data to non-scientists. What does it mean to emit 20 kilograms of carbon? Is it a lot? Is it a little? This is one of the reasons we provide the social aspect of PEIR that lets you compare your impact/exposure to your friends. It provides context. PEIR sort of fits into this niche of self-surveillance where people collect data about themselves, and use it as a way to monitor their habits and then change for the better. For example, there are lots of web apps out there that let people track their weight/what they eat/exercise with the main goal of losing weight. Similarly, PEIR users might try to reduce their environmental impact. When do you suspect it will be available to a broad range of users?
Well, we’re doing a test run with 300-400 students up in San Francisco in addition to running PEIR in southern California. I can’t say we have a specific target date for country-wide usage though. One step at a time for the moment.
NE: What made you so smitten with data visualization? Was there an event of a particular piece of visualization that sparked your interest?
Ah, I remember the exact moment. My professor, and now thesis adviser, Mark Hansen, was giving a guest lecture in one of my classes. It was my very first class in graduate school on using statistics to analyze the Web. His lecture had very little to do with the course topic, so I sat there wondering why he was even talking. Then towards the middle of his lecture, he showed us data visualization examples (outside statistics). That’s when my ears perked up. Two pieces in particular – (1) Listening Post (by my adviser and Ben Rubin) and (2) Bradford Paley’s Text Arc. It was like seeing data in a completely different light. It was beautiful. I immediately Googled “data visualization” when I got home, subscribed to Andrew Vande Moere’s blog, Information Aesthetics, and I’ve been loving visualization ever since.