Data Science: “It Is Getting Easier for the Novice To Do This Kind of Analysis”

How would data journalists benefit from text mining skills? A brief conversation with Rebecca J. Weiss, Data Analyst at Mozilla Foundation and a PhD candidate in Stanford.

Note: This is part of a series of interviews and short profiles, collected at the Mozilla Festival 2013.

At this moment media organizations around the world are fighting for talented professionals with a new type of know-how: Data analysis and data visualization. Along with other buzzwords like „big data", „data-driven journalism" the ability to search, structure and mine large heaps of data or documents is a much sought-after ability.

We spoke to Rebecca J. Weiss, who ran a workshop at the Mozilla Festival in London, providing insights on how to use software and techniques to find relevant information. („How to work with text as data", London, October 2013).

Please tell us who you are and what you do.

"My name is Rebecca Weiss, I am a data analyst for the Mozilla Foundation and I am also a PhD candidate at Stanford. What I do is text analysis, which covers the aggregation of texts, machine learning, natural language processing, applied to text files to understand the context. There is a specific set of skills for data journalists to learn, as it is more and more becoming common place to find information in text files, ranging from material published by governments to corporations. And if you can learn those skills you can start to find meaningful patterns in these documents."

Can you provide an example?

"One example that has been really successful and is the application of these methods: There has been an examination of congressional press releases, by a professor from Stanford. What he has done was applying models and text mining analysis to understand the expressed agenda of the congressman, whether they spoke to their constituencies or the public at large. And the power of this approach is that there is one person who starts to examine this massive collection of content and he is able to tell the story by finding meaning from this material."

How accessible are the tools you work with for journalists?

"I think the other power of text analysis is that a lot of these technologies are really mature. And the software that is out there is relatively easy to use. It is becoming easier and easier for a relative code novice to be able to do this kind of analysis. And I think that these techniques should be used for data journalism."

More information:
Contact page for Rebecca J. Weiss (Stanford University)
Rebecca Weiss (Scientific work and papers overview)

Interview/Video by Cosmin Cabulea (@pushthings4ward on Twitter)

Logo Deutsche Welle
DW Innovation