Treat data as carefully as any source
|Data journalists Chris Giles (left) and Simon Rogers (right), with LFB chair Dave Rotchelle (centre).
GOOD DATA journalism is "not really about maths," says Guardian Datablog editor Simon Rogers, it's about "treating data as another journalistic source" and making data part of news process. Financial Times economics editor Chris Giles agrees. He believes that "data helps you to get to the truth," but in dealing with data there are "exactly the same issues as dealing with sources," such as quoting sources out of context.
Chris and Simon were speaking at the London Freelance Branch March meeting on data journalism. Simon's involvement in the field goes back to the Guardian's early "news unlimited" online operation in 2001. He started work there on September 10, and the events of the following day meant there "was a sudden need for graphics of a quarter page in a way we hadn't done so before in UK," which is how he "got into the data." The Datablog has since "gone from an eccentric part of newspaper to part of the editorial process; [we] now produce two or three pieces a day hooked to the news agenda," mostly "visualisations".
Data journalism is now so vital to the Guardian that Simon works "close to news desk" and the data team - Simon and two trainees on casual shifts - "will suggest ideas for stories" to the news team. Chris adds that the line between data journalism and news journalism is now "much less demarcated".
Recent Datablog stories have been a piece (on the day Simon spoke to LFB) on antibiotics use in the UK compared to other countries, and a highly detailed look at the demographics of the 4000 or so people sentenced in connection with the 2011 riots.
After half the courts refused to release this data, and the other half wanted to charge £15 a name, the Guardian "went to the Ministry of Justice, who "gave an instruction" to release a "data deluge in tje form of pdfs." Hard-to-process pdf-format documents are "where data goes to die... governments able to appear to publish info while making it difficult to work with" by releasing it as pdfs. From the MoJ data it became clear that 2011 riot defendants were disproportionately "from poor parts of country... treated more harshly" than usual by the courts, and included "an unusual number of minors."
Why has data journalism taken off? Free online tools to visualise data have made it much easier and cheaper. Says Simon, "barriers to entry are very low, there are lots of free tools out there" to "refine" datasets, and "you don't have to be techie to use them." Chris adds that it's "much easier to manipulate data than it was, to hold people to account... the internet gives you access to data in a way that you didn't have before."
Chris Giles came from the world of statistics and economics into journalism - "telling the story I actually found harder than doing the numbers." He won his 2012 Royal Statistical Society prize and spotted a £12 billion hole in the government's finances after uncovering flaws in the Office for Budget Responsibility "model" that it uses for financial forecasting. Chris "replicated" the OBR model and "didn't quite get their results," then "went to them privately, they were helpful... quite good about it". Chris had shown, six weeks ahead of the Chancellor's autumn statement, that "things were in a much worse state than they thought." The OBR "ditched that model this year... In a way we did George Osborne a favour."
Data is now, says Chris, "quite cool; a while ago it was lonely... like being a librarian." Data journalists "don't need a degree in stats," more important is an ability to find "things that other people want to know, (to) communicate it to people." You need to produce " great storytelling out of the manipulation of data," and "good data doesn't necessarily mean good journalism."
We now have "tools we didn't have, but treat it as carefully as ever," warns Chris. There's a temptation to get so "wound up in the visualisation" that you lose sight of common sense. Chris gave the example of Datablog's analysis showing that Twitter comments on Natwest's handling of its recent cash machine failure were generally positive - but the data-manipulating software couldn't factor in the irony and sarcasm in those comments.
Can freelances get in on the data action? Chris says the FT's responses to a freelance data journalism pitch "wouldn't be different from any other story: we'd want to check it."