The UK media landscape is divided between print and digital, but this schism rarely affects content in any meaningful sense. For many publishers, articles appear in both print and online editions with very minimal, if any, changes. However, certain publications have made a point of separating print editorial from their digital operations, in order to effectively capitalise on two very different markets. The Daily Mail and its digital counterpart Mail Online are perhaps the best examples of this.
In order to explore the scale of the changes between the Mail’s print and online editions, Signal AI performed a statistical comparison of the language used. To get a representative sample of data, we pulled 100 stories per day from both online and print sources throughout the month of June, giving us a corpus of 6,000 articles to analyse. We then used chi-squared (\(\chi^2\)) analysis to see just how different the online and print language is. (The full table of results is below.)
For the uninitiated, chi-squared is a way of determining the relationship between two groups through highlighting difference. When analysing text, one group’s high-scoring entries will be the terms which are most different from the other group: it indicates both a high presence in one group as well as a relatively low presence in the other. More formally, the Pearson’s chi-squared statistic compares the observed and expected frequencies of an event; in our analysis, we treat word frequencies in one source as the ‘observed’ and in the other as the ‘expected’ values.
Different terms
The differences between the two groups are stark. Interestingly, the top entry for print is the word yesterday. Here, the 24-hour news cycle is particularly visible: the relative term yesterday will become incorrect if the article is viewed more than a day after being written. Instead, absolute terms like Wednesday, Tuesday and Friday are far more present online than in print. Print newspapers spend a lot of time referring to the previous day, it seems, with the clear context provided by the current date at the top of every page. (The word today is notable for its absence from both lists - this indicates that it is used in roughly equivalent amounts both in print and online.)
The subject matter of print and digital is also very different. Looking at the two lists more generally, high-scoring print terms are redolent of the establishment, including Tory, MPs, Sir, royal and Commons. The two lists put Mail Online’s strategy into sharp focus; top-scoring terms include brunette, model, blonde, beauty and wore. The different emphases of the Mail’s online and print publications are laid out plainly here, with a strong tendency towards clickbait to maximise online ad revenue.
This is perhaps most relevant when we look at how the media itself is represented in the two datasets. For the Daily Mail, the BBC, BBC1 and BBC2 are high on the list; meanwhile, high-scoring terms for Mail Online include Instagram, shared, video, captioned, posted and image. The presence and significance of social networks and multimedia is scored throughout the language of Mail Online, and Signal AI’s Chi-Squared data quantifies the extent to which content like this is absent from the printed Daily Mail.
Different audiences
Newspaper proprietors are becoming increasingly reliant on digital, both as a revenue stream and, more generally, to remain relevant to audiences in the 21st century. However, retaining a ‘traditional’ print brand is also an important part of many organisations’ business models. This poses its own problem: how far publications should distinguish between the content they provide to increasingly different audiences. As news presentation becomes ever more personalised, will every reader become an audience of one, shown their own filtered content? The Mail shows us how far one publisher will go in separating out print and digital, with this difference being quantified in our Chi-squared analysis. Will other newspapers look enviously at the Mail’s readership figures and follow suit with this approach? Or will the Mail’s strategy remain an outlier?
Top 50 terms for each source
Daily Mail (print) | \(\chi^2\) | Mail Online | \(\chi^2\) |
---|---|---|---|
yesterday | 168.0 | June | 222.4 |
Scotland | 38.5 | editing | 185.2 |
pc | 37.6 | video | 182.4 |
Scottish | 33.6 | told | 147.9 |
Tory | 27.6 | Wednesday | 143.0 |
Glasgow | 26.8 | reporting | 135.8 |
9:00 PM | 26.4 | 131.1 | |
UK | 24.0 | scroll | 128.3 |
EU | 23.2 | pictured | 128.1 |
MP | 21.9 | Thursday | 123.4 |
co | 19.6 | pair | 122.6 |
BBC | 19.4 | Tuesday | 122.0 |
Scotland’s | 19.2 | shared | 121.9 |
MPs | 18.4 | Friday | 110.0 |
Wimbledon | 17.8 | star | 105.3 |
bbc2 | 17.8 | percent | 103.5 |
Edinburgh | 17.3 | according | 99.6 |
sir | 16.2 | photo | 90.7 |
miss | 15.8 | added | 90.2 |
bbc1 | 15.0 | beauty | 87.9 |
Cameron | 14.6 | posted | 84.3 |
10:00 PM | 14.4 | earlier | 81.7 |
ch4 | 14.1 | afp | 81.1 |
England | 14.0 | wore | 79.4 |
SNP | 13.5 | reported | 77.3 |
royal | 13.4 | black | 76.5 |
British | 13.4 | media | 75.0 |
commons | 12.2 | Monday | 74.4 |
ch5 | 12.0 | white | 72.9 |
tennis | 11.8 | looked | 72.7 |
Jeremy | 11.7 | Sunday | 71.7 |
Scots | 11.7 | appeared | 69.7 |
bbc4 | 11.5 | locks | 69.5 |
pensions | 11.4 | blonde | 67.3 |
Labour | 11.1 | seen | 66.9 |
Britain | 11.1 | Sydney | 64.8 |
firm | 10.9 | wrote | 64.5 |
BHS | 10.8 | snap | 63.2 |
8:00 PM | 10.8 | following | 63.1 |
eighties | 10.6 | statement | 61.9 |
effective | 10.5 | image | 60.6 |
pages | 10.4 | incident | 60.5 |
ie | 10.3 | captioned | 60.4 |
pension | 10.3 | brunette | 60.4 |
patients | 10.0 | 2015 | 59.7 |
Cambridge | 9.9 | model | 57.7 |
masterclass | 9.8 | Australia | 56.0 |
Labour’s | 9.8 | file | 55.7 |
Donovan | 9.6 | morning | 55.5 |