jhwhistory

Just another WordPress.com site

Critique of Google’s Ngram Viewer:

Google Ngram provides its users with the ability to create a graph showing the frequency of words in a corpus of Google’s digitised books. It allows the user to search around 5million books for a particular word or phrase. Letting the users’ search books from 1500 to 2008, letting you narrow this search by inputting a period year by year of which to search. After doing this it produces a graph which shows the frequency of each word or phrase in the time period searched. You can enter multiple terms to compare the frequencies of up to five phrases or words. It also gives you the option of choosing which corpus to search, ranging from different languages such as English, Chinese, German or English Fiction. The website is very easy to use as it takes you straight to the inputting stage without the need to scroll through or search for anything. There is a detailed page labelled ‘About Google Books Ngram Viewer’ which gives a good insight as to how the website works and also provides us with a view to its limitations. There has been some debate as to how these limitations affect the potential of this tool. One of the first and most important limitations would be that the Viewer is only able to give us an idea of frequency and “there is no way of telling how words were used, in what context or in what form.”[1] This shows us that when using the information in an historical context it would be very difficult to form a thesis as we have no way of knowing how words were used. One of the biggest problems being that language changes over time and this could lead to incorrect interpretations of the data produced. In the same vein it has been shown that the Optical Character Recognition system used by the Ngram Viewer also has its problems. This being mainly with the medial or long ‘s’ as this has been shown through the analysis of the words ‘fuck’ and ‘suck’. Leading us to believe there is in fact language trends with the use of these words, when in fact it is only the incompetence of the OCR to recognise the difference between a medial ‘s’ and an ‘f. Dan Cohen has also highlighted limitations, “the biggest problem with the viewer and the data is that you cannot seamlessly move from distant reading to close reading”[2]. This relates to the inability to move from looking at the graph and the ‘big’ data, to looking at individual uses of the words in the individual books, giving us even less ability to consider the usage or context of words. The tool is one which has a potential to be used in a wider context and when even larger datasets can be produced this will be even more powerful, with the greatest benefits being seen in the study of language. However the information produced by the viewer can only really be put into context and used in an historical academic sense when paired with wider studying and reading so until it is combined with an ability to measure how words are used and the OCR is improved it will remain a supporting tool in the study of History.

 

Bibliography 

Diski J., ‘Short cuts’, http://www.lrb.co.uk/v33/n02/jenny-diski/short-cuts, consulted on 14/03/2012

 

Cohen D, ‘Initial thoughts on the Google Books Ngram Viewer and datasets’, http://www.dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/, consulted on 14/03/2012


[1] J Diski, ‘Short cuts’, http://www.lrb.co.uk/v33/n02/jenny-diski/short-cuts, consulted on 14/03/2012

[2] D Cohen, ‘Initial thoughts on the Google Books Ngram Viewer and datasets’, http://www.dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/, consulted on 14/03/2012

Advertisements

Single Post Navigation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: