Just another WordPress.com site

Google Ngram: Useful or Useless?

One of the newest editions to Google brand, is the word counting tool Ngram, it has the ability to search their database of books  and tell us the frequency in which words appear. It is able to tell us how many times per thousand a single word appears, now this all seems like hocus pocus, and it most probably is but what interests historians is how this can be used with academic research. The program is very entertaining, it provides you with that “hmm interesting” moment. It is also able to process huge amounts of data which would take a human a life time to gather, but how useful is this really. I can see how these programs can be used by marketeers and other such professions, they would be able to recognise how words may be perceived or how recognisable or popular certain words may be. To as historian though, in its current form I am unable to see how it can be considered academically useful, most of the time I search terms in relation to current historical projects, I only end up reaffirming what I could predict by knowing a few basic facts, for instance we can see that the Great War only became the First World War in the 1940’s and this helps us understand when people began to see the Second conflict becoming a World War, but this is nothing new. We do not need to count the words in a book to see changes in people’s perceptions, we can gather this information from more reliable sources.

World War, Great War search

Google Ngram also has its drawbacks in its reliability, The OCR system used has some flaws which cannot be ignored. In the lecture covering the new tool, the Lecturer uses the example of “Fuck” and “Suck”, showing us a graph which looks to show the word “Fuck” being used relatively frequently in pre-1900s literature, before dying out in 1850 or so and coming back into use in 1960, however what we are really shown is the Ngram  OCR system  misjudging the use of the long “F” instead of an “S” in 19th century literature. This highlights how the OCR can in fact misinform us and make us see a pattern or shift in language which has a completely different motive to what we could presuppose. Another drawback would be that it only uses the written word, and in this sense when looking at 19th century literature, we are only seeing what an elite part of the population has read. There was still low literacy rates and so it would be difficult to generalise shifts in ideas or changes in language which encapsulate the total population. For these reasons I feel the Google Ngram reader in its current form has minimal uses for historians, and until the OCR is reformed and until we can refine searching or create a phrase searching system then for academic historians i see it being nothing more than another form of procrastination. I mean don’t get me wrong, the system is amazing, every good idea has to start somewhere and this could be the transition phase for a project which could change the way we look at books and literature. However until we really find an essential use for this tool it remains a toy thing for Google, who seem to create things just because they have the money and brain power to do it. They are even in the process of producing a driver-less car, believe it or not which highlights the diverse nature of their business, it seems they are becoming a multi-national with a proverbial ‘finger’ in every pie. The success of this project is yet to convince me, I feel a more in-depth look at how it works and what others have been able to deduce from these graphs and figures may help me change my mind.



Single Post Navigation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: