In in Monday’s Digital History course we explored the various ways that text analysis and topic modelling are being used to understand and interpret trends in history in a new way. Programs like Google’s Ngram are allowing us to use simple words to show their usage through out time in 4% of all published works. It’s very intriguing to see the results! However, there are a number of drawbacks to this type of work including it being restricted to published books (no newspapers, letters, diaries etc.) and the graphs that show the usage of these words cannot really help us draw many conclusions other than the literal usage of the words. Still, many try to draw educated guesses from the data- including things like when censorship was imposed- and I have to say many were convincing! Still, it’s not always accurate as you can see with the famous example that Ian Milligan (a guest speaker in a seminar) showed us with the words “fuck” and “suck”. The Google Ngram shows the existence of the word “fuck” far before we would expect it but with thought put into it, one can fairly safely guess that this is because the letter s in old script writing looks like an f. These are the neat things but also drawbacks you can learn from a program like this!
As a blog topic for this week, we were asked to look deeper into topic modelling with two mediums- Tagxedo and Voyant Tools- each of which you can paste a piece of your own text and a word cloud will be automatically created. From this we are to evaluate how this could be used in a professional setting- I chose museums as an example.
I pasted my blog post on the Tillsonburg tobacco industry into each and both created a really cool-looking word cloud although I couldn’t help but notice that Voyant Tools placed the most emphasis on the word ‘the’- a setting that Tagxedo was automatically set against. However, I was able to change the settings for Voyant to emit these pretty useless-type words. Below you can see what came from each.So at first I drew a total blank-besides as a visual for exhibits how could these word clouds possibly be useful in a museum? The other way could be to focus exhibits quickly and efficiently. By entering pieces of text about a historic site and using either tool to show you what is used most frequently- one may be able to pick out important parts of the historic sites that come up time and again and base an exhibit around that. Or maybe if an exhibit idea has already been created, text on that specific topic can be imput and what is important will be drawn out. For example, if a museum was putting on an exhibit on about the history of tobacco, they could use a world cloud to figure out important places influenced by the tobacco industry, important tobacco boards and important positions like chairmen in said boards. In this way, word clouds can be used as a focusing tool.Right now the possibilities for text analysis seem limited to me. It’s hard to gather any real, hardline conclusions from it but it is interesting and slightly useful to see patterns and trends that appear in text. With that said, I think they will become more useful as context is taken into consideration. Although Voyant Tools has begun this by searching the three words before and after a key word I think context consideration will need to be taken beyond this to really add value to diginomics.