Skip to main content

Text Visualization: HathiTrust Catalog and Analytics

Welcome! This guide will guide you in how and when to use text visualization tools in your research.

What is HathiTrust?

HathiTrust is a large scale collaborative repository that includes Google Books, The Internet Archive, and digitized materials from other libraries. 

HathiTrust Analytics are computational tools available for use using the HathiTrust Digital Library. 

Link to HathiTrust Digital Library

HathiTrust Digital Library Instructions

1. You will need to create an account with HathiTrust Analytics. Use your DU email account and create a 15 character password. 

2. At the HathiTrust website, we can search for a title to create a corpus.
i. The book must be out of copyright for this exercise to work.
ii. Let’s search for something we all “should” read, but haven’t. Let’s try Crime and Punishment.

3. Click on the ‘Advanced Search’ option right below the large search bar. 
i. In the first search bar write Crime and Punishment and choose from the associated drop down menu ‘Title.’
ii. In the search box below, type ‘Fyodor F-Y-O-D-O-R’ ‘Dostoevsky D-O-S-T-O-E-V-S-K-Y’ and choose ‘Author.’

4. Make sure you choose a copy that says ‘Full View.’ There is a ‘Full View’ box you can check by the search engine that will help simplify this process. Select those and then create your collection. 

5. Create a Collection:
i. Select the book and in the grey box at the top of the result box, it says ‘START COLLECTION.’
ii. Click on the ‘Add Selected’ button. This will produce a pop up window that allows you to name this Collection. Let’s call it Crime and Punishment. iii. Now, at the top of the page, a green box will appear that says ‘Go to your Collection.’ Click on that. 

6. Your Collection Page:
i. Now that you are on the collection page, there is a drop down menu under Share that says ‘Download Metadata.’
ii. Click the first option for the TSV folder. This will prompt a download.
iii. Open the file and rename it without any dashes. Once that is done, head over to the HathiTrust Digital Analytics page.

HathiTrust Analytics/Research Center Instructions

Using Analytics:

1. From the Homepage, in the global navigation, click ‘Worksets.’ This will take you to your Worksets page.

2. Click on ‘Create a Workset’ in the top right corner. 

i. On this page, Fill out the workset name without any spaces. You can add a description if you want, and choose the file you just saved. This workset can be made available for other HathiTrust users or you can click on the ‘Private Workset’ button. Once these are done, click ‘Create Workset.’

ii. This will take you back to your workset page once it is done. This shouldn’t take long because its only one book. Once it is complete, click on the ‘Algorithms.’

Using the Algorithms: 

Now, you see there are many options. If we’re looking to distantly read a novel, we want to know the themes surrounding the novel. Often these themes are only known to us by reading the book closely and asking ourselves what the big picture of the novel is. We can do that by distantly reading by looking at topics of the novel. Again, this is not a way to ace an English exam, but to familiarize ourselves with resources so we can look at them in a different way when we closely read them. 

1. Choose the 'Topic Meandering' algorithim
2. Name the Job and select your Crime and Punishment Worksheet
3. Choose how many words you would like to be included and how many topics you want to see. Click Submit. 
4. This step often takes a couple minutes. First it will go through a ‘Staging’ process and then a ‘Queued’ process,  and finally it will be ready. Do not leave this page while this is happening by going back or looking at other worksets. Just let it do it’s thing. 
5. Once the algorithm is done, we can look at the different topics of the novel and see the associated words. 

From this