Digital Studies: Text Analysis

What is text analysis?

"Text Analysis is about parsing texts in order to extract machine-readable facts from them. The purpose of Text Analysis is to create structured data out of free text content. The process can be thought of as slicing and dicing heaps of unstructured, heterogeneous documents into easy-to-manage and interpret data pieces" (Ontotext). For a deeper look, head to Ontotext

DataBasic

DataBasic

A light introduction into what you can do with different text analysis tools.

"A suite of easy-to-use web tools for beginners that introduce concepts of working with data. WordCounter analyzes your text and tells you the most common words and phrases. WTFcsv tells you WTF is going on with your .csv file. SameDiff compares two or more text files and tells you how similar or different they are" (DataBasic). You can paste text, upload a file, or paste a link to work with.

Lexos {Loader}

Lexos {Loader} 

An "online tool ... to "scrub" (clean) your text(s), cut a text(s) into various size chunks, manage chunks and chunk sets, and choose from a suite of analysis tools for investigating those texts. Functionality includes building dendrograms [tree diagrams], making graphs of rolling averages of word frequencies or ratios of words or letters, and playing with visualizations of word frequencies including word clouds and bubble visualizations" (Lexos).

Overview

Overview

An "open-source web-based tool designed originally for journalists needing to sort large numbers of stories automatically and cluster them by subject/topic; includes visualization and reading interface; allows for import of documents in PDF, Word, HTML and text" (DH Toychest).

Bookworm

Bookworm

"HathiTrust+Bookworm (HT+BW) visualizes word trends in 13.7 million works held by HathiTrust. It enables scholars to discover new textual use patterns across the entire corpus, including in-copyright and public domain volumes" (HTRC docs).

Also, take a look at spin-offs from this tool:

Google Ngram Viewer

Google NGRAM Viewer

"Search for and visualize trends of words and phrases in the Google Books corpus; includes ability to focus on parts of the corpus [e.g., "American English," "English Fiction"] and to use a variety of Boolean and other search operators" (DH Toychest).

AntConc

AntConc

AntConc is for more in-depth text analysis and for people who are more knowledgeable of text analysis.

A general purpose program for analyzing electronic texts (corpus linguistics) in order to find and reveal patterns in language. It creates word lists, concordances, clusters, N-Grams, keyword lists and collocations. 

Subjects: Digital Studies
  • Last Updated: Oct 26, 2023 10:44 AM
  • URL: https://libguides.gvsu.edu/DS