<?xml version='1.0' encoding='UTF-8'?><metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns="http://dublincore.org/documents/dcmi-terms/"><dcterms:title>Mary Wollstonecraft and Influence - Preprocessed Data</dcterms:title><dcterms:identifier>https://doi.org/10.48349/ASU/GPDIR5</dcterms:identifier><dcterms:creator>Caddy, Scott</dcterms:creator><dcterms:creator>Simeone, Michael</dcterms:creator><dcterms:publisher>ASU Library Research Data Repository</dcterms:publisher><dcterms:issued>2020-11-05</dcterms:issued><dcterms:modified>2020-11-06T22:35:43Z</dcterms:modified><dcterms:description>Preprocessing steps for this topic modeling are laid out in this dataset. It contains a stopword list, additions made to the stop word list, plus steps taken to clean each text within the corpus.

Original stopword list obtained at GitHub: &lt;a href="https://github.com/fozziethebeat/S-Space/blob/master/data/english-stop-words-large.txt">https://github.com/fozziethebeat/S-Space/blob/master/data/english-stop-words-large.txt &lt;/a>

For this project, preprocessing was a much less straightforward step than the collocate/concordance project from Fall 2017. Most of the preprocessing was done by myself [Scott Caddy] via SublimeText. Segmenting of items in the corpus was done via the TopicModelingTool settings. What I have here is a step-by-step process attempted over several weeks as I learned more about the TopicModelingTool, topic modeling, and how to interpret the results.

A large part of my preprocessing came from doing topic modeling "runs" and modying my stopword list according to results. Hopefully, this component can show how I continued to hone preprocessing, based on feedback from my peers, instructor, and reading papers on topic modeling.</dcterms:description><dcterms:subject>Arts and Humanities</dcterms:subject><dcterms:subject>Computer and Information Science</dcterms:subject><dcterms:isReferencedBy>Caddy, S., &amp; Simeone, M. (2020, October 31). Preprocessed Data. Retrieved from osf.io/brx2q, url, https://osf.io/brx2q/</dcterms:isReferencedBy><dcterms:date>2018-04-19</dcterms:date><dcterms:contributor>Harp, Matthew</dcterms:contributor><dcterms:dateSubmitted>2020-11-05</dcterms:dateSubmitted><dcterms:type>textual data</dcterms:type><dcterms:license>CC0 1.0</dcterms:license></metadata>