WallStreetBets Subreddit dataset (doi:10.48349/ASU/WLV8JA)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Entire Codebook

Document Description

Citation

Title:

WallStreetBets Subreddit dataset

Identification Number:

doi:10.48349/ASU/WLV8JA

Distributor:

ASU Library Research Data Repository

Date of Distribution:

2022-06-20

Version:

1

Bibliographic Citation:

Little, David, 2022, "WallStreetBets Subreddit dataset", https://doi.org/10.48349/ASU/WLV8JA, ASU Library Research Data Repository, V1, UNF:6:9xDozy1E0i7ZAlZqh9EXEQ== [fileUNF]

Study Description

Citation

Title:

WallStreetBets Subreddit dataset

Identification Number:

doi:10.48349/ASU/WLV8JA

Authoring Entity:

Little, David (Arizona State University)

Distributor:

ASU Library Research Data Repository

Access Authority:

Unit for Data Science & Analytics

Depositor:

Little, David

Date of Deposit:

2022-06-20

Holdings Information:

https://doi.org/10.48349/ASU/WLV8JA

Study Scope

Keywords:

Computer and Information Science, Natural language processing (Computer science), Social Networks

Topic Classification:

Social media and society

Abstract:

This dataset is an extract of the <a href="https://www.reddit.com/r/wallstreetbets/">subreddit /s/wallstreetbets</a> from the website <a href="https://www.reddit.com">Reddit.com</a>. It contains all of the non-deleted posts from all of January and February 2021. Suggested uses for this dataset is great for all types of natural language processing (NLP).

Time Period:

2021-01-01-2021-02-28

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

File Description--f10139

File: reddit_data.tab

  • Number of cases: 774

  • No. of variables per record: 11

  • Type of File: text/tab-separated-values

Notes:

UNF:6:9xDozy1E0i7ZAlZqh9EXEQ==

Variable Description

List of Variables:

Variables



f10139 Location:

Summary Statistics: Min. 2.0; Valid 774.0; StDev 640.8169495692304; Mean 899.9082687338482; Max. 2242.0;

Variable Format: numeric

Notes: UNF:6:PlmbnzfGLjKfKBuBBRsRuw==

title

f10139 Location:

Variable Format: character

Notes: UNF:6:W1W7jb9+78Z5bHhN0rszcA==

flair

f10139 Location:

Variable Format: character

Notes: UNF:6:R26QCD17QKBUVKiZMA8ulA==

score

f10139 Location:

Summary Statistics: Mean 877.1421188630474; StDev 4522.596428012204; Max. 59080.0; Valid 774.0; Min. 2.0

Variable Format: numeric

Notes: UNF:6:m/pJC7MGcqV+aEEwe6hP5w==

upvote_ratio

f10139 Location:

Summary Statistics: StDev 0.10418984579198459; Min. 0.52; Max. 1.0; Valid 774.0; Mean 0.8930878552971576

Variable Format: numeric

Notes: UNF:6:H4myxeIOdqMMir0cMzvDdg==

id

f10139 Location:

Variable Format: character

Notes: UNF:6:KRrxFHg5LoDryHD9iX+/AA==

subreddit

f10139 Location:

Variable Format: character

Notes: UNF:6:5olekrLUz6hP+08XVRU8DA==

url

f10139 Location:

Variable Format: character

Notes: UNF:6:/SwWoipgQL/RrReSQtSGmA==

num_comments

f10139 Location:

Summary Statistics: StDev 4419.635799965718; Max. 74853.0; Valid 774.0; Mean 531.574935400517; Min. 0.0

Variable Format: numeric

Notes: UNF:6:zzMCgnvZCbzsLEUpOaue8Q==

body

f10139 Location:

Variable Format: character

Notes: UNF:6:M2cEirAoJ5dK/gbC74XJxQ==

created

f10139 Location:

Variable Format: character

Notes: UNF:6:IZFL98nQEd+1+YOSyLj9bw==