"$d_webRawe""economics, markets/neuro-linguistic/0_neuro-linguistic notes.txt" www.BillHowell.ca 25Apr2021 https://finance.yahoo.com "$d_webRawe""economics, markets/neuro-linguistic/210425 yahoo finance news.txt >> vastly superior to Market Watch, Barrons, but a chunk of news is from Barrons >> how do I wget the page? text-copy it? 24************************24 08********08 25Apr2021 https://finance.yahoo.com/news/ +-----+ ngram analysis of stock markets Dataset is the big challenge - for now, just use yahoo finance I need to use current news - MrketWatch I'm subscribed to >> too general noise /media/bill/Dell2/Website - raw/economics, markets/neuro-linguistic/210425 MarketWatch news.txt https://finance.yahoo.com /media/bill/Dell2/Website - raw/economics, markets/neuro-linguistic/210425 yahoo finance news.txt >> vastly superior to Market Watch, Barrons what about [Barrons, Wall Street Journal]? /media/bill/Dell2/Website - raw/economics, markets/neuro-linguistic/210425 Barrons news.txt +-----+ https://www.researchgate.net/publication/346352752_Stock_Market_Prediction_using_Daily_News_Headlines Stock Market Prediction using Daily News Headlines January 2020SSRN Electronic Journal DOI:10.2139/ssrn.3685530 Ali Hassanzadeh Kalshani, University of California, Irvine Ahmad Razavi Reza Asadi We use a dataset containing news headlines and stock market index available in Kaggle. Kaggle: Daily News for Stock Market Predictions https://www.researchgate.net/deref/https%3A%2F%2Fwww.kaggle.com%2Faaron7sun%2Fstocknews%2Fdata https://www.kaggle.com/aaron7sun/stocknews/data +-----+ https://www.kaggle.com/aaron7sun/stocknews/data Actually, I prepare this dataset for students on my Deep Learning and NLP course. But I am also very happy to see kagglers play around with it. Have fun! Description: There are two channels of data provided in this dataset: News data: I crawled historical news headlines from Reddit WorldNews Channel (/r/worldnews). They are ranked by reddit users' votes, and only the top 25 headlines are considered for a single date. (Range: 2008-06-08 to 2016-07-01) Stock data: Dow Jones Industrial Average (DJIA) is used to "prove the concept". (Range: 2008-08-08 to 2016-07-01) I provided three data files in .csv format: RedditNews.csv: two columns The first column is the "date", and second column is the "news headlines". All news are ranked from top to bottom based on how hot they are. Hence, there are 25 lines for each date. DJIA_table.csv: Downloaded directly from Yahoo Finance: check out the web page for more info. CombinedNewsDJIA.csv: To make things easier for my students, I provide this combined dataset with 27 columns. The first column is "Date", the second is "Label", and the following ones are news headlines ranging from "Top1" to "Top25". To my students: I made this a binary classification task. Hence, there are only two labels: "1" when DJIA Adj Close value rose or stayed as the same; "0" when DJIA Adj Close value decreased. For task evaluation, please use data from 2008-08-08 to 2014-12-31 as Training Set, and Test Set is then the following two years data (from 2015-01-02 to 2016-07-01). This is roughly a 80%/20% split. And, of course, use AUC as the evaluation metric. To all kagglers: Please upvote this dataset if you like this idea for market prediction. If you think you coded an amazing trading algorithm, friendly advice do play safe with your own money :) Feel free to contact me if there is any question~ And, remember me when you become a millionaire :P Note: If you'd like to cite this dataset in your publications, please use: Sun, J. (2016, August). Daily News for Stock Market Prediction, Version 1. Retrieved [Date You Retrieved This Data] from https://www.kaggle.com/aaron7sun/stocknews. /media/bill/Dell2/Website - raw/economics, markets/neuro-linguistic/Aaron7sun Kaggle daily news for stock market prediction/[Combined_News_DJIA.csv, RedditNews.csv, upload_DJIA_table.csv] >> only goes to 2016 # enddoc