Great idea, I actually had this before. I was feeding the articles into google's natural language processing API to get a sentiment index from -1 to 1 (-1 negative, 0 neutral +1 positive). The issue I was having that it wasn't really that accurate. Google NPL has trouble analysing contextual news articles. e.g. if a headline says "Bubble in cryptocurrency markets", it doesn't recognise that bubble is negative in news contexts. I have been looking for some different NPL libraries that might be better including AWS Rekognition. I'm not a data scientist so don't think I'll do very well building my own machine learning algorithm. Any suggestions of good libraries people have used for news sentiment would be greatly appreciated
Could you elaborate on news & NLP in regards to stocks?
We tried sentiment analysis in uni a few years ago and had no good results:
The idea was essentially: news says: 'stock A is great' -> it goes up shortly thereafter
We tested our algorithms against classifying Amazon reviews & Tweets by sentiment. Those are filled with sentiment and easy to detect if it is a 5 star review or a 1 star review. The news articles we parsed all had near neutral sentiment. We ended up building a classifier that could detect the news category of an article quite easily instead.
My initial idea was sparked by the Gulf spill and the subsequent dip in BP, I wanted to detect and capitalise big events like that, but the news sources we parsed always seemed to significantly lag behind the stock movement, too.
Bias detector in python for NYT articles using Vader sentiment analysis and the textacy library. Rates articles positive, negative, and neutral sentiments and how intense each one is. Although the majority of the heavy lifting is covered in the library, it's open source on github and reading the code (the actual main operations are less than 500 lines) is teaching me some cool NLP techniques, especially when it comes to rule-based analysis.
An API like that is good for an introduction, but I think you'll get better results with a machine learning approach (my pet project http://www.sentimentview.com). When I was running tests with a baseline algorithm (just matching against positive and negative keywords), I saw results from 62% accuracy with the baseline to 80% with an SVM http://blog.sentimentview.com/post/59031004797/learning-curv....
Reweighting sentiment by looking at the number of occurrences of positive and negative words in my assumed neutral corpus is a great idea :)
Will implement and report back
I've looked into using Naive Bayes but my understanding is you need labeled training documents and then I face the problem of scoring documents which introduces subjectivity compared to just counting the 'sentiment words'.
I understand complexity is needed to deal with negation ('not bad' != 'bad') but I'd imagine that the sentiment scoring process would be the same regardless of algorithm which brings us back to the problem of how to correct bias in 'word list' asymmetries
The app uses algorithms of sentiment analysis https://en.wikipedia.org/wiki/Sentiment_analysis All news classify by their positive/negative features. A lot of news sources for different languages were analyzed to automatically extract the typical "good" or "bad" patterns for classification.
My first reaction was : "well, if it's sentiment analysis, it doesn't know anything about if the news is bad or good, only about the mood of writer". But I then realized this is actually even better. I don't want to filter out news that are not good news, this would be plain denial. For a same news, an article can be written in a positive and analytical way, or trying to incite hate or bad feelings. This is the later I want to filter, and sentiment analysis is probably the perfect tool for that.
I would love to know how you built your training dataset (how good and bad labels were decided), because that's ultimately the choice that shapes the whole decision process. Maybe this should be a standard kind of page for products offering ML based filtering.
Also, thanks a lot for providing an "all stories" tab, additionally to "good stories" and "bad stories", this is something automatic curated content misses too often. I really love the "stories to read" mode of google now, which provides me stories based on my interests, that's basically the first thing I check every morning. But I always wonder, when I read news from there : "is this a thing for the whole world or just for me?". We need referential, the possibility to see the whole picture and to switch easily between "content for me" and "content for the world" to take advantage of the bubble without being harmed by it.
The app uses algorithms of sentiment analysis https://en.wikipedia.org/wiki/Sentiment_analysis
All news classify by their positive/negative features. The set of features was created with the help of deep learning technique. A lot of news sources for different languages were analyzed to automatically extract the typical "good" or "bad" patterns for classification.
2. Sentiment-tagging (positive/negative) for financial news.
Personally, I don't believe there's a lot of alpha to extract from this, because news usually lags market information. But A LOT of people believe differently, and this article shot up on relevant Google searches, way ahead of academic papers or other sources of authority.
https://towardsdatascience.com/a-new-way-to-sentiment-tag-fi...
Hey guys, my startup (Wingify) has exposed an API for determining sentiment and context from any URL or piece of text. I'll be glad if you could review it.
ContextSense was made to demonstrate our contextual targeting capabilities -- and sentiment aspect was added to avoid traditional blunders of displaying ads on pages/news on catastrophies. Best to try ContextSense with text heavy URLs. Also try it with news items (both positive and negative).
I will be happy to provide API access in case any of you is interested in trying it out.
Google Cloud Natural Language API Documentation
Natural Language API Basics
Contents
Natural Language features
Basic Natural Language requests
Specifying text content
Sentiment analysis
This document provides a guide to the basics of using the Google Cloud Natural Language API. This conceptual guide covers the types of requests you can make to the Natural Language API, how to construct those requests, and how to handle their responses. We recommend that all users of the Natural Language API read this guide and one of the associated tutorials before diving into the API itself.
Natural Language features
The Natural Language API has several methods for performing analysis and annotation on your text. Each level of analysis provides valuable information for language understanding. These methods are listed below:
Sentiment analysis inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral. Sentiment analysis is performed through the analyzeSentiment method. Currently, only English is supported for sentiment analysis.
Entity analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.) and returns information about those entities. Entity analysis is performed with the analyzeEntities method.
Syntactic analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens. Syntactic Analysis is performed with the analyzeSyntax method.
This is the result of an 8 day final project for DBC Chicago. Our team scraped over 140,000 headlines of several news agencies stretching back several years. We then took those headlines and fed them through the AlchemyAPI sentiment analysis engine to assign each one a score. They were then plotted in a couple different ways using D3.
This is far from perfect and even farther from scientific. It was done in 8 days by some passionate amateur developers. It was however a lot of fun and very interesting.
You can read about it and the team in more detail on the repo page here:
That is cool. Most of the sentiment analysis APIs I saw just give a -1 to +1 score based on positive or negative feeling but looks like Watson APIs are much more elaborate.
The two sentences in the article that tells me this will be a failure:
“ Once we have a list of related headlines, we then use GPT-3 to generate a list of facts, not opinions, that are present within each of those headline’s respective articles.”
“ We then run the article through a “bias checker,” which uses sentiment analysis to rate the sentiment polarity of the generated article.”
Very cool - would you mind elaborating on the tools you used to analyze the sentiment? Any open source ones / is the source available for reuse? Would love to apply this to other areas.
This is the result of an 8 day final project for DBC Chicago. Our team scraped over 140,000 headlines of several news agencies stretching back several years. We then took those headlines and fed them through the AlchemyAPI sentiment analysis engine to assign each one a score. They were then plotted in a couple different ways using D3.
This is far from perfect and even farther from scientific. It was done in 8 days by some passionate amateur developers. It was however a lot of fun and very interesting.
You can read about it and the team in more detail on the repo page here:
Hey guys, we (Wingify) have exposed an API for determining sentiment and context from any URL or piece of text. I'll be glad if you could review it.
ContextSense was made to demonstrate our contextual targeting capabilities -- and sentiment aspect was added to avoid traditional blunders of displaying ads on pages/news on catastrophies. Best to try with text heavy URLs. Also try it with news items (both positive and negative).
I will be happy to provide API access in case any of you is interested in trying it out.
reply