Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Great idea, I actually had this before. I was feeding the articles into google's natural language processing API to get a sentiment index from -1 to 1 (-1 negative, 0 neutral +1 positive). The issue I was having that it wasn't really that accurate. Google NPL has trouble analysing contextual news articles. e.g. if a headline says "Bubble in cryptocurrency markets", it doesn't recognise that bubble is negative in news contexts. I have been looking for some different NPL libraries that might be better including AWS Rekognition. I'm not a data scientist so don't think I'll do very well building my own machine learning algorithm. Any suggestions of good libraries people have used for news sentiment would be greatly appreciated


sort by: page size:

Could you elaborate on news & NLP in regards to stocks?

We tried sentiment analysis in uni a few years ago and had no good results:

The idea was essentially: news says: 'stock A is great' -> it goes up shortly thereafter

We tested our algorithms against classifying Amazon reviews & Tweets by sentiment. Those are filled with sentiment and easy to detect if it is a 5 star review or a 1 star review. The news articles we parsed all had near neutral sentiment. We ended up building a classifier that could detect the news category of an article quite easily instead.

My initial idea was sparked by the Gulf spill and the subsequent dip in BP, I wanted to detect and capitalise big events like that, but the news sources we parsed always seemed to significantly lag behind the stock movement, too.


Bias detector in python for NYT articles using Vader sentiment analysis and the textacy library. Rates articles positive, negative, and neutral sentiments and how intense each one is. Although the majority of the heavy lifting is covered in the library, it's open source on github and reading the code (the actual main operations are less than 500 lines) is teaching me some cool NLP techniques, especially when it comes to rule-based analysis.

An API like that is good for an introduction, but I think you'll get better results with a machine learning approach (my pet project http://www.sentimentview.com). When I was running tests with a baseline algorithm (just matching against positive and negative keywords), I saw results from 62% accuracy with the baseline to 80% with an SVM http://blog.sentimentview.com/post/59031004797/learning-curv....

Reweighting sentiment by looking at the number of occurrences of positive and negative words in my assumed neutral corpus is a great idea :)

Will implement and report back

I've looked into using Naive Bayes but my understanding is you need labeled training documents and then I face the problem of scoring documents which introduces subjectivity compared to just counting the 'sentiment words'.

I understand complexity is needed to deal with negation ('not bad' != 'bad') but I'd imagine that the sentiment scoring process would be the same regardless of algorithm which brings us back to the problem of how to correct bias in 'word list' asymmetries


The app uses algorithms of sentiment analysis https://en.wikipedia.org/wiki/Sentiment_analysis All news classify by their positive/negative features. A lot of news sources for different languages were analyzed to automatically extract the typical "good" or "bad" patterns for classification.

Great job, congratulations!

My first reaction was : "well, if it's sentiment analysis, it doesn't know anything about if the news is bad or good, only about the mood of writer". But I then realized this is actually even better. I don't want to filter out news that are not good news, this would be plain denial. For a same news, an article can be written in a positive and analytical way, or trying to incite hate or bad feelings. This is the later I want to filter, and sentiment analysis is probably the perfect tool for that.

I would love to know how you built your training dataset (how good and bad labels were decided), because that's ultimately the choice that shapes the whole decision process. Maybe this should be a standard kind of page for products offering ML based filtering.

Also, thanks a lot for providing an "all stories" tab, additionally to "good stories" and "bad stories", this is something automatic curated content misses too often. I really love the "stories to read" mode of google now, which provides me stories based on my interests, that's basically the first thing I check every morning. But I always wonder, when I read news from there : "is this a thing for the whole world or just for me?". We need referential, the possibility to see the whole picture and to switch easily between "content for me" and "content for the world" to take advantage of the bubble without being harmed by it.


The app uses algorithms of sentiment analysis https://en.wikipedia.org/wiki/Sentiment_analysis All news classify by their positive/negative features. The set of features was created with the help of deep learning technique. A lot of news sources for different languages were analyzed to automatically extract the typical "good" or "bad" patterns for classification.

I built some NLP infrastructure, as part of a larger project I'm working on. Two bits I was fairly pleased with were

1. Something that takes a line-chart and turns into a word narrative of what the graph is describing. https://towardsdatascience.com/financial-storytelling-using-...

2. Sentiment-tagging (positive/negative) for financial news. Personally, I don't believe there's a lot of alpha to extract from this, because news usually lags market information. But A LOT of people believe differently, and this article shot up on relevant Google searches, way ahead of academic papers or other sources of authority. https://towardsdatascience.com/a-new-way-to-sentiment-tag-fi...


There already is a project trying to use NLP to detect the sentiment of scientific citations: https://scite.ai/

This is a really cool idea. Would be really interesting to run it through a sentiment analysis engine as well.

Hey guys, my startup (Wingify) has exposed an API for determining sentiment and context from any URL or piece of text. I'll be glad if you could review it.

ContextSense was made to demonstrate our contextual targeting capabilities -- and sentiment aspect was added to avoid traditional blunders of displaying ads on pages/news on catastrophies. Best to try ContextSense with text heavy URLs. Also try it with news items (both positive and negative).

I will be happy to provide API access in case any of you is interested in trying it out.


Which one are you trying to accomplish?

-Sentiment analysis

-Syntactic analysis

-Entity analysis

Google Cloud Natural Language API Documentation Natural Language API Basics

Contents Natural Language features Basic Natural Language requests Specifying text content Sentiment analysis

This document provides a guide to the basics of using the Google Cloud Natural Language API. This conceptual guide covers the types of requests you can make to the Natural Language API, how to construct those requests, and how to handle their responses. We recommend that all users of the Natural Language API read this guide and one of the associated tutorials before diving into the API itself.

Natural Language features

The Natural Language API has several methods for performing analysis and annotation on your text. Each level of analysis provides valuable information for language understanding. These methods are listed below:

Sentiment analysis inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral. Sentiment analysis is performed through the analyzeSentiment method. Currently, only English is supported for sentiment analysis.

Entity analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.) and returns information about those entities. Entity analysis is performed with the analyzeEntities method. Syntactic analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens. Syntactic Analysis is performed with the analyzeSyntax method.


This is the result of an 8 day final project for DBC Chicago. Our team scraped over 140,000 headlines of several news agencies stretching back several years. We then took those headlines and fed them through the AlchemyAPI sentiment analysis engine to assign each one a score. They were then plotted in a couple different ways using D3.

This is far from perfect and even farther from scientific. It was done in 8 days by some passionate amateur developers. It was however a lot of fun and very interesting.

You can read about it and the team in more detail on the repo page here:

https://github.com/kelmerp/headline_sentiment_rating

and see some slightly more technical slides here:

https://speakerdeck.com/luizneves77/sentimental-headlines

This was written in RoR, Postgres (Memcached), and javascript + D3.

I'm also the creator of:

onionornot.com reddesigned.com http://luiz-n.github.io/route-search/

and am interviewing for web dev (and data visualization) positions in the chicago area if you would like to reach out to me. @hey_luiz


That is cool. Most of the sentiment analysis APIs I saw just give a -1 to +1 score based on positive or negative feeling but looks like Watson APIs are much more elaborate.

The two sentences in the article that tells me this will be a failure:

“ Once we have a list of related headlines, we then use GPT-3 to generate a list of facts, not opinions, that are present within each of those headline’s respective articles.”

“ We then run the article through a “bias checker,” which uses sentiment analysis to rate the sentiment polarity of the generated article.”


Very cool - would you mind elaborating on the tools you used to analyze the sentiment? Any open source ones / is the source available for reuse? Would love to apply this to other areas.

This is the result of an 8 day final project for DBC Chicago. Our team scraped over 140,000 headlines of several news agencies stretching back several years. We then took those headlines and fed them through the AlchemyAPI sentiment analysis engine to assign each one a score. They were then plotted in a couple different ways using D3.

This is far from perfect and even farther from scientific. It was done in 8 days by some passionate amateur developers. It was however a lot of fun and very interesting.

You can read about it and the team in more detail on the repo page here:

https://github.com/kelmerp/headline_sentiment_rating

and see some slightly more technical slides here:

https://speakerdeck.com/luizneves77/sentimental-headlines

This was written in RoR, Postgres (Memcached), and javascript + D3.

I'm also the creator of:

onionornot.com reddesigned.com http://luiz-n.github.io/route-search/

and am interviewing for web dev (and data visualization) positions in the chicago area if you would like to reach out to me. @hey_luiz


Hey guys, we (Wingify) have exposed an API for determining sentiment and context from any URL or piece of text. I'll be glad if you could review it.

ContextSense was made to demonstrate our contextual targeting capabilities -- and sentiment aspect was added to avoid traditional blunders of displaying ads on pages/news on catastrophies. Best to try with text heavy URLs. Also try it with news items (both positive and negative).

I will be happy to provide API access in case any of you is interested in trying it out.


I found this paper useful for a side project I worked on a few months ago, one that made use of n-grams in a naive bayesian classifier:

http://arxiv.org/pdf/1305.6143v2.pdf

and the lead authors's github repos are:

https://github.com/vivekn/sentiment https://github.com/vivekn/sentiment-web

He's implemented 'negative bi-gram detection' (my phrasing, not his) with this function:

https://github.com/vivekn/sentiment/blob/master/info.py#L26-...

...which I found useful as a jumping off point. Good luck!

next

Legal | privacy