Although simplicity and intuition work in its favor, there are many challenges with this approach.
- Define a relevant list of words and phrases. Journalists, analysts, and market participants use specific language when talking about stocks and expressing their views or recommendations. They use things like better than expected, missed earning target, downgraded from equalweight to underweight, etc. Many scholars and researchers in the field have used generic lists or libraries of positive and negative terms like Harvard’s General Inquirer, which often fail to capture key financial language, its use and meaning in finance.
- Define a point-of-view (i.e. identify sentiment by company). Identifying key financial language is one challenge, associating it to a specific company is another. Firstly, you need an accurate named entity recognition approach in order to detect mentions or references to companies in text. For example, the company Bank of America Corp. is also known in the news by its Ticker (BAC), a CUSIP or ISIN like US0605051046, and common aliases like “BofA” or more recently “Bank of America Merrill Lynch”. Regularly, companies change their exchange listings, names, merge or get acquired, and therefore a sense of time and information of corporate actions is key in company detection. That way you find only references to companies by information that is accurate at the time of story publication. Also, dealing with false-positives and disambiguation is essential. For example, the French energy company Total S.A. can often be confused with the word “total”, which is very common in news stories.
- Associate emotionally charged language with companies. Once you’ve identified a company and some key words or phrases, the challenge is to understand the context in which they are used. If a story mentions Bank of America, Lehman Brothers and the word “bankruptcy”, is this enough to say that the story is negative for both companies? Let’s say the headline read “Lehman Bros files for bankruptcy”. The body of the story contained a few paragraphs about the fourth-largest US investment bank then just filing for bankruptcy protection, dealing a huge blow to the fragile global financial system. In the very last paragraph, the story might say “In other news, BofA is up 1.5% for the day”. Without context and proper company identification, any assumptions for sentiment are flawed.
- Map companies to traded securities. News stories normally make references to companies not securities. Therefore, one must have information about companies that is correct at each point in time, identifying them by ISINs and ticker symbols accurate at the time of story publication. References to companies in text must then be mapped to the proper traded securities identifiers.
- Associate security-mapped, company-specific language with stock price movements. Perhaps the most challenging of all, it’s imperative to model and back test so-called positive and negative key words, phrases terms with subsequent positive and negative abnormal stock returns (or other directional benchmark metrics). I did some work on this area and documented the results here.
The work done by Paul Tetlock in More than Words: Quantifying Language to Measure Firms' Fundamentals addresses some of these challenges.
Other, perhaps more powerful news sentiment techniques include Expert Consensus and Market Response.
The Expert Consensus methodology entails training classification algorithms on the results of financial experts manually tagging thousands of stories and creating so-called “training sets”. Experts tag stories as likely to have a positive, negative, or neutral effect in the stock price of a given company in the hours ahead or on a given trading window. Stories where the majority of the experts agree on sentiment serve as consensus training sets. Classification systems can then be built using Bayesian networks or other probabilistic models to automate the process.
The Market Response methodology uses historical market data to measure the degree of impact a news item has on a specific market or security. A classifier can be trained on several years of news archives related to a set of global companies. The classifier is trained based on how markets tend to respond to news and assigns positive or negative ratings based on statistical analysis and not human judgment or input. Scholars from the University of Massachusetts have done some work related to this technique documented in the paper called Language Models for Financial News Recommendation.
In future postings, I plan to discuss these methodologies in more detail.

