Wednesday, July 22, 2009

Equity portfolio risk (volatility) estimation using market information and sentiment

Previously, I have presented different findings on how news sentiment can be used in alpha generation in both short-term trading and longer-term investing. The focus of this posting will be on how news sentiment can be used, supplementing option implied volatility, in constructing forward-looking co-variance matrices that updates quickly as market conditions change. The adjusted estimate of the co-variance matrix can be applied for instance in multifactor models, which are often used to describe equity portfolio risk.

Here are the key findings of a joint study between Northfield Information Services and Mitra et.al. from CARISMA at Brunel University:
  • News sentiment improves reaction times. A news sentiment adjusted co-variance matrix reacts more quickly to changes in volatility as compared to a pure option-implied volatility adjusted co-variance matrix.
  • News sentiment is available when option-implied volatility is not. A news sentiment adjusted co-variance matrix can be constructed in cases where an option-implied volatility adjustment is impossible due to the lack of exchange traded options.
According to Mitra et.al. option-implied volatility can be used as a measure of the extent to which market participants believe current conditions that affect volatility are different from their typical state. This since traders are likely to respond quickly to new information that impacts expectations of future volatility because option prices are directly dependent on such volatility expectations. As a result, including option-implied volatility in the construction of the co-variance matrix should help provide more sensible estimates of future volatility.
An alternative and untapped source for picking up changes in market conditions that are manifested as time varying volatility is through the use of quantified news. Mitra et.al. provide the following example.

“If on a typical trading day there are ten to fifteen news wire service stories about Firm X, and today there are two hundred news wire service stories about Firm X, we can assert that there is a significantly greater than usual amount of information being imparted to investors about this firm. As such, more substantial share price movements may result than would be typical. We might even be able to analyze whether the content of the news stories would be considered broadly negative or positive with respect to the operations or valuation of Firm X. In essence, the volume and nature of textual news can be used like option-implied volatility to very rapidly adjust our expectations of future volatility for a particular firm or an entire market.”

In order to estimate the co-variance matrix, Mitra et.al. use a statistical factor model applying principal component analysis to extract orthogonal factors. This model is referred to as the “basic model”. The model is further updated applying a scaling factor based on the relative change in option-implied volatility as compared to the relative change in volatility using the basic model. A similar update is made using a news sentiment scaling factor. It should be noticed that the construction of a news sentiment scaling factor is possible for a larger set of companies than is covered by the options exchange.

For experimentation purposes, Mitra et.al. considers two different portfolios: the Eurostoxx 50 and Dow 30 constituents.

The first example focuses on the period 17 to 23 January 2008, where sentiment worsened and option implied volatility surged following several significant events including the announcement of a stimulus plan for the economy by President Bush, and a Fed interest rate cut by 75 basis points, the largest cut since October 1984. In Europe, Societe Generale was hit by the fraud scandal involving Jerome Kerviel. Looking at this period, Mitra et.al. find a clear indication that a sentiment-based model picks up on increased volatility at an earlier date than the model only using option implied volatility, See Table 1. Both models react quicker than the “basic model”. It should be noted that on 21 January 2008 there was a sharp decline in non-US stock markets (the US market was closed); hence Mitra et.al. argues that it is reasonable to assume that stock volatility rose on this date.



In the second example, Mitra et.al. focus on the period 18 to 24 September 2008 where several significant events took place including Lehman's filling for bankruptcy, Bank of America's announcement of its intention to purchase Merrill Lynch, the Fed's announcement of the AIG rescue, Lloyds takeover of HBOS and on 19 September restrictions were imposed on short selling of financial stocks. Table 2 shows the volatility for a portfolio of three financial stocks with equal weights on each stock: Bank of America, CitiGroup and J.P. Morgan Chase. Similarly, Table 3 shows the figures for a portfolio of three non-financial stocks: Johnson and Johnson, Kraft Foods and Coca Cola.

Mitra et.al. argue that in most cases there is higher volatility for the financial portfolio when the volatility estimate is updated using option implied data and likewise are found to increase when the news sentiment data is processed. On comparing the estimates for the financial and non-financial companies they find that the financial stocks volatility has risen significantly more than the non-financial stocks. This seems a sensible result for this period, given the market conditions and the different news announcements.





The results of the above study indicate that news sentiment could add value in the construction of forward-looking co-variance matrices, which could be a useful input to multifactor models. I find interesting that such adjustments can be made for a large set of stocks where exchange traded options are not available.

Wednesday, July 15, 2009

How correlated is news sentiment with other quant factors?

In the previous blog-posting, I presented some of the key findings of a quant factor analysis based on news sentiment conducted by Macquarie Research. This blog-posting will be focusing on the same report (May edition of Factorial! Under the title “Breaking News: How to use news sentiment to pick stocks”), but presenting their findings looking into the correlations between news sentiment-based quant factors and the more traditional quant factors. As previously stated, this is obviously important since quants are not only interested in a positive Information Coefficient, but also how it correlates with other factors - otherwise there is little benefit from adding it to a multifactor model.

Focusing on the news sentiment factors themselves, Macquarie finds that different types of news scores are somewhat uncorrelated, at least to the extent that diversification effects are available. More important for quants is the correlation between news sentiment and other common quant factors. In general, the performance of news sentiment is relatively uncorrelated with value, but has higher correlations with price momentum, earnings revisions and earnings surprises.

In addition, Macquarie look at the correlations between the average correlation of their composite news sentiment factor with every factor in each factor bucket. Surprisingly, news sentiment performance has the highest average correlation with momentum and analyst sentiment factors, but this correlation is not as high as expected. In fact, it is below 40% against both momentum and analyst sentiment factors (Fig. 37).

Based on these findings, Macquarie concludes that this means news sentiment has the potential to add value as an additional factor in a multifactor model.

Wednesday, July 8, 2009

When does News become Noise?

There are over 100,000 unique English language stories on average about finance published daily from reputable sources like Dow Jones, Bloomberg, Reuters, and other newswire services (source: RavenPack). Stories are said to be unique in that they discuss a specific company, topic or theme, and reveal new information. In just a few seconds, these news stories are fed into thousands of news outlets, content aggregators, websites, and end-user applications like trading terminals. News is reproduced, redistributed, and duplicated into millions of news items in minutes. As more time goes by (usually a few days), the original stories become an uncontrollable stream of news items found just about anywhere from corporate or proprietary intranets to the Web. So, when does news become noise?

For starters, most original financial news comes from newswires and professional paid news services and not blogs or other social media. News vendors sell their information usually as part of a subscription model and investors pay a regular fee to receive their content. Accessing content in real-time involves a fee whereas consuming the information online typically has a lower cost (or no cost) because it’s delayed. Investors will pay premium to receive timely content like economic indicators, corporate news and sentiment data in a machine-readable format, as a low-latency feed, sometimes even co-locating servers with the content provider to gain just a few milliseconds on delivery. The faster one can access the content and the more efficient the format is (pre-analyzed for sentiment, relevance, and potential market impact), the higher the premium they are willing to pay.

One of the reasons why investors pay for content is to be closer to the original source and further away from the noise. Having access to direct feeds of original content helps remove noise, but doesn’t solve the problem. Even reputable and original sources may cover the same stories but produce separate unique news items. A challenge is then to sift through the content of original publishers and find the “news” on the first instance when it happens. Here is where the burden is placed more on the format of the stories (rich tags and analytics) than on speed.

The following factors play an important role in eliminating noise:
  1. Direct Access: Just because it’s not online when you looked doesn’t mean the content isn’t already out there being consumed and traded by investors. Most original breaking news in finance can take minutes to hours before you can read it online (and more time for it to appear on a search engine or news aggregator service like Google news). Don’t be fooled by the “real-time” web argument. Let’s face it, most news online is not real-time.
  2. News Dissemination: Understand how news and information flows from the original content producer to the end consumer (the investor). Without this fundamental knowledge, as an investor you’re probably trading off old news even if it appears new to you.
  3. Identify the Source: Distinguish between sources, authors, publishers, aggregators, and even trading terminals (i.e. not everything on the Bloomberg terminal is Bloomberg News). For example, most financial sites like Google Finance, CNN Money, Forbes, etc., get their news from an original publisher (and someone else likely subscribed to a direct feed from the source).
  4. Novelty: Again, just because it’s news to you (or your news reader or aggregator) doesn’t mean it’s new and unique. A criterion for novelty is essential depending on your trading strategy and horizon (intraday, daily, monthly, quarterly, etc.)
  5. Relevance: You must be able to describe (preferably in numerical terms) how pertinent, connected, or applicable a news story is to a given entity (i.e. a company). In Fig. 1, I have included a summary describing the distribution of relevance (according to RavenPack) across sentiment-based news stories for the constituents of the Dow Jones Industrial Average covering the years 2005 through 2008. The vast amount of low relevance stories indicates a potential presence of significant levels of noise provided that relevance is not taken into account. Especially, it can be observed that only about 22% of stories hold a high degree of relevance.


    Fig. 1: Distribution of Relevance on news stories with sentiment for the Dow 30. Relevance is measured with a score of 0-100 where higher values indicate a story is more relevant to a company in the Dow 30.

  6. Sentiment: If only it was that easy as “green” for “good, “red” for “bad”. Sentiment is not absolute! Is it positive to one company but negative to another? Under what conditions and within what trading horizon? Are some news events more negative than others? Do you have sentiment data from one perspective (one technique) or multiple (various techniques)? Looking at news from various angles will likely yield a more representative interpretation of sentiment.
There are perhaps other factors to consider when addressing the question of when news becomes noise. David Leinweber does a really good job explaining some of these issues and provides a structured view for the sources of investment news and securities trading rumours in his book “Nerds on Wall Street”. I also read some postings by Jason Goepfert in Sentiment’s Edge where he discusses the importance of accuracy in sentiment analysis and the sources of content used to measure sentiment. Overall, I think understanding the various aspects of news production and consumption is key to generate alpha from public information.

Wednesday, July 1, 2009

News Sentiment as a Quant Factor

In a previous blog-posting, I presented some of the key findings of an event study conducted by Macquarie Research. This blog-posting will be focusing on the same report (May edition of Factorial! Under the title “Breaking News: How to use news sentiment to pick stocks”), but presenting the key results of their quant factor analysis. To summarize, Macquarie shows that quant factors based on News Sentiment add significant value to multi-factor alpha models. More importantly, they find that investing based on News Sentiment has worked at a time when many traditional quant factors have failed. The study was conducted on the Russell 3000 index covering both large and small cap companies for the period January 2005 through March 2009. Sentiment data was supplied by RavenPack International.

In order to be able to include news sentiment in a multifactor quant model, it is necessary to perform some kind of frequency transformation, and thus translate sentiment into a quant factor working on the same periodic intervals as the relevant market data. In their report, Macquarie tests a number of such factors including the following examples:
  • Simple average: take the average of one of the RavenPack sentiment classifiers per company over the past week
  • Relevance-weighted average: scale each sentiment classifier score for the relevant company by the relevance score for that article and then take the average over the last week. This puts more emphasis on articles where the company was prominently mentioned.
  • Simple count: count the number of positive or negative scores per classifier for the relevant company in the past week.
  • Linear time-weighted average: use a linear decay profile to weight sentiment classifier scores for the past week. This puts more weight on more recent news articles.
Summarizing the key points of the Macquarie Research quant factor study:
  • Composite factors tend to work better than any of the factors based on individual scores. This is partly because of diversification benefits. The five RavenPack sentiment scores tend to have high, but not perfect, correlations. Also, each score has reasonable predictive power in its own right. So as a result, a composite factor is able to pick up some diversification benefits.
  • Relevance-weighted factors tend to have better performance than unweighted factors. Articles with a high relevance score are better at capturing news sentiment specific to a particular company. So weighting each sentiment score by the relevance score should enhance the predictive power of the factors.
  • Recent performance of news sentiment was excellent through the credit crisis – a time when many familiar quant factors have struggled (Fig. 33, below). More specifically, news sentiment-based quant factors account for 31 of the top 50 quant factors by average rank Information Coefficient (IC). Macquarie suggest this is because through the severe market dislocation, investors lost faith in the company fundamentals upon which many traditional quant factors are based. For example, investors stopped trusting not only factors based on sell-side analyst forecasts (eg, earnings revisions, forward PER), but also factors that could be clouded by uncertain asset values (eg, price/book, ROA).
  • Just being in the news is a positive sign. This is seen in the fact that a simple count of the number of news articles (irrespective of whether they are positive or negative) actually has positive predictive power. This is more pronounced in the monthly backtests. Marcquarie argues that further research is needed addressing this issue because the academic literature has been inconclusive on whether news coverage is positive or negative for stock returns. Another issue here is a potential size bias. Buying stocks with lots of news stories is similar to buying large cap names, because of the correlation between size and the number of stories.
  • Rebalancing frequency is important. The weekly and monthly results are similar, but there are also some significant differences. In general, count-based factors tend to outperform average-based factors in the monthly backtests. However, the weekly results are opposite. This raises interesting questions about the information decay rate of different components of news sentiment. For example, does the ‘attention effect’ of just being in the news persist longer than the directional aspect (ie, positive or negative sentiment) of the article? Again, Macquarie suggests that further research is needed adressing this issue.
  • Is data mining a potential problem? According to Macquarie, the short answer is yes. Whenever testing a large number of factors, some will almost certainly give good performance purely due to random chance. How can one tell if this is the case here?
Macquarie argues that there are a few aspects to the results that give them some comfort. First, the majority of results have the signs that they would expect. For example, the count of negative stories for four of the five sentiment scores has a negative (IC) in the weekly backtests, which makes sense because one would expect that stocks with a large number of negative stories would underperform. Second, there is a consistency of performance across factors derived from the five news sentiment score, despite the fact that the factors actually have a reasonable low correlation. Again, this provides some confidence that the results are not purely the result of spurious data variation.


In my next blog posting, I will make reference to some interesting findings by Macquarie looking into the correlations between news sentiment-based quant factors and the more traditional quant factors. This is obviously important since quants are not only interested in positive ICs, but also how it correlates with other factors - otherwise there is little benefit from adding it to a multifactor model.