Trumped by conditionality: why many posts on this blog are not interesting
A large number of the posts on this blog do not get much attention – not too many hits and few if any comments. There could be several reasons for this, but I need to consider the possibility that readers find many of the things I write about uninteresting. Now, this isn’t for the want of effort from my side: I put a fair bit of work into research and writing, so it is a little disappointing. However, I take heart from the possibility that it might not be entirely my fault: there’s a statistical reason (excuse?) for the dearth of quality posts on this blog. This (possibly uninteresting) post discusses this probabilistic excuse.
The argument I present uses the concepts of conditional probability and Bayes Theorem. Those unfamiliar with these may want to have a look at my post on Bayes theorem before proceeding further.
Grist for my blogging mill comes from a variety of sources: work, others’ stories, books, research papers and the Internet. Because of time constraints, I can write up only a fraction of the ideas that come to my attention. Let’s put a number to this fraction – say I can write up only 10% of the ideas I come across. Assuming that my intent is to write interesting stuff, this number corresponds to the best (or most interesting) ideas I encounter. Of course, the term “interesting” is subjective – an idea that fascinates me might not have the same effect on you. However this is a problem for most qualitative judgements, so we’ll accept this and move on.
If we denote the event “I have an interesting idea” by and its probability by , we have:
Then, if we denote the event “I have an idea that is uninteresting” by , we have:
assuming that an idea must either be interesting or uninteresting (no other possibilities allowed).
Now, for me to write up an idea, I have to find it interesting (i.e. judge it as being in the top 10%). Let’s be generous and assume that I correctly recognise an interesting idea (as being interesting) 70% of the time. From this, the conditional probability of my writing a post given that I encounter an interesting idea, , is:
where is the event that I write up an idea.
On the flip side, let’s assume that I correctly recognise 80% of the uninteresting ideas that I encounter as being no good. This implies that I incorrectly identify 20% of the uninteresting stuff as being interesting. That is, 20% of the uninteresting stuff is wrongly identified as being blog-worthy. So, the conditional probability of my writing a post about an uninteresting idea, , is:
(If the above values for and are confusing remember that, by assumption, I write about all ideas that I find interesting – and this includes those ideas that I deem interesting but are actually uninteresting)
Now, we want to figure out the probability that a post that appears on my blog is interesting – i.e. that a post is interesting given that I have written it up. Using the notation of conditional probability, this can be written as . Bayes Theorem tells us that:
, which is the probability that I write a post, can be expressed as follows:
= probability that I write an interesting post+ probability that I write an uninteresting post
This can be written as,
Substituting this in the expression for Bayes Theorem, we get:
Using the numbers quoted above
So, only 28% of the ideas I write about are interesting. The main reason for is my inability to filter out all the dross. These “false positives” – which are all the ideas that I identify as interesting but are actually not – are represented by the term in the denominator. Since there are way more bad ideas than good ones floating around (pretty much everywhere!), the chance of false positives is significant.
So, there you go: it isn’t my fault really.🙂
I should point out that the percentage of interesting ideas written up will be small whenever the false positive term is significant compared to the numerator. In this sense the result is insensitive to the values of the probabilities that I’ve used.
Of course, the argument presented above is based on a number of assumptions. I assume that:
- Mostreaders of this blog share my interests.
- The ideas that I encounter are either interesting or uninteresting.
- There is an arbitrary cutoff point between interesting and uninteresting ideas (the 10% cutoff).
- There is an objective criterion for what’s interesting and what’s not, and that I can tell one from the other 70% of the time.
- The relevant probabilities are known.
…and so, to conclude
I have to accept that much of the stuff I write about will be uninteresting, but can take consolation in the possibility that it is a consequence of conditional probabilities. I’m trumped by conditionality, once more.
This post was inspired by Peter Rousseeuw’s brilliant and entertaining paper entitled, Why the Wrong Papers Get Published. Thanks also go out to Vlado Bokan for interesting conversations about conditional probabilities and Bayes theorem.