I was looking at Judea Pearl’s book on Bayesian networks, Probabilistic Reasoning in Intelligent Systems, on Amazon today, and ran into an interesting demonstration of why NLP is so difficult. I’m including the full quote below, because it may be interesting for those interested in language processing to think about how they might fix the fluke.
Recently I needed to learn the principles of Bayesian networks quickly, so I bought three books: this one by Pearl, “Pattern Recognition and Machine Learning” by Bishop, and “Bayesian Artificial Intelligence” by Korb and Nicholson. Each has a very different audience and different set of strengths.
The Bishop book would probably be a great text for a serious student with a year to spend learning the theory of machine learning. But I found it a bit too concise, with a bias towards an “algebraic” description rather than a “geometric” one (my preference). I wound up spending a lot of time trying to translate equations into mental pictures in order to grasp the concepts. Too much work, so I dropped this after a couple of days.
Next I tackled the Korb and Nicholson book. This one’s aimed at the application engineer who wants to get a network up and running quickly, and is not too concerned about how it works. I’ve been in that position many times in my career, and have always welcomed books like this for giving me a quick start into a new field. But this time I needed to really understand how Bayesian networks worked, and for this the Korb and Nicholson book is not great. In the first 9 pages of chapter 3 they try to explain the belief propagation algorithm, but their hearts just weren’t it in–I found their explanation to be unintelligible. (I suspect most readers just skim this to get to the applications.) So after several days of struggling and getting nowhere, I tossed this aside as well.
The Pearl book was the only left; I put it off to last since I was initially somewhat intimidated by it. After all, this is one of the books that kicked off the “Bayesian revolution,” so I was fearing a foundational math book consisting of one dry theorem after another. Not so! Although you have to read 174 pages to get through the belief propagation algorithm for trees, this took far less time than reading the first 62 pages of Korb and Nicholson, which cover roughly the same ground. The reason: Pearl is a gifted teacher and writer. His explanations are a series of baby steps, leaving nothing out, never assuming the reader will make “obvious” inferences, and supplying motivation every step of the way. Although he doesn’t have a lot of figures in the book, the ones he does have are excellent, and by the time I hit page 175, I had a clear picture in my head of not just how it all worked, but why it worked. In fact, after just two days of reading, I was able to implement the belief propagation algorithm in software in an afternoon (I tested the software with examples from the Korb and Nicholson book, so that book was ultimately useful). Pearl made the subject seem almost obvious. If you are looking for a book to help you get canned Bayesian software up and running for an application quickly, this is not it. But if you want to really understand how these things work, and don’t have a lot of time available, I cannot imagine a better book than this.
Amazon has a very cool feature under the “Customer Reviews” section of product pages that displays quotes from reviews that people found to be generally useful. The review can be both positive and negative, and its usefulness is determined by the number of people that click on “Yes” when asked “Was this review useful to you?” The interesting part is that they extract quotes that they believe to sum up or characterize the overall message of the review. In the particular review above, the following snippet was extracted and displayed at the top of the customer reviews section:
So after several days of struggling and getting nowhere, I tossed this aside as well.
If you find this sentence in the review (final sentence of the third paragraph), you’ll see that the customer was actually talking about another book he had bought to learn the same subject! When I saw this snippet, I was shocked to see that someone had had such a negative experience with this book—I’ve heard from many people that it’s incredibly lucid. I pulled up the full review, and was tickled to see that an intelligent system had made a mistake when summarizing user reviews on a book about intelligent systems.
My first thought was that anaphora resolution would be the way to prevent this type of mistake. After reading the paragraph, though, I think it would be tricky for an automatic system to resolve this reference. The preceding constituent is fairly far back in the text. I’ve never worked on this problem, but I imagine it’s fairly difficult to resolve a reference when it spans more than a few sentences.