Modeling Reasoning With Probabilities

In the pursuit of a complete model for human reasoning, one task which is crucial to the final product is an effective model of reasoning. The ability to combine known facts in order to create new facts is a process humans do effortlessly in our daily lives. When critically applied, the process of producing new facts from existing information helps to forward most, if not every, human endeavor. Modeling this process has, consequently, been attempted and refined many times.

An aside on logical variables and notation:

When discussing logic, it’s common to use “variables” to refer to other statements. It’s a useful practice in that it allows us to write very large and complex statements with very few characters. Whenever you see a lowercase or capital letter that hasn’t been defined, it’s likely a placeholder for a logical statement about the world. Variables can refer to sentential propositions (ex. x = "The tall tree in my backyard fell over") or to other combinations of variables and symbols (ex. x="~y & z"). Symbols describe relationships between variables, and are defined in tables throughout this post.

Propositional Logic Terminology:

Term	Meaning
x	x is true
~x	x is false
x ∧ y	x and y are true
x → y	If x is true then y is true

One approach to producing new facts from old ones is binary propositional logic. Propositional logic allows us to capture relations between statements about the world. Say we have three propositions: “the temperature is below 32 degrees” (t), “there is precipitation” (p), and “it is snowing” (s). From these variables we can make a few statements about the relationship between them: “if t and p are true, then s is true” (t ∧ p → s), “if t is false then s is false” (~t → ~s), and “if p is false then s is false” (~p → ~s). These rules let us use limited information we might have to gain more information about the state of the world. Say we know that it’s snowing. From that information and these rules, we can figure out both that it is below 32 degrees and that there is precipitation.

Bayesian reasoning terminology:

Term	Meaning
P( A )	The probability of A being true
P( ~A )	The probability of A being false
P( A, B )	The probability of A being true and B being true
P( A \| B )	The probability of A being true in the context that we know B is true

A alternative method for modeling reasoning is Bayesian probability, which, like propositional logic, consists of beliefs and relations between beliefs. The major benefit to bayesian networks over classical logical networks is that, rather than the beliefs and relations being binary, they are probabilistic. This helps immensely, because, outside of the frictionless vacuum that is classical logic, most events and relations between events aren’t known with 100% certainty. Where a classical interpretation of the model above would be easily disprovable with a counterexample (say… hail?), the bayesian interpretation (annotated below) is far more robust.

In dealing with probabilistic events rather than binary facts, Bayesian models capture the incomplete knowledge that we hold about the world. Most statements humans make are collectively understood to be mostly true, but sometimes false. When I say that the bus arrives at 8, nobody would claim I’m a liar if the bus didn’t come one day. My statement clearly isn’t meant to mean that it would be impossible for the bus not to come at 8, but rather that I intend the listener to believe that the bus will arrive at 8. I hold some internal high probability of the bus arriving at 8 (like 99%), and make statements to bring the listener’s probability close to mine. For this reason, taking statements about the world and feeding them into a binary logic machine would in surprisingly often produce unexpected and intractable paradoxes when unlikely scenarios occur that counter the statements programmed into it. By storing probabilities rather than binary facts, the model can recognize when an unlikely but possible event occurs and reason through it.

Even if we did prefer the binary format for facts in terms of communicating information to the person running the inference machine, we could always write rules to convert probabilities into binary facts, whereas we couldn’t do the same in reverse order. We could tell the program to convey knowledge it holds at greater than 95% likely as true, and knowledge it holds at less than 5% likely as false, leaving everything in the middle as so far undecided. That would allow the machine to reason internally with probabilities, but convey knowledge externally as facts. We cannot convert the other way around, because there are only two values in classical logic to project onto a continuous range between probabilities zero and one. If you set every false statement to probability 0 and every true statement to probability 1, you would get a working bayesian network that produces the same results as your binary network, but it would be similarly stuck saying all statements are 100% true or 100% false. Probabilistic models of reasoning are more expressive than binary models in capturing systems of imperfect knowledge.

Next week! The math behind Bayesian reasoning.